Re: [Gluster-users] Replica brick not working

Atin Mukherjee Wed, 07 Dec 2016 21:13:55 -0800

>From the log snippet:

[2016-12-07 09:15:35.677645] I [MSGID: 106482]
[glusterd-brick-ops.c:442:__glusterd_handle_add_brick] 0-management:
Received add brick req
[2016-12-07 09:15:35.677708] I [MSGID: 106062]
[glusterd-brick-ops.c:494:__glusterd_handle_add_brick] 0-management:
replica-count is 2
[2016-12-07 09:15:35.677735] E [MSGID: 106291]
[glusterd-brick-ops.c:614:__glusterd_handle_add_brick] 0-management:


The last log entry indicates that we hit the code path in
gd_addbr_validate_replica_count ()

                if (replica_count == volinfo->replica_count)
{

                        if (!(total_bricks % volinfo->dist_leaf_count))
{
                                ret =
1;
                                goto
out;

}
                }

@Pranith, Ravi - Milos was trying to convert a dist (1 X 1) volume to a
replicate (1 X 2) using add brick and hit this issue where add-brick
failed. The cluster is operating with 3.7.6. Could you help on what
scenario this code path can be hit? One straight forward issue I see here
is missing err_str in this path.



On Wed, Dec 7, 2016 at 7:56 PM, Miloš Čučulović - MDPI <cuculo...@mdpi.com>
wrote:

> Sure Atin, logs are attached.
>
> - Kindest regards,
>
> Milos Cuculovic
> IT Manager
>
> ---
> MDPI AG
> Postfach, CH-4020 Basel, Switzerland
> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
> Tel. +41 61 683 77 35
> Fax +41 61 302 89 18
> Email: cuculo...@mdpi.com
> Skype: milos.cuculovic.mdpi
>
> On 07.12.2016 11:32, Atin Mukherjee wrote:
>
>> Milos,
>>
>> Giving snippets wouldn't help much, could you get me all the log files
>> (/var/log/glusterfs/*) from both the nodes?
>>
>> On Wed, Dec 7, 2016 at 3:54 PM, Miloš Čučulović - MDPI
>> <cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>> wrote:
>>
>>     Thanks, here is the log after volume force:
>>
>>     [2016-12-07 10:23:39.157234] I [MSGID: 115036]
>>     [server.c:552:server_rpc_notify] 0-storage-server: disconnecting
>>     connection from
>>     storage2-23175-2016/12/07-10:14:56:951307-storage-client-0-0-0
>>     [2016-12-07 10:23:39.157301] I [MSGID: 101055]
>>     [client_t.c:419:gf_client_unref] 0-storage-server: Shutting down
>>     connection
>>     storage2-23175-2016/12/07-10:14:56:951307-storage-client-0-0-0
>>     [2016-12-07 10:23:40.187805] I [login.c:81:gf_auth] 0-auth/login:
>>     allowed user names: ef4e608d-487b-49a3-85dd-0b36b3554312
>>     [2016-12-07 10:23:40.187848] I [MSGID: 115029]
>>     [server-handshake.c:612:server_setvolume] 0-storage-server: accepted
>>     client from
>>     storage2-23679-2016/12/07-10:23:40:160327-storage-client-0-0-0
>>     (version: 3.7.6)
>>     [2016-12-07 10:23:52.817529] E [MSGID: 113001]
>>     [posix-helpers.c:1177:posix_handle_pair] 0-storage-posix:
>>     /data/data-cluster/dms/submissions/User - 226485:
>>     key:glusterfs.preop.parent.keyflags: 1 length:22 [Operation not
>>     supported]
>>     [2016-12-07 10:23:52.817598] E [MSGID: 113001]
>>     [posix.c:1384:posix_mkdir] 0-storage-posix: setting xattrs on
>>     /data/data-cluster/dms/submissions/User - 226485 failed [Operation
>>     not supported]
>>     [2016-12-07 10:23:52.821388] E [MSGID: 113001]
>>     [posix-helpers.c:1177:posix_handle_pair] 0-storage-posix:
>>     /data/data-cluster/dms/submissions/User -
>>     226485/815a39ccc2cb41dadba45fe7c1e226d4:
>>     key:glusterfs.preop.parent.keyflags: 1 length:22 [Operation not
>>     supported]
>>     [2016-12-07 10:23:52.821434] E [MSGID: 113001]
>>     [posix.c:1384:posix_mkdir] 0-storage-posix: setting xattrs on
>>     /data/data-cluster/dms/submissions/User -
>>     226485/815a39ccc2cb41dadba45fe7c1e226d4 failed [Operation not
>> supported]
>>
>>     - Kindest regards,
>>
>>     Milos Cuculovic
>>     IT Manager
>>
>>     ---
>>     MDPI AG
>>     Postfach, CH-4020 Basel, Switzerland
>>     Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>     Tel. +41 61 683 77 35
>>     Fax +41 61 302 89 18
>>     Email: cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>
>>     Skype: milos.cuculovic.mdpi
>>
>>     On 07.12.2016 11:19, Atin Mukherjee wrote:
>>
>>         You are referring to wrong log file which is for self heal
>>         daemon. You'd
>>         need to get back with the brick log file.
>>
>>         On Wed, Dec 7, 2016 at 3:45 PM, Miloš Čučulović - MDPI
>>         <cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>
>>         <mailto:cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>>> wrote:
>>
>>             This is the log file after force command:
>>
>>
>>             [2016-12-07 10:14:55.945937] W
>>         [glusterfsd.c:1236:cleanup_and_exit]
>>             (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x770a)
>>         [0x7fe9d905570a]
>>             -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x40810d]
>>             -->/usr/sbin/glusterfs(cleanup_and_exit+0x4d) [0x407f8d] )
>> 0-:
>>             received signum (15), shutting down
>>             [2016-12-07 10:14:56.960573] I [MSGID: 100030]
>>             [glusterfsd.c:2318:main] 0-/usr/sbin/glusterfs: Started
>> running
>>             /usr/sbin/glusterfs version 3.7.6 (args: /usr/sbin/glusterfs
>> -s
>>             localhost --volfile-id gluster/glustershd -p
>>             /var/lib/glusterd/glustershd/run/glustershd.pid -l
>>             /var/log/glusterfs/glustershd.log -S
>>             /var/run/gluster/2599dc977214c2895ef1b090a26c1518.socket
>>             --xlator-option
>>             *replicate*.node-uuid=7c988af2-9f76-4843-8e6f-d94866d57bb0)
>>             [2016-12-07 10:14:56.968437] I [MSGID: 101190]
>>             [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll:
>> Started
>>             thread with index 1
>>             [2016-12-07 10:14:56.969774] I
>>         [graph.c:269:gf_add_cmdline_options]
>>             0-storage-replicate-0: adding option 'node-uuid' for volume
>>             'storage-replicate-0' with value
>>         '7c988af2-9f76-4843-8e6f-d94866d57bb0'
>>             [2016-12-07 10:14:56.985257] I [MSGID: 101190]
>>             [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll:
>> Started
>>             thread with index 2
>>             [2016-12-07 10:14:56.986105] I [MSGID: 114020]
>>             [client.c:2118:notify] 0-storage-client-0: parent
>>         translators are
>>             ready, attempting connect on transport
>>             [2016-12-07 10:14:56.986668] I [MSGID: 114020]
>>             [client.c:2118:notify] 0-storage-client-1: parent
>>         translators are
>>             ready, attempting connect on transport
>>             Final graph:
>>
>>         +-----------------------------------------------------------
>> -------------------+
>>               1: volume storage-client-0
>>               2:     type protocol/client
>>               3:     option ping-timeout 42
>>               4:     option remote-host storage2
>>               5:     option remote-subvolume /data/data-cluster
>>               6:     option transport-type socket
>>               7:     option username ef4e608d-487b-49a3-85dd-0b36b3554312
>>               8:     option password dda0bdbf-95c1-4206-a57d-686756210170
>>               9: end-volume
>>              10:
>>              11: volume storage-client-1
>>              12:     type protocol/client
>>              13:     option ping-timeout 42
>>              14:     option remote-host storage
>>              15:     option remote-subvolume /data/data-cluster
>>              16:     option transport-type socket
>>              17:     option username ef4e608d-487b-49a3-85dd-0b36b3554312
>>              18:     option password dda0bdbf-95c1-4206-a57d-686756210170
>>              19: end-volume
>>              20:
>>              21: volume storage-replicate-0
>>              22:     type cluster/replicate
>>              23:     option node-uuid 7c988af2-9f76-4843-8e6f-d94866
>> d57bb0
>>              24:     option background-self-heal-count 0
>>              25:     option metadata-self-heal on
>>              26:     option data-self-heal on
>>              27:     option entry-self-heal on
>>              28:     option self-heal-daemon enable
>>              29:     option iam-self-heal-daemon yes
>>             [2016-12-07 10:14:56.987096] I
>>         [rpc-clnt.c:1847:rpc_clnt_reconfig]
>>             0-storage-client-0: changing port to 49152 (from 0)
>>              30:     subvolumes storage-client-0 storage-client-1
>>              31: end-volume
>>              32:
>>              33: volume glustershd
>>              34:     type debug/io-stats
>>              35:     subvolumes storage-replicate-0
>>              36: end-volume
>>              37:
>>
>>         +-----------------------------------------------------------
>> -------------------+
>>             [2016-12-07 10:14:56.987685] E [MSGID: 114058]
>>             [client-handshake.c:1524:client_query_portmap_cbk]
>>             0-storage-client-1: failed to get the port number for remote
>>             subvolume. Please run 'gluster volume status' on server to
>>         see if
>>             brick process is running.
>>             [2016-12-07 10:14:56.987766] I [MSGID: 114018]
>>             [client.c:2042:client_rpc_notify] 0-storage-client-1:
>>         disconnected
>>             from storage-client-1. Client process will keep trying to
>>         connect to
>>             glusterd until brick's port is available
>>             [2016-12-07 10:14:56.988065] I [MSGID: 114057]
>>             [client-handshake.c:1437:select_server_supported_programs]
>>             0-storage-client-0: Using Program GlusterFS 3.3, Num
>> (1298437),
>>             Version (330)
>>             [2016-12-07 10:14:56.988387] I [MSGID: 114046]
>>             [client-handshake.c:1213:client_setvolume_cbk]
>>         0-storage-client-0:
>>             Connected to storage-client-0, attached to remote volume
>>             '/data/data-cluster'.
>>             [2016-12-07 10:14:56.988409] I [MSGID: 114047]
>>             [client-handshake.c:1224:client_setvolume_cbk]
>>         0-storage-client-0:
>>             Server and Client lk-version numbers are not same, reopening
>>         the fds
>>             [2016-12-07 10:14:56.988476] I [MSGID: 108005]
>>             [afr-common.c:3841:afr_notify] 0-storage-replicate-0:
>> Subvolume
>>             'storage-client-0' came back up; going online.
>>             [2016-12-07 10:14:56.988581] I [MSGID: 114035]
>>             [client-handshake.c:193:client_set_lk_version_cbk]
>>             0-storage-client-0: Server lk version = 1
>>
>>
>>             - Kindest regards,
>>
>>             Milos Cuculovic
>>             IT Manager
>>
>>             ---
>>             MDPI AG
>>             Postfach, CH-4020 Basel, Switzerland
>>             Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>             Tel. +41 61 683 77 35
>>             Fax +41 61 302 89 18
>>             Email: cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>
>>         <mailto:cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>>
>>             Skype: milos.cuculovic.mdpi
>>
>>             On 07.12.2016 11:09, Atin Mukherjee wrote:
>>
>>
>>
>>                 On Wed, Dec 7, 2016 at 3:37 PM, Miloš Čučulović - MDPI
>>                 <cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>
>>         <mailto:cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>>
>>                 <mailto:cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>
>>         <mailto:cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>>>> wrote:
>>
>>                     Hi Akin,
>>
>>                     thanks for your reply.
>>
>>                     I was trying to debug it since yesterday and today I
>>         completely
>>                     purget the glusterfs-server from the storage server.
>>
>>                     I installed it again, checked the firewall and the
>>         current
>>                 status is
>>                     as follows now:
>>
>>                     On storage2, I am running:
>>                     sudo gluster volume add-brick storage replica 2
>>                     storage:/data/data-cluster
>>                     Answer => volume add-brick: failed: Operation failed
>>                     cmd_history says:
>>                     [2016-12-07 09:57:28.471009]  : volume add-brick
>> storage
>>                 replica 2
>>                     storage:/data/data-cluster : FAILED : Operation failed
>>
>>                     glustershd.log => no new entry on runing the
>>         add-brick command.
>>
>>                     etc-glusterfs-glusterd.vol.log =>
>>                     [2016-12-07 10:01:56.567564] I [MSGID: 106482]
>>                     [glusterd-brick-ops.c:442:__gl
>> usterd_handle_add_brick]
>>                 0-management:
>>                     Received add brick req
>>                     [2016-12-07 10:01:56.567626] I [MSGID: 106062]
>>                     [glusterd-brick-ops.c:494:__gl
>> usterd_handle_add_brick]
>>                 0-management:
>>                     replica-count is 2
>>                     [2016-12-07 10:01:56.567655] E [MSGID: 106291]
>>                     [glusterd-brick-ops.c:614:__gl
>> usterd_handle_add_brick]
>>                 0-management:
>>
>>
>>                     Logs from storage (new server), there is no relevant
>> log
>>                 when I am
>>                     running the command add-brick on storage2.
>>
>>
>>                     Now, after reinstalling glusterfs-server on storage,
>>         I can
>>                 see on
>>                     storage2:
>>
>>                     Status of volume: storage
>>                     Gluster process                       TCP Port  RDMA
>>         Port
>>                 Online  Pid
>>
>>
>>         ------------------------------------------------------------
>> ------------------
>>                     Brick storage2:/data/data-cluster    49152     0
>>               Y
>>                      2160
>>                     Self-heal Daemon on localhost        N/A       N/A
>>               Y
>>                      7906
>>
>>                     Task Status of Volume storage
>>
>>
>>         ------------------------------------------------------------
>> ------------------
>>                     There are no active volume tasks
>>
>>
>>                     By running the "gluster volume start storage force",
>>         do I
>>                 risk to
>>                     broke the storage2? This is a production server and
>>         needs to
>>                 stay live.
>>
>>
>>                 No, its going to bring up the brick process(es) if its
>>         not up.
>>
>>
>>                     - Kindest regards,
>>
>>                     Milos Cuculovic
>>                     IT Manager
>>
>>                     ---
>>                     MDPI AG
>>                     Postfach, CH-4020 Basel, Switzerland
>>                     Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>                     Tel. +41 61 683 77 35
>>                     Fax +41 61 302 89 18
>>                     Email: cuculo...@mdpi.com
>>         <mailto:cuculo...@mdpi.com> <mailto:cuculo...@mdpi.com
>>         <mailto:cuculo...@mdpi.com>>
>>                 <mailto:cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>
>>         <mailto:cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>>>
>>                     Skype: milos.cuculovic.mdpi
>>
>>                     On 07.12.2016 10:44, Atin Mukherjee wrote:
>>
>>
>>
>>                         On Tue, Dec 6, 2016 at 10:08 PM, Miloš Čučulović
>>         - MDPI
>>                         <cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>
>>         <mailto:cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>>
>>                 <mailto:cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>
>>         <mailto:cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>>>
>>                         <mailto:cuculo...@mdpi.com
>>         <mailto:cuculo...@mdpi.com> <mailto:cuculo...@mdpi.com
>>         <mailto:cuculo...@mdpi.com>>
>>                 <mailto:cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>
>>         <mailto:cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>>>>> wrote:
>>
>>                             Dear All,
>>
>>                             I have two servers, storage and storage2.
>>                             The storage2 had a volume called storage.
>>                             I then decided to add a replica brick
>> (storage).
>>
>>                             I did this in the following way:
>>
>>                             1. sudo gluster peer probe storage (on
>>         storage server2)
>>                             2. sudo gluster volume add-brick storage
>>         replica 2
>>                             storage:/data/data-cluster
>>
>>                             Then I was getting the following error:
>>                             volume add-brick: failed: Operation failed
>>
>>                             But, it seems the brick was somehow added,
>>         as when
>>                 checking
>>                         on storage2:
>>                             sudo gluster volume info storage
>>                             I am getting:
>>                             Status: Started
>>                             Number of Bricks: 1 x 2 = 2
>>                             Transport-type: tcp
>>                             Bricks:
>>                             Brick1: storage2:/data/data-cluster
>>                             Brick2: storage:/data/data-cluster
>>
>>
>>                             So, seems ok here, however, when doing:
>>                             sudo gluster volume heal storage info
>>                             I am getting:
>>                             Volume storage is not of type
>> replicate/disperse
>>                             Volume heal failed.
>>
>>
>>                             Also, when doing
>>                             sudo gluster volume status all
>>
>>                             I am getting:
>>                             Status of volume: storage
>>                             Gluster process                       TCP
>>         Port  RDMA
>>                 Port
>>                         Online  Pid
>>
>>
>>
>>         ------------------------------------------------------------
>> ------------------
>>                             Brick storage2:/data/data-cluster    49152
>>  0
>>                       Y
>>                              2160
>>                             Brick storage:/data/data-cluster     N/A
>>            N/A
>>                       N
>>                              N/A
>>                             Self-heal Daemon on localhost        N/A
>>            N/A
>>                       Y
>>                              7906
>>                             Self-heal Daemon on storage          N/A
>>            N/A
>>                       N
>>                              N/A
>>
>>                             Task Status of Volume storage
>>
>>
>>
>>         ------------------------------------------------------------
>> ------------------
>>
>>                             Any idea please?
>>
>>
>>                         It looks like the brick didn't come up during an
>>         add-brick.
>>                         Could you
>>                         share cmd_history, glusterd and the new brick
>>         log file
>>                 from both the
>>                         nodes? As a workaround, could you try 'gluster
>>         volume
>>                 start storage
>>                         force' and see if the issue persists?
>>
>>
>>
>>                             --
>>                             - Kindest regards,
>>
>>                             Milos Cuculovic
>>                             IT Manager
>>
>>                             ---
>>                             MDPI AG
>>                             Postfach, CH-4020 Basel, Switzerland
>>                             Office: St. Alban-Anlage 66, 4052 Basel,
>>         Switzerland
>>                             Tel. +41 61 683 77 35
>>                             Fax +41 61 302 89 18
>>                             Email: cuculo...@mdpi.com
>>         <mailto:cuculo...@mdpi.com>
>>                 <mailto:cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>>
>>         <mailto:cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>
>>                 <mailto:cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>>>
>>                         <mailto:cuculo...@mdpi.com
>>         <mailto:cuculo...@mdpi.com> <mailto:cuculo...@mdpi.com
>>         <mailto:cuculo...@mdpi.com>>
>>                 <mailto:cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>
>>         <mailto:cuculo...@mdpi.com <mailto:cuculo...@mdpi.com>>>>
>>                             Skype: milos.cuculovic.mdpi
>>                             ______________________________
>> _________________
>>                             Gluster-users mailing list
>>                             Gluster-users@gluster.org
>>         <mailto:Gluster-users@gluster.org>
>>                 <mailto:Gluster-users@gluster.org
>>         <mailto:Gluster-users@gluster.org>>
>>                 <mailto:Gluster-users@gluster.org
>>         <mailto:Gluster-users@gluster.org>
>>                 <mailto:Gluster-users@gluster.org
>>         <mailto:Gluster-users@gluster.org>>>
>>                         <mailto:Gluster-users@gluster.org
>>         <mailto:Gluster-users@gluster.org>
>>                 <mailto:Gluster-users@gluster.org
>>         <mailto:Gluster-users@gluster.org>>
>>                         <mailto:Gluster-users@gluster.org
>>         <mailto:Gluster-users@gluster.org>
>>                 <mailto:Gluster-users@gluster.org
>>         <mailto:Gluster-users@gluster.org>>>>
>>
>>                 http://www.gluster.org/mailman/listinfo/gluster-users
>>         <http://www.gluster.org/mailman/listinfo/gluster-users>
>>                 <http://www.gluster.org/mailman/listinfo/gluster-users
>>         <http://www.gluster.org/mailman/listinfo/gluster-users>>
>>
>>         <http://www.gluster.org/mailman/listinfo/gluster-users
>>         <http://www.gluster.org/mailman/listinfo/gluster-users>
>>                 <http://www.gluster.org/mailman/listinfo/gluster-users
>>         <http://www.gluster.org/mailman/listinfo/gluster-users>>>
>>
>>                 <http://www.gluster.org/mailman/listinfo/gluster-users
>>         <http://www.gluster.org/mailman/listinfo/gluster-users>
>>                 <http://www.gluster.org/mailman/listinfo/gluster-users
>>         <http://www.gluster.org/mailman/listinfo/gluster-users>>
>>
>>         <http://www.gluster.org/mailman/listinfo/gluster-users
>>         <http://www.gluster.org/mailman/listinfo/gluster-users>
>>                 <http://www.gluster.org/mailman/listinfo/gluster-users
>>         <http://www.gluster.org/mailman/listinfo/gluster-users>>>>
>>
>>
>>
>>
>>                         --
>>
>>                         ~ Atin (atinm)
>>
>>
>>
>>
>>                 --
>>
>>                 ~ Atin (atinm)
>>
>>
>>
>>
>>         --
>>
>>         ~ Atin (atinm)
>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>>
>


-- 

~ Atin (atinm)

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Replica brick not working

Reply via email to