Re: [Gluster-users] glusterfs 4.1.6 error in starting glusterd service

2019-01-30 Thread Amudhan P
Hi Atin,

This is the steps exactly I have done which caused failure. additional to
this node3 OS drive was running out of space when service failed. so I have
cleared some space in OS drive but still service failed to start.

Trying to simulate a situation. where volume stoped abnormally and
entire cluster restarted with some missing disks.

My test cluster is set up with 3 nodes and each has four disks, I have
setup a volume with disperse 4+2.
In Node-3 2 disks have failed, to replace I have shutdown all system

below are the steps done.

1. umount from client machine
2. shutdown all system by running `shutdown -h now` command ( without
stopping volume and stop service)
3. replace faulty disk in Node-3
4. powered ON all system
5. format replaced drives, and mount all drives
6. start glusterd service in all node (success)
7. Now running `voulume status` command from node-3
output : [2019-01-15 16:52:17.718422]  : v status : FAILED : Staging failed
on 0083ec0c-40bf-472a-a128-458924e56c96. Please check log file for details.
8. running `voulume start gfs-tst` command from node-3
output : [2019-01-15 16:53:19.410252]  : v start gfs-tst : FAILED : Volume
gfs-tst already started

9. running `gluster v status` in other node. showing all brick available
but 'self-heal daemon' not running
@gfstst-node2:~$ sudo gluster v status
Status of volume: gfs-tst
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick IP.2:/media/disk1/brick1  49152 0  Y   1517
Brick IP.4:/media/disk1/brick1  49152 0  Y   1668
Brick IP.2:/media/disk2/brick2  49153 0  Y   1522
Brick IP.4:/media/disk2/brick2  49153 0  Y   1678
Brick IP.2:/media/disk3/brick3  49154 0  Y   1527
Brick IP.4:/media/disk3/brick3  49154 0  Y   1677
Brick IP.2:/media/disk4/brick4  49155 0  Y   1541
Brick IP.4:/media/disk4/brick4  49155 0  Y   1683
Self-heal Daemon on localhost   N/A   N/AY
 2662
Self-heal Daemon on IP.4N/A   N/AY   2786

10. in the above output 'volume already started'. so, running `reset-brick`
command
   v reset-brick gfs-tst IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3
commit force

output : [2019-01-15 16:57:37.916942]  : v reset-brick gfs-tst
IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3 commit force : FAILED :
/media/disk3/brick3 is already part of a volume

11. reset-brick command was not working, so, tried stopping volume and
start with force command
output : [2019-01-15 17:01:04.570794]  : v start gfs-tst force : FAILED :
Pre-validation failed on localhost. Please check log file for details

12. now stopped service in all node and tried starting again. except node-3
other nodes service started successfully without any issues.

in node-3 receiving following message.

sudo service glusterd start
 * Starting glusterd service glusterd

  [fail]
/usr/local/sbin/glusterd: option requires an argument -- 'f'
Try `glusterd --help' or `glusterd --usage' for more information.

13. checking glusterd log file found that OS drive was running out of space
output : [2019-01-15 16:51:37.210792] W [MSGID: 101012]
[store.c:372:gf_store_save_value] 0-management: fflush failed. [No space
left on device]
 [2019-01-15 16:51:37.210874] E [MSGID: 106190]
[glusterd-store.c:1058:glusterd_volume_exclude_options_write] 0-management:
Unable to write volume values for gfs-tst

14. cleared some space in OS drive but still, service is not running. below
is the error logged in glusterd.log

[2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
[2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init]
0-management: Maximum allowed open file descriptors set to 65536
[2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init]
0-management: Using /var/lib/glusterd as working directory
[2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init]
0-management: Using /var/run/gluster as pid file working directory
[2019-01-15 17:50:13.964437] W [MSGID: 103071]
[rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
channel creation failed [No such device]
[2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init]
0-rdma.management: Failed to initialize IB Device
[2019-01-15 17:50:13.964491] W [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma' initialization failed
[2019-01-15 17:50:13.964560] W [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create listener, initing the transport failed
[2019-01-15 17:50:13.964579] E [MSGID: 106244] [glusterd.c:1764:init]
0-management: creation of 1 

Re: [Gluster-users] glusterfs 4.1.6 error in starting glusterd service

2019-01-30 Thread Atin Mukherjee
I'm not very sure how did you end up into a state where in one of the node
lost information of one peer from the cluster. I suspect doing a replace
node operation you somehow landed into this situation by an incorrect step.
Until and unless you could elaborate more on what all steps you have
performed in the cluster, it'd be difficult to figure out the exact cause.

On Wed, Jan 30, 2019 at 7:25 PM Amudhan P  wrote:

> Hi Atin,
>
> yes, it worked out thank you.
>
> what would be the cause of this issue?
>
>
>
> On Fri, Jan 25, 2019 at 1:56 PM Atin Mukherjee 
> wrote:
>
>> Amudhan,
>>
>> So here's the issue:
>>
>> In node3, 'cat /var/lib/glusterd/peers/* ' doesn't show up node2's
>> details and that's why glusterd wasn't able to resolve the brick(s) hosted
>> on node2.
>>
>> Can you please pick up 0083ec0c-40bf-472a-a128-458924e56c96 file from
>> /var/lib/glusterd/peers/ from node 4 and place it in the same location in
>> node 3 and then restart glusterd service on node 3?
>>
>>
>> On Thu, Jan 24, 2019 at 11:57 AM Amudhan P  wrote:
>>
>>> Atin,
>>>
>>> Sorry, i missed to send entire `glusterd` folder.  Now attached zip
>>> contains `glusterd` folder from all nodes.
>>>
>>> the problem node is node3 IP 10.1.2.3, `glusterd` log file is inside
>>> node3 folder.
>>>
>>> regards
>>> Amudhan
>>>
>>> On Wed, Jan 23, 2019 at 11:02 PM Atin Mukherjee 
>>> wrote:
>>>
 Amudhan,

 I see that you have provided the content of the configuration of the
 volume gfs-tst where the request was to share the dump of
 /var/lib/glusterd/* . I can not debug this further until you share the
 correct dump.

 On Thu, Jan 17, 2019 at 3:43 PM Atin Mukherjee 
 wrote:

> Can you please run 'glusterd -LDEBUG' and share back the glusterd.log?
> Instead of doing too many back and forth I suggest you to share the 
> content
> of /var/lib/glusterd from all the nodes. Also do mention which particular
> node the glusterd service is unable to come up.
>
> On Thu, Jan 17, 2019 at 11:34 AM Amudhan P 
> wrote:
>
>> I have created the folder in the path as said but still, service
>> failed to start below is the error msg in glusterd.log
>>
>> [2019-01-16 14:50:14.555742] I [MSGID: 100030]
>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running
>> /usr/local/sbin/glusterd version 4.1.6 (args: /usr/local/sbin/glusterd -p
>> /var/run/glusterd.pid)
>> [2019-01-16 14:50:14.559835] I [MSGID: 106478] [glusterd.c:1423:init]
>> 0-management: Maximum allowed open file descriptors set to 65536
>> [2019-01-16 14:50:14.559894] I [MSGID: 106479] [glusterd.c:1481:init]
>> 0-management: Using /var/lib/glusterd as working directory
>> [2019-01-16 14:50:14.559912] I [MSGID: 106479] [glusterd.c:1486:init]
>> 0-management: Using /var/run/gluster as pid file working directory
>> [2019-01-16 14:50:14.563834] W [MSGID: 103071]
>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
>> channel creation failed [No such device]
>> [2019-01-16 14:50:14.563867] W [MSGID: 103055] [rdma.c:4938:init]
>> 0-rdma.management: Failed to initialize IB Device
>> [2019-01-16 14:50:14.563882] W
>> [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma'
>> initialization failed
>> [2019-01-16 14:50:14.563957] W [rpcsvc.c:1781:rpcsvc_create_listener]
>> 0-rpc-service: cannot create listener, initing the transport failed
>> [2019-01-16 14:50:14.563974] E [MSGID: 106244] [glusterd.c:1764:init]
>> 0-management: creation of 1 listeners failed, continuing with succeeded
>> transport
>> [2019-01-16 14:50:15.565868] I [MSGID: 106513]
>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>> op-version: 40100
>> [2019-01-16 14:50:15.642532] I [MSGID: 106544]
>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>> [2019-01-16 14:50:15.675333] I [MSGID: 106498]
>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
>> connect returned 0
>> [2019-01-16 14:50:15.675421] W [MSGID: 106061]
>> [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 
>> 0-glusterd:
>> Failed to get tcp-user-timeout
>> [2019-01-16 14:50:15.675451] I
>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting
>> frame-timeout to 600
>> *[2019-01-16 14:50:15.676912] E [MSGID: 106187]
>> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
>> brick failed in restore*
>> *[2019-01-16 14:50:15.676956] E [MSGID: 101019]
>> [xlator.c:720:xlator_init] 0-management: Initialization of volume
>> 'management' failed, review your volfile again*
>> [2019-01-16 14:50:15.676973] E [MSGID: 101066]
>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator
>> failed
>> 

Re: [Gluster-users] glusterfs 4.1.6 error in starting glusterd service

2019-01-30 Thread Amudhan P
Hi Atin,

yes, it worked out thank you.

what would be the cause of this issue?



On Fri, Jan 25, 2019 at 1:56 PM Atin Mukherjee  wrote:

> Amudhan,
>
> So here's the issue:
>
> In node3, 'cat /var/lib/glusterd/peers/* ' doesn't show up node2's details
> and that's why glusterd wasn't able to resolve the brick(s) hosted on node2.
>
> Can you please pick up 0083ec0c-40bf-472a-a128-458924e56c96 file from
> /var/lib/glusterd/peers/ from node 4 and place it in the same location in
> node 3 and then restart glusterd service on node 3?
>
>
> On Thu, Jan 24, 2019 at 11:57 AM Amudhan P  wrote:
>
>> Atin,
>>
>> Sorry, i missed to send entire `glusterd` folder.  Now attached zip
>> contains `glusterd` folder from all nodes.
>>
>> the problem node is node3 IP 10.1.2.3, `glusterd` log file is inside
>> node3 folder.
>>
>> regards
>> Amudhan
>>
>> On Wed, Jan 23, 2019 at 11:02 PM Atin Mukherjee 
>> wrote:
>>
>>> Amudhan,
>>>
>>> I see that you have provided the content of the configuration of the
>>> volume gfs-tst where the request was to share the dump of
>>> /var/lib/glusterd/* . I can not debug this further until you share the
>>> correct dump.
>>>
>>> On Thu, Jan 17, 2019 at 3:43 PM Atin Mukherjee 
>>> wrote:
>>>
 Can you please run 'glusterd -LDEBUG' and share back the glusterd.log?
 Instead of doing too many back and forth I suggest you to share the content
 of /var/lib/glusterd from all the nodes. Also do mention which particular
 node the glusterd service is unable to come up.

 On Thu, Jan 17, 2019 at 11:34 AM Amudhan P  wrote:

> I have created the folder in the path as said but still, service
> failed to start below is the error msg in glusterd.log
>
> [2019-01-16 14:50:14.555742] I [MSGID: 100030]
> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running
> /usr/local/sbin/glusterd version 4.1.6 (args: /usr/local/sbin/glusterd -p
> /var/run/glusterd.pid)
> [2019-01-16 14:50:14.559835] I [MSGID: 106478] [glusterd.c:1423:init]
> 0-management: Maximum allowed open file descriptors set to 65536
> [2019-01-16 14:50:14.559894] I [MSGID: 106479] [glusterd.c:1481:init]
> 0-management: Using /var/lib/glusterd as working directory
> [2019-01-16 14:50:14.559912] I [MSGID: 106479] [glusterd.c:1486:init]
> 0-management: Using /var/run/gluster as pid file working directory
> [2019-01-16 14:50:14.563834] W [MSGID: 103071]
> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
> channel creation failed [No such device]
> [2019-01-16 14:50:14.563867] W [MSGID: 103055] [rdma.c:4938:init]
> 0-rdma.management: Failed to initialize IB Device
> [2019-01-16 14:50:14.563882] W
> [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma'
> initialization failed
> [2019-01-16 14:50:14.563957] W [rpcsvc.c:1781:rpcsvc_create_listener]
> 0-rpc-service: cannot create listener, initing the transport failed
> [2019-01-16 14:50:14.563974] E [MSGID: 106244] [glusterd.c:1764:init]
> 0-management: creation of 1 listeners failed, continuing with succeeded
> transport
> [2019-01-16 14:50:15.565868] I [MSGID: 106513]
> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
> op-version: 40100
> [2019-01-16 14:50:15.642532] I [MSGID: 106544]
> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
> d6bf51a7-c296-492f-8dac-e81efa9dd22d
> [2019-01-16 14:50:15.675333] I [MSGID: 106498]
> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
> connect returned 0
> [2019-01-16 14:50:15.675421] W [MSGID: 106061]
> [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 
> 0-glusterd:
> Failed to get tcp-user-timeout
> [2019-01-16 14:50:15.675451] I
> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting
> frame-timeout to 600
> *[2019-01-16 14:50:15.676912] E [MSGID: 106187]
> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
> brick failed in restore*
> *[2019-01-16 14:50:15.676956] E [MSGID: 101019]
> [xlator.c:720:xlator_init] 0-management: Initialization of volume
> 'management' failed, review your volfile again*
> [2019-01-16 14:50:15.676973] E [MSGID: 101066]
> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator
> failed
> [2019-01-16 14:50:15.676986] E [MSGID: 101176]
> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
> [2019-01-16 14:50:15.677479] W [glusterfsd.c:1514:cleanup_and_exit]
> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
> received signum (-1), shutting down
>
>
> On Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee 
> wrote:
>
>> If 

Re: [Gluster-users] glusterfs 4.1.6 error in starting glusterd service

2019-01-25 Thread Atin Mukherjee
Amudhan,

So here's the issue:

In node3, 'cat /var/lib/glusterd/peers/* ' doesn't show up node2's details
and that's why glusterd wasn't able to resolve the brick(s) hosted on node2.

Can you please pick up 0083ec0c-40bf-472a-a128-458924e56c96 file from
/var/lib/glusterd/peers/ from node 4 and place it in the same location in
node 3 and then restart glusterd service on node 3?


On Thu, Jan 24, 2019 at 11:57 AM Amudhan P  wrote:

> Atin,
>
> Sorry, i missed to send entire `glusterd` folder.  Now attached zip
> contains `glusterd` folder from all nodes.
>
> the problem node is node3 IP 10.1.2.3, `glusterd` log file is inside node3
> folder.
>
> regards
> Amudhan
>
> On Wed, Jan 23, 2019 at 11:02 PM Atin Mukherjee 
> wrote:
>
>> Amudhan,
>>
>> I see that you have provided the content of the configuration of the
>> volume gfs-tst where the request was to share the dump of
>> /var/lib/glusterd/* . I can not debug this further until you share the
>> correct dump.
>>
>> On Thu, Jan 17, 2019 at 3:43 PM Atin Mukherjee 
>> wrote:
>>
>>> Can you please run 'glusterd -LDEBUG' and share back the glusterd.log?
>>> Instead of doing too many back and forth I suggest you to share the content
>>> of /var/lib/glusterd from all the nodes. Also do mention which particular
>>> node the glusterd service is unable to come up.
>>>
>>> On Thu, Jan 17, 2019 at 11:34 AM Amudhan P  wrote:
>>>
 I have created the folder in the path as said but still, service failed
 to start below is the error msg in glusterd.log

 [2019-01-16 14:50:14.555742] I [MSGID: 100030] [glusterfsd.c:2741:main]
 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
 version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
 [2019-01-16 14:50:14.559835] I [MSGID: 106478] [glusterd.c:1423:init]
 0-management: Maximum allowed open file descriptors set to 65536
 [2019-01-16 14:50:14.559894] I [MSGID: 106479] [glusterd.c:1481:init]
 0-management: Using /var/lib/glusterd as working directory
 [2019-01-16 14:50:14.559912] I [MSGID: 106479] [glusterd.c:1486:init]
 0-management: Using /var/run/gluster as pid file working directory
 [2019-01-16 14:50:14.563834] W [MSGID: 103071]
 [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
 channel creation failed [No such device]
 [2019-01-16 14:50:14.563867] W [MSGID: 103055] [rdma.c:4938:init]
 0-rdma.management: Failed to initialize IB Device
 [2019-01-16 14:50:14.563882] W [rpc-transport.c:351:rpc_transport_load]
 0-rpc-transport: 'rdma' initialization failed
 [2019-01-16 14:50:14.563957] W [rpcsvc.c:1781:rpcsvc_create_listener]
 0-rpc-service: cannot create listener, initing the transport failed
 [2019-01-16 14:50:14.563974] E [MSGID: 106244] [glusterd.c:1764:init]
 0-management: creation of 1 listeners failed, continuing with succeeded
 transport
 [2019-01-16 14:50:15.565868] I [MSGID: 106513]
 [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
 op-version: 40100
 [2019-01-16 14:50:15.642532] I [MSGID: 106544]
 [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
 d6bf51a7-c296-492f-8dac-e81efa9dd22d
 [2019-01-16 14:50:15.675333] I [MSGID: 106498]
 [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
 connect returned 0
 [2019-01-16 14:50:15.675421] W [MSGID: 106061]
 [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
 Failed to get tcp-user-timeout
 [2019-01-16 14:50:15.675451] I
 [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting
 frame-timeout to 600
 *[2019-01-16 14:50:15.676912] E [MSGID: 106187]
 [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
 brick failed in restore*
 *[2019-01-16 14:50:15.676956] E [MSGID: 101019]
 [xlator.c:720:xlator_init] 0-management: Initialization of volume
 'management' failed, review your volfile again*
 [2019-01-16 14:50:15.676973] E [MSGID: 101066]
 [graph.c:367:glusterfs_graph_init] 0-management: initializing translator
 failed
 [2019-01-16 14:50:15.676986] E [MSGID: 101176]
 [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
 [2019-01-16 14:50:15.677479] W [glusterfsd.c:1514:cleanup_and_exit]
 (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
 -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
 -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
 received signum (-1), shutting down


 On Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee 
 wrote:

> If gluster volume info/status shows the brick to be
> /media/disk4/brick4 then you'd need to mount the same path and hence you'd
> need to create the brick4 directory explicitly. I fail to understand the
> rationale how only /media/disk4 can be used as the mount path 

Re: [Gluster-users] glusterfs 4.1.6 error in starting glusterd service

2019-01-23 Thread Atin Mukherjee
Amudhan,

I see that you have provided the content of the configuration of the volume
gfs-tst where the request was to share the dump of /var/lib/glusterd/* . I
can not debug this further until you share the correct dump.

On Thu, Jan 17, 2019 at 3:43 PM Atin Mukherjee  wrote:

> Can you please run 'glusterd -LDEBUG' and share back the glusterd.log?
> Instead of doing too many back and forth I suggest you to share the content
> of /var/lib/glusterd from all the nodes. Also do mention which particular
> node the glusterd service is unable to come up.
>
> On Thu, Jan 17, 2019 at 11:34 AM Amudhan P  wrote:
>
>> I have created the folder in the path as said but still, service failed
>> to start below is the error msg in glusterd.log
>>
>> [2019-01-16 14:50:14.555742] I [MSGID: 100030] [glusterfsd.c:2741:main]
>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
>> [2019-01-16 14:50:14.559835] I [MSGID: 106478] [glusterd.c:1423:init]
>> 0-management: Maximum allowed open file descriptors set to 65536
>> [2019-01-16 14:50:14.559894] I [MSGID: 106479] [glusterd.c:1481:init]
>> 0-management: Using /var/lib/glusterd as working directory
>> [2019-01-16 14:50:14.559912] I [MSGID: 106479] [glusterd.c:1486:init]
>> 0-management: Using /var/run/gluster as pid file working directory
>> [2019-01-16 14:50:14.563834] W [MSGID: 103071]
>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
>> channel creation failed [No such device]
>> [2019-01-16 14:50:14.563867] W [MSGID: 103055] [rdma.c:4938:init]
>> 0-rdma.management: Failed to initialize IB Device
>> [2019-01-16 14:50:14.563882] W [rpc-transport.c:351:rpc_transport_load]
>> 0-rpc-transport: 'rdma' initialization failed
>> [2019-01-16 14:50:14.563957] W [rpcsvc.c:1781:rpcsvc_create_listener]
>> 0-rpc-service: cannot create listener, initing the transport failed
>> [2019-01-16 14:50:14.563974] E [MSGID: 106244] [glusterd.c:1764:init]
>> 0-management: creation of 1 listeners failed, continuing with succeeded
>> transport
>> [2019-01-16 14:50:15.565868] I [MSGID: 106513]
>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>> op-version: 40100
>> [2019-01-16 14:50:15.642532] I [MSGID: 106544]
>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>> [2019-01-16 14:50:15.675333] I [MSGID: 106498]
>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
>> connect returned 0
>> [2019-01-16 14:50:15.675421] W [MSGID: 106061]
>> [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
>> Failed to get tcp-user-timeout
>> [2019-01-16 14:50:15.675451] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
>> 0-management: setting frame-timeout to 600
>> *[2019-01-16 14:50:15.676912] E [MSGID: 106187]
>> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
>> brick failed in restore*
>> *[2019-01-16 14:50:15.676956] E [MSGID: 101019]
>> [xlator.c:720:xlator_init] 0-management: Initialization of volume
>> 'management' failed, review your volfile again*
>> [2019-01-16 14:50:15.676973] E [MSGID: 101066]
>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator
>> failed
>> [2019-01-16 14:50:15.676986] E [MSGID: 101176]
>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>> [2019-01-16 14:50:15.677479] W [glusterfsd.c:1514:cleanup_and_exit]
>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>> received signum (-1), shutting down
>>
>>
>> On Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee 
>> wrote:
>>
>>> If gluster volume info/status shows the brick to be /media/disk4/brick4
>>> then you'd need to mount the same path and hence you'd need to create the
>>> brick4 directory explicitly. I fail to understand the rationale how only
>>> /media/disk4 can be used as the mount path for the brick.
>>>
>>> On Wed, Jan 16, 2019 at 5:24 PM Amudhan P  wrote:
>>>
 Yes, I did mount bricks but the folder 'brick4' was still not created
 inside the brick.
 Do I need to create this folder because when I run replace-brick it
 will create folder inside the brick. I have seen this behavior before when
 running replace-brick or heal begins.

 On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee 
 wrote:

>
>
> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P  wrote:
>
>> Atin,
>> I have copied the content of 'gfs-tst' from vol folder in another
>> node. when starting service again fails with error msg in glusterd.log 
>> file.
>>
>> [2019-01-15 20:16:59.513023] I [MSGID: 100030]
>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running
>> /usr/local/sbin/glusterd version 4.1.6 (args: 

Re: [Gluster-users] glusterfs 4.1.6 error in starting glusterd service

2019-01-19 Thread Amudhan P
Ok, no problem.

On Sat 19 Jan, 2019, 7:55 AM Atin Mukherjee  I have received but haven’t got a chance to look at them. I can only come
> back on this sometime early next week based on my schedule.
>
> On Fri, 18 Jan 2019 at 16:52, Amudhan P  wrote:
>
>> Hi Atin,
>>
>> I have sent files to your email directly in other mail. hope you have
>> received.
>>
>> regards
>> Amudhan
>>
>> On Thu, Jan 17, 2019 at 3:43 PM Atin Mukherjee 
>> wrote:
>>
>>> Can you please run 'glusterd -LDEBUG' and share back the glusterd.log?
>>> Instead of doing too many back and forth I suggest you to share the content
>>> of /var/lib/glusterd from all the nodes. Also do mention which particular
>>> node the glusterd service is unable to come up.
>>>
>>> On Thu, Jan 17, 2019 at 11:34 AM Amudhan P  wrote:
>>>
 I have created the folder in the path as said but still, service failed
 to start below is the error msg in glusterd.log

 [2019-01-16 14:50:14.555742] I [MSGID: 100030] [glusterfsd.c:2741:main]
 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
 version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
 [2019-01-16 14:50:14.559835] I [MSGID: 106478] [glusterd.c:1423:init]
 0-management: Maximum allowed open file descriptors set to 65536
 [2019-01-16 14:50:14.559894] I [MSGID: 106479] [glusterd.c:1481:init]
 0-management: Using /var/lib/glusterd as working directory
 [2019-01-16 14:50:14.559912] I [MSGID: 106479] [glusterd.c:1486:init]
 0-management: Using /var/run/gluster as pid file working directory
 [2019-01-16 14:50:14.563834] W [MSGID: 103071]
 [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
 channel creation failed [No such device]
 [2019-01-16 14:50:14.563867] W [MSGID: 103055] [rdma.c:4938:init]
 0-rdma.management: Failed to initialize IB Device
 [2019-01-16 14:50:14.563882] W [rpc-transport.c:351:rpc_transport_load]
 0-rpc-transport: 'rdma' initialization failed
 [2019-01-16 14:50:14.563957] W [rpcsvc.c:1781:rpcsvc_create_listener]
 0-rpc-service: cannot create listener, initing the transport failed
 [2019-01-16 14:50:14.563974] E [MSGID: 106244] [glusterd.c:1764:init]
 0-management: creation of 1 listeners failed, continuing with succeeded
 transport
 [2019-01-16 14:50:15.565868] I [MSGID: 106513]
 [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
 op-version: 40100
 [2019-01-16 14:50:15.642532] I [MSGID: 106544]
 [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
 d6bf51a7-c296-492f-8dac-e81efa9dd22d
 [2019-01-16 14:50:15.675333] I [MSGID: 106498]
 [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
 connect returned 0
 [2019-01-16 14:50:15.675421] W [MSGID: 106061]
 [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
 Failed to get tcp-user-timeout
 [2019-01-16 14:50:15.675451] I
 [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting
 frame-timeout to 600
 *[2019-01-16 14:50:15.676912] E [MSGID: 106187]
 [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
 brick failed in restore*
 *[2019-01-16 14:50:15.676956] E [MSGID: 101019]
 [xlator.c:720:xlator_init] 0-management: Initialization of volume
 'management' failed, review your volfile again*
 [2019-01-16 14:50:15.676973] E [MSGID: 101066]
 [graph.c:367:glusterfs_graph_init] 0-management: initializing translator
 failed
 [2019-01-16 14:50:15.676986] E [MSGID: 101176]
 [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
 [2019-01-16 14:50:15.677479] W [glusterfsd.c:1514:cleanup_and_exit]
 (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
 -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
 -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
 received signum (-1), shutting down


 On Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee 
 wrote:

> If gluster volume info/status shows the brick to be
> /media/disk4/brick4 then you'd need to mount the same path and hence you'd
> need to create the brick4 directory explicitly. I fail to understand the
> rationale how only /media/disk4 can be used as the mount path for the
> brick.
>
> On Wed, Jan 16, 2019 at 5:24 PM Amudhan P  wrote:
>
>> Yes, I did mount bricks but the folder 'brick4' was still not created
>> inside the brick.
>> Do I need to create this folder because when I run replace-brick it
>> will create folder inside the brick. I have seen this behavior before 
>> when
>> running replace-brick or heal begins.
>>
>> On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee 
>> wrote:
>>
>>>
>>>
>>> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P 
>>> wrote:
>>>
 Atin,

Re: [Gluster-users] glusterfs 4.1.6 error in starting glusterd service

2019-01-18 Thread Atin Mukherjee
I have received but haven’t got a chance to look at them. I can only come
back on this sometime early next week based on my schedule.

On Fri, 18 Jan 2019 at 16:52, Amudhan P  wrote:

> Hi Atin,
>
> I have sent files to your email directly in other mail. hope you have
> received.
>
> regards
> Amudhan
>
> On Thu, Jan 17, 2019 at 3:43 PM Atin Mukherjee 
> wrote:
>
>> Can you please run 'glusterd -LDEBUG' and share back the glusterd.log?
>> Instead of doing too many back and forth I suggest you to share the content
>> of /var/lib/glusterd from all the nodes. Also do mention which particular
>> node the glusterd service is unable to come up.
>>
>> On Thu, Jan 17, 2019 at 11:34 AM Amudhan P  wrote:
>>
>>> I have created the folder in the path as said but still, service failed
>>> to start below is the error msg in glusterd.log
>>>
>>> [2019-01-16 14:50:14.555742] I [MSGID: 100030] [glusterfsd.c:2741:main]
>>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
>>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
>>> [2019-01-16 14:50:14.559835] I [MSGID: 106478] [glusterd.c:1423:init]
>>> 0-management: Maximum allowed open file descriptors set to 65536
>>> [2019-01-16 14:50:14.559894] I [MSGID: 106479] [glusterd.c:1481:init]
>>> 0-management: Using /var/lib/glusterd as working directory
>>> [2019-01-16 14:50:14.559912] I [MSGID: 106479] [glusterd.c:1486:init]
>>> 0-management: Using /var/run/gluster as pid file working directory
>>> [2019-01-16 14:50:14.563834] W [MSGID: 103071]
>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
>>> channel creation failed [No such device]
>>> [2019-01-16 14:50:14.563867] W [MSGID: 103055] [rdma.c:4938:init]
>>> 0-rdma.management: Failed to initialize IB Device
>>> [2019-01-16 14:50:14.563882] W [rpc-transport.c:351:rpc_transport_load]
>>> 0-rpc-transport: 'rdma' initialization failed
>>> [2019-01-16 14:50:14.563957] W [rpcsvc.c:1781:rpcsvc_create_listener]
>>> 0-rpc-service: cannot create listener, initing the transport failed
>>> [2019-01-16 14:50:14.563974] E [MSGID: 106244] [glusterd.c:1764:init]
>>> 0-management: creation of 1 listeners failed, continuing with succeeded
>>> transport
>>> [2019-01-16 14:50:15.565868] I [MSGID: 106513]
>>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>> op-version: 40100
>>> [2019-01-16 14:50:15.642532] I [MSGID: 106544]
>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>> [2019-01-16 14:50:15.675333] I [MSGID: 106498]
>>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
>>> connect returned 0
>>> [2019-01-16 14:50:15.675421] W [MSGID: 106061]
>>> [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
>>> Failed to get tcp-user-timeout
>>> [2019-01-16 14:50:15.675451] I
>>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting
>>> frame-timeout to 600
>>> *[2019-01-16 14:50:15.676912] E [MSGID: 106187]
>>> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
>>> brick failed in restore*
>>> *[2019-01-16 14:50:15.676956] E [MSGID: 101019]
>>> [xlator.c:720:xlator_init] 0-management: Initialization of volume
>>> 'management' failed, review your volfile again*
>>> [2019-01-16 14:50:15.676973] E [MSGID: 101066]
>>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator
>>> failed
>>> [2019-01-16 14:50:15.676986] E [MSGID: 101176]
>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>>> [2019-01-16 14:50:15.677479] W [glusterfsd.c:1514:cleanup_and_exit]
>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>>> received signum (-1), shutting down
>>>
>>>
>>> On Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee 
>>> wrote:
>>>
 If gluster volume info/status shows the brick to be /media/disk4/brick4
 then you'd need to mount the same path and hence you'd need to create the
 brick4 directory explicitly. I fail to understand the rationale how only
 /media/disk4 can be used as the mount path for the brick.

 On Wed, Jan 16, 2019 at 5:24 PM Amudhan P  wrote:

> Yes, I did mount bricks but the folder 'brick4' was still not created
> inside the brick.
> Do I need to create this folder because when I run replace-brick it
> will create folder inside the brick. I have seen this behavior before when
> running replace-brick or heal begins.
>
> On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee 
> wrote:
>
>>
>>
>> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P 
>> wrote:
>>
>>> Atin,
>>> I have copied the content of 'gfs-tst' from vol folder in another
>>> node. when starting service again fails with error msg in glusterd.log 
>>> file.
>>>
>>> 

Re: [Gluster-users] glusterfs 4.1.6 error in starting glusterd service

2019-01-18 Thread Amudhan P
Hi Atin,

I have sent files to your email directly in other mail. hope you have
received.

regards
Amudhan

On Thu, Jan 17, 2019 at 3:43 PM Atin Mukherjee  wrote:

> Can you please run 'glusterd -LDEBUG' and share back the glusterd.log?
> Instead of doing too many back and forth I suggest you to share the content
> of /var/lib/glusterd from all the nodes. Also do mention which particular
> node the glusterd service is unable to come up.
>
> On Thu, Jan 17, 2019 at 11:34 AM Amudhan P  wrote:
>
>> I have created the folder in the path as said but still, service failed
>> to start below is the error msg in glusterd.log
>>
>> [2019-01-16 14:50:14.555742] I [MSGID: 100030] [glusterfsd.c:2741:main]
>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
>> [2019-01-16 14:50:14.559835] I [MSGID: 106478] [glusterd.c:1423:init]
>> 0-management: Maximum allowed open file descriptors set to 65536
>> [2019-01-16 14:50:14.559894] I [MSGID: 106479] [glusterd.c:1481:init]
>> 0-management: Using /var/lib/glusterd as working directory
>> [2019-01-16 14:50:14.559912] I [MSGID: 106479] [glusterd.c:1486:init]
>> 0-management: Using /var/run/gluster as pid file working directory
>> [2019-01-16 14:50:14.563834] W [MSGID: 103071]
>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
>> channel creation failed [No such device]
>> [2019-01-16 14:50:14.563867] W [MSGID: 103055] [rdma.c:4938:init]
>> 0-rdma.management: Failed to initialize IB Device
>> [2019-01-16 14:50:14.563882] W [rpc-transport.c:351:rpc_transport_load]
>> 0-rpc-transport: 'rdma' initialization failed
>> [2019-01-16 14:50:14.563957] W [rpcsvc.c:1781:rpcsvc_create_listener]
>> 0-rpc-service: cannot create listener, initing the transport failed
>> [2019-01-16 14:50:14.563974] E [MSGID: 106244] [glusterd.c:1764:init]
>> 0-management: creation of 1 listeners failed, continuing with succeeded
>> transport
>> [2019-01-16 14:50:15.565868] I [MSGID: 106513]
>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>> op-version: 40100
>> [2019-01-16 14:50:15.642532] I [MSGID: 106544]
>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>> [2019-01-16 14:50:15.675333] I [MSGID: 106498]
>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
>> connect returned 0
>> [2019-01-16 14:50:15.675421] W [MSGID: 106061]
>> [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
>> Failed to get tcp-user-timeout
>> [2019-01-16 14:50:15.675451] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
>> 0-management: setting frame-timeout to 600
>> *[2019-01-16 14:50:15.676912] E [MSGID: 106187]
>> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
>> brick failed in restore*
>> *[2019-01-16 14:50:15.676956] E [MSGID: 101019]
>> [xlator.c:720:xlator_init] 0-management: Initialization of volume
>> 'management' failed, review your volfile again*
>> [2019-01-16 14:50:15.676973] E [MSGID: 101066]
>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator
>> failed
>> [2019-01-16 14:50:15.676986] E [MSGID: 101176]
>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>> [2019-01-16 14:50:15.677479] W [glusterfsd.c:1514:cleanup_and_exit]
>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>> received signum (-1), shutting down
>>
>>
>> On Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee 
>> wrote:
>>
>>> If gluster volume info/status shows the brick to be /media/disk4/brick4
>>> then you'd need to mount the same path and hence you'd need to create the
>>> brick4 directory explicitly. I fail to understand the rationale how only
>>> /media/disk4 can be used as the mount path for the brick.
>>>
>>> On Wed, Jan 16, 2019 at 5:24 PM Amudhan P  wrote:
>>>
 Yes, I did mount bricks but the folder 'brick4' was still not created
 inside the brick.
 Do I need to create this folder because when I run replace-brick it
 will create folder inside the brick. I have seen this behavior before when
 running replace-brick or heal begins.

 On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee 
 wrote:

>
>
> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P  wrote:
>
>> Atin,
>> I have copied the content of 'gfs-tst' from vol folder in another
>> node. when starting service again fails with error msg in glusterd.log 
>> file.
>>
>> [2019-01-15 20:16:59.513023] I [MSGID: 100030]
>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running
>> /usr/local/sbin/glusterd version 4.1.6 (args: /usr/local/sbin/glusterd -p
>> /var/run/glusterd.pid)
>> [2019-01-15 20:16:59.517164] I [MSGID: 106478] 

Re: [Gluster-users] glusterfs 4.1.6 error in starting glusterd service

2019-01-17 Thread Atin Mukherjee
Can you please run 'glusterd -LDEBUG' and share back the glusterd.log?
Instead of doing too many back and forth I suggest you to share the content
of /var/lib/glusterd from all the nodes. Also do mention which particular
node the glusterd service is unable to come up.

On Thu, Jan 17, 2019 at 11:34 AM Amudhan P  wrote:

> I have created the folder in the path as said but still, service failed to
> start below is the error msg in glusterd.log
>
> [2019-01-16 14:50:14.555742] I [MSGID: 100030] [glusterfsd.c:2741:main]
> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
> [2019-01-16 14:50:14.559835] I [MSGID: 106478] [glusterd.c:1423:init]
> 0-management: Maximum allowed open file descriptors set to 65536
> [2019-01-16 14:50:14.559894] I [MSGID: 106479] [glusterd.c:1481:init]
> 0-management: Using /var/lib/glusterd as working directory
> [2019-01-16 14:50:14.559912] I [MSGID: 106479] [glusterd.c:1486:init]
> 0-management: Using /var/run/gluster as pid file working directory
> [2019-01-16 14:50:14.563834] W [MSGID: 103071]
> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
> channel creation failed [No such device]
> [2019-01-16 14:50:14.563867] W [MSGID: 103055] [rdma.c:4938:init]
> 0-rdma.management: Failed to initialize IB Device
> [2019-01-16 14:50:14.563882] W [rpc-transport.c:351:rpc_transport_load]
> 0-rpc-transport: 'rdma' initialization failed
> [2019-01-16 14:50:14.563957] W [rpcsvc.c:1781:rpcsvc_create_listener]
> 0-rpc-service: cannot create listener, initing the transport failed
> [2019-01-16 14:50:14.563974] E [MSGID: 106244] [glusterd.c:1764:init]
> 0-management: creation of 1 listeners failed, continuing with succeeded
> transport
> [2019-01-16 14:50:15.565868] I [MSGID: 106513]
> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
> op-version: 40100
> [2019-01-16 14:50:15.642532] I [MSGID: 106544]
> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
> d6bf51a7-c296-492f-8dac-e81efa9dd22d
> [2019-01-16 14:50:15.675333] I [MSGID: 106498]
> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
> connect returned 0
> [2019-01-16 14:50:15.675421] W [MSGID: 106061]
> [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
> Failed to get tcp-user-timeout
> [2019-01-16 14:50:15.675451] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> *[2019-01-16 14:50:15.676912] E [MSGID: 106187]
> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
> brick failed in restore*
> *[2019-01-16 14:50:15.676956] E [MSGID: 101019] [xlator.c:720:xlator_init]
> 0-management: Initialization of volume 'management' failed, review your
> volfile again*
> [2019-01-16 14:50:15.676973] E [MSGID: 101066]
> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator
> failed
> [2019-01-16 14:50:15.676986] E [MSGID: 101176]
> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
> [2019-01-16 14:50:15.677479] W [glusterfsd.c:1514:cleanup_and_exit]
> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
> received signum (-1), shutting down
>
>
> On Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee 
> wrote:
>
>> If gluster volume info/status shows the brick to be /media/disk4/brick4
>> then you'd need to mount the same path and hence you'd need to create the
>> brick4 directory explicitly. I fail to understand the rationale how only
>> /media/disk4 can be used as the mount path for the brick.
>>
>> On Wed, Jan 16, 2019 at 5:24 PM Amudhan P  wrote:
>>
>>> Yes, I did mount bricks but the folder 'brick4' was still not created
>>> inside the brick.
>>> Do I need to create this folder because when I run replace-brick it will
>>> create folder inside the brick. I have seen this behavior before when
>>> running replace-brick or heal begins.
>>>
>>> On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee 
>>> wrote:
>>>


 On Wed, Jan 16, 2019 at 5:02 PM Amudhan P  wrote:

> Atin,
> I have copied the content of 'gfs-tst' from vol folder in another
> node. when starting service again fails with error msg in glusterd.log 
> file.
>
> [2019-01-15 20:16:59.513023] I [MSGID: 100030]
> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running
> /usr/local/sbin/glusterd version 4.1.6 (args: /usr/local/sbin/glusterd -p
> /var/run/glusterd.pid)
> [2019-01-15 20:16:59.517164] I [MSGID: 106478] [glusterd.c:1423:init]
> 0-management: Maximum allowed open file descriptors set to 65536
> [2019-01-15 20:16:59.517264] I [MSGID: 106479] [glusterd.c:1481:init]
> 0-management: Using /var/lib/glusterd as working directory
> [2019-01-15 20:16:59.517283] I [MSGID: 

Re: [Gluster-users] glusterfs 4.1.6 error in starting glusterd service

2019-01-16 Thread Amudhan P
I have created the folder in the path as said but still, service failed to
start below is the error msg in glusterd.log

[2019-01-16 14:50:14.555742] I [MSGID: 100030] [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
[2019-01-16 14:50:14.559835] I [MSGID: 106478] [glusterd.c:1423:init]
0-management: Maximum allowed open file descriptors set to 65536
[2019-01-16 14:50:14.559894] I [MSGID: 106479] [glusterd.c:1481:init]
0-management: Using /var/lib/glusterd as working directory
[2019-01-16 14:50:14.559912] I [MSGID: 106479] [glusterd.c:1486:init]
0-management: Using /var/run/gluster as pid file working directory
[2019-01-16 14:50:14.563834] W [MSGID: 103071]
[rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
channel creation failed [No such device]
[2019-01-16 14:50:14.563867] W [MSGID: 103055] [rdma.c:4938:init]
0-rdma.management: Failed to initialize IB Device
[2019-01-16 14:50:14.563882] W [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma' initialization failed
[2019-01-16 14:50:14.563957] W [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create listener, initing the transport failed
[2019-01-16 14:50:14.563974] E [MSGID: 106244] [glusterd.c:1764:init]
0-management: creation of 1 listeners failed, continuing with succeeded
transport
[2019-01-16 14:50:15.565868] I [MSGID: 106513]
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
op-version: 40100
[2019-01-16 14:50:15.642532] I [MSGID: 106544]
[glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
d6bf51a7-c296-492f-8dac-e81efa9dd22d
[2019-01-16 14:50:15.675333] I [MSGID: 106498]
[glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
connect returned 0
[2019-01-16 14:50:15.675421] W [MSGID: 106061]
[glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
Failed to get tcp-user-timeout
[2019-01-16 14:50:15.675451] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
*[2019-01-16 14:50:15.676912] E [MSGID: 106187]
[glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
brick failed in restore*
*[2019-01-16 14:50:15.676956] E [MSGID: 101019] [xlator.c:720:xlator_init]
0-management: Initialization of volume 'management' failed, review your
volfile again*
[2019-01-16 14:50:15.676973] E [MSGID: 101066]
[graph.c:367:glusterfs_graph_init] 0-management: initializing translator
failed
[2019-01-16 14:50:15.676986] E [MSGID: 101176]
[graph.c:738:glusterfs_graph_activate] 0-graph: init failed
[2019-01-16 14:50:15.677479] W [glusterfsd.c:1514:cleanup_and_exit]
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
-->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
received signum (-1), shutting down


On Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee  wrote:

> If gluster volume info/status shows the brick to be /media/disk4/brick4
> then you'd need to mount the same path and hence you'd need to create the
> brick4 directory explicitly. I fail to understand the rationale how only
> /media/disk4 can be used as the mount path for the brick.
>
> On Wed, Jan 16, 2019 at 5:24 PM Amudhan P  wrote:
>
>> Yes, I did mount bricks but the folder 'brick4' was still not created
>> inside the brick.
>> Do I need to create this folder because when I run replace-brick it will
>> create folder inside the brick. I have seen this behavior before when
>> running replace-brick or heal begins.
>>
>> On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee 
>> wrote:
>>
>>>
>>>
>>> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P  wrote:
>>>
 Atin,
 I have copied the content of 'gfs-tst' from vol folder in another node.
 when starting service again fails with error msg in glusterd.log file.

 [2019-01-15 20:16:59.513023] I [MSGID: 100030] [glusterfsd.c:2741:main]
 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
 version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
 [2019-01-15 20:16:59.517164] I [MSGID: 106478] [glusterd.c:1423:init]
 0-management: Maximum allowed open file descriptors set to 65536
 [2019-01-15 20:16:59.517264] I [MSGID: 106479] [glusterd.c:1481:init]
 0-management: Using /var/lib/glusterd as working directory
 [2019-01-15 20:16:59.517283] I [MSGID: 106479] [glusterd.c:1486:init]
 0-management: Using /var/run/gluster as pid file working directory
 [2019-01-15 20:16:59.521508] W [MSGID: 103071]
 [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
 channel creation failed [No such device]
 [2019-01-15 20:16:59.521544] W [MSGID: 103055] [rdma.c:4938:init]
 0-rdma.management: Failed to initialize IB Device
 [2019-01-15 20:16:59.521562] W [rpc-transport.c:351:rpc_transport_load]
 

Re: [Gluster-users] glusterfs 4.1.6 error in starting glusterd service

2019-01-16 Thread Atin Mukherjee
If gluster volume info/status shows the brick to be /media/disk4/brick4
then you'd need to mount the same path and hence you'd need to create the
brick4 directory explicitly. I fail to understand the rationale how only
/media/disk4 can be used as the mount path for the brick.

On Wed, Jan 16, 2019 at 5:24 PM Amudhan P  wrote:

> Yes, I did mount bricks but the folder 'brick4' was still not created
> inside the brick.
> Do I need to create this folder because when I run replace-brick it will
> create folder inside the brick. I have seen this behavior before when
> running replace-brick or heal begins.
>
> On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee 
> wrote:
>
>>
>>
>> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P  wrote:
>>
>>> Atin,
>>> I have copied the content of 'gfs-tst' from vol folder in another node.
>>> when starting service again fails with error msg in glusterd.log file.
>>>
>>> [2019-01-15 20:16:59.513023] I [MSGID: 100030] [glusterfsd.c:2741:main]
>>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
>>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
>>> [2019-01-15 20:16:59.517164] I [MSGID: 106478] [glusterd.c:1423:init]
>>> 0-management: Maximum allowed open file descriptors set to 65536
>>> [2019-01-15 20:16:59.517264] I [MSGID: 106479] [glusterd.c:1481:init]
>>> 0-management: Using /var/lib/glusterd as working directory
>>> [2019-01-15 20:16:59.517283] I [MSGID: 106479] [glusterd.c:1486:init]
>>> 0-management: Using /var/run/gluster as pid file working directory
>>> [2019-01-15 20:16:59.521508] W [MSGID: 103071]
>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
>>> channel creation failed [No such device]
>>> [2019-01-15 20:16:59.521544] W [MSGID: 103055] [rdma.c:4938:init]
>>> 0-rdma.management: Failed to initialize IB Device
>>> [2019-01-15 20:16:59.521562] W [rpc-transport.c:351:rpc_transport_load]
>>> 0-rpc-transport: 'rdma' initialization failed
>>> [2019-01-15 20:16:59.521629] W [rpcsvc.c:1781:rpcsvc_create_listener]
>>> 0-rpc-service: cannot create listener, initing the transport failed
>>> [2019-01-15 20:16:59.521648] E [MSGID: 106244] [glusterd.c:1764:init]
>>> 0-management: creation of 1 listeners failed, continuing with succeeded
>>> transport
>>> [2019-01-15 20:17:00.529390] I [MSGID: 106513]
>>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>> op-version: 40100
>>> [2019-01-15 20:17:00.608354] I [MSGID: 106544]
>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>> [2019-01-15 20:17:00.650911] W [MSGID: 106425]
>>> [glusterd-store.c:2643:glusterd_store_retrieve_bricks] 0-management: failed
>>> to get statfs() call on brick /media/disk4/brick4 [No such file or
>>> directory]
>>>
>>
>> This means that underlying brick /media/disk4/brick4 doesn't exist. You
>> already mentioned that you had replaced the faulty disk, but have you not
>> mounted it yet?
>>
>>
>>> [2019-01-15 20:17:00.691240] I [MSGID: 106498]
>>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
>>> connect returned 0
>>> [2019-01-15 20:17:00.691307] W [MSGID: 106061]
>>> [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
>>> Failed to get tcp-user-timeout
>>> [2019-01-15 20:17:00.691331] I
>>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting
>>> frame-timeout to 600
>>> [2019-01-15 20:17:00.692547] E [MSGID: 106187]
>>> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
>>> brick failed in restore
>>> [2019-01-15 20:17:00.692582] E [MSGID: 101019]
>>> [xlator.c:720:xlator_init] 0-management: Initialization of volume
>>> 'management' failed, review your volfile again
>>> [2019-01-15 20:17:00.692597] E [MSGID: 101066]
>>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator
>>> failed
>>> [2019-01-15 20:17:00.692607] E [MSGID: 101176]
>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>>> [2019-01-15 20:17:00.693004] W [glusterfsd.c:1514:cleanup_and_exit]
>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>>> received signum (-1), shutting down
>>>
>>>
>>> On Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee 
>>> wrote:
>>>
 This is a case of partial write of a transaction and as the host ran
 out of space for the root partition where all the glusterd related
 configurations are persisted, the transaction couldn't be written and hence
 the new (replaced) brick's information wasn't persisted in the
 configuration. The workaround for this is to copy the content of
 /var/lib/glusterd/vols/gfs-tst/ from one of the nodes in the trusted
 storage pool to the node where glusterd service fails to come up and post
 that restarting the glusterd service should be 

Re: [Gluster-users] glusterfs 4.1.6 error in starting glusterd service

2019-01-16 Thread Amudhan P
Yes, I did mount bricks but the folder 'brick4' was still not created
inside the brick.
Do I need to create this folder because when I run replace-brick it will
create folder inside the brick. I have seen this behavior before when
running replace-brick or heal begins.

On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee  wrote:

>
>
> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P  wrote:
>
>> Atin,
>> I have copied the content of 'gfs-tst' from vol folder in another node.
>> when starting service again fails with error msg in glusterd.log file.
>>
>> [2019-01-15 20:16:59.513023] I [MSGID: 100030] [glusterfsd.c:2741:main]
>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
>> [2019-01-15 20:16:59.517164] I [MSGID: 106478] [glusterd.c:1423:init]
>> 0-management: Maximum allowed open file descriptors set to 65536
>> [2019-01-15 20:16:59.517264] I [MSGID: 106479] [glusterd.c:1481:init]
>> 0-management: Using /var/lib/glusterd as working directory
>> [2019-01-15 20:16:59.517283] I [MSGID: 106479] [glusterd.c:1486:init]
>> 0-management: Using /var/run/gluster as pid file working directory
>> [2019-01-15 20:16:59.521508] W [MSGID: 103071]
>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
>> channel creation failed [No such device]
>> [2019-01-15 20:16:59.521544] W [MSGID: 103055] [rdma.c:4938:init]
>> 0-rdma.management: Failed to initialize IB Device
>> [2019-01-15 20:16:59.521562] W [rpc-transport.c:351:rpc_transport_load]
>> 0-rpc-transport: 'rdma' initialization failed
>> [2019-01-15 20:16:59.521629] W [rpcsvc.c:1781:rpcsvc_create_listener]
>> 0-rpc-service: cannot create listener, initing the transport failed
>> [2019-01-15 20:16:59.521648] E [MSGID: 106244] [glusterd.c:1764:init]
>> 0-management: creation of 1 listeners failed, continuing with succeeded
>> transport
>> [2019-01-15 20:17:00.529390] I [MSGID: 106513]
>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>> op-version: 40100
>> [2019-01-15 20:17:00.608354] I [MSGID: 106544]
>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>> [2019-01-15 20:17:00.650911] W [MSGID: 106425]
>> [glusterd-store.c:2643:glusterd_store_retrieve_bricks] 0-management: failed
>> to get statfs() call on brick /media/disk4/brick4 [No such file or
>> directory]
>>
>
> This means that underlying brick /media/disk4/brick4 doesn't exist. You
> already mentioned that you had replaced the faulty disk, but have you not
> mounted it yet?
>
>
>> [2019-01-15 20:17:00.691240] I [MSGID: 106498]
>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
>> connect returned 0
>> [2019-01-15 20:17:00.691307] W [MSGID: 106061]
>> [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
>> Failed to get tcp-user-timeout
>> [2019-01-15 20:17:00.691331] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
>> 0-management: setting frame-timeout to 600
>> [2019-01-15 20:17:00.692547] E [MSGID: 106187]
>> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
>> brick failed in restore
>> [2019-01-15 20:17:00.692582] E [MSGID: 101019] [xlator.c:720:xlator_init]
>> 0-management: Initialization of volume 'management' failed, review your
>> volfile again
>> [2019-01-15 20:17:00.692597] E [MSGID: 101066]
>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator
>> failed
>> [2019-01-15 20:17:00.692607] E [MSGID: 101176]
>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>> [2019-01-15 20:17:00.693004] W [glusterfsd.c:1514:cleanup_and_exit]
>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>> received signum (-1), shutting down
>>
>>
>> On Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee 
>> wrote:
>>
>>> This is a case of partial write of a transaction and as the host ran out
>>> of space for the root partition where all the glusterd related
>>> configurations are persisted, the transaction couldn't be written and hence
>>> the new (replaced) brick's information wasn't persisted in the
>>> configuration. The workaround for this is to copy the content of
>>> /var/lib/glusterd/vols/gfs-tst/ from one of the nodes in the trusted
>>> storage pool to the node where glusterd service fails to come up and post
>>> that restarting the glusterd service should be able to make peer status
>>> reporting all nodes healthy and connected.
>>>
>>> On Wed, Jan 16, 2019 at 3:49 PM Amudhan P  wrote:
>>>
 Hi,

 In short, when I started glusterd service I am getting following error
 msg in the glusterd.log file in one server.
 what needs to be done?

 error logged in glusterd.log

 [2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main]
 

Re: [Gluster-users] glusterfs 4.1.6 error in starting glusterd service

2019-01-16 Thread Atin Mukherjee
On Wed, Jan 16, 2019 at 5:02 PM Amudhan P  wrote:

> Atin,
> I have copied the content of 'gfs-tst' from vol folder in another node.
> when starting service again fails with error msg in glusterd.log file.
>
> [2019-01-15 20:16:59.513023] I [MSGID: 100030] [glusterfsd.c:2741:main]
> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
> [2019-01-15 20:16:59.517164] I [MSGID: 106478] [glusterd.c:1423:init]
> 0-management: Maximum allowed open file descriptors set to 65536
> [2019-01-15 20:16:59.517264] I [MSGID: 106479] [glusterd.c:1481:init]
> 0-management: Using /var/lib/glusterd as working directory
> [2019-01-15 20:16:59.517283] I [MSGID: 106479] [glusterd.c:1486:init]
> 0-management: Using /var/run/gluster as pid file working directory
> [2019-01-15 20:16:59.521508] W [MSGID: 103071]
> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
> channel creation failed [No such device]
> [2019-01-15 20:16:59.521544] W [MSGID: 103055] [rdma.c:4938:init]
> 0-rdma.management: Failed to initialize IB Device
> [2019-01-15 20:16:59.521562] W [rpc-transport.c:351:rpc_transport_load]
> 0-rpc-transport: 'rdma' initialization failed
> [2019-01-15 20:16:59.521629] W [rpcsvc.c:1781:rpcsvc_create_listener]
> 0-rpc-service: cannot create listener, initing the transport failed
> [2019-01-15 20:16:59.521648] E [MSGID: 106244] [glusterd.c:1764:init]
> 0-management: creation of 1 listeners failed, continuing with succeeded
> transport
> [2019-01-15 20:17:00.529390] I [MSGID: 106513]
> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
> op-version: 40100
> [2019-01-15 20:17:00.608354] I [MSGID: 106544]
> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
> d6bf51a7-c296-492f-8dac-e81efa9dd22d
> [2019-01-15 20:17:00.650911] W [MSGID: 106425]
> [glusterd-store.c:2643:glusterd_store_retrieve_bricks] 0-management: failed
> to get statfs() call on brick /media/disk4/brick4 [No such file or
> directory]
>

This means that underlying brick /media/disk4/brick4 doesn't exist. You
already mentioned that you had replaced the faulty disk, but have you not
mounted it yet?


> [2019-01-15 20:17:00.691240] I [MSGID: 106498]
> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
> connect returned 0
> [2019-01-15 20:17:00.691307] W [MSGID: 106061]
> [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
> Failed to get tcp-user-timeout
> [2019-01-15 20:17:00.691331] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2019-01-15 20:17:00.692547] E [MSGID: 106187]
> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
> brick failed in restore
> [2019-01-15 20:17:00.692582] E [MSGID: 101019] [xlator.c:720:xlator_init]
> 0-management: Initialization of volume 'management' failed, review your
> volfile again
> [2019-01-15 20:17:00.692597] E [MSGID: 101066]
> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator
> failed
> [2019-01-15 20:17:00.692607] E [MSGID: 101176]
> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
> [2019-01-15 20:17:00.693004] W [glusterfsd.c:1514:cleanup_and_exit]
> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
> received signum (-1), shutting down
>
>
> On Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee 
> wrote:
>
>> This is a case of partial write of a transaction and as the host ran out
>> of space for the root partition where all the glusterd related
>> configurations are persisted, the transaction couldn't be written and hence
>> the new (replaced) brick's information wasn't persisted in the
>> configuration. The workaround for this is to copy the content of
>> /var/lib/glusterd/vols/gfs-tst/ from one of the nodes in the trusted
>> storage pool to the node where glusterd service fails to come up and post
>> that restarting the glusterd service should be able to make peer status
>> reporting all nodes healthy and connected.
>>
>> On Wed, Jan 16, 2019 at 3:49 PM Amudhan P  wrote:
>>
>>> Hi,
>>>
>>> In short, when I started glusterd service I am getting following error
>>> msg in the glusterd.log file in one server.
>>> what needs to be done?
>>>
>>> error logged in glusterd.log
>>>
>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main]
>>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
>>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init]
>>> 0-management: Maximum allowed open file descriptors set to 65536
>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init]
>>> 0-management: Using /var/lib/glusterd as working directory
>>> 

Re: [Gluster-users] glusterfs 4.1.6 error in starting glusterd service

2019-01-16 Thread Amudhan P
Atin,
I have copied the content of 'gfs-tst' from vol folder in another node.
when starting service again fails with error msg in glusterd.log file.

[2019-01-15 20:16:59.513023] I [MSGID: 100030] [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
[2019-01-15 20:16:59.517164] I [MSGID: 106478] [glusterd.c:1423:init]
0-management: Maximum allowed open file descriptors set to 65536
[2019-01-15 20:16:59.517264] I [MSGID: 106479] [glusterd.c:1481:init]
0-management: Using /var/lib/glusterd as working directory
[2019-01-15 20:16:59.517283] I [MSGID: 106479] [glusterd.c:1486:init]
0-management: Using /var/run/gluster as pid file working directory
[2019-01-15 20:16:59.521508] W [MSGID: 103071]
[rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
channel creation failed [No such device]
[2019-01-15 20:16:59.521544] W [MSGID: 103055] [rdma.c:4938:init]
0-rdma.management: Failed to initialize IB Device
[2019-01-15 20:16:59.521562] W [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma' initialization failed
[2019-01-15 20:16:59.521629] W [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create listener, initing the transport failed
[2019-01-15 20:16:59.521648] E [MSGID: 106244] [glusterd.c:1764:init]
0-management: creation of 1 listeners failed, continuing with succeeded
transport
[2019-01-15 20:17:00.529390] I [MSGID: 106513]
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
op-version: 40100
[2019-01-15 20:17:00.608354] I [MSGID: 106544]
[glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
d6bf51a7-c296-492f-8dac-e81efa9dd22d
[2019-01-15 20:17:00.650911] W [MSGID: 106425]
[glusterd-store.c:2643:glusterd_store_retrieve_bricks] 0-management: failed
to get statfs() call on brick /media/disk4/brick4 [No such file or
directory]
[2019-01-15 20:17:00.691240] I [MSGID: 106498]
[glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
connect returned 0
[2019-01-15 20:17:00.691307] W [MSGID: 106061]
[glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
Failed to get tcp-user-timeout
[2019-01-15 20:17:00.691331] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2019-01-15 20:17:00.692547] E [MSGID: 106187]
[glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
brick failed in restore
[2019-01-15 20:17:00.692582] E [MSGID: 101019] [xlator.c:720:xlator_init]
0-management: Initialization of volume 'management' failed, review your
volfile again
[2019-01-15 20:17:00.692597] E [MSGID: 101066]
[graph.c:367:glusterfs_graph_init] 0-management: initializing translator
failed
[2019-01-15 20:17:00.692607] E [MSGID: 101176]
[graph.c:738:glusterfs_graph_activate] 0-graph: init failed
[2019-01-15 20:17:00.693004] W [glusterfsd.c:1514:cleanup_and_exit]
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
-->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
received signum (-1), shutting down


On Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee  wrote:

> This is a case of partial write of a transaction and as the host ran out
> of space for the root partition where all the glusterd related
> configurations are persisted, the transaction couldn't be written and hence
> the new (replaced) brick's information wasn't persisted in the
> configuration. The workaround for this is to copy the content of
> /var/lib/glusterd/vols/gfs-tst/ from one of the nodes in the trusted
> storage pool to the node where glusterd service fails to come up and post
> that restarting the glusterd service should be able to make peer status
> reporting all nodes healthy and connected.
>
> On Wed, Jan 16, 2019 at 3:49 PM Amudhan P  wrote:
>
>> Hi,
>>
>> In short, when I started glusterd service I am getting following error
>> msg in the glusterd.log file in one server.
>> what needs to be done?
>>
>> error logged in glusterd.log
>>
>> [2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main]
>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
>> [2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init]
>> 0-management: Maximum allowed open file descriptors set to 65536
>> [2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init]
>> 0-management: Using /var/lib/glusterd as working directory
>> [2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init]
>> 0-management: Using /var/run/gluster as pid file working directory
>> [2019-01-15 17:50:13.964437] W [MSGID: 103071]
>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
>> channel creation failed [No such device]
>> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init]

Re: [Gluster-users] glusterfs 4.1.6 error in starting glusterd service

2019-01-16 Thread Atin Mukherjee
This is a case of partial write of a transaction and as the host ran out of
space for the root partition where all the glusterd related configurations
are persisted, the transaction couldn't be written and hence the new
(replaced) brick's information wasn't persisted in the configuration. The
workaround for this is to copy the content of
/var/lib/glusterd/vols/gfs-tst/ from one of the nodes in the trusted
storage pool to the node where glusterd service fails to come up and post
that restarting the glusterd service should be able to make peer status
reporting all nodes healthy and connected.

On Wed, Jan 16, 2019 at 3:49 PM Amudhan P  wrote:

> Hi,
>
> In short, when I started glusterd service I am getting following error msg
> in the glusterd.log file in one server.
> what needs to be done?
>
> error logged in glusterd.log
>
> [2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main]
> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
> [2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init]
> 0-management: Maximum allowed open file descriptors set to 65536
> [2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init]
> 0-management: Using /var/lib/glusterd as working directory
> [2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init]
> 0-management: Using /var/run/gluster as pid file working directory
> [2019-01-15 17:50:13.964437] W [MSGID: 103071]
> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
> channel creation failed [No such device]
> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init]
> 0-rdma.management: Failed to initialize IB Device
> [2019-01-15 17:50:13.964491] W [rpc-transport.c:351:rpc_transport_load]
> 0-rpc-transport: 'rdma' initialization failed
> [2019-01-15 17:50:13.964560] W [rpcsvc.c:1781:rpcsvc_create_listener]
> 0-rpc-service: cannot create listener, initing the transport failed
> [2019-01-15 17:50:13.964579] E [MSGID: 106244] [glusterd.c:1764:init]
> 0-management: creation of 1 listeners failed, continuing with succeeded
> transport
> [2019-01-15 17:50:14.967681] I [MSGID: 106513]
> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
> op-version: 40100
> [2019-01-15 17:50:14.973931] I [MSGID: 106544]
> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
> d6bf51a7-c296-492f-8dac-e81efa9dd22d
> [2019-01-15 17:50:15.046620] E [MSGID: 101032]
> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
> file or directory]
> [2019-01-15 17:50:15.046685] E [MSGID: 106201]
> [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
> Unable to restore volume: gfs-tst
> [2019-01-15 17:50:15.046718] E [MSGID: 101019] [xlator.c:720:xlator_init]
> 0-management: Initialization of volume 'management' failed, review your
> volfile again
> [2019-01-15 17:50:15.046732] E [MSGID: 101066]
> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator
> failed
> [2019-01-15 17:50:15.046741] E [MSGID: 101176]
> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
> [2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit]
> (-->/usr/local/sbin/glusterd(glusterfs_volumes
>
>
>
> In long, I am trying to simulate a situation. where volume stoped
> abnormally and
> entire cluster restarted with some missing disks.
>
> My test cluster is set up with 3 nodes and each has four disks, I have
> setup a volume with disperse 4+2.
> In Node-3 2 disks have failed, to replace I have shutdown all system
>
> below are the steps done.
>
> 1. umount from client machine
> 2. shutdown all system by running `shutdown -h now` command ( without
> stopping volume and stop service)
> 3. replace faulty disk in Node-3
> 4. powered ON all system
> 5. format replaced drives, and mount all drives
> 6. start glusterd service in all node (success)
> 7. Now running `voulume status` command from node-3
> output : [2019-01-15 16:52:17.718422]  : v status : FAILED : Staging
> failed on 0083ec0c-40bf-472a-a128-458924e56c96. Please check log file for
> details.
> 8. running `voulume start gfs-tst` command from node-3
> output : [2019-01-15 16:53:19.410252]  : v start gfs-tst : FAILED : Volume
> gfs-tst already started
>
> 9. running `gluster v status` in other node. showing all brick available
> but 'self-heal daemon' not running
> @gfstst-node2:~$ sudo gluster v status
> Status of volume: gfs-tst
> Gluster process TCP Port  RDMA Port  Online
> Pid
>
> --
> Brick IP.2:/media/disk1/brick1  49152 0  Y   1517
> Brick IP.4:/media/disk1/brick1  49152 0  Y   1668
> Brick IP.2:/media/disk2/brick2  49153 0  Y   1522
> Brick 

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

2019-01-16 Thread Amudhan P
Hi,

In short, when I started glusterd service I am getting following error msg
in the glusterd.log file in one server.
what needs to be done?

error logged in glusterd.log

[2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
[2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init]
0-management: Maximum allowed open file descriptors set to 65536
[2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init]
0-management: Using /var/lib/glusterd as working directory
[2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init]
0-management: Using /var/run/gluster as pid file working directory
[2019-01-15 17:50:13.964437] W [MSGID: 103071]
[rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
channel creation failed [No such device]
[2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init]
0-rdma.management: Failed to initialize IB Device
[2019-01-15 17:50:13.964491] W [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma' initialization failed
[2019-01-15 17:50:13.964560] W [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create listener, initing the transport failed
[2019-01-15 17:50:13.964579] E [MSGID: 106244] [glusterd.c:1764:init]
0-management: creation of 1 listeners failed, continuing with succeeded
transport
[2019-01-15 17:50:14.967681] I [MSGID: 106513]
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
op-version: 40100
[2019-01-15 17:50:14.973931] I [MSGID: 106544]
[glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
d6bf51a7-c296-492f-8dac-e81efa9dd22d
[2019-01-15 17:50:15.046620] E [MSGID: 101032]
[store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
file or directory]
[2019-01-15 17:50:15.046685] E [MSGID: 106201]
[glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
Unable to restore volume: gfs-tst
[2019-01-15 17:50:15.046718] E [MSGID: 101019] [xlator.c:720:xlator_init]
0-management: Initialization of volume 'management' failed, review your
volfile again
[2019-01-15 17:50:15.046732] E [MSGID: 101066]
[graph.c:367:glusterfs_graph_init] 0-management: initializing translator
failed
[2019-01-15 17:50:15.046741] E [MSGID: 101176]
[graph.c:738:glusterfs_graph_activate] 0-graph: init failed
[2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit]
(-->/usr/local/sbin/glusterd(glusterfs_volumes



In long, I am trying to simulate a situation. where volume stoped
abnormally and
entire cluster restarted with some missing disks.

My test cluster is set up with 3 nodes and each has four disks, I have
setup a volume with disperse 4+2.
In Node-3 2 disks have failed, to replace I have shutdown all system

below are the steps done.

1. umount from client machine
2. shutdown all system by running `shutdown -h now` command ( without
stopping volume and stop service)
3. replace faulty disk in Node-3
4. powered ON all system
5. format replaced drives, and mount all drives
6. start glusterd service in all node (success)
7. Now running `voulume status` command from node-3
output : [2019-01-15 16:52:17.718422]  : v status : FAILED : Staging failed
on 0083ec0c-40bf-472a-a128-458924e56c96. Please check log file for details.
8. running `voulume start gfs-tst` command from node-3
output : [2019-01-15 16:53:19.410252]  : v start gfs-tst : FAILED : Volume
gfs-tst already started

9. running `gluster v status` in other node. showing all brick available
but 'self-heal daemon' not running
@gfstst-node2:~$ sudo gluster v status
Status of volume: gfs-tst
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick IP.2:/media/disk1/brick1  49152 0  Y   1517
Brick IP.4:/media/disk1/brick1  49152 0  Y   1668
Brick IP.2:/media/disk2/brick2  49153 0  Y   1522
Brick IP.4:/media/disk2/brick2  49153 0  Y   1678
Brick IP.2:/media/disk3/brick3  49154 0  Y   1527
Brick IP.4:/media/disk3/brick3  49154 0  Y   1677
Brick IP.2:/media/disk4/brick4  49155 0  Y   1541
Brick IP.4:/media/disk4/brick4  49155 0  Y   1683
Self-heal Daemon on localhost   N/A   N/AY
 2662
Self-heal Daemon on IP.4N/A   N/AY   2786

10. in the above output 'volume already started'. so, running `reset-brick`
command
   v reset-brick gfs-tst IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3
commit force

output : [2019-01-15 16:57:37.916942]  : v reset-brick gfs-tst
IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3 commit force : FAILED :