Re: [Gluster-users] heal: Not able to fetch volfile from glusterd

2019-05-07 Thread Łukasz Michalski


Does the same issue occur if you try to resolve the split-brain on the 
gfid 2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 using the |gluster volume heal 
 split-brain |CLI?




Many thanks for responding!

gluster volume info:

Volume Name: cluster
Type: Replicate
Volume ID: 8787d95e-8e66-4476-a990-4e27fc47c765
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: ixmed2:/glusterfs-bricks/cluster/cluster
Brick2: ixmed1:/glusterfs-bricks/cluster/cluster
Options Reconfigured:
network.ping-timeout: 5
user.smb: disable
transport.address-family: inet
nfs.disable: on

The problem was in network.ping-timeout set to 5 seconds. It is set for 
such a short value to prevent smb session from disconnecting when one 
node goes offline.


It seems that for split-brain resolution and management I have to 
temporarily set this value to 30 seconds or more.


Regards,
Łukasz





___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] heal: Not able to fetch volfile from glusterd

2019-05-07 Thread Ravishankar N


On 06/05/19 6:43 PM, Łukasz Michalski wrote:

Hi,

I have problem resolving split-brain in one of my installations.

CenOS 7, glusterfs 3.10.12, replica on two nodes:

[root@ixmed1 iscsi]# gluster volume status cluster
Status of volume: cluster
Gluster process TCP Port  RDMA Port 
Online  Pid
-- 


Brick ixmed2:/glusterfs-bricks/cluster/clus
ter 49153 0 Y 3028
Brick ixmed1:/glusterfs-bricks/cluster/clus
ter 49153 0 Y 2917
Self-heal Daemon on localhost   N/A   N/A Y 112929
Self-heal Daemon on ixmed2  N/A   N/A Y 57774

Task Status of Volume cluster
-- 


There are no active volume tasks

When I try to access one file glusterd reports split brain:

[2019-05-06 12:36:43.785098] E [MSGID: 108008] 
[afr-read-txn.c:90:afr_read_txn_refresh_done] 0-cluster-replicate-0: 
Failing READ on gfid 2584a0e2-c0fa-4fde-8537-5d5b6a5a4635: split-brain 
observed. [Input/output error]
[2019-05-06 12:36:43.787952] E [MSGID: 108008] 
[afr-read-txn.c:90:afr_read_txn_refresh_done] 0-cluster-replicate-0: 
Failing FGETXATTR on gfid 2584a0e2-c0fa-4fde-8537-5d5b6a5a4635: 
split-brain observed. [Input/output error]
[2019-05-06 12:36:43.788778] W [MSGID: 108027] 
[afr-common.c:2722:afr_discover_done] 0-cluster-replicate-0: no read 
subvols for (null)
[2019-05-06 12:36:43.790123] W [fuse-bridge.c:2254:fuse_readv_cbk] 
0-glusterfs-fuse: 3352501: READ => -1 
gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde0803f390 
(Input/output error)
[2019-05-06 12:36:43.794979] W [fuse-bridge.c:2254:fuse_readv_cbk] 
0-glusterfs-fuse: 3352506: READ => -1 
gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde08215ed0 
(Input/output error)
[2019-05-06 12:36:43.800468] W [fuse-bridge.c:2254:fuse_readv_cbk] 
0-glusterfs-fuse: 3352508: READ => -1 
gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde08215ed0 
(Input/output error)


The problem is that "gluster volume heal info" hangs for 10 seconds 
and returns:


    Not able to fetch volfile from glusterd
    Volume heal failed

glfsheal.log contains:

[2019-05-06 12:40:25.589879] I [afr.c:94:fix_quorum_options] 
0-cluster-replicate-0: reindeer: incoming qtype = none
[2019-05-06 12:40:25.589967] I [afr.c:116:fix_quorum_options] 
0-cluster-replicate-0: reindeer: quorum_count = 0
[2019-05-06 12:40:25.593294] W [MSGID: 101174] 
[graph.c:361:_log_if_unknown_option] 0-cluster-readdir-ahead: option 
'parallel-readdir' is not recognized
[2019-05-06 12:40:25.593895] I [MSGID: 104045] 
[glfs-master.c:91:notify] 0-gfapi: New graph 
69786d65-6431-2d32-3037-3739322d3230 (0) coming up
[2019-05-06 12:40:25.593972] I [MSGID: 114020] [client.c:2352:notify] 
0-cluster-client-0: parent translators are ready, attempting connect 
on transport
[2019-05-06 12:40:25.607836] I [MSGID: 114020] [client.c:2352:notify] 
0-cluster-client-1: parent translators are ready, attempting connect 
on transport
[2019-05-06 12:40:25.608556] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
0-cluster-client-0: changing port to 49153 (from 0)
[2019-05-06 12:40:25.618167] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
0-cluster-client-1: changing port to 49153 (from 0)
[2019-05-06 12:40:25.629595] I [MSGID: 114057] 
[client-handshake.c:1451:select_server_supported_programs] 
0-cluster-client-0: Using Program GlusterFS 3.3, Num (1298437), 
Version (330)
[2019-05-06 12:40:25.632031] I [MSGID: 114046] 
[client-handshake.c:1216:client_setvolume_cbk] 0-cluster-client-0: 
Connected to cluster-client-0, attached to remote volume 
'/glusterfs-bricks/cluster/cluster'.
[2019-05-06 12:40:25.632100] I [MSGID: 114047] 
[client-handshake.c:1227:client_setvolume_cbk] 0-cluster-client-0: 
Server and Client lk-version numbers are not same, reopening the fds
[2019-05-06 12:40:25.632263] I [MSGID: 108005] 
[afr-common.c:4817:afr_notify] 0-cluster-replicate-0: Subvolume 
'cluster-client-0' came back up; going online.
[2019-05-06 12:40:25.637707] I [MSGID: 114057] 
[client-handshake.c:1451:select_server_supported_programs] 
0-cluster-client-1: Using Program GlusterFS 3.3, Num (1298437), 
Version (330)
[2019-05-06 12:40:25.639285] I [MSGID: 114046] 
[client-handshake.c:1216:client_setvolume_cbk] 0-cluster-client-1: 
Connected to cluster-client-1, attached to remote volume 
'/glusterfs-bricks/cluster/cluster'.
[2019-05-06 12:40:25.639341] I [MSGID: 114047] 
[client-handshake.c:1227:client_setvolume_cbk] 0-cluster-client-1: 
Server and Client lk-version numbers are not same, reopening the fds
[2019-05-06 12:40:31.564407] C 
[rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-cluster-client-0: 
server 10.0.104.26:49153 has not responded in the last 5 seconds, 
disconnecting.
[2019-05-06 12:40:31.565764] C 
[rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-cluster-client-1: 
server 

[Gluster-users] heal: Not able to fetch volfile from glusterd

2019-05-06 Thread Łukasz Michalski

Hi,

I have problem resolving split-brain in one of my installations.

CenOS 7, glusterfs 3.10.12, replica on two nodes:

[root@ixmed1 iscsi]# gluster volume status cluster
Status of volume: cluster
Gluster process TCP Port  RDMA Port Online  Pid
--
Brick ixmed2:/glusterfs-bricks/cluster/clus
ter 49153 0 Y   3028
Brick ixmed1:/glusterfs-bricks/cluster/clus
ter 49153 0 Y   2917
Self-heal Daemon on localhost   N/A   N/A Y   112929
Self-heal Daemon on ixmed2  N/A   N/A Y   57774

Task Status of Volume cluster
--
There are no active volume tasks

When I try to access one file glusterd reports split brain:

[2019-05-06 12:36:43.785098] E [MSGID: 108008] 
[afr-read-txn.c:90:afr_read_txn_refresh_done] 0-cluster-replicate-0: 
Failing READ on gfid 2584a0e2-c0fa-4fde-8537-5d5b6a5a4635: split-brain 
observed. [Input/output error]
[2019-05-06 12:36:43.787952] E [MSGID: 108008] 
[afr-read-txn.c:90:afr_read_txn_refresh_done] 0-cluster-replicate-0: 
Failing FGETXATTR on gfid 2584a0e2-c0fa-4fde-8537-5d5b6a5a4635: 
split-brain observed. [Input/output error]
[2019-05-06 12:36:43.788778] W [MSGID: 108027] 
[afr-common.c:2722:afr_discover_done] 0-cluster-replicate-0: no read 
subvols for (null)
[2019-05-06 12:36:43.790123] W [fuse-bridge.c:2254:fuse_readv_cbk] 
0-glusterfs-fuse: 3352501: READ => -1 
gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde0803f390 
(Input/output error)
[2019-05-06 12:36:43.794979] W [fuse-bridge.c:2254:fuse_readv_cbk] 
0-glusterfs-fuse: 3352506: READ => -1 
gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde08215ed0 
(Input/output error)
[2019-05-06 12:36:43.800468] W [fuse-bridge.c:2254:fuse_readv_cbk] 
0-glusterfs-fuse: 3352508: READ => -1 
gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde08215ed0 
(Input/output error)


The problem is that "gluster volume heal info" hangs for 10 seconds and 
returns:


    Not able to fetch volfile from glusterd
    Volume heal failed

glfsheal.log contains:

[2019-05-06 12:40:25.589879] I [afr.c:94:fix_quorum_options] 
0-cluster-replicate-0: reindeer: incoming qtype = none
[2019-05-06 12:40:25.589967] I [afr.c:116:fix_quorum_options] 
0-cluster-replicate-0: reindeer: quorum_count = 0
[2019-05-06 12:40:25.593294] W [MSGID: 101174] 
[graph.c:361:_log_if_unknown_option] 0-cluster-readdir-ahead: option 
'parallel-readdir' is not recognized
[2019-05-06 12:40:25.593895] I [MSGID: 104045] [glfs-master.c:91:notify] 
0-gfapi: New graph 69786d65-6431-2d32-3037-3739322d3230 (0) coming up
[2019-05-06 12:40:25.593972] I [MSGID: 114020] [client.c:2352:notify] 
0-cluster-client-0: parent translators are ready, attempting connect on 
transport
[2019-05-06 12:40:25.607836] I [MSGID: 114020] [client.c:2352:notify] 
0-cluster-client-1: parent translators are ready, attempting connect on 
transport
[2019-05-06 12:40:25.608556] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
0-cluster-client-0: changing port to 49153 (from 0)
[2019-05-06 12:40:25.618167] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
0-cluster-client-1: changing port to 49153 (from 0)
[2019-05-06 12:40:25.629595] I [MSGID: 114057] 
[client-handshake.c:1451:select_server_supported_programs] 
0-cluster-client-0: Using Program GlusterFS 3.3, Num (1298437), Version 
(330)
[2019-05-06 12:40:25.632031] I [MSGID: 114046] 
[client-handshake.c:1216:client_setvolume_cbk] 0-cluster-client-0: 
Connected to cluster-client-0, attached to remote volume 
'/glusterfs-bricks/cluster/cluster'.
[2019-05-06 12:40:25.632100] I [MSGID: 114047] 
[client-handshake.c:1227:client_setvolume_cbk] 0-cluster-client-0: 
Server and Client lk-version numbers are not same, reopening the fds
[2019-05-06 12:40:25.632263] I [MSGID: 108005] 
[afr-common.c:4817:afr_notify] 0-cluster-replicate-0: Subvolume 
'cluster-client-0' came back up; going online.
[2019-05-06 12:40:25.637707] I [MSGID: 114057] 
[client-handshake.c:1451:select_server_supported_programs] 
0-cluster-client-1: Using Program GlusterFS 3.3, Num (1298437), Version 
(330)
[2019-05-06 12:40:25.639285] I [MSGID: 114046] 
[client-handshake.c:1216:client_setvolume_cbk] 0-cluster-client-1: 
Connected to cluster-client-1, attached to remote volume 
'/glusterfs-bricks/cluster/cluster'.
[2019-05-06 12:40:25.639341] I [MSGID: 114047] 
[client-handshake.c:1227:client_setvolume_cbk] 0-cluster-client-1: 
Server and Client lk-version numbers are not same, reopening the fds
[2019-05-06 12:40:31.564407] C 
[rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-cluster-client-0: 
server 10.0.104.26:49153 has not responded in the last 5 seconds, 
disconnecting.
[2019-05-06 12:40:31.565764] C 
[rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-cluster-client-1: 
server 10.0.7.26:49153 has not responded in the