Hi gluster users,

I just upgraded 3.2.5 to 3.3.1 for a Distributed-Replicate volume with
about 2M directories to get a working replace-brick and now see it hang
up the entire gluster volume for all clients for several minutes, and
subsequently hang up the glusterfs on the destination brick.

I suspect the gluster volume hangup to be related to
https://bugzilla.redhat.com/show_bug.cgi?id=832609 "Glusterfsd hangs if
brick filesystem becomes unresponsive, causing all clients to lock up".

The resulting hanging destination replace-brick sits at 100% CPU and
shows no strace output.

gluster volume replace-brick xxx status
Number of files migrated = 3       Current file= /xxx 

%CPU %MEM    TIME+  P COMMAND
100  0.2   2238:48 2 //sbin/glusterfs 
-f/var/lib/glusterd/vols/vol01/rb_dst_brick.vol ...

The target brick received about 1% of the intended directories.

The log file -etc-glusterfs-glusterd.vol.log shows only that the
replace-brick has started :

I [glusterd-replace-brick.c:98:glusterd_handle_replace_brick] 0-glusterd: 
Received replace brick req
I [glusterd-replace-brick.c:147:glusterd_handle_replace_brick] 0-glusterd: 
Received replace brick status request
I [glusterd-utils.c:285:glusterd_lock] 0-glusterd: Cluster lock held by 3*
I [glusterd-handler.c:463:glusterd_op_txn_begin] 0-management: Acquired local 
lock
I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received 
ACC from uuid: 9*
I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received 
ACC from uuid: c*
I [glusterd-utils.c:857:glusterd_volume_brickinfo_get_by_brick] 0-: brick: 
s1:/g/c
I [glusterd-utils.c:814:glusterd_volume_brickinfo_get] 0-management: Found brick
I [glusterd-op-sm.c:2039:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req 
to 2 peers
I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC 
from uuid: c*
I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC 
from uuid: 9*
I [glusterd-utils.c:857:glusterd_volume_brickinfo_get_by_brick] 0-: brick: 
s1:/g/c
I [glusterd-utils.c:814:glusterd_volume_brickinfo_get] 0-management: Found brick
I [glusterd-replace-brick.c:1288:rb_update_dstbrick_port] 0-: adding dst-brick 
port no
I [glusterd-op-sm.c:2384:glusterd_op_ac_send_commit_op] 0-management: Sent op 
req to 2 peers
I [glusterd-rpc-ops.c:1317:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC 
from uuid: c*
I [glusterd-rpc-ops.c:1317:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC 
from uuid: 9*
I [glusterd-rpc-ops.c:607:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received 
ACC from uuid: 9*
I [glusterd-rpc-ops.c:607:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received 
ACC from uuid: c*
I [glusterd-op-sm.c:2653:glusterd_op_txn_complete] 0-glusterd: Cleared local 
lock

Any hints on how to proceed from here and get replace-brick to work are welcome.

regards,
   Hans Lambermont
-- 
Hans Lambermont | Senior Architect
(t) +31407370104 (w) www.shapeways.com
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to