[Gluster-users] Replacing a downed brick

Reinis Rozitis Mon, 25 Jul 2011 11:36:04 -0700

Hello,

while playing around with the new elastic glusterfs system (via 'glusterd', previously have been using glusterfs with staticconfiguration) I have stumbled upon such problem:


1. I have a test system with 12 nodes in a replicated/distributed way (replica 
count 3):

Volume Name: storage
Type: Distributed-Replicate
Status: Started
Number of Bricks: 4 x 3 = 12
Transport-type: tcp


2. One of the brick systems/servers had a simulated hardware failure (disks 
have been wiped) and restarted a new.


3. When the server ('glusterd') came up the rest of the bricks received 
something like:

Jul 25 17:10:45 snode182 GlusterFS[3371]: [2011-07-25 17:10:45.435786] C [glusterd-rpc-ops.c:748:glusterd3_1_cluster_lock_cbk] 0-:Lock response received from unknown peer: 4ecec354-1d02-4709-8f1e-607a735dbe62

Obviously the peer UID in glusterd.info (because of the full "crash/reinstall") seems to be different from the UID which is in thecluster configuration.


Peer status shows:

Hostname: 10.0.0.149
Uuid: f9ea651e-68da-40fa-80d9-6bee7779aa97
State: Peer Rejected (Connected)

4. While the info commands work fine anything that involves changing the volume settings return that the volume doesn't exist (fromthe logs seem to be coming from the reinstalled node):


[2011-07-25 17:08:54.579631] E 
[glusterd-op-sm.c:1237:glusterd_op_stage_set_volume] 0-: Volume storage does 
not exist
[2011-07-25 17:08:54.579769] E [glusterd-op-sm.c:7107:glusterd_op_ac_stage_op] 
0-: Validate failed: -1



So my question is how to correctly reintroduce the box to the glusterfs cluster 
since:

1. I can't run 'peer probe 10.0.0.149' as gluster says the peer is already in 
cluster
2. I can't remove the peer because it is part of a volume.

3. I can't remove the brick from the volume because gluster asks me to remove 3 bricks (eg the replica count and also would meandata loss).4. I imagine that the replace-brick won't work even if I fake the new node with a different ip/hostname (since the source brick willbe down) or will it replicate from the alive ones?

I tried just to manually change the UID back to the the one which is listed in the rest of the nodes (peer status) but apparently itwas not enough (the node itself didn't see any other servers and wasn't able to sync volume information from remote brick(s)complaining that it is not his friend).

Then when I manually placed all the peers/* files from a running bricknode and restarted glusterd the node reverted to 'Peer inCluster' state.



Is this the way?
Or am I doing something totally wrong?


wbr
rr

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

[Gluster-users] Replacing a downed brick

Reply via email to