I had a two-node replicated/distributed volume, spread across server1:/bricks/1 
server2:/bricks/1 server1:/bricks/2 server2:/bricks/2.  I powered down server2 
in order to re-rack it to make room for server3.  server2 fails to come up, for 
reasons having nothing to do with gluster.  So I decided to go ahead and bring 
up server3 and move server2's bricks to it.  I saw conflicting information on 
how to do that with a completely dead node and a new node of a different name.  
Basically i did a peer probe server3, then volume replace-brick share name 
server2:/bricks/1 server3:/bricks/1.  then i did a volume replace-brick <blah> 
commit force.

this was probably a bad thing.  then i tried to do the replace-brick with the 
second set.  it fails to start saying replace-brick is already running on the 
volume.  now i'm stuck.  the data in brick/1 DOES appear on the new node, but i 
can't do anything with brick/2.  

if i try to do a commit, it says bricks/1 isn't on server2, and if i try to do 
anything else it says replace-brick is running.  i did a rebalance, hoping that 
would fix it, but it has not.  I attempted to stop the volume, but it said i 
couldn't until the replace-brick was committed or aborted.  I cannot abort, it 
says replace-brick abort failed.  Now what?  Mind, this is a temporary setup 
which has a complex directory structure, but no data as yet.  We are looking to 
use this for production VERY soon, and i'm not sure that (a) i have time to 
rebuild everything, and (and more importantly) (b) i need to be able to 
demonstrate to management that "look, a node failed and we replaced it with no 
data loss".

so, what's my next step to get this mess untangled, and the data safely on my 
new node...
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to