Re: [Gluster-users] How to re-sync

2010-03-07 Thread Liam Slusser
Assuming you used raid1 (distribute), you DO bring up the new machine
and start gluster.  On one of your gluster mounts you run a ls -alR
and it will resync the new node.  The gluster clients are smart enough
to get the files from the first node.

liam

On Sat, Mar 6, 2010 at 11:48 PM, Chad ccolu...@hotmail.com wrote:
 Ok, so assuming you have N glusterfsd servers (say 2 cause it does not
 really matter).
 Now one of the servers dies.
 You repair the machine and bring it back up.

 I think 2 things:
 1. You should not start glusterfsd on boot (you need to sync the HD first)
 2. When it is up how do you re-sync it?

 Do you rsync the underlying mount points?
 If it is a busy gluster cluster it will be getting new files all the time.
 So how do you sync and bring it back up safely so that clients don't connect
 to an incomplete server?

 ^C
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] How to re-sync

2010-03-07 Thread Stephan von Krawczynski
I love top-post ;-)

Generally, you are right. But in real-life you cannot trust on this
smartness. We tried exactly this point and had to find out that the clients
do not always select the correct file version (i.e. the latest) automatically.
Our idea in the testcase was to bring down a node, update its kernel an revive
it - just as you would like to do it in real world for a kernel update.
We found out that some files were taken from the downed node afterwards and
the new contents on the other node got in fact overwritten.
This does not happen generally, of course. But it does happen. We could only
stop this behaviour by setting favorite-child. But that does not really help
a lot, since we want to take down all nodes some other day.
This is in fact one of our show-stoppers.


On Sun, 7 Mar 2010 01:33:14 -0800
Liam Slusser lslus...@gmail.com wrote:

 Assuming you used raid1 (distribute), you DO bring up the new machine
 and start gluster.  On one of your gluster mounts you run a ls -alR
 and it will resync the new node.  The gluster clients are smart enough
 to get the files from the first node.
 
 liam
 
 On Sat, Mar 6, 2010 at 11:48 PM, Chad ccolu...@hotmail.com wrote:
  Ok, so assuming you have N glusterfsd servers (say 2 cause it does not
  really matter).
  Now one of the servers dies.
  You repair the machine and bring it back up.
 
  I think 2 things:
  1. You should not start glusterfsd on boot (you need to sync the HD first)
  2. When it is up how do you re-sync it?
 
  Do you rsync the underlying mount points?
  If it is a busy gluster cluster it will be getting new files all the time.
  So how do you sync and bring it back up safely so that clients don't connect
  to an incomplete server?
 
  ^C
  ___
  Gluster-users mailing list
  Gluster-users@gluster.org
  http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
 


-- 
MfG,
Stephan von Krawczynski


--
ith Kommunikationstechnik GmbH

Lieferanschrift  : Reiterstrasse 24, D-94447 Plattling
Telefon  : +49 9931 9188 0
Fax  : +49 9931 9188 44
Geschaeftsfuehrer: Stephan von Krawczynski
Registergericht  : Deggendorf HRB 1625
--

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] How to re-sync

2010-03-07 Thread Tejas N. Bhise
Chad, Stephan - thank you for your feedback.

Just to clarify on what wrote, do you mean to say that -

1) The setup is a replicate setup with the file being written to multiple nodes.
2) One of these nodes is brought down.
3) A replicated file with a copy on the node brought down is written to.
4) The other copies are updates as writes  happen while this node is still down.
5) After this node is brought up, the client sometimes sees the old file on the 
node brought up
instead of picking the file from a node that has the latest copy.

If the above is correct, quick questions -

1) What versions are you using ?
2) Can you share your volume files ? Are they generated using volgen ? 
3) Did you notice any patterns for the files where the wrong copy was picked ? 
like 
were they open when the node was brought down ?
4) Any other way to reproduce the problem ?
5) Any other patterns you observed when you see the problem ?
6) Would you have listings of problem file(s) from the replica nodes ?

If however my understanding was not  correct, then please let me know with some
examples.

Regards,
Tejas.

- Original Message -
From: Chad ccolu...@hotmail.com
To: Stephan von Krawczynski sk...@ithnet.com
Cc: gluster-users@gluster.org
Sent: Sunday, March 7, 2010 9:32:27 PM GMT +05:30 Chennai, Kolkata, Mumbai, New 
Delhi
Subject: Re: [Gluster-users] How to re-sync

I actually do prefer top post.

Well this overwritten behavior is what I saw as well and that is a REALLY 
REALLY bad thing.
Which is why I asked my question in the first place.

Is there a gluster developer out there working on this problem specifically?
Could we add some kind of sync done command that has to be run manually and 
until it is the failed node is not used?
The bottom line for me is that I would much rather run on a performance 
degraded array until a sysadmin intervenes, than loose any data.

^C



Stephan von Krawczynski wrote:
 I love top-post ;-)
 
 Generally, you are right. But in real-life you cannot trust on this
 smartness. We tried exactly this point and had to find out that the clients
 do not always select the correct file version (i.e. the latest) automatically.
 Our idea in the testcase was to bring down a node, update its kernel an revive
 it - just as you would like to do it in real world for a kernel update.
 We found out that some files were taken from the downed node afterwards and
 the new contents on the other node got in fact overwritten.
 This does not happen generally, of course. But it does happen. We could only
 stop this behaviour by setting favorite-child. But that does not really help
 a lot, since we want to take down all nodes some other day.
 This is in fact one of our show-stoppers.
 
 
 On Sun, 7 Mar 2010 01:33:14 -0800
 Liam Slusser lslus...@gmail.com wrote:
 
 Assuming you used raid1 (distribute), you DO bring up the new machine
 and start gluster.  On one of your gluster mounts you run a ls -alR
 and it will resync the new node.  The gluster clients are smart enough
 to get the files from the first node.

 liam

 On Sat, Mar 6, 2010 at 11:48 PM, Chad ccolu...@hotmail.com wrote:
 Ok, so assuming you have N glusterfsd servers (say 2 cause it does not
 really matter).
 Now one of the servers dies.
 You repair the machine and bring it back up.

 I think 2 things:
 1. You should not start glusterfsd on boot (you need to sync the HD first)
 2. When it is up how do you re-sync it?

 Do you rsync the underlying mount points?
 If it is a busy gluster cluster it will be getting new files all the time.
 So how do you sync and bring it back up safely so that clients don't connect
 to an incomplete server?

 ^C
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

 
 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users