Hi Glusterfs Users!
I have got one replicated volume with two bricks:
s1 ~ # gluster volume info
Volume Name: data-ns
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: s1:/mnt/gluster/data-ns
Brick2: s2:/mnt/gluster/data-ns
Options Reconfigured:
performance.cache-refresh-timeout: 1
performance.io-thread-count: 32
auth.allow: 10.*
performance.cache-size: 1073741824
There are 5 clients which have got mounted volume from s1 server.
We've face a hardware failure on s2 box for about one week. During that
time the s2 box was down.
All read writes operations went to s1.
Now I would like to synchronize all files on s2 which is operable. I have
started Glusterfs Server and
executed self healing process("find with stat"on the glusterfs mount from
s2 box).
During the replication process I have faced very strange behaviour of
Glusterfs.
Some of clients have tried to get lots of files from s2 server, but those
files did not exist or have got 0 bytes size.
It caused lots of "disk wait" on the web servers (clients which have got
mounted volume from s1) and finally 503 http response had been sent.
My question is, how to avoid serving files from s2 box until all files
would be replicated correctly from s1 server?
I have installed Glusters 3.2.6-1 from Debian repository.
Thank you a lot in advance,
Jimmy,
Dear Jimmy,
I have had problems re-synchronising out of date servers myself. I
posted the following query last year.
http://gluster.org/pipermail/gluster-users/2011-October/008933.html
In my case I was mainly worried about the self-heal process causing
excessive load, which I suspected of causing my fairly low specification
servers to hang. Following that posting I received some advice off line
concerning the use of rsync to re-synchronise out of date servers that
have been off line for repairs for a long period of time. I was advised
that it is safe to use rsync, provided that the -X or --xattrs option is
used to preserve extended attributes, and it is also necessary to use
the --delete option in order to delete files that were deleted from the
live server. When I do this I disable the glusterd service while the
rsync is taking place, although I have not been advised that this is
essential. It is possible that files on the live server may be modified
while the rsync is in process, so I always follow up with a targeted
self-heal in order to bring the repaired server fully up to date. The
targeted self-heal procedure is described in the following Gluster
Community article.
http://community.gluster.org/a/howto-targeted-self-heal-repairing-less-than-the-whole-volume/
When the resynchronisation process is complete I have noticed that the
volume of data in replicated bricks can differ by up to 100MB. I find
this a bit worrying, but I haven't had time to find out exactly which
files are on these bricks and why the volume of data reported by df
differs on the two servers.
The problem with the rsync approach is that it can take a very long time
if there are a large number of files to synchronise, probably because
rsync is single threaded. I recently had one rsync going for two weeks
and it still didn't finish, and I discovered that the bricks in question
had more than 2.5 million files. I couldn't wait any longer to bring my
repaired server back into service so I killed the rsync and started
glusterd, and I then ran a targeted self-heal on the unsynchronised
bricks to continue the resynchronisation. That is still going on now,
but I am not seeing excessive load and haven't noticed any replication
errors (but I haven't got the time to check thoroughly). This might be
because most of the file transfer has already taken place or because
most of the files in these particular bricks are small.
My conclusion from this experience is that if a server goes down for a
long time and becomes significantly out of date, it is best to use rsync
(with glusterd disabled) to do as much of the file transfer as
possible. Once that has been done, the GlusterFS self heal mechanism
can finish off the resynchronisation without any problematic side
effects. I will follow that procedure next time and report any other
problems or observations.
-Dan.
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users