Serious issue: rsync and hardlinks are dangerous...

2012-08-09 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Rsync 3.0.9 here.

I am using a rsync script like:

"""
rsync -z --numeric-ids -a -H --inplace --delete --delete-excluded
- --stats --progress -v --itemize-changes SOURCE DESTINATION
"""

I detected the following issue when RSYNCing a bunch of Mercurial
repositories. It is very dangerous, because it will corrupt files.

When cloning a local repository, Mercurial uses hardlinks for
performance and disk use. When one of the clones updates a file, the
file is "unlinked" and replaced by a new file, so history can diverge
gracefully.

The problem can be trivially reproduced like this:

1. Create a text file "a.txt" with a bunch of caracters inside.

2. Create a hardlink to that file, called "b.txt".

3. Use "rsync -z --numeric-ids -a -H --inplace --delete
- --delete-excluded --stats --progress -v --itemize-changes SOURCE
DESTINATION" to replicate the directory.

4. Verify that a new directory is created, with two files "a.txt" and
"b.txt", hardlinked. Nice.

5. Now delete the original "b.txt" and create a new file "b.txt", with
new DIFFERENT content. So you now have two different files in the source.

6. Rerun the "rsync" script.

7. In the destination directory you will have two files, "a.txt" and
"b.txt". They are still the same file, hardlinked. Both will have the
same content. The content of the original "a.txt" *OR* "b.txt" file.

8. Rerun the "rsync" script a few times. Each time, the destination
will have two hardlinked files, with the same content, alternating
between the "a.txt" and "b.txt" files.

So origin and destination will never synchronize (each time you rsync,
destination will alternate content), and destination will be
"corrupt", since different files in the origin are the same file in
the destination. Two years of backups are spoiled, because of this :-(.

I know that source and destination files can have a different link
count for a variety of valid reasons, but rsync should know, when
using "-H", that two hardlinked files in the destination are not
hardlinked in the origin anymore. That should be quite easy to detect,
since rsync track inodes already (when using "-H"), and can detect
that two files inside the destination path hardlinked are not
hardlinked in the origin.

Even if I stop using "-H", that I rather not, the destination will be
permanently corrupted UNTIL we delete it and start over again.

In my particular case, not using "-H" will explode my disk usage, but
using "-H" will corrupt the destination.

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/
j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:j...@jabber.org _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
"Things are not so easy"  _/_/  _/_/_/_/  _/_/_/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/_/_/_/  _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBUCRfCZlgi5GaxT1NAQIYzgP8DgYS+9RKwoR57KjcX+jAyhQmizZ3UG1y
3mSJmz0a77NiCiRhXDbaxEqBbmdNk6pZDWjva2CVKITjUqbIaPyR87NtD1kNd24q
LNWpTkS7KXEM7DzNs93URllT4jrnfx5W98EORXC7D6A8lg62WBipX4b91Xlx+/yj
63X7F4I7hIc=
=G19k
-END PGP SIGNATURE-
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Serious issue: rsync and hardlinks are dangerous...

2012-08-09 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/08/12 03:08, Jesus Cea wrote:
> In my particular case, not using "-H" will explode my disk usage,
> but using "-H" will corrupt the destination.

Looks like not using "--inplace" would work, but I have multimegabyte
files to transfer thru a flacky residential ADSL line :-(

Also, this is already reported, but a bit "forgotten" (last comment
was three years ago): 

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/
j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:j...@jabber.org _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
"Things are not so easy"  _/_/  _/_/_/_/  _/_/_/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/_/_/_/  _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBUCRoYJlgi5GaxT1NAQI6IAP/aZ0oNKR4AJYi82H18o4ROAo2gn09/IrU
4FQgOAc3Tnd8B/ozGXLwujYm+BqKOkacMzN6DnAH/OCZp2Er14QCPpX66VSQA2Py
AUnNjZ8Kqd9TfTEoWC82boTUbMk5f5J0H29Y9UP1NiyEuPkRjoaAu4ot0jbEGPUn
m/qGx+oGSG0=
=Lmr5
-END PGP SIGNATURE-
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Serious issue: rsync and hardlinks are dangerous...

2012-08-10 Thread grarpamp
> Also, this is already reported, but a bit "forgotten" (last comment
> was three years ago): 

I think if search the zilla, you find quite some open ticket on hardlink
issues still? :( As in some of those tickets, provide your replicate command
sequence and results to the tickets. Not forget 'ls -ali' too.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html