On 9/10/2013 10:27 AM, Vortex wrote: > On 06.09.2013 11:17, Dave Howorth wrote: >> I suppose that it should not be necessary to run hardlink after dirvish, >> in theory. dirvish uses rsync and instructs it to make hard links >> between the backups. Any dupes in the original data are better fixed by >> running hardlink or similar on the original data, not the backup. > I suppose that dirvish only links identical files across images, NOT > multiples inside the same image. For that, hardlink may be useful.
Yes, dirvish/rsync will not hardlink duplicate files within an image as it's a bit more complicated than that. When hardlinking first happens across images, it's because they start 100% identical before the sync part of rsync happens. If I understand rsync correctly, any data or metadata change will break that hardlink with a complete, independent copy of the current version of the file. That means, IIRC, that even an mtime change could cause a duplicate file while the contents may still be identical. Even if mtime doesn't do it, a change in permissions/ownership surely will regardless if the file contents are identical. So, here's the big question that needs to be answered. Does hardlink(1) check for metadata differences? And if it does, how does it determine which version to keep? As both hardlinks to the same file share the same inode, they also share the same metadata (permission, times, ownerships, etc.) A good backup solution should preserve that metadata for every successful image. There is also three levels of duplication I can see. One is duplication of files on a single filesystem. If there are duplicated files on a server, they should be hard-linked on the original filesystem which will then transfer to dirvish/rsync automatically, but that can only be done if it's acceptable to have the same metadata. It doesn't work if they have different ownership, for example, due to some kind of per-user jail that is being done. The second is duplication between images. Dirvish/rsync should handle this automatically and only create duplicate copies of data when there is a metadata change. The third duplication is between vaults due to identical software installed on multiple servers. This will result in duplication when files are changed or added, but can be squashed post-rsync with a command like hardlink(1). But again, this will squash metadata to one version. While permissions will probably be the same, times may not be. Is it OK to squash this information? > > Cheers > > V. > > > > _______________________________________________ > Dirvish mailing list > [email protected] > http://www.dirvish.org/mailman/listinfo/dirvish -- Loren M. Lang [email protected] http://www.alzatex.com/ Public Key: ftp://ftp.tallye.com/pub/lorenl_pubkey.asc Fingerprint: 10A0 7AE2 DAF5 4780 888A 3FA4 DCEE BB39 7654 DE5B
_______________________________________________ Dirvish mailing list [email protected] http://www.dirvish.org/mailman/listinfo/dirvish
