We are using rsync for several years, but since a couple of months we use it to backup remote servers, some with more than 200GB capacity.
Especially Windows users sometimes have the (bad) habit to change the name of a directory with huge amounts of data below them. We see the same nasty results as you are talking about: * rsync "thinks" that the old directory name has disappeared, and deletes the directory on the target machine, throwing away the expensive transmission * the new directory name initiates a fresh / full (re)transmission, sometimes taking days.... while the "real work" would be done in minutes... * the servers we backup have between 20GB and 200GB capacity. * all rsync's are run in parallel, average sync time is 1.5 hour for 900GB. * when a "user" behaves as described, it takes days to a week to resync. It is a tricky problem to deal with i think, it is tempting to keep a checksum'd file/directory list on both sides with information like: * a fingerprint/signature/checksum to identify each file or directory * inode number * timestamp * filesize In case a files appears to be deleted, because the name/path is changed, it could possibly be identified by it's fingerprint and used to sync cleverly ;-) This in the thought of expanding --fuzzy, giving it more functionality (hint). For some time i am experimenting with a solution to this problem, by some sort of a "preprocessor", that tries to identify in the described way, creating hardlinks (ln) to let rsync think the files are already in the new location. I am traversing on both sides (remote and local) the directory trees, producing a file with the information described above, but it is still work in progress... The cost of keeping a database in this scenario would be truly justified for me. That rsync deletes the files in the old location is then no problem for me anymore. But.... i am just a user with needs... looking for a solution to a problem also, hoping this can be solved by the clever developers ;-) Maybe there is already a solution available, and we are chasing shadows ? Thanks, Nico Frank Thomas schreef: > > Good day, > > > > I’ve got a question regarding the usage of rsync that I just cannot > figure out. I’ve done a fare hunt for the answer, but I’m stumped. > > > > Here is the situation. > > > > I have two pc’s running linux and using rsync to perform a backup from > server1 to server2. For example: rsync -avzr -e 'ssh > -i/root/.ssh/id_rsa' --delete /home/samba/admin/software > www.some-server.com:/home/RemoteSystems/company/home/samba/admin > > Let’s say I have a directory within rsync’s scope to sync called > directory1. > > Rsync is run and directory1 is sync’ed from server1 to server2. Also, > a file named File1 is sync’ed because it is in the directory being > sync’ed. > > > > Server1 server2 > > Directory1 Directory1 > > File1 File1 > > > > Now, let’s say a user comes and changes the name of the Directory1 on > server1 to DirectoryNew, rsync performs the following actions: > > 1. rsync recognizes that Directory 1 is not on > server1, but it is on server2, so it flags it and it’s contents for > deletion on server2. > > 2. rsync recognizes that DirectoryNew is on server1, > but not on server2, so it flags it and it’s contents for copying to > server2. > > 3. rsync performs these actions to make the two > directories the same. > > > > This action is the simplest method of performing an rsync, but it > would be nice to have rsync to be intelligent enough to recognize a > name change but not an inode change on the source. So the action > performed would be, > > 1. rsync recognizes that Directory1 is not on > server1, but it’s inode still is. Rsync reads the new directory name > and flags the name change from Directory1 to DirectoryNew on server1. > > 2. Rsync reads server2 and sees that Directory1 > exists, and flags a pending name change on server2 from Directory1 to > DirectoryNew. > > 3. Name is changed on server2. No files or > directories are deleted and re-transferred from source to destination > as the structure under the directory has not changed. > > > > Why go through all this work? I’ve had personnel change a directory > name that has several gigabytes of data in it without notifying me and > at night, rsync tries to perform the directory and file dance and > fails simply because the volume is so great. It would be nice to > either, one, recognize a large discrepancy between the source and > destination before anything occurs, by giving a message of amount of > potential bytes that would be transferred, (this doesn’t work with > dry-run option), or do the fancy dance by recognizing a name change > over a deletion of a directory. > > > > Thanks. > > > > *Frank Thomas* > > > -- Behandeld door / Handled by: N.J. van der Horn (Nico) --- ICT Support Vanderhorn IT-works, www.vanderhorn.nl, Voorstraat 55, 3135 HW Vlaardingen, The Netherlands, Tel +31 10 2486060, Fax +31 10 2486061 -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html