Hi. When I discovered rsync, it immediately became one of my most indispensable utilities. It's a real godsend on bandwidth-limited links, especially digital cellular.
It works remarkably well in the general case, but I think the algorithm could be improved for one very important special case. Many (or even most) of the updated files I transfer with rsync change only by stuff being appending to the end. Examples of such files include system logs and (especially) email archives in mbox format. Rsync correctly handles these files, of course, but I think it could do so more efficiently. Right now, the receiver sends back a list of checksums for the blocks it has, and this checksum list can grow quite long when the file is large. I often see transfers of large mailboxes where the appendage of one small email message to the sender's copy results in a reverse transfer of checksum blocks that is much larger than the new message. It seems to me that this situation is common enough that the rsync protocol should look for it as a special case. Once the protocol has determined from differing timestamps and/or lengths that a file needs to be synchronized, the receiver should return a hash (and length) of its copy of the entire file to the sender. The sender then computes the hash for the corresponding leading segment of its copy. If they match, the sender simply sends the newly appended data and instructs the receiver to append it to its copy. I just joined this list, and I couldn't find any obvious discussion of this issue in the archives. My apologies if it has already been discussed. Phil Karn