On Mon, Jan 08, 2007 at 01:37:45AM -0600, Evan Harris wrote: > I've been playing with rsync and very large files approaching and > surpassing 100GB, and have found that rsync has excessively very poor > performance on these very large files, and the performance appears to > degrade the larger the file gets.
Yes, this is caused by the current hashing algorithm that the sender uses to find matches for moved data. The current hash table has a fixed size of 65536 slots, and can get overloaded for really large files. There is a diff in the patches dir that makes rsync work better with large files: dynamic_hash.diff. This makes the size of the hash table depend on how many blocks there are in the transfer. It does speed up the transfer of large files significantly, but since it introduces a mod (%) operation on a per-byte basis, it slows down the transfer of normal sized files significantly. I'm going to be checking into using a hash algorithm with a table that is always a power of 2 in size as an alternative implementation of this dynamic hash algorithm. That will hopefully not bloat the CPU time for normal-sized files. Alternately, the hashing algorithm could be made to vary depending on the file's size. I'm hoping to have this improved in the upcoming 3.0.0 release. And one final thought that occurred to me: it would also be possible for the sender to segment a really large file into several chunks, handling each one without overlap, all without the generator or the receiver knowing that it was happening. The upside is that huge files could be handled this way, but the downside is that the incremental-sync algorithm would not find matches spanning the chunks. It would be interesting to test this and see if the rsync algorithm would be better served by using a larger number of smaller chunks while segmenting the file, rather than a smaller number of much larger chunks while considering the file as a whole. ..wayne.. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html