Hi all-

My test results so far indicate a pretty decent improvement in overall rsync 
performance when using a slightly more sophisticated checksum calculation.

The attached patch has the required changes (in hindsight, I should have 
compressed this using zlib with the new algorithm :-) ).

Some things to know about the patch:

First, it is against the zlib library - NOT the gzip application.

By default, rsyncable computations are turned on, and the default behavior is 
to use the new rolling checksum algorithm.  The window and reset block sizes 
are set to 30 bytes and 4096 bytes respectively.  I've found that this gets 
much better rsync performance when used with the Z_RSYNCABLE_RSSUM checksum 
algorithm.  If you want to play with the Z_RSYNCABLE_SIMPLESUM, and you want to 
keep your window sizes small, be sure you run several different window sizes - 
you'll be amazed at how much the compression ratio and rsync performance vary 
for small window sizes with that algorithm.  With Z_RSYNCABLE_RSSUM, the 
compression ratios and rsync performance are quite well behaved, even for block 
sizes down to 10 or 15 - but 30 seems like a safe value for the time being.

In my test runs, I'm seeing approximately 20-30% improvement in the total 
number of changed bytes identified by the rsync algorithm, without any impact 
on the zlib compression ratio as compared to the simpler rolling checksum 
algorithm.  Your results, of course, may vary :-)

This patch includes the patch for adding rsyncable behavior, plus my changes.  
If you just want the basic patch without my changes, it is located at 
https://svn.uhulinux.hu/packages/dev/zlib/patches/02-rsync.patch

You can configure the rsyncable behavior (which checksum to use, window size 
and block size) dynamically (instead of adjusting the #define lines at the 
beginning of defelate.c) by calling the deflateSetRsyncParameters() function 
immediately following stream initialization, and before writing anything to the 
stream.  This is good if you want to play with parametric studies, etc...

If you set the rolling checksum algorithm to Z_RSYNCABLE_OFF, you will get the 
exact behavior as zlib without the patch - it will be a hair slower, but 
compared to the rest of what's going on in zlib, the overhead of this should be 
quite negligible.



I'd love to hear feedback/comments!

Cheers,

- Kevin

Attachment: rsyncable_checksum.patch
Description: Binary data

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to