On Mon, Oct 14, 2002 at 10:45:44PM +1000, Donovan Baarda wrote:
> In conclusion, a blocksize of 700 with the current 48bit signature blocksum
> has an unacceptable failure rate (>5%) for any file larger than 100M, unless
> the file being synced is almost identical.
> 
> Increasing the blocksize will help, with the following minimum sizes being
> recommended for a <5% failure rate;
> 
> file  block
>  100M   1K
>  200M   3K
>  400M  12K
>  800M  48K
>    1G  75K
>    2G 300K
>    4G 1.2M
> 
> Note that the required block size is growing faster than the file size is,
> so the number of blocks in the signature is shrinking as the file grows. We
> absolutely need to increase the signature checksum size as the filesize
> increases.
> 
> > If my new hypothesis is correct we definitely need to increase the size
> > of the first-pass checksum for files bigger than maybe 50MB.
> 
> Does the first pass signature block checksum really only use 2 bytes of the
> md4sum? That seems pretty damn small to me. For 100M~1G you need at least
> 56bits, for 1G~10G you need 64bits. If you go above 10G you need more than
> 64bits, but you should probably increase the block size as well/instead.

It is worth remembering that increasing the block size with
a fixed checksum size increases the likelihood of two
unequal blocks having the same checksums.

I think we want both the block and checksum sizes to
increase with file size.  Just increasing block size gains
diminishing returns but just increasing checksum size will
cause a non-linear increase in bandwidth requirement.
Increasing both in tandem is appropriate.  Larger files call
for larger blocks and larger blocks deserve larger
checksums.

I do think we want a ceiling on block size unless we layer
the algorithm.  The idea of transmitting 300K because a 4K
block in a 2GB DB file was modified is unsettling.

Note for rsync2 or superlifter:
        We may want to layer the algorithm so that large
        files get a first pass with large blocks but
        modified blocks are accomplished with a second pass
        using smaller blocks.
        ie. 2GB file is checked with 500KB blocks and a
        500KB block that changed is checked with 700B
        so rsyncing the file would be almost like rsyncing
        a directory with -c.

-- 
________________________________________________________________
        J.W. Schultz            Pegasystems Technologies
        email address:          [EMAIL PROTECTED]

                Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html

Reply via email to