On Mon, Oct 14, 2002 at 10:45:44PM +1000, Donovan Baarda wrote: > In conclusion, a blocksize of 700 with the current 48bit signature blocksum > has an unacceptable failure rate (>5%) for any file larger than 100M, unless > the file being synced is almost identical. > > Increasing the blocksize will help, with the following minimum sizes being > recommended for a <5% failure rate; > > file block > 100M 1K > 200M 3K > 400M 12K > 800M 48K > 1G 75K > 2G 300K > 4G 1.2M > > Note that the required block size is growing faster than the file size is, > so the number of blocks in the signature is shrinking as the file grows. We > absolutely need to increase the signature checksum size as the filesize > increases. > > > If my new hypothesis is correct we definitely need to increase the size > > of the first-pass checksum for files bigger than maybe 50MB. > > Does the first pass signature block checksum really only use 2 bytes of the > md4sum? That seems pretty damn small to me. For 100M~1G you need at least > 56bits, for 1G~10G you need 64bits. If you go above 10G you need more than > 64bits, but you should probably increase the block size as well/instead.
It is worth remembering that increasing the block size with a fixed checksum size increases the likelihood of two unequal blocks having the same checksums. I think we want both the block and checksum sizes to increase with file size. Just increasing block size gains diminishing returns but just increasing checksum size will cause a non-linear increase in bandwidth requirement. Increasing both in tandem is appropriate. Larger files call for larger blocks and larger blocks deserve larger checksums. I do think we want a ceiling on block size unless we layer the algorithm. The idea of transmitting 300K because a 4K block in a 2GB DB file was modified is unsettling. Note for rsync2 or superlifter: We may want to layer the algorithm so that large files get a first pass with large blocks but modified blocks are accomplished with a second pass using smaller blocks. ie. 2GB file is checked with 500KB blocks and a 500KB block that changed is checked with 700B so rsyncing the file would be almost like rsyncing a directory with -c. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html