On Sat, Oct 12, 2002 at 11:13:50AM -0700, Derek Simkowiak wrote: > > My theory is that this is expected behavior given the check sum size. > > Craig, > Excellent analysis!
I was a bit concerned about his maths at first, but I did it myself from scratch using a different aproach and got the same figures... for those who are interested, an OK approximation for the maths turns out to be; p = (c^2)/2^(b+1) where: p is probability of "collision" c is number of blocks b is number of bits in checksum. Provided c is significantly less than 2^b. As c approaches 2^b, p becomes too large using this approx(when c=2^b, p should be 1). > Assuming your hypothesis is correct, I like the adaptive checksum > idea. But how much extra processor overhead is there with a larger > checksum bit size? Is it worth the extra code and testing to use an > adaptive algorithm? There is no extra processor overhead, because the full md4sums are currently calculated anyway, before they are truncated down to minimise the signature size. > I'd be more inclined to say "This ain't the 90's anymore", realize > that overall filesizes have increased (MP3, MS-Office, CD-R .iso, and DV) > and that people are moving from dialup to DSL/Cable, and then make either > the default (a) initial checksum size, or (b) block size, a bit larger. librsync has options to specify the "strong sum" size, though I think it currently just uses the full md4sum size. pysync also uses the full md4sum size. I'm not sure what best aproach is, but there should be a relationship between block size, signature size, and file size. -- ---------------------------------------------------------------------- ABO: finger [EMAIL PROTECTED] for more info, including pgp key ---------------------------------------------------------------------- -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html