G'day again, From: "Eran Tromer" <[EMAIL PROTECTED]> [...] > > if the > > checksum_seed is based on something like the whole file md4sum, it > > becomes repeatable, but unpredictable. You can't manipulate individual > > blocks without it affecting every other blocksum, but the signature for > > the same file is always the same. Nice :-) > > Nice indeed, but the cost is enormous: you'll have to read the file > twice. When syncing a mostly-unchanged file that's larger than the disk > cache, that means doubling the runtime (and disk load) on the receiver's > side. Also, it means 'rdiff signature' and equivalents won't work on
But the vast majority of the load is in the delta calculation on the sender's side. > arbitrary streams. Currently the receiver can tee the output of 'rdiff > patch' into both the output file and an instance of 'rdiff signature', > in order to save the IO of re-reading the file upon the next sync; this > won't work anymore. (How about a built-in option for that, BTW?). True... that is a nice feature... you are slowly turning me off the whole-file checksum as seed :-) > > More thoughts on using the wholefile sum as the seed; at first I thought > > this would still be vulnerable to case 3). Using a any single block as > > the original file and trying to find any block that matches means you > > still have "birthday algorithm" numbers of blocks to check (ie 2^(n/2)). > > However, each block "comparison" requires the recalculation of the > > "target" blocksum using the "original" blocksum as the seed, resulting > > in non-birthday algorithm number of checksum calculations (ie, 2^n). > > I'm afraid it's still vulnerable to case 3 (a pair of "target" and > "original" files with matching blocks). For simplicity consider > single-block files. In this case what you've done is simply to replace > the hash function > f(x) = truncate(MD4(x,fixed_seed)) > with the hash function > f'(x) = truncate(MD4(x,MD4(x,fixed_seed))) Not quite... it's f(x,y) = truncate(MD4(x,MD4(y,fixed_seed))), where x and y are the two blocks to be compared. This means you have to re-calculate the hash for every compare, not just once for every block. > which happens to be the same as hashing two copies of x in sequence. > But the birthday attack doesn't care what's the hash function; it only > cares about its input and output sizes (which we didn't change) and that > the function is "sufficiently random". It also assumes that the hash is done once per sample input, not once per compare. Sure, you still only need to try 2^(n/2) blocks, but you need to calculate 2^n hashes, and that's what really hurts. ---------------------------------------------------------------- Donovan Baarda http://minkirri.apana.org.au/~abo/ ---------------------------------------------------------------- -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html