On Thu, Oct 15, 2009 at 07:27:07PM -0700, Garrett D'Amore wrote: > Scott Rotondo wrote: > >Perhaps it's worth pointing out that both statements above are > >correct, but they are answers to different questions. 10^-77 is the > >probability of a hash collision for a particular pair of blocks. For > >ZFS, we care if there is a collision between *any* pair of unequal > >blocks. That probability depends on the number of blocks, as Krishna > >points out. Finally, both of these calculations rely upon the implicit > >assumption that the 2^256 possible hash values are uniformly > >distributed; that assumption is widely accepted to be at least > >approximately true, but I'm not aware of a mathematical proof. > > > >In any case, I think it's safe to conclude that SHA-256 is more than > >adequate for filesystem block equality comparisons. > > That's true today. At what point will Moore's law catch up though? > (In other words, how long will it take for storage densities to reach > the point where where the risk of a collision becomes significant?) > Start from a petabyte (probably about the largest practical filesystem > size in use today), and double every 12 months. (I think storage has > been outpacing Moore somewhat.)
It's not. Brute forcing a security system with 128 bits of security, and storing 2^128 bits runs into fundamental physical limits. Still, if you have 2^48 bits of storage the likelihood of pair-wise conflicts with a 256-bit hash is going to be a more that 2^-128: ~ 2^-97 if we assume a block size of 128KB. 2^-97 is still extremely unlikely. If we up the storage amount to 2^64 and block sizes to 1MB we have a 2^-88 probability of collisions. Still comfortable, but if SHA-256 turns out to have weaknesses, then 2^-88 begins to get uncomfortable. Of course, by the time anyone has 2^64 bits of storage we'll have switched to a larger hash function for zfs send streams. The problem for me is not that 128 bits is not enough -- it sure seems like enough. One problem is that we don't know that SHA-256 has a uniform distribution of outputs for any random set of inputs, but let's assume that SHA-256 does. The bigger problem for me is that ZFS had never before used checksums for equality comparison, and I just wanted to make sure that the fact that ZFS would now have one use case of checksums for equality comparison didn't happen by accident. Since the i-team has indicated that this design point is purposeful, I'm done. Nico --