2009]

Nicolas Williams Fri, 16 Oct 2009 11:36:16 -0500

On Thu, Oct 15, 2009 at 07:27:07PM -0700, Garrett D'Amore wrote:
> Scott Rotondo wrote:
> >Perhaps it's worth pointing out that both statements above are 
> >correct, but they are answers to different questions. 10^-77 is the 
> >probability of a hash collision for a particular pair of blocks. For 
> >ZFS, we care if there is a collision between *any* pair of unequal 
> >blocks. That probability depends on the number of blocks, as Krishna 
> >points out. Finally, both of these calculations rely upon the implicit 
> >assumption that the 2^256 possible hash values are uniformly 
> >distributed; that assumption is widely accepted to be at least 
> >approximately true, but I'm not aware of a mathematical proof.
> >
> >In any case, I think it's safe to conclude that SHA-256 is more than 
> >adequate for filesystem block equality comparisons.
> 
> That's true today.   At what point will Moore's law catch up though?   
> (In other words, how long will it take for storage densities to reach 
> the point where where the risk of a collision becomes significant?)  
> Start from a petabyte (probably about the largest practical filesystem 
> size in use today), and double every 12 months.  (I think storage has 
> been outpacing Moore somewhat.)


It's not.  Brute forcing a security system with 128 bits of security,
and storing 2^128 bits runs into fundamental physical limits.  Still, if
you have 2^48 bits of storage the likelihood of pair-wise conflicts with
a 256-bit hash is going to be a more that 2^-128: ~ 2^-97 if we assume a
block size of 128KB.  2^-97 is still extremely unlikely.  If we up the
storage amount to 2^64 and block sizes to 1MB we have a 2^-88
probability of collisions.  Still comfortable, but if SHA-256 turns out
to have weaknesses, then 2^-88 begins to get uncomfortable.  Of course,
by the time anyone has 2^64 bits of storage we'll have switched to a
larger hash function for zfs send streams.

The problem for me is not that 128 bits is not enough -- it sure seems
like enough.  One problem is that we don't know that SHA-256 has a
uniform distribution of outputs for any random set of inputs, but let's
assume that SHA-256 does.  The bigger problem for me is that ZFS had
never before used checksums for equality comparison, and I just wanted
to make sure that the fact that ZFS would now have one use case of
checksums for equality comparison didn't happen by accident.  Since the
i-team has indicated that this design point is purposeful, I'm done.

Nico
--

ZFS send dedup [PSARC/2009/557 FastTrack timeout 10/21/2009]

Reply via email to