From: "Russ Cox" <[EMAIL PROTECTED]>
> sure.  use sha-256 and your probability of collision goes
> down even further.  but *you* (probably) still won't be *sure*.

I should probably not put my 2 cents worth in here,
but my resistance is weak...

It is true that you cannot be sure that there won't be a
collision in venti, regardless of what hashing function
you use.  It is probabilistic, and doesn't prevent it
from happening tomorrow, or from not happening until
the sun burns out.  But it seems to me that there's
a bigger picture.  The reason we would not want a collision
is that it would, in effect, be a form of data corruption.
But it's only one possible source.  It's possible that
network communication could be corrupted but still
pass the CRC checks (if they're even present).  It's
possible that the disk could be corrupted in such a
way that a block is in error, but still passes the
ECC check.  It's possible that a bit in the main
memory might flip (or two bits if we have parity
memory).  In the end, we have to rely on the fact
that these are all very unlikely to happen; their
probabilities are quite low.  A higher probability
of damage comes from a potential fire in the machine
room.  We often add some form of off-site backup
to handle this.  But it can't make us sure that
an earthquake won't hit the off-site backup location
at the same time we have a fire locally.  Rather,
the probability of both is low enough we accept it.
The amount of effort we put into mitigating an error
is proportional to the probability of that error
occurring and the amount of harm the error would
cause.

What does all this mean for venti?  If we want to
reduce the overall probability of data corruption,
we want to put our efforts into addressing the one
with the highest probability.  Making the others
better won't appreciably help the overall probability.
And a venti collision is not the one with the
highest probability among those I've listed.  In
fact, I'd suspect its the one with the lowest
probability.  So putting attention on making it
less likely is really misplaced effort from a
practical standpoint.

The truth is that the first time I read the venti
papers, I was bothered the same way.  Yes, there
can be problems, but generally we design systems
where the design itself doesn't contain any known
sources of failure.  In venti, we have.  And it
bugged me for quite a while.  But when I finally
realized objectively that the probability of a
hardware failure is orders of magnitude greater
than a collision, I started to accept that venti
is a very well-designed system, and is as reliable
as any other form of archive.

BLS

Reply via email to