Re: Data Deduplication with the help of an online filesystem check

Dmitri Nikulin Tue, 28 Apr 2009 14:26:19 -0700

On Wed, Apr 29, 2009 at 3:43 AM, Chris Mason <chris.ma...@oracle.com> wrote:
> So you need an extra index either way.  It makes sense to keep the
> crc32c csums for fast verification of the data read from disk and only
> use the expensive csums for dedup.


What about self-healing? With only a CRC32 to distinguish a good block
from a bad one, statistically you're likely to get an incorrectly
healed block in only every few billion blocks. And that may not be
your machine, but it'll be somebody's, since the probability is way
too high for it not to happen to somebody. Even just a 64 bit checksum
would drop the probability plenty, but I'd really only start with 128
bits. NetApp does 64 for 4k of data, ZFS does 256 bits per block, and
this traces back to the root like a highly dynamic Merkle tree.

In the CRC case the only safe redundancy is one that has 3+ copies of
the block, to compare the raw data itself, at which point you may as
well have just been using healing RAID1 without checksums.

-- 
Dmitri Nikulin

Centre for Synchrotron Science
Monash University
Victoria 3800, Australia
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Data Deduplication with the help of an online filesystem check

Reply via email to