Re: Expected behavior of bad sectors on one drive in a RAID1

Austin S Hemmelgarn Wed, 21 Oct 2015 04:52:47 -0700

On 2015-10-20 15:59, Austin S Hemmelgarn wrote:

On 2015-10-20 15:20, Duncan wrote:

Yes, there's some small but not infinitesimal chance the checksum may be
wrong, but if there's two copies of the data and the checksum on one is
wrong while the checksum on the other verifies... yes, there's still that
small chance that the one that verifies is wrong too, but that it's any
worse than the one that does not verify?  /That's/ getting close to
infinitesimal, or at least close enough for the purposes of a mailing-
list claim without links to supporting evidence by someone who has
already characterized it as not mathematically rigorous... and for me,
personally.  I'm not spending any serious time thinking about getting hit
by lightening, either, tho by the same token I don't go out flying kites
or waving long metal rods around in lightning storms, either.

With a 32-bit checksum and a 4k block (the math is easier with smaller
numbers), that's 4128 bits, which means that a random single bit error
will have a approximately 0.24% chance of occurring in a given bit,
which translates to an approximately 7.75% chance that it will occur in
one of the checksum bits.  For a 16k block it's smaller of course
(around 1.8% I think, but that's just a guess), but it's still
sufficiently statistically likely that it should be considered.

As mentioned in my other reply to this, I did the math wrong (bit of a difference between kilobit and kilobyte), so here's a (hopefully) correct and more thorough analysis:


For 4kb blocks (32768 bits):

There are a total of 32800 bits when including a 32 bit checksum outside the block, this makes the chance of a single bit error in either the block or the checksum ~0.30%. This in turn means an approximately 9.7% chance of a single bit error in the checksum.


For 16kb blocks (131072 bits):

There are a total of 131104 bits when including a 32 bit checksum outside the block, this makes the chance of a single bit error in either the block or the checksum ~0.07%. This in turn means an approximately 2.4% chance of a single bit error in the checksum.

This all of course assumes a naive interpretation of how modern block storage devices work. All modern hard drives and SSD's include at a minimum the ability to correct single bit errors per byte, and detect double bit errors per byte, which means that we need a triple bit error in the same byte to get bad data back, which in turn makes the numbers small enough that it's impractical to represent them without scientific notation (on the order of 10^-5).

That in turn assumes zero correlation beyond what's required to get bad data back from the storage, however, if there is enough correlation for that to happen, it's statistically likely that there will be other errors very close by. This in turn means that it's more likely that the checksum is either correct or absolutely completely wrong, which increases the chances that the resultant metadata block containing the checksum will nnot appear to have an incorrect checksum itself (because checksums are good at detecting proportionately small errors, but only mediocre at detecting very big errors).

The approximate proportionate chances of an error in the data versus the checksum however are still roughly the same however, irrespective of how small the chances of getting any error are. Based on this, the ratio of the size of the checksum to the size of the data is a tradeoff that needs to be considered, the closer the ratio is to 1, the higher the chance of having an error in the checksum, but the less data you need to correct/verify when there is an error.

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Expected behavior of bad sectors on one drive in a RAID1

Reply via email to