On Sat, 2013-03-16 at 20:41 -0400, Tom Lane wrote:
> Simon Riggs <si...@2ndquadrant.com> writes:
> > On 15 March 2013 13:08, Andres Freund <and...@2ndquadrant.com> wrote:
> >> I commented on this before, I personally think this property makes 
> >> fletcher a
> >> not so good fit for this. Its not uncommon for parts of a block being 
> >> all-zero
> >> and many disk corruptions actually change whole runs of bytes.

[ referring to Ants's comment that the existing algorithm doesn't
distinguish between 0x00 and 0xFF ]

> Meh.  I don't think that argument holds a lot of water.  The point of
> having checksums is not so much to notice corruption as to be able to
> point the finger at flaky hardware.  If we have an 8K page with only
> 1K of data in it, and we fail to notice that the hardware dropped a lot
> of bits in the other 7K, we're not doing our job; and that's not really
> something to write off, because it would be a lot better if we complain
> *before* the hardware manages to corrupt something valuable.

I will move back to verifying the page hole, as well.

There are a few approaches:

1. Verify that the page hole is zero before write and after read.
2. Include it in the calculation (if we think there are some corner
cases where the hole might not be all zero).
3. Zero the page hole before write, and verify that it's zero on read.
This can be done during the memcpy at no performance penalty in
PageSetChecksumOnCopy(), but that won't work for
PageSetChecksumInplace().

With option #2 or #3, we might also verify that the hole is all-zero if
asserts are enabled.

> So I think we'd be best off to pick an algorithm whose failure modes
> don't line up so nicely with probable hardware failure modes.  It's
> worth noting that one of the reasons that CRCs are so popular is
> precisely that they were designed to detect burst errors with high
> probability.

Another option is to use a different modulus. The page
http://en.wikipedia.org/wiki/Fletcher%27s_checksum suggests that a prime
number can be a good modulus for Fletcher-32. Perhaps we could use 251
instead of 255? That would make it less likely to miss a common form of
hardware failure, although it would also reduce the number of possible
checksums slightly (about 4% fewer than 2^16).

I'm leaning toward this option now, or a CRC of some kind if the
performance is reasonable.

Regards,
        Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to