Paul Schlie wrote:
... if that doesn't fix
the problem, assume a single bit error, and iteratively flip
single bits until the check sum matches ...
This can actually be done much faster, if you're doing a CRC checksum
(aka modulo over GF(2^n)). Basically, an error flipping bit n will
always create the same xor between the computed CRC and the stored CRC.
So you can just store a table- for all n, an error in bit n will create
an xor of this value, sort the table in order of xor values, and then
you can do a binary search on the table, and get exactly what bit was wrong.
This is actually probably fairly safe- for an 8K page, there are only
65536 possible bit positions. Assuming a 32-bit CRC, that means that
larger corrupts are much more likely to hit one of the other
4,294,901,760 (2^32 - 2^16) CRC values- 99.998% likely, in fact.
Brian
(hopefully not making the
problem worse as may be the case if many bits were actually already
in error) and write the data back, and proceed as normal, possibly
logging the action; otherwise presume the data is unrecoverable and
in error, somehow mark it as being so such that subsequent queries
which may utilize any portion of it knows it may be corrupt (which
I suspect may be best done not on file-system blocks, but actually
on a logical rows or even individual entries if very large, as my
best initial guess, and likely to measurably affect performance
when enabled, and haven't a clue how resulting query should/could
be identified as being potentially corrupt without confusing the
client which requested it).
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers