Tom Lane wrote:
> Paul Schlie writes:
>> - yes, if you're willing to compute true CRC's as opposed to simpler
>> checksums, which may be worth the price if in fact many/most data
>> check failures are truly caused by single bit errors somewhere in the
>> chain,
> 
> FWIW, not one of the corrupted-data problems I've investigated has ever
> looked like a single-bit error.  So the theoretical basis for using a
> CRC here seems pretty weak.  I doubt we'd even consider automatic repair
> attempts anyway.

- although I accept that you may be correct in your assessment that most
errors are in fact multi-bit; I've never seen any hard data to coberate
either this or my suspicion that most errors are in fact single bit in
nature (if occurring within the read/processing/write paths from storage),
but agree that if occurring within an otherwise ECC'd memory subsystem,
would have to be multi-bit in nature; however in systems which record very
low single bit corrected errors, and little if any uncorrectable double bit
errors, it seems unlikely that multi-bit errors resulting from memory
failure can account for the number of integrity check failures for data
stored in file systems; so strongly suspect that of the failures you've
had occasion to investigate, they were predominantly so catastrophic
they were sufficiently obvious to catch your attention, with most having
more subtle integrity errors simply sneaking below the radar. (As it
seems clear that statistically hardware failure will most likely result
in single bit errors being injected into data with greater frequency than
multi-bit ones, and will not be detected unless otherwise provisioned to
be minimally detected, if not corrected at each communication boundary the
data traverses).



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to