Tom Lane wrote: > Paul Schlie writes: >> - yes, if you're willing to compute true CRC's as opposed to simpler >> checksums, which may be worth the price if in fact many/most data >> check failures are truly caused by single bit errors somewhere in the >> chain, > > FWIW, not one of the corrupted-data problems I've investigated has ever > looked like a single-bit error. So the theoretical basis for using a > CRC here seems pretty weak. I doubt we'd even consider automatic repair > attempts anyway.
- although I accept that you may be correct in your assessment that most errors are in fact multi-bit; I've never seen any hard data to coberate either this or my suspicion that most errors are in fact single bit in nature (if occurring within the read/processing/write paths from storage), but agree that if occurring within an otherwise ECC'd memory subsystem, would have to be multi-bit in nature; however in systems which record very low single bit corrected errors, and little if any uncorrectable double bit errors, it seems unlikely that multi-bit errors resulting from memory failure can account for the number of integrity check failures for data stored in file systems; so strongly suspect that of the failures you've had occasion to investigate, they were predominantly so catastrophic they were sufficiently obvious to catch your attention, with most having more subtle integrity errors simply sneaking below the radar. (As it seems clear that statistically hardware failure will most likely result in single bit errors being injected into data with greater frequency than multi-bit ones, and will not be detected unless otherwise provisioned to be minimally detected, if not corrected at each communication boundary the data traverses). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers