Simon Riggs <si...@2ndquadrant.com> writes: > On Mon, 2009-11-30 at 16:49 -0500, Aidan Van Dyk wrote: >> No, I believe the torn-page problem is exactly the thing that made the >> checksum talks stall out last time... The torn page isn't currently a >> problem on only-hint-bit-dirty writes, because if you get >> half-old/half-new, the only changes is the hint bit - no big loss, the >> data is still the same.
> A good argument, but we're missing some proportion. No, I think you are. The problem with the described behavior is exactly that it converts a non-problem into a problem --- a big problem, in fact: uncorrectable data loss. Loss of hint bits is expected and tolerated in the current system design. But a block with bad CRC is not going to have any automated recovery path. So the difficulty is that in the name of improving system reliability by detecting infrequent corruption events, we'd be decreasing system reliability by *creating* infrequent corruption events, added onto whatever events we were hoping to detect. There is no strong argument you can make that this isn't a net loss --- you'd need to pull some error-rate numbers out of the air to even try to make the argument, and in any case the fact remains that more data gets lost with the CRC than without it. The only thing the CRC is really buying is giving the PG project a more plausible argument for blaming data loss on somebody else; it's not helping the user whose data got lost. It's hard to justify the amount of work and performance hit we'd take to obtain a "feature" like that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers