On Fri, 2008-10-17 at 12:26 -0300, Alvaro Herrera wrote: > So this discussion died with no solution arising to the > hint-bit-setting-invalidates-the-CRC problem. > > Apparently the only solution in sight is to WAL-log hint bits. Simon > opines it would be horrible from a performance standpoint to WAL-log > every hint bit set, and I think we all agree with that. So we need to > find an alternative mechanism to WAL log hint bits.
It occurred to me that maybe we don't need to WAL-log the CRC checks. Proposal * We reserve enough space on a disk block for a CRC check. When a dirty block is written to disk we calculate and annotate the CRC value, though this is *not* WAL logged. * In normal running we re-check the CRC when we read the block back into shared_buffers. * In recovery we will overwrite the last image of a block from WAL, so we ignore the block CRC check, since the WAL record was already CRC checked. If full_page_writes = off, we ignore and zero the block's CRC for any block touched during recovery. We do those things because the block CRC in the WAL is likely to be different to that on disk, due to hints. * We also re-check the CRC on a block immediately before we dirty the block (for any reason). This minimises the possibility of in-memory data corruption for blocks. So in the typical case all blocks moving from disk <-> memory and from clean -> dirty are CRC checked. So in the case where we have full_page_writes = on then we have a good CRC every time. In the full_page_writes = off case we are exposed only on the blocks that changed during last checkpoint cycle and only if we crash. That seems good because most databases are up 99% of the time, so any corruptions are likely to occur in normal running, not as a result of crashes. This would be a run-time option. Like it? -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers