On Tuesday 01 December 2009 15:26:21 Aidan Van Dyk wrote: > * Andres Freund <and...@anarazel.de> [091201 08:42]: > > On Tuesday 01 December 2009 14:38:26 marcin mank wrote: > > > On Mon, Nov 30, 2009 at 9:27 PM, Heikki Linnakangas > > > > > > <heikki.linnakan...@enterprisedb.com> wrote: > > > > Simon Riggs wrote: > > > >> Proposal > > > >> > > > >> * We reserve enough space on a disk block for a CRC check. When a > > > >> dirty block is written to disk we calculate and annotate the CRC > > > >> value, though this is *not* WAL logged. > > > > > > > > Imagine this: > > > > 1. A hint bit is set. It is not WAL-logged, but the page is dirtied. > > > > 2. The buffer is flushed out of the buffer cache to the OS. A new CRC > > > > is calculated and stored on the page. > > > > 3. Half of the page is flushed to disk (aka torn page problem). The > > > > CRC made it to disk but the flipped hint bit didn't. > > > > > > > > You now have a page with incorrect CRC on disk. > > > > > > What if we treated the hint bits as all-zeros for the purpose of CRC > > > calculation? This would exclude them from the checksum. > > > > That sounds like doing a complete copy of the wal page zeroing specific > > fields and then doing wal - rather expensive I would say. Both, during > > computing the checksum and checking it...
> No, it has nothing to do with WAL, it has to do with when writing > "pages" out... You already double-buffer them (to avoid the page > changing while you checksum it) before calling write, but the code > writing (and then reading) pages doesn't currently have to know all the > internal "stuff" needed decide what's a hint bit and what's not... err, yes. That "WAL" slipped in, sorry. But it would still either mean a third copy of the page or a rather complex jumping around on the page... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers