* Andres Freund <and...@anarazel.de> [091201 08:42]: > On Tuesday 01 December 2009 14:38:26 marcin mank wrote: > > On Mon, Nov 30, 2009 at 9:27 PM, Heikki Linnakangas > > > > <heikki.linnakan...@enterprisedb.com> wrote: > > > Simon Riggs wrote: > > >> Proposal > > >> > > >> * We reserve enough space on a disk block for a CRC check. When a dirty > > >> block is written to disk we calculate and annotate the CRC value, though > > >> this is *not* WAL logged. > > > > > > Imagine this: > > > 1. A hint bit is set. It is not WAL-logged, but the page is dirtied. > > > 2. The buffer is flushed out of the buffer cache to the OS. A new CRC is > > > calculated and stored on the page. > > > 3. Half of the page is flushed to disk (aka torn page problem). The CRC > > > made it to disk but the flipped hint bit didn't. > > > > > > You now have a page with incorrect CRC on disk. > > > > What if we treated the hint bits as all-zeros for the purpose of CRC > > calculation? This would exclude them from the checksum. > That sounds like doing a complete copy of the wal page zeroing specific > fields > and then doing wal - rather expensive I would say. Both, during computing the > checksum and checking it...
No, it has nothing to do with WAL, it has to do with when writing "pages" out... You already double-buffer them (to avoid the page changing while you checksum it) before calling write, but the code writing (and then reading) pages doesn't currently have to know all the internal "stuff" needed decide what's a hint bit and what's not... And adding that information into the buffer in/out would be a huge wart on the modularity of the PG code... a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
signature.asc
Description: Digital signature