On Fri, Dec 30, 2011 at 11:58 AM, Jeff Janes <jeff.ja...@gmail.com> wrote: > On 12/29/11, Ants Aasma <ants.aa...@eesti.ee> wrote: >> Unless I'm missing something, double-writes are needed for all writes, >> not only the first page after a checkpoint. Consider this sequence of >> events: >> >> 1. Checkpoint >> 2. Double-write of page A (DW buffer write, sync, heap write) >> 3. Sync of heap, releasing DW buffer for new writes. >> ... some time goes by >> 4. Regular write of page A >> 5. OS writes one part of page A >> 6. Crash! >> >> Now recovery comes along, page A is broken in the heap with no >> double-write buffer backup nor anything to recover it by in the WAL. > > Isn't 3 the very definition of a checkpoint, meaning that 4 is not > really a regular write as it is the first one after a checkpoint?
I think you nailed it. > But it doesn't seem safe to me replace a page from the DW buffer and > then apply WAL to that replaced page which preceded the age of the > page in the buffer. That's what LSNs are for. If we write the page to the checkpoint buffer just once per checkpoint, recovery can restore the double-written versions of the pages and then begin WAL replay, which will restore all the subsequent changes made to the page. Recovery may also need to do additional double-writes if it encounters pages that for which we wrote WAL but never flushed the buffer, because a crash during recovery can also create torn pages. When we reach a restartpoint, we fsync everything down to disk and then nuke the double-write buffer. Similarly, in normal running, we can nuke the double-write buffer at checkpoint time, once the fsyncs are complete. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers