On Apr 25, 2007, at 2:48 PM, Heikki Linnakangas wrote:
In recovery, with full_pages_writes=on, we read in each page only to overwrite the contents with a full page image. That's a waste of time, and can have a surprisingly large effect on recovery time.

As a quick test on my laptop, I initialized a DBT-2 test with 5 warehouses, and let it run for 2 minutes without think-times to generate some WAL. Then I did a "kill -9 postmaster", and took a copy of the data directory to use for testing recovery.

With CVS HEAD, the recovery took ~ 2 minutes. With the attached patch, it took 5 seconds. (yes, I used the same not-yet-recovered data directory in both tests, and cleared the os cache with "echo 1 > /proc/sys/vm/drop_caches").

I was surprised how big a difference it makes, but when you think about it it's logical. Without the patch, it's doing roughly the same I/O as the test itself, reading in pages, modifying them, and writing them back. With the patch, all the reads are done sequentially from the WAL, and then written back in a batch at the end of the WAL replay which is a lot more efficient.

It's interesting that (with the patch) full_page_writes can *shorten* your recovery time. I've always thought it to have a purely negative effect on performance.

I'll leave it up to the jury if this tiny little change is appropriate after feature freeze...

While working on this, this comment in ReadBuffer caught my eye:

        /*
         * During WAL recovery, the first access to any data page should
         * overwrite the whole page from the WAL; so a clobbered page
         * header is not reason to fail.  Hence, when InRecovery we may
         * always act as though zero_damaged_pages is ON.
         */
        if (zero_damaged_pages || InRecovery)
        {

But that assumption only holds if full_page_writes is enabled, right? I changed that in the attached patch as well, but if it isn't accepted that part of it should still be applied, I think.

So what happens if a backend is running with full_page_writes = off, someone edits postgresql.conf to turns it on and forgets to reload/ restart, and then we crash? You'll come up in recovery mode thinking that f_p_w was turned on, when in fact it wasn't.

ISTM that we need to somehow log what the status of full_page_writes is, if it's going to affect how recovery works.
--
Jim Nasby                                            [EMAIL PROTECTED]
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Reply via email to