On Apr 25, 2007, at 2:48 PM, Heikki Linnakangas wrote:
In recovery, with full_pages_writes=on, we read in each page only
to overwrite the contents with a full page image. That's a waste of
time, and can have a surprisingly large effect on recovery time.
As a quick test on my laptop, I initialized a DBT-2 test with 5
warehouses, and let it run for 2 minutes without think-times to
generate some WAL. Then I did a "kill -9 postmaster", and took a
copy of the data directory to use for testing recovery.
With CVS HEAD, the recovery took ~ 2 minutes. With the attached
patch, it took 5 seconds. (yes, I used the same not-yet-recovered
data directory in both tests, and cleared the os cache with "echo 1
> /proc/sys/vm/drop_caches").
I was surprised how big a difference it makes, but when you think
about it it's logical. Without the patch, it's doing roughly the
same I/O as the test itself, reading in pages, modifying them, and
writing them back. With the patch, all the reads are done
sequentially from the WAL, and then written back in a batch at the
end of the WAL replay which is a lot more efficient.
It's interesting that (with the patch) full_page_writes can
*shorten* your recovery time. I've always thought it to have a
purely negative effect on performance.
I'll leave it up to the jury if this tiny little change is
appropriate after feature freeze...
While working on this, this comment in ReadBuffer caught my eye:
/*
* During WAL recovery, the first access to any data page should
* overwrite the whole page from the WAL; so a clobbered page
* header is not reason to fail. Hence, when InRecovery we may
* always act as though zero_damaged_pages is ON.
*/
if (zero_damaged_pages || InRecovery)
{
But that assumption only holds if full_page_writes is enabled,
right? I changed that in the attached patch as well, but if it
isn't accepted that part of it should still be applied, I think.
So what happens if a backend is running with full_page_writes = off,
someone edits postgresql.conf to turns it on and forgets to reload/
restart, and then we crash? You'll come up in recovery mode thinking
that f_p_w was turned on, when in fact it wasn't.
ISTM that we need to somehow log what the status of full_page_writes
is, if it's going to affect how recovery works.
--
Jim Nasby [EMAIL PROTECTED]
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match