As already mentioned in the broader discussion at http://archives.postgresql.org/message-id/4d4c4610.1030...@2ndquadrant.com , I'm seeing no solid performance swing in the checkpoint sorting code itself. Better sometimes, worse others, but never by a large amount.

Here's what the statistics part derived from the sorted data looks like on a real checkpoint spike:

2011-02-04 07:02:51 EST: LOG:  checkpoint starting: xlog
2011-02-04 07:02:51 EST: DEBUG: BufferSync 10 dirty blocks in relation.segment_fork 17216.0_2 2011-02-04 07:02:51 EST: DEBUG: BufferSync 159 dirty blocks in relation.segment_fork 17216.0_1 2011-02-04 07:02:51 EST: DEBUG: BufferSync 10 dirty blocks in relation.segment_fork 17216.3_0 2011-02-04 07:02:51 EST: DEBUG: BufferSync 548 dirty blocks in relation.segment_fork 17216.4_0 2011-02-04 07:02:51 EST: DEBUG: BufferSync 808 dirty blocks in relation.segment_fork 17216.5_0 2011-02-04 07:02:51 EST: DEBUG: BufferSync 799 dirty blocks in relation.segment_fork 17216.6_0 2011-02-04 07:02:51 EST: DEBUG: BufferSync 807 dirty blocks in relation.segment_fork 17216.7_0 2011-02-04 07:02:51 EST: DEBUG: BufferSync 716 dirty blocks in relation.segment_fork 17216.8_0 2011-02-04 07:02:51 EST: DEBUG: BufferSync 3857 buffers to write, 8 total dirty segment file(s) expected to need sync 2011-02-04 07:03:31 EST: DEBUG: checkpoint sync: number=1 file=base/16384/17216.5 time=1324.614 msec 2011-02-04 07:03:31 EST: DEBUG: checkpoint sync: number=2 file=base/16384/17216.4 time=0.002 msec 2011-02-04 07:03:31 EST: DEBUG: checkpoint sync: number=3 file=base/16384/17216_fsm time=0.001 msec 2011-02-04 07:03:47 EST: DEBUG: checkpoint sync: number=4 file=base/16384/17216.10 time=16446.753 msec 2011-02-04 07:03:53 EST: DEBUG: checkpoint sync: number=5 file=base/16384/17216.8 time=5804.252 msec 2011-02-04 07:03:53 EST: DEBUG: checkpoint sync: number=6 file=base/16384/17216.7 time=0.001 msec 2011-02-04 07:03:54 EST: DEBUG: compacted fsync request queue from 32768 entries to 2 entries 2011-02-04 07:03:54 EST: CONTEXT: writing block 1642223 of relation base/16384/17216 2011-02-04 07:04:00 EST: DEBUG: checkpoint sync: number=7 file=base/16384/17216.11 time=6350.577 msec 2011-02-04 07:04:00 EST: DEBUG: checkpoint sync: number=8 file=base/16384/17216.9 time=0.001 msec 2011-02-04 07:04:00 EST: DEBUG: checkpoint sync: number=9 file=base/16384/17216.6 time=0.001 msec 2011-02-04 07:04:00 EST: DEBUG: checkpoint sync: number=10 file=base/16384/17216.3 time=0.001 msec 2011-02-04 07:04:00 EST: DEBUG: checkpoint sync: number=11 file=base/16384/17216_vm time=0.001 msec 2011-02-04 07:04:00 EST: LOG: checkpoint complete: wrote 3813 buffers (11.6%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=39.073 s, sync=29.926 s, total=69.003 s; sync files=11, longest=16.446 s, average=2.720 s

You can see that it ran out of fsync absorption space in the middle of the sync phase, which is usually when compaction is needed, but the recent patch to fix that kicked in and did its thing.

Couple of observations:

-The total number of buffers I'm computing based on the checkpoint writes being sorted it not a perfect match to the number reported by the "checkpoint complete" status line. Sometimes they are the same, sometimes not. Not sure why yet.

-The estimate for "expected to need sync" computed as a by-product of the checkpoint sorting is not completely accurate either. This particular one has a fairly large error in it, percentage-wise, being off by 3 with a total of 11. Presumably these are absorbed fsync requests that were already queued up before the checkpoint even started. So any time estimate I drive based off of this count is only going to be approximate.

-The order in which the sync phase processes files is unrelated to the order in which they are written out. Note that 17216.10 here, the biggest victim (cause?) of the I/O spike, isn't even listed among the checkpoint writes!

The fuzziness here is a bit disconcerting, and I'll keep digging for why it happens. But I don't see any reason not to continue forward using the rough count here to derive a nap time from, which I can then feed into the "useful leftovers" patch that Robert already refactored here. Can always sharpen up that estimate later, I need to get some solid results I can share on what the delay time does to the throughput/latency pattern next.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to