Re: [HACKERS] Spread checkpoint sync

Greg Smith Fri, 04 Feb 2011 11:08:48 -0800

As already mentioned in the broader discussion athttp://archives.postgresql.org/message-id/[email protected], I'm seeing no solid performance swing in the checkpoint sorting codeitself. Better sometimes, worse others, but never by a large amount.

Here's what the statistics part derived from the sorted data looks likeon a real checkpoint spike:


2011-02-04 07:02:51 EST: LOG:  checkpoint starting: xlog

2011-02-04 07:02:51 EST: DEBUG: BufferSync 10 dirty blocks inrelation.segment_fork 17216.0_22011-02-04 07:02:51 EST: DEBUG: BufferSync 159 dirty blocks inrelation.segment_fork 17216.0_12011-02-04 07:02:51 EST: DEBUG: BufferSync 10 dirty blocks inrelation.segment_fork 17216.3_02011-02-04 07:02:51 EST: DEBUG: BufferSync 548 dirty blocks inrelation.segment_fork 17216.4_02011-02-04 07:02:51 EST: DEBUG: BufferSync 808 dirty blocks inrelation.segment_fork 17216.5_02011-02-04 07:02:51 EST: DEBUG: BufferSync 799 dirty blocks inrelation.segment_fork 17216.6_02011-02-04 07:02:51 EST: DEBUG: BufferSync 807 dirty blocks inrelation.segment_fork 17216.7_02011-02-04 07:02:51 EST: DEBUG: BufferSync 716 dirty blocks inrelation.segment_fork 17216.8_02011-02-04 07:02:51 EST: DEBUG: BufferSync 3857 buffers to write, 8total dirty segment file(s) expected to need sync2011-02-04 07:03:31 EST: DEBUG: checkpoint sync: number=1file=base/16384/17216.5 time=1324.614 msec2011-02-04 07:03:31 EST: DEBUG: checkpoint sync: number=2file=base/16384/17216.4 time=0.002 msec2011-02-04 07:03:31 EST: DEBUG: checkpoint sync: number=3file=base/16384/17216_fsm time=0.001 msec2011-02-04 07:03:47 EST: DEBUG: checkpoint sync: number=4file=base/16384/17216.10 time=16446.753 msec2011-02-04 07:03:53 EST: DEBUG: checkpoint sync: number=5file=base/16384/17216.8 time=5804.252 msec2011-02-04 07:03:53 EST: DEBUG: checkpoint sync: number=6file=base/16384/17216.7 time=0.001 msec2011-02-04 07:03:54 EST: DEBUG: compacted fsync request queue from32768 entries to 2 entries2011-02-04 07:03:54 EST: CONTEXT: writing block 1642223 of relationbase/16384/172162011-02-04 07:04:00 EST: DEBUG: checkpoint sync: number=7file=base/16384/17216.11 time=6350.577 msec2011-02-04 07:04:00 EST: DEBUG: checkpoint sync: number=8file=base/16384/17216.9 time=0.001 msec2011-02-04 07:04:00 EST: DEBUG: checkpoint sync: number=9file=base/16384/17216.6 time=0.001 msec2011-02-04 07:04:00 EST: DEBUG: checkpoint sync: number=10file=base/16384/17216.3 time=0.001 msec2011-02-04 07:04:00 EST: DEBUG: checkpoint sync: number=11file=base/16384/17216_vm time=0.001 msec2011-02-04 07:04:00 EST: LOG: checkpoint complete: wrote 3813 buffers(11.6%); 0 transaction log file(s) added, 0 removed, 64 recycled;write=39.073 s, sync=29.926 s, total=69.003 s; sync files=11,longest=16.446 s, average=2.720 s

You can see that it ran out of fsync absorption space in the middle ofthe sync phase, which is usually when compaction is needed, but therecent patch to fix that kicked in and did its thing.


Couple of observations:

-The total number of buffers I'm computing based on the checkpointwrites being sorted it not a perfect match to the number reported by the"checkpoint complete" status line. Sometimes they are the same,sometimes not. Not sure why yet.

-The estimate for "expected to need sync" computed as a by-product ofthe checkpoint sorting is not completely accurate either. Thisparticular one has a fairly large error in it, percentage-wise, beingoff by 3 with a total of 11. Presumably these are absorbed fsyncrequests that were already queued up before the checkpoint evenstarted. So any time estimate I drive based off of this count is onlygoing to be approximate.

-The order in which the sync phase processes files is unrelated to theorder in which they are written out. Note that 17216.10 here, thebiggest victim (cause?) of the I/O spike, isn't even listed among thecheckpoint writes!

The fuzziness here is a bit disconcerting, and I'll keep digging for whyit happens. But I don't see any reason not to continue forward usingthe rough count here to derive a nap time from, which I can then feedinto the "useful leftovers" patch that Robert already refactored here.Can always sharpen up that estimate later, I need to get some solidresults I can share on what the delay time does to thethroughput/latency pattern next.


--
Greg Smith   2ndQuadrant US    [email protected]   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Spread checkpoint sync

Reply via email to