2010/11/30 Greg Smith <g...@2ndquadrant.com>: > Jeff Janes wrote: >> >> For the individual file sync times emitted under debug1, it would be >> very handy if the file being synced was identified, for example >> "relation base/16384/16523". Rather than being numbered sequentially >> within a given checkpoint. >> > > I was numbering them sequentially so that it's straightforward to graph the > sync times in an external analysis tool, but the relation data is helpful > too. New patch reflecting all upthread suggestions is attached. The output > looks like this now at DEBUG1: > > LOG: checkpoint starting: xlog > DEBUG: checkpoint sync: number=1 file=base/16424/11645 time=11589.549000 > msec > DEBUG: checkpoint sync: number=2 file=base/16424/16438 time=16.148000 msec > DEBUG: checkpoint sync: number=3 file=base/16424/16437 time=53.530000 msec > DEBUG: checkpoint sync: number=4 file=base/16424/16447 time=10.214000 msec > DEBUG: checkpoint sync: number=5 file=base/16424/11607 time=1.499000 msec > DEBUG: checkpoint sync: number=6 file=base/16424/16425_fsm time=2.921000 > msec > DEBUG: checkpoint sync: number=7 file=base/16424/16437.1 time=4.237000 msec > DEBUG: checkpoint sync: number=8 file=base/16424/16428_fsm time=1.654000 > msec > DEBUG: checkpoint sync: number=9 file=base/16424/16442 time=7.920000 msec > DEBUG: checkpoint sync: number=10 file=base/16424/16428_vm time=2.613000 > msec > DEBUG: checkpoint sync: number=11 file=base/16424/11618 time=1.468000 msec > DEBUG: checkpoint sync: number=12 file=base/16424/16437_fsm time=2.638000 > msec > DEBUG: checkpoint sync: number=13 file=base/16424/16428 time=2.883000 msec > DEBUG: checkpoint sync: number=14 file=base/16424/16425 time=3.369000 msec > DEBUG: checkpoint sync: number=15 file=base/16424/16437_vm time=8.686000 > msec > DEBUG: checkpoint sync: number=16 file=base/16424/16425_vm time=5.984000 > msec > LOG: checkpoint complete: wrote 2074 buffers (50.6%); 0 transaction log > file(s) added, 0 removed, 3 recycled; write=0.617 s, sync=11.715 s, > total=22.167 s; sync files=16, longest=11.589 s, average=0.724 s > > I kept the units for the DEBUG level ones in msec because that's a better > scale for the common really short syncs during good behavior. But the > summary info in seconds now appears at the end of the existing "checkpoint > complete" message, so only one line to parse for those looking to analyze > the gross checkpoint data. That looks to work well enough for finding > situations like the big ext3 spikes. You can easily see one in this example > by the fact that "longest=11.589 s" is almost the entirety of "sync=11.715 > s". That's the really key thing there's currently no visibility into, > that's made obvious with this patch.
wonderfull. > > This might be ready for some proper review now. I know there's at least one > blatant bug still in here I haven't found yet, related to how the averages > are computed. I saw this once: > > LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) > added, 0 removed, 1 recycled; write=0.000 s, sync=0.000 s, total=0.001 s; > sync files=0, longest=0.000 s, average=-9223372036854775808.-2147483 s > > After an immediate checkpoint, so at least one path not quite right yet. > > -- > Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD > > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers