I have a WIP patch that adds the main detail I have found I need to
properly tune checkpoint and background writer activity. I think it's
almost ready to submit (you can see the current patch against 8.2 at
http://www.westnet.com/~gsmith/content/postgresql/patch-checkpoint.txt )
after making it a bit more human-readable. But I've realized that along
with that, I need some guidance in regards to what log level is
appropriate for this information.
An example works better than explaining what the patch does:
2007-02-19 21:53:24.602 EST - DEBUG: checkpoint required (wrote
checkpoint_segments)
2007-02-19 21:53:24.685 EST - DEBUG: checkpoint starting
2007-02-19 21:53:24.705 EST - DEBUG: checkpoint flushing buffer pool
2007-02-19 21:53:24.985 EST - DEBUG: checkpoint database fsync starting
2007-02-19 21:53:42.725 EST - DEBUG: checkpoint database fsync complete
2007-02-19 21:53:42.726 EST - DEBUG: checkpoint buffer flush dirty=8034
write=279956 us sync=17739974 us
Remember that "Load distributed checkpoint" discussion back in December?
You can see exactly how bad the problem is on your system with this log
style (this is from a pgbench run where it's postively awful--it really
does take over 17 seconds for the fsync to execute, and there are clients
that are hung the whole time waiting for it).
I also instrumented the background writer. You get messages like this:
2007-02-19 21:58:54.328 EST - DEBUG: BGWriter Scan All - Written = 5/5
Unscanned = 23/54
This shows that we wrote (5) the maximum pages we were allowed to write
(5) while failing to scan almost half (23) of the buffers we meant to look
at (54). By taking a look at this output while the system is under load,
I found I was able to do bgwriter optimization that used to take me days
of frustrating testing in hours. I've been waiting for a good guide to
bgwriter tuning since 8.1 came out. Once you have this, combined with
knowing how many buffers were dirty at checkpoint time because the
bgwriter didn't get to them in time (the number you want to minimize), the
tuning guide practically writes itself.
So my question is...what log level should all this go at? Right now, I
have the background writer stuff adjusting its level dynamically based on
what happened; it logs at DEBUG2 if it hits the write limit (which should
be a rare event once you're tuned properly), DEBUG3 for writes that
scanned everything they were supposed to, and DEBUG4 if it scanned but
didn't find anything to write. The source of checkpoint information logs
at DEBUG1, the fsync/write info at DEBUG2.
I'd like to move some of these up. On my system, I even have many of the
checkpoint messages logged at INFO (the source of the checkpoint and the
total write time line). It's a bit chatty, but when you get some weird
system pause issue it makes it easy to figure out if checkpoints were to
blame. Given how useful I feel some of these messages are to system
tuning, and to explaining what currently appears as inexplicable pauses, I
don't want them to be buried at DEBUG levels where people are unlikely to
ever see them (I think some people may be concerned about turning on
things labeled DEBUG at all). I am aware that I am too deep into this to
have an unbiased opinion at this point though, which is why I ask for
feedback on how to proceed here.
--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
http://www.postgresql.org/about/donate