I have a WIP patch that adds the main detail I have found I need to properly tune checkpoint and background writer activity. I think it's almost ready to submit (you can see the current patch against 8.2 at http://www.westnet.com/~gsmith/content/postgresql/patch-checkpoint.txt ) after making it a bit more human-readable. But I've realized that along with that, I need some guidance in regards to what log level is appropriate for this information.

An example works better than explaining what the patch does:

2007-02-19 21:53:24.602 EST - DEBUG: checkpoint required (wrote checkpoint_segments)
2007-02-19 21:53:24.685 EST - DEBUG:  checkpoint starting
2007-02-19 21:53:24.705 EST - DEBUG:  checkpoint flushing buffer pool
2007-02-19 21:53:24.985 EST - DEBUG:  checkpoint database fsync starting
2007-02-19 21:53:42.725 EST - DEBUG:  checkpoint database fsync complete
2007-02-19 21:53:42.726 EST - DEBUG: checkpoint buffer flush dirty=8034 write=279956 us sync=17739974 us

Remember that "Load distributed checkpoint" discussion back in December? You can see exactly how bad the problem is on your system with this log style (this is from a pgbench run where it's postively awful--it really does take over 17 seconds for the fsync to execute, and there are clients that are hung the whole time waiting for it).

I also instrumented the background writer.  You get messages like this:

2007-02-19 21:58:54.328 EST - DEBUG: BGWriter Scan All - Written = 5/5 Unscanned = 23/54

This shows that we wrote (5) the maximum pages we were allowed to write (5) while failing to scan almost half (23) of the buffers we meant to look at (54). By taking a look at this output while the system is under load, I found I was able to do bgwriter optimization that used to take me days of frustrating testing in hours. I've been waiting for a good guide to bgwriter tuning since 8.1 came out. Once you have this, combined with knowing how many buffers were dirty at checkpoint time because the bgwriter didn't get to them in time (the number you want to minimize), the tuning guide practically writes itself.

So my question is...what log level should all this go at? Right now, I have the background writer stuff adjusting its level dynamically based on what happened; it logs at DEBUG2 if it hits the write limit (which should be a rare event once you're tuned properly), DEBUG3 for writes that scanned everything they were supposed to, and DEBUG4 if it scanned but didn't find anything to write. The source of checkpoint information logs at DEBUG1, the fsync/write info at DEBUG2.

I'd like to move some of these up. On my system, I even have many of the checkpoint messages logged at INFO (the source of the checkpoint and the total write time line). It's a bit chatty, but when you get some weird system pause issue it makes it easy to figure out if checkpoints were to blame. Given how useful I feel some of these messages are to system tuning, and to explaining what currently appears as inexplicable pauses, I don't want them to be buried at DEBUG levels where people are unlikely to ever see them (I think some people may be concerned about turning on things labeled DEBUG at all). I am aware that I am too deep into this to have an unbiased opinion at this point though, which is why I ask for feedback on how to proceed here.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Reply via email to