On Thu, 5 Apr 2007, Heikki Linnakangas wrote:

The purpose of the bgwriter_all_* settings is to shorten the duration of the eventual checkpoint. The reason to shorten the checkpoint duration is to limit the damage to other I/O activity it causes. My thinking is that assuming the LDC patch is effective (agreed, needs more testing) at smoothening the checkpoint, the duration doesn't matter anymore. Do you want to argue there's other reasons to shorten the checkpoint duration?

My testing results suggest that LDC doesn't smooth the checkpoint usefully when under a high (>30 client here) load, because (on Linux at least) the way the OS caches writes clashes badly with how buffers end up being evicted if the buffer pool fills back up before the checkpoint is done. In that context, anything that slows down the checkpoint duration is going to make the problem worse rather than better, because it makes it more likely that the tail end of the checkpoint will have to fight with the clients for write bandwidth, at which point they both suffer. If you just get the checkpoint done fast, the clients can't fill the pool as fast as the BufferSync is writing it out, and things are as happy as they can be without a major rewrite to all this code. I can get a tiny improvement in some respects by delaying 2-5 seconds between finishing the writes and calling fsync, because that gives Linux a moment to usefully spool some of the data to the disk controller's cache; beyond that any additional delay is a problem.

Since it's only the high load cases I'm having trouble dealing with, this basically makes it a non-starter for me. The focus on checkpoint_timeout and ignoring checkpoint_segments in the patch is also a big issue for me. At the same time, I recognize that the approach taken in LDC probably is a big improvement for many systems, it's just a step backwards for my highest throughput one. I'd really enjoy hearing some results from someone else.

The number of buffers evicted by normal backends in a bgwriter_delay period is simple to keep track of, just increase a counter in StrategyGetBuffer and reset it when bgwriter wakes up.

I see you've already found the other helpful Itagaki patch in this area. I know I would like to see his code for tracking evictions commited, then I'd like that to be added as another counter in pg_stat_bgwriter (I mentioned that to Magnus in passing when he was setting the stats up but didn't press it because of the patch dependency). Ideally, and this idea was also in Itagaki's patch with the writtenByBgWriter/ByBackEnds debug hook, I think it's important that you know how every buffer written to disk got there--was it a background writer, a checkpoint, or an eviction that wrote it out? Track all those and you can really learn something about your write performance, data that's impossible to collect right now.

However, as Itagaki himself points out, doing something useful with bgwriter_lru_maxpages is only one piece of automatically tuning the background writer. I hate to join in on chopping his patches up, but without some additional work I don't think the exact auto-tuning logic he then applies will work in all cases, which could make it more a problem than the current crude yet predictable method. The whole way bgwriter_lru_maxpages and num_to_clean play off each other in his code currently has a number of failure modes I'm concerned about. I'm not sure if a re-write using a moving-average approach (as I did in my auto-tuning writer prototype and as Tom just suggested here) will be sufficient to fix all of them. Was already on my to-do list to investigate that further.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Reply via email to