Re: [PATCHES] Load distributed checkpoint V3

Greg Smith Thu, 05 Apr 2007 22:50:23 -0700

On Thu, 5 Apr 2007, Heikki Linnakangas wrote:

The purpose of the bgwriter_all_* settings is to shorten the duration ofthe eventual checkpoint. The reason to shorten the checkpoint durationis to limit the damage to other I/O activity it causes. My thinking isthat assuming the LDC patch is effective (agreed, needs more testing) atsmoothening the checkpoint, the duration doesn't matter anymore. Do youwant to argue there's other reasons to shorten the checkpoint duration?

My testing results suggest that LDC doesn't smooth the checkpoint usefullywhen under a high (>30 client here) load, because (on Linux at least) theway the OS caches writes clashes badly with how buffers end up beingevicted if the buffer pool fills back up before the checkpoint is done.In that context, anything that slows down the checkpoint duration is goingto make the problem worse rather than better, because it makes it morelikely that the tail end of the checkpoint will have to fight with theclients for write bandwidth, at which point they both suffer. If you justget the checkpoint done fast, the clients can't fill the pool as fast asthe BufferSync is writing it out, and things are as happy as they can bewithout a major rewrite to all this code. I can get a tiny improvement insome respects by delaying 2-5 seconds between finishing the writes andcalling fsync, because that gives Linux a moment to usefully spool some ofthe data to the disk controller's cache; beyond that any additional delayis a problem.

Since it's only the high load cases I'm having trouble dealing with, thisbasically makes it a non-starter for me. The focus on checkpoint_timeoutand ignoring checkpoint_segments in the patch is also a big issue for me.At the same time, I recognize that the approach taken in LDC probably is abig improvement for many systems, it's just a step backwards for myhighest throughput one. I'd really enjoy hearing some results fromsomeone else.

The number of buffers evicted by normal backends in a bgwriter_delay periodis simple to keep track of, just increase a counter in StrategyGetBuffer andreset it when bgwriter wakes up.

I see you've already found the other helpful Itagaki patch in this area.I know I would like to see his code for tracking evictions commited, thenI'd like that to be added as another counter in pg_stat_bgwriter (Imentioned that to Magnus in passing when he was setting the stats up butdidn't press it because of the patch dependency). Ideally, and this ideawas also in Itagaki's patch with the writtenByBgWriter/ByBackEnds debughook, I think it's important that you know how every buffer written todisk got there--was it a background writer, a checkpoint, or an evictionthat wrote it out? Track all those and you can really learn somethingabout your write performance, data that's impossible to collect right now.

However, as Itagaki himself points out, doing something useful withbgwriter_lru_maxpages is only one piece of automatically tuning thebackground writer. I hate to join in on chopping his patches up, butwithout some additional work I don't think the exact auto-tuning logic hethen applies will work in all cases, which could make it more a problemthan the current crude yet predictable method. The whole waybgwriter_lru_maxpages and num_to_clean play off each other in his codecurrently has a number of failure modes I'm concerned about. I'm not sureif a re-write using a moving-average approach (as I did in my auto-tuningwriter prototype and as Tom just suggested here) will be sufficient to fixall of them. Was already on my to-do list to investigate that further.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Re: [PATCHES] Load distributed checkpoint V3

Reply via email to