[HACKERS] Final background writer cleanup for 8.3

Greg Smith Thu, 23 Aug 2007 19:16:47 -0700

In the interest of closing work on what's officially titled the "Automaticadjustment of bgwriter_lru_maxpages" patch, I wanted to summarize where Ithink this is at, what I'm working on right now, and see if feedback fromthat changes how I submit my final attempt for a useful patch in this areathis week. Hopefully there are enough free eyes to stare at this now towrap up a plan for what to do that makes sense and still fits in the 8.3schedule. I'd hate to see this pushed off to 8.4 without making someforward progress here after the amount of work done already, particularlywhen odds aren't good I'll still be working with this code by then.

Let me start with a summary of the conclusions I've reached based on myown tests and the set that Heikki did last month (last results athttp://community.enterprisedb.com/bgwriter/ ); Heikki will hopefully chimein if he disagrees with how I'm characterizing things:

1) In the current configuration, if you have a large setting forbgwriter_lru_percent and/or a small setting for bgwriter_delay, that canbe extremely wasteful because the background writer will consumeCPU/locking resources scanning the buffer pool needlessly. This problemshould go away.

2) Having backends write their own buffers out does not significantlydegrade performance, as those turn into cached OS writes which generallyexecute fast enough to not be a large drag on the backend.

3) Any attempt to scan significantly ahead of the current strategy pointwill result in some amount of premature writes that decreases overallefficiency in cases where the buffer is touched again before it getsre-used. The more in advance you go, the worse this inefficiency is.The most efficient way for many workloads is to just let the backends doall the writes.

4) Tom observed that there's no reason to ever scan the same section ofthe pool more than once, because anything that changes a buffer's statuswill always make it un-reusable until the strategy point has passed overit. But because of (3), this does not mean that one should drive forwardconstantly trying to lap the buffer pool and catch up with the strategypoint.

5) There hasn't been any definitive proof that the background writer ishelpful at all in the context of 8.3. However, yanking it out altogethermay be premature, as there are some theorized ways that it may be helpfulin real-world situations with more intermittant workloads than aregenerally encountered in a benchmarking situation. I personally feel thatis some potential for the BGW to become more useful in the context of the8.4 release if it starts doing things like adding pages it expects to berecycled soon onto the free list, which could improve backend efficiencyquite a bit compared to the current situation where each backend isnormally running their own scan. But that's a bit too big to fit into 8.3I think.

What I'm aiming for here is to have the BGW do as little work as possible,as efficiently as possible, but not remove it altogether. (2) suggeststhat this approach won't decrease performance compared to the current 8.2situation, where I've seen evidence some are over-tuning to have a veryaggressive BGW scan an enormous amount of the pool each time because theyhave resources to burn. Having a generally self-tuning background writerthat errs on the lazy side stay in the codebase satisfies (5). Here iswhat the patch I'm testing right now does to try and balance all this out:

A) Counters are added to pg_stat_bgwriter that show how many buffers werewritten by the backends, by the background writer, how many timesbgwriter_lru_maxpages was hit, and the total number of buffers allocated.This at least allows monitoring what's going on as people run their ownexperiments. Heikki's results included data using the earlier version ofthis patch I put assembled (which now conflicts with HEAD, I have anupdated one).

B) bgwriter_lru_percent is removed as a tunable. This eliminates (1).The idea of scanning a fixed percentage doesn't ever make sense given theobservations above; we scan until we accomplish the cleaning missioninstead.

C) bgwriter_lru_maxpages stays as an absolute maximum number of pages thatcan be written in one sweep each bgwriter_delay. This allows easilyturning the writer off altogether by setting it to 0, or limiting howactive it tries to be in situations where (3) is a concern. Admins canmonitor the amount that the max is hit in pg_stat_bgwriter and considerraising it (or lowering the delay) if it proves to be too limiting. Ithink the default needs to be bumped to something more like 100 ratherthan the current tiny one before the stock configuration can be considered"self-tuning" at all.

D) The strategy code gets a "passes" count added to it that serves as asort of high-order int for how many times the buffer cache has been lookedover in its entirety.

E) When the background writer start the LRU cleaner, it checks if thestrategy point has passed where it last cleaned up to, using thepasses+buf_id "pointer". If so, it just starts cleaning from the strategypoint as it always has. But if it's still ahead it just continues fromthere, thus implementing the core of (4)'s insight. It estimates how manybuffers are probably clean in the space between the strategy point andwhere it's starting at, based on how far ahead it is combined withhistorical data about how many buffers are scanned on average per reusablebuffer found (the exact computation of this number is the main thing I'mstill fiddling with).

F) A moving average of buffer allocations is used to predict how manyclean buffers are expected to be needed in the next delay cycle. Theoriginal patch from Itagaki doubled the recent allocations to pad thisout; (3) suggests that's too much.


G) Scan the buffer pool until either

--Enough reusable buffers have been located or written out to fill theupcoming allocation need, taking into account the estimate from (E); thisis the normal expected way the scan will terminate.

  --We've written bgwriter_lru_maxpages
  --We "lap" and catch the strategy point

In addition to removing a tunable and making the remaining two lesscritical, one of my hopes here is that the more efficient way this schemeoperates will allow using much smaller values for bgwriter_delay than havebeen practical in the current codebase, which may ultimately have its ownvalue.

That's what I've got working here now, still need some more tweaking andtesting before I'm done with the code but there's not much left. The mainproblem I forsee is that this approach is moderately complicated, adding alot of new code and regular+static variables, for something that's notreally proven to be valuable. I will not be surprised if my patch isrejected on that basis. That's why I wanted to get the big picturepainted in this message while I finish up the work necessary to submit it,'cause if the whole idea is doomed anyway I might as well stop now.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

[HACKERS] Final background writer cleanup for 8.3

Reply via email to