Re: [HACKERS] Bgwriter strategies

Heikki Linnakangas Fri, 06 Jul 2007 02:59:57 -0700

Greg Smith wrote:

On Thu, 5 Jul 2007, Heikki Linnakangas wrote:
It looks like Tom's idea is not a winner; it leads to more writes thannecessary.
What I came away with as the core of Tom's idea is that the cleaning/LRUwriter shouldn't ever scan the same section of the buffer cache twice,because anything that resulted in a new dirty buffer will be unwritableby it until the clock sweep passes over it. I never took that to meanthat idea necessarily had to be implemented as "trying to aggressivelykeep all pages with usage_count=0 clean".
I've been making slow progress on this myself, and the question I'vebeen trying to answer is whether this fundamental idea really matters ornot. One clear benefit of that alternate implementation should allow issetting a lower value for the interval without being as concerned thatyou're wasting resources by doing so, which I've found to a problem withthe current implementation--it will consume a lot of CPU scanning thesame section right now if you lower that too much.

Yes, in fact ignoring the CPU overhead of scanning the same section overand over again, Tom's proposal is the same as setting bothbgwriter_lru_* settings all the way up to the max. In fact I ran a DBT-2test like that as well, and the # of writes was indeed the same, justwith a max higher CPU usage. It's clear that scanning the same sectionover and over again has been a waste of time in previous releases.

As a further data point, I constructed a smaller test case that performsrandom DELETEs on a table using an index. I varied the # ofshared_buffers, and ran the test with bgwriter disabled, or tuned allthe way up to the maximum. Here's the results from that:


 shared_buffers | writes | writes |   writes_ratio
----------------+--------+--------+-------------------
 2560           |  86936 |  88023 |  1.01250345081439
 5120           |  81207 |  84551 |  1.04117871612053
 7680           |  75367 |  80603 |  1.06947337694216
 10240          |  69772 |  74533 |  1.06823654187926
 12800          |  64281 |  69237 |  1.07709898725907
 15360          |  58515 |  64735 |  1.10629753054772
 17920          |  53231 |  58635 |  1.10151979109917
 20480          |  48128 |  54403 |  1.13038148271277
 23040          |  43087 |  49949 |  1.15925917330053
 25600          |  39062 |  46477 |   1.1898264297783
 28160          |  35391 |  43739 |  1.23587917832217
 30720          |  32713 |  37480 |  1.14572188426619
 33280          |  31634 |  31677 |  1.00135929695897
 35840          |  31668 |  31717 |  1.00154730327144
 38400          |  31696 |  31693 | 0.999905350832913
 40960          |  31685 |  31730 |  1.00142023039293
 43520          |  31694 |  31650 | 0.998611724616647
 46080          |  31661 |  31650 | 0.999652569407157

The first writes-column is the # of writes with bgwriter disabled, 2ndcolumn is with the aggressive bgwriter. The table size is 33334 pages,so after that the table fits in cache and the bgwriter strategy makes nodifference.

As far as your results, first off I'm really glad to see someone elsecomparing checkpoint/backend/bgwriter writes the same I've been doing soI finally have someone else's results to compare against. I expect thatthe optimal approach here is a hybrid one that structures scanning thebuffer cache the new way Tom suggests, but limits the number of writesto "just enough". I happen to be fond of the "just enough" computationbased on a weighted moving average I wrote before, but there's certainlyroom for multiple implementations of that part of the code to evolve.


We need to get the requirements straight.

One goal of bgwriter is clearly to keep just enough buffers clean infront of the clock hand so that backends don't need to do writesthemselves until the next bgwriter iteration. But not any more thanthat, otherwise we might end up doing more writes than necessary if someof the buffers are redirtied.

To deal with bursty workloads, for example a batch of 2 GB worth ofinserts coming in every 10 minutes, it seems we want to keep doing alittle bit of cleaning even when the system is idle, to prepare for thenext burst. The idea is to smoothen the physical I/O bursts; if we don'tclean the dirty buffers left over from the previous burst during theidle period, the I/O system will be bottlenecked during the bursts, andsit idle otherwise.

To strike a balance between cleaning buffers ahead of possible bursts inthe future and not doing unnecessary I/O when no such bursts come, Ithink a reasonable strategy is to write buffers with usage_count=0 at aslow pace when there's no buffer allocations happening.

To smoothen the small variations on a relatively steady workload, theweighted average sounds good.





--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Bgwriter strategies

Reply via email to