On 12/12/2004 5:08 PM, Simon Riggs wrote:

On Sun, 2004-12-12 at 05:46, Neil Conway wrote:
Simon Riggs wrote:
> If the bgwriter_percent = 100, then we should actually do the sensible
> thing and prepare the list that we need, i.e. limit
> StrategyDirtyBufferList to finding at most bgwriter_maxpages.

Is the plan to make bgwriter_percent = 100 the default setting?

Hmm...must confess that my only plan is: i) discover dynamic behaviour of bgwriter ii) fix any bugs or wierdness as quickly as possible iii) try to find a way to set the bgwriter defaults

I'm worried that we're late in the day for changes, but I'm equally
worried that a) the bgwriter is very tuning sensitive b) we don't really
have much info on how to set the defaults in a meaningful way for the
majority of cases c) there are some issues that greatly reduce the
effectiveness of the bgwriter in many circumstances.

The 100pct.patch was my first attempt at getting something acceptable in
the next few days that gives sufficient room for the DBA to perform
tuning.

Doesn't cranking up the bgwriter_percent to 100 effectively make the entire shared memory a write-through cache? In other words, with 100% the bgwriter will allways write all dirty blocks out and it becomes unlikely to avoid an IO for subsequent modificaitons to the same data block.



Jan


On Sun, 2004-12-12 at 05:46, Neil Conway wrote:
I wonder if we even need to retain the bgwriter_percent GUC var. Is there actually a situation in which the combination of bgwriter_maxpages and bgwriter_delay does not give the DBA sufficient flexibility in tuning bgwriter behavior?

Yes, I do now think that only two GUCs are required to tune the behaviour; but you make me think - which two? Right now, bgwriter_delay is useless because the O(N) behaviour makes it impossible to set any lower when you have a large shared_buffers. (I see that as a bug)

Your question has made me rethink the exact objective of the bgwriter's
actions: The way it is coded now the bgwriter looks for dirty blocks, no
matter where they are in the list. What we are bothered about is the
number of clean buffers at the LRU, which has a direct influence on the
probability that BufferAlloc() will need to call FlushBuffer(), since
StrategyGetBuffer() returns the first unpinned buffer, dirty or not.
After further thought, I would prefer a subtle change in behaviour so
that the bgwriter checks that clean blocks are available at the LRUs for
when buffer replacement occurs. With that slight change, I'd keep the
bgwriter_percent GUC but make it mean something different.

bgwriter_percent would be the % of shared_buffers that are searched
(from the LRU end) to see if they contain dirty buffers, which are then
written to disk.  That means the number of dirty blocks written to disk
is between 0 and the number of buffers searched, but we're not hugely
bothered what that number is... [This change to StrategyDirtyBufferList
resolves the unusability of the bgwriter with large shared_buffers]

Writing away dirty blocks towards the MRU end is more likely to be
wasted effort. If a block stays near the MRU then it will be dirty again
in the wink of an eye, so you gain nothing at checkpoint time by
cleaning it. Also, since it isn't near the LRU, cleaning it has no
effect on buffer replacement I/O. If a block is at the LRU, then it is
by definition the least likely to be reused, and is a candidate for
replacement anyway. So concentrating on the LRU, not the number of dirty
buffers seems to be the better thing to do.

That would then be a much simpler way of setting the defaults. With that
definition, we would set the defaults:

bgwriter_percent = 2 (according to my new suggestion here)
bgwriter_delay = 200
bgwriter_maxpages = -1 (i.e. mostly ignore it, but keep it for fine
tuning)

Thus, for the default shared_buffers=1000 the bgwriter would clear a
space of up to 20 blocks each cycle.
For a config with shared_buffers=60000, the bgwriter default would clear
space for 600 blocks (max) each cycle - a reasonable setting.

Overall that would need very little specific tuning, because it would
scale upwards as you changed the shared_buffers higher.

So, that interpretation of bgwriter_percent gives these advantages:
- we bound the StrategyDirtyBufferList scan to a small % of the whole
list, rather than the whole list...so we could realistically set the
bgwriter_delay lower if required
- we can set a default that scales, so would not often need to change it
- the parameter is defined in terms of the thing we really care about:
sufficient clean blocks at the LRU of the buffer lists
- these changes are very isolated and actually minor - just a different
way of specifying which buffers the bgwriter will clean

Patch attached...again for discussion and to help understanding of this
proposal. Will submit to patches if we agree it seems like the best way
to allow the bgwriter defaults to be sensibly set.

[...and yes, everybody, I do know where we are in the release cycle]



------------------------------------------------------------------------

Index: buffer/bufmgr.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.182
diff -d -c -r1.182 bufmgr.c
*** buffer/bufmgr.c 24 Nov 2004 02:56:17 -0000 1.182
--- buffer/bufmgr.c 12 Dec 2004 21:53:10 -0000
***************
*** 681,686 ****
--- 681,687 ----
{
BufferDesc **dirty_buffers;
BufferTag *buftags;
+ int maxdirty;
int num_buffer_dirty;
int i;
***************
*** 688,717 ****
if (percent == 0 || maxpages == 0)
return 0;
/*
* Get a list of all currently dirty buffers and how many there are.
* We do not flush buffers that get dirtied after we started. They
* have to wait until the next checkpoint.
*/
! dirty_buffers = (BufferDesc **) palloc(NBuffers * sizeof(BufferDesc *));
! buftags = (BufferTag *) palloc(NBuffers * sizeof(BufferTag));
LWLockAcquire(BufMgrLock, LW_EXCLUSIVE);
- num_buffer_dirty = StrategyDirtyBufferList(dirty_buffers, buftags,
- NBuffers);
! /*
! * If called by the background writer, we are usually asked to only
! * write out some portion of dirty buffers now, to prevent the IO
! * storm at checkpoint time.
! */
! if (percent > 0)
! {
! Assert(percent <= 100);
! num_buffer_dirty = (num_buffer_dirty * percent + 99) / 100;
! }
! if (maxpages > 0 && num_buffer_dirty > maxpages)
! num_buffer_dirty = maxpages;
/* Make sure we can handle the pin inside the loop */
ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
--- 689,720 ----
if (percent == 0 || maxpages == 0)
return 0;
+ /* Set number of buffers we will clean at LRUs of buffer lists + * If no limits set, then clean the whole of shared_buffers
+ */
+ if (maxpages > 0)
+ maxdirty = maxpages;
+ else {
+ if (percent > 0) {
+ Assert(percent <= 100);
+ maxdirty = (NBuffers * percent + 99) / 100;
+ }
+ else
+ maxdirty = NBuffers;
+ }
+ /*
* Get a list of all currently dirty buffers and how many there are.
* We do not flush buffers that get dirtied after we started. They
* have to wait until the next checkpoint.
*/
! dirty_buffers = (BufferDesc **) palloc(maxdirty * sizeof(BufferDesc *));
! buftags = (BufferTag *) palloc(maxdirty * sizeof(BufferTag));
LWLockAcquire(BufMgrLock, LW_EXCLUSIVE);
! num_buffer_dirty = StrategyDirtyBufferList(dirty_buffers, buftags,
! maxdirty);
/* Make sure we can handle the pin inside the loop */
ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
Index: buffer/freelist.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/freelist.c,v
retrieving revision 1.48
diff -d -c -r1.48 freelist.c
*** buffer/freelist.c 16 Sep 2004 16:58:31 -0000 1.48
--- buffer/freelist.c 12 Dec 2004 21:53:11 -0000
***************
*** 735,741 ****
* StrategyDirtyBufferList
*
* Returns a list of dirty buffers, in priority order for writing.
- * Note that the caller may choose not to write them all.
*
* The caller must beware of the possibility that a buffer is no longer dirty,
* or even contains a different page, by the time he reaches it. If it no
--- 735,740 ----
***************
*** 755,760 ****
--- 754,760 ----
int cdb_id_t2;
int buf_id;
BufferDesc *buf;
+ int i;
/*
* Traverse the T1 and T2 list LRU to MRU in "parallel" and add all
***************
*** 765,771 ****
cdb_id_t1 = StrategyControl->listHead[STRAT_LIST_T1];
cdb_id_t2 = StrategyControl->listHead[STRAT_LIST_T2];
! while (cdb_id_t1 >= 0 || cdb_id_t2 >= 0)
{
if (cdb_id_t1 >= 0)
{
--- 765,771 ----
cdb_id_t1 = StrategyControl->listHead[STRAT_LIST_T1];
cdb_id_t2 = StrategyControl->listHead[STRAT_LIST_T2];
! for (i = 0; i < max_buffers; i++)
{
if (cdb_id_t1 >= 0)
{
***************
*** 779,786 ****
buffers[num_buffer_dirty] = buf;
buftags[num_buffer_dirty] = buf->tag;
num_buffer_dirty++;
- if (num_buffer_dirty >= max_buffers)
- break;
}
}
--- 779,784 ----
***************
*** 799,806 ****
buffers[num_buffer_dirty] = buf;
buftags[num_buffer_dirty] = buf->tag;
num_buffer_dirty++;
- if (num_buffer_dirty >= max_buffers)
- break;
}
}
--- 797,802 ----



------------------------------------------------------------------------


---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== [EMAIL PROTECTED] #

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Reply via email to