Tom Lane wrote:
I've already pointed out a couple reasons why I don't have any
confidence in its correctness.

Well, you've suggested that I should try and reduce the API churn caused by the patch. As I said on -patches, I don't really see this as an issue if we just apply the patch to REL8_0_STABLE. You never replied to my -patches mail -- AFAIK that's where the issue stands.


I think the biggest area of concern with the LRU patch is how it changes bgwriter behavior. To review, the bgwriter currently does the following each time it is invoked:

(1) acquire BufMgrLock
(2) sweep through the entire buffer pool in LRU order, adding dirty pages to a list
(3) examine a subset of the computed list according to bgwriter_maxpages and bgwriter_percent
(4) for each element in the subset:
(a) drop BufMgrLock
(b) flush page to disk
(c) reacquire BufMgrLock


This is tricky to implement with LRU; we only record the recency of last access for unpinned pages, so we can't get the list in #2 (we can either get all dirty pages, as required for checkpoint, or all the unpinned dirty pages in LRU order). Besides which, the ARC behavior has some issues that were raised prior to the 8.0 release: scanning the whole buffer pool with the BufMgrLock held is bad.

So the LRU patch changes the bgwriter to:

(1) acquire the BufMgrLock
(2) sweep through the free list (the list of unpinned buffers) in LRU order, adding up to bgwriter_maxpages dirty pages to the list
3) for each element in the list, write it out as in #4 above


That means we don't hold the BufMgrLock for as long and the bgwriter doesn't consider flushing pinned pages (both a good idea IMHO), but it also means there's no consideration of bgwriter_percent. I'm not at all sure that this is the best compromise, however, so some comments would be welcome.

Perhaps it would be best to keep bgwriter_percent, but redefine it to mean "the % of the buffer pool scanned by the bgwriter, at most." (I think this was Simon's idea originally.)

This may be the right path to go for
8.0.* ... but we must NOT suppose that we can just push it out without
a full beta test cycle.

Yeah, I think a beta period would be a good idea (not nearly as long as the 8.0 beta period was, though).


I think it would be better to have a few weeks of beta prior to 8.0.2 and resolve the problem here and now, rather than crippling the 8.1 cycle (as the no-initdb policy would) or waiting for the problem to *really* become serious (as the "no action needed now" policy would).

-Neil

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
     subscribe-nomail command to [EMAIL PROTECTED] so that your
     message can get through to the mailing list cleanly

Reply via email to