Re: [HACKERS] Seq scans roadmap

Heikki Linnakangas Tue, 15 May 2007 02:35:26 -0700

Just to keep you guys informed, I've been busy testing and ponderingover different buffer ring strategies for vacuum, seqscans and copy.Here's what I'm going to do:

Use a fixed size ring. Fixed as in doesn't change after the ring isinitialized, however different kinds of scans use differently sized rings.

I said earlier that it'd be invasive change to see if a buffer needs aWAL flush and choose another victim if that's the case. I looked at itagain and found a pretty clean way of doing that, so I took thatapproach for seq scans.

1. For VACUUM, use a ring of 32 buffers. 32 buffers is small enough togive the L2 cache benefits and keep cache pollution low, but at the sametime it's large enough that it keeps the need to WAL flush reasonable(1/32 of what we do now).

2. For sequential scans, also use a ring of 32 buffers, but whenever abuffer in the ring would need a WAL flush to recycle, we throw it out ofthe buffer ring instead. On read-only scans (and scans that only updatehint bit) this gives the L2 cache benefits and doesn't pollute thebuffer cache. On bulk updates, it's effectively the current behavior. Onscans that do some updates, it's something in between. In all cases itshould be no worse than what we have now. 32 buffers should be largeenough to leave a "cache trail" for Jeff's synchronized scans to work.

3. For COPY that doesn't write WAL, use the same strategy as forsequential scans. This keeps the cache pollution low and gives the L2cache benefits.

4. For COPY that writes WAL, use a large ring of 2048-4096 buffers. Wewant to use a ring that can accommodate 1 WAL segment worth of data, toavoid having to do any extra WAL flushes, and the WAL segment size is2048 pages in the default configuration.


Some alternatives I considered but rejected:

* Instead of throwing away dirtied buffers in seq scans, accumulate themin another fixed sized list. When the list gets full, do a WAL flush andput them to the shared freelist or a backend-private freelist. Thatwould eliminate the cache pollution of bulk DELETEs and bulk UPDATEs,and it could be used for vacuum as well. I think this would be theoptimal algorithm but I don't feel like inventing something thatcomplicated at this stage anymore. Maybe for 8.4.

* Using a different sized ring for 1st and 2nd vacuum phase. Decidedthat it's not worth the trouble, the above is already an order ofmagnitude better than the current behavior.

I'm going to rerun the performance tests I ran earlier with new patch,tidy it up a bit, and submit it in the next few days. This turned out tobe even more laborious patch to review than I thought. While the patchis short and in the end turned out to be very close to Simon's originalpatch, there's many different usage scenarios that need to be cateredfor and tested.

I still need to check the interaction with Jeff's patch. This is closeenough to Simon's original patch that I believe the results of the testsJeff ran earlier are still valid.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Seq scans roadmap

Reply via email to