Re: [HACKERS] Page replacement algorithm in buffer cache

Jim Nasby Mon, 01 Apr 2013 16:09:40 -0700

On 4/1/13 4:55 PM, Merlin Moncure wrote:

On Mon, Apr 1, 2013 at 4:09 PM, Andres Freund<[email protected]>  wrote:

>On 2013-04-01 08:28:13 -0500, Merlin Moncure wrote:

>>On Sun, Mar 31, 2013 at 1:27 PM, Jeff Janes<[email protected]>  wrote:

>> >On Friday, March 22, 2013, Ants Aasma wrote:

>> >>
>> >>On Fri, Mar 22, 2013 at 10:22 PM, Merlin Moncure<[email protected]>
>> >>wrote:

>> >> >well if you do a non-locking test first you could at least avoid some
>> >> >cases (and, if you get the answer wrong, so what?) by jumping to the
>> >> >next buffer immediately.  if the non locking test comes good, only
>> >> >then do you do a hardware TAS.
>> >> >
>> >> >you could in fact go further and dispense with all locking in front of
>> >> >usage_count, on the premise that it's only advisory and not a real
>> >> >refcount.  so you only then lock if/when it's time to select a
>> >> >candidate buffer, and only then when you did a non locking test first.
>> >> >  this would of course require some amusing adjustments to various
>> >> >logical checks (usage_count <= 0, heh).

>> >>
>> >>Moreover, if the buffer happens to miss a decrement due to a data
>> >>race, there's a good chance that the buffer is heavily used and
>> >>wouldn't need to be evicted soon anyway. (if you arrange it to be a
>> >>read-test-inc/dec-store operation then you will never go out of
>> >>bounds) However, clocksweep and usage_count maintenance is not what is
>> >>causing contention because that workload is distributed. The issue is
>> >>pinning and unpinning.

>> >
>> >
>> >That is one of multiple issues.  Contention on the BufFreelistLock is
>> >another one.  I agree that usage_count maintenance is unlikely to become a
>> >bottleneck unless one or both of those is fixed first (and maybe not even
>> >then)

>>
>>usage_count manipulation is not a bottleneck but that is irrelevant.
>>It can be affected by other page contention which can lead to priority
>>inversion.  I don't be believe there is any reasonable argument that
>>sitting and spinning while holding the BufFreelistLock is a good idea.

>
>In my experience the mere fact of (unlockedly, but still) accessing all the
>buffer headers can cause noticeable slowdowns in write only/mostly workloads 
with
>big amounts of shmem.
>Due to the write only nature large amounts of the buffers have a similar
>usagecounts (since they are infrequently touched after the initial insertion)
>and there are no free ones around so the search for a buffer frequently runs
>through*all*  buffer headers multiple times till it decremented all usagecounts
>to 0. Then comes a period where free buffers are found easily (since all
>usagecounts from the current sweep point onwards are zero). After that it
>starts all over.
>I now have seen that scenario multiple times:(

Interesting -- I was thinking about that too, but it's a separate
problem with a different trigger.  Maybe a bailout should be in there
so that after X usage_count adjustments the sweeper summarily does an
eviction, or maybe the "max" declines from 5 once per hundred buffers
inspected or some such.


What's the potential downside on that though? IE: what happens if this scheme 
suddenly evicts the root page on a heavily used index? You'll suddenly have a 
ton of stuff blocked waiting for that page to come back in.

This is a use case that I think would benefit greatly from a background process 
that keeps pages in the free list.

That said, I now suspect that your "frightened turtle" approach would be of higher value 
than "bgfreelist"... but I suspect we'll ultimately want both of them for different 
reasons.
--
Jim C. Nasby, Data Architect                   [email protected]
512.569.9461 (cell)                         http://jim.nasby.net


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

Reply via email to