Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

Jim Nasby Mon, 14 Apr 2014 21:56:35 -0700

On 4/14/14, 7:43 PM, Stephen Frost wrote:

* Jim Nasby ([email protected]) wrote:

I think it's important to mention that OS implementations (at least all I know of) 
have multiple page pools, each of which has it's own clock. IIRC one of the 
arguments for us supporting a count>1 was we could get the benefits of multiple 
page pools without the overhead. In reality I believe that argument is false, 
because the clocks for each page pool in an OS *run at different rates* based on 
system demands.


They're also maintained in *parallel*, no?  That's something that I've
been talking over with a few folks at various conferences- that we
should consider breaking up shared buffers and then have new backend
processes which work through each pool independently and in parallel.


I suspect that varies based on the OS, but it certainly happens in a separate 
process from user processes. The expectation is that there should always be 
pages on the free list so requests for memory can happen quickly.

http://www.freebsd.org/doc/en/articles/vm-design/freeing-pages.html contains a 
good overview of what FreeBSD does. See 
http://www.freebsd.org/doc/en/articles/vm-design/allen-briggs-qa.html#idp62990256
 as well.

I don't know if multiple buffer pools would be good or bad for Postgres, but I 
do think it's important to remember this difference any time we look at what 
OSes do.


It's my suspicion that the one-big-pool is exactly why we see many cases
where PG performs worse when the pool is more than a few gigs.  Of
course, this is all speculation and proper testing needs to be done..


I think there some critical take-aways from FreeBSD that apply here (in no 
particular order):

1: The system is driven by memory pressure. No pressure means no processing.
2: It sounds like the active list is LFU, not LRU. The cache list is LRU.
3: *The use counter is maintained by a clock.* Because the clock only runs so 
often this means there is no run-away incrementing like we see in Postgres.
4: Once a page is determined to not be active it goes onto a separate list 
depending on whether it's clean or dirty.
5: Dirty pages are only written to maintain a certain clean/dirty ratio and 
again, only when there's actual memory pressure.
6: The system maintains a list of free pages to serve memory requests quickly. 
In fact, lower level functions (ie: 
http://www.leidinger.net/FreeBSD/dox/vm/html/d4/d65/vm__phys_8c_source.html#l00862)
 simply return NULL if they can't find pages on the free list.
--
Jim C. Nasby, Data Architect                       [email protected]
512.569.9461 (cell)                         http://jim.nasby.net


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

Reply via email to