Re: [HACKERS] 2nd Level Buffer Cache

Merlin Moncure Mon, 21 Mar 2011 08:55:07 -0700

On Mon, Mar 21, 2011 at 5:24 AM, Greg Stark <gsst...@mit.edu> wrote:
> On Fri, Mar 18, 2011 at 11:55 PM, Josh Berkus <j...@agliodbs.com> wrote:
>>> To take the opposite approach... has anyone looked at having the OS just 
>>> manage all caching for us? Something like MMAPed shared buffers? Even if we 
>>> find the issue with large shared buffers, we still can't dedicate serious 
>>> amounts of memory to them because of work_mem issues. Granted, that's 
>>> something else on the TODO list, but it really seems like we're 
>>> re-inventing the wheels that the OS has already created here...
>
> A lot of people have talked about it. You can find references to mmap
> going at least as far back as 2001 or so. The problem is that it would
> depend on the OS implementing things in a certain way and guaranteeing
> things we don't think can be portably assumed. We would need to mlock
> large amounts of address space which most OS's don't allow, and we
> would need to at least mlock and munlock lots of small bits of memory
> all over the place which would create lots and lots of mappings which
> the kernel and hardware implementations would generally not
> appreciate.
>
>> As far as I know, no OS has a more sophisticated approach to eviction
>> than LRU.  And clock-sweep is a significant improvement on performance
>> over LRU for frequently accessed database objects ... plus our
>> optimizations around not overwriting the whole cache for things like VACUUM.
>
> The clock-sweep algorithm was standard OS design before you or I knew
> how to type. I would expect any half-decent OS to have sometihng at
> least as good -- perhaps better because it can rely on hardware
> features to handle things.
>
> However the second point is the crux of the issue and of all similar
> issues on where to draw the line between the OS and Postgres. The OS
> knows better about the hardware characteristics and can better
> optimize the overall system behaviour, but Postgres understands better
> its own access patterns and can better optimize its behaviour whereas
> the OS is stuck reverse-engineering what Postgres needs, usually from
> simple heuristics.
>
>>
>> 2-level caches work well for a variety of applications.
>
> I think 2-level caches with simple heuristics like "pin all the
> indexes" is unlikely to be helpful. At least it won't optimize the
> average case and I think that's been proven. It might be helpful for
> optimizing the worst-case which would reduce the standard deviation.
> Perhaps we're at the point now where that matters.
>
> Where it might be helpful is as a more refined version of the
> "sequential scans use limited set of buffers" patch. Instead of having
> each sequential scan use a hard coded number of buffers, perhaps all
> sequential scans should share a fraction of the global buffer pool
> managed separately from the main pool. Though in my thought
> experiments I don't see any real win here. In the current scheme if
> there's any sign the buffer is useful it gets thrown from the
> sequential scan's set of buffers to reuse anyways.
>
>> Now, what would be *really* useful is some way to avoid all the data
>> copying we do between shared_buffers and the FS cache.
>>
>
> Well the two options are mmap/mlock or directio. The former might be a
> fun experiment but I expect any OS to fall over pretty quickly when
> faced with thousands (or millions) of 8kB mappings. The latter would
> need Postgres to do async i/o and hopefully a global view of its i/o
> access patterns so it could do prefetching in a lot more cases.


Can't you make just one large mapping and lock it in 8k regions? I
thought the problem with mmap was not being able to detect other
processes 
(http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.html)
compatibility issues (possibly obsolete), etc.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2nd Level Buffer Cache

Reply via email to