Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

Robert Haas Mon, 28 Apr 2014 06:03:34 -0700

On Fri, Apr 18, 2014 at 11:46 AM, Greg Stark <st...@mit.edu> wrote:
> On Fri, Apr 18, 2014 at 4:14 PM, Robert Haas <robertmh...@gmail.com> wrote:
>> I am a bit confused by this remark.  In *any* circumstance when you
>> evict you're incurring precisely one page fault I/O when the page is
>> read back in.   That doesn't mean that the choice of which page to
>> evict is irrelevant.
>
> But you might be evicting a page that will be needed soon or one that
> won't be needed for a while. If it's not needed for a while you might
> be able to avoid many page evictions by caching a page that will be
> used several times.


Sure.

> If all the pages currently in RAM are hot -- meaning they're hot
> enough that they'll be needed again before the page you're reading in
> -- then they're all equally bad to evict.

Also true.  But the problem is that it is very rarely, if ever, the
case that all pages are *equally* hot.  On a pgbench workload, for
example, I'm very confident that while there's not really any cold
data, the btree roots and visibility map pages are a whole lot hotter
than a randomly-selected heap page.  If you evict a heap page, you're
going to need it back pretty quick, because it won't be long until the
random-number generator again chooses a key that happens to be located
on that page.  But if you evict the root of the btree index, you're
going to need it back *immediately*, because the very next query, no
matter what key it's looking for, is going to need that page.  I'm
pretty sure that's a significant difference.

> I'm trying to push us away from the gut instinct that frequently used
> pages are important to cache and towards actually counting how many
> i/os we're saving. In the extreme it's possible to simulate any cache
> algorithm on a recorded list of page requests and count how many page
> misses it generates to compare it with an optimal cache algorithm.

There's another issue, which Simon clued me into a few years back:
evicting the wrong page can cause system-wide stalls.  In the pgbench
case, evicting a heap page will force the next process that chooses a
random number that maps to a tuple on that page to wait for the page
to be faulted back in.  That's sad, but unless the scale factor is
small compared to the number of backends, there will probably be only
ONE process waiting.  On the other hand, if we evict the btree root,
within a fraction of a second, EVERY process that isn't already
waiting on some other I/O will be waiting for that I/O to complete.
The impact on throughput is much bigger in that case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

Reply via email to