Re: [PERFORM] B-Heaps

Greg Smith Mon, 14 Jun 2010 23:41:40 -0700

Eliot Gable wrote:

Just curious if this would apply toPostgreSQL: http://queue.acm.org/detail.cfm?id=1814327

It's hard to take this seriously at all when it's so ignorant of actualresearch in this area. Take a look athttp://www.cc.gatech.edu/~bader/COURSES/UNM/ece637-Fall2003/papers/BFJ01.pdffor a second, specifically page 9. See the "van Emde Boas" layout?That's basically the same as what this article is calling a B-heap, andthe idea goes back to at least 1977. As you can see from that paper,the idea of using it to optimize for multi-level caches goes back to atleast 2001. Based on the performance number, it seems a particularlyhelpful optimization for the type of in-memory caching that his Varnishtool is good at, so kudos for reinventing the right wheel. But that'san environment with one level of cache: you're delivering somethingfrom memory, or not. You can't extrapolate from what works for thatvery far.

So, how does PostgreSQL deal with the different latencies involved inaccessing data on disk for searches / sorts vs. accessing data inmemory? Is it allocated in a similar way as described in the articlesuch that disk access is reduced to a minimum?

PostgreSQL is modeling a much more complicated situation where there aremany levels of caches, from CPU to disk. When executing a query, thedatabase tries to manage that by estimating the relative costs for CPUoperations, row operations, sequential disk reads, and random diskreads. Those fundamental operations are then added up to build morecomplicated machinery like sorting. To minimize query execution cost,various query plans are considered, the cost computed for each one, andthe cheapest one gets executed. This has to take into account a widevariety of subtle tradeoffs related to whether memory should be used forthings that would otherwise happen on disk. There are three primaryways to search for a row, three main ways to do a join, two for how tosort, and they all need to have cost estimates made for them thatbalance CPU time, memory, and disk access.

The problem Varnish is solving is most like how PostgreSQL decides whatdisk pages to keep in memory, specifically the shared_buffersstructure. Even there the problem the database is trying to solve isquite a bit more complicated than what a HTTP cache has to deal with.For details about what the database does there, see "Inside thePostgreSQL Buffer Cache" at http://projects.2ndquadrant.com/talks


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
[email protected]   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] B-Heaps

Reply via email to