On Thu, May 31, 2012 at 1:50 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Thu, May 31, 2012 at 2:03 PM, Merlin Moncure <mmonc...@gmail.com> wrote: >> On Thu, May 31, 2012 at 11:54 AM, Sergey Koposov <kopo...@ast.cam.ac.uk> >> wrote: >>> On Thu, 31 May 2012, Robert Haas wrote: >>> >>>> Oh, ho. So from this we can see that the problem is that we're >>>> getting huge amounts of spinlock contention when pinning and unpinning >>>> index pages. >>>> >>>> It would be nice to have a self-contained reproducible test case for >>>> this, so that we could experiment with it on other systems. >>> >>> >>> I have created it a few days ago: >>> http://archives.postgresql.org/pgsql-hackers/2012-05/msg01143.php >>> >>> It is still valid. And I'm using exactly it to test. The only thing to >>> change is to create a two-col index and drop another index. >>> The scripts are precisely the ones I'm using now. >>> >>> The problem is that in order to see a really big slowdown (10 times slower >>> than a single thread) I've had to raise the buffers to 48g but it was slow >>> for smaller shared buffer settings as well. >>> >>> But I'm not sure how sensitive the test is to the hardware. >> >> It's not: high contention on spinlocks is going to suck no matter what >> hardware you have. I think the problem is pretty obvious now: any >> case where multiple backends are scanning the same sequence of buffers >> in a very tight loop is going to display this behavior. It doesn't >> come up that often: it takes a pretty unusual sequence of events to >> get a bunch of backends hitting the same buffer like that. >> >> Hm, I wonder if you could alleviate the symptoms by making making the >> Pin/UnpinBuffer smarter so that frequently pinned buffers could stay >> pinned longer -- kinda as if your private ref count was hacked to be >> higher in that case. It would be a complex fix for a narrow issue >> though. > > This test case is unusual because it hits a whole series of buffers > very hard. However, there are other cases where this happens on a > single buffer that is just very, very hot, like the root block of a > btree index, where the pin/unpin overhead hurts us. I've been > thinking about this problem for a while, but it hasn't made it up to > the top of my priority list, because workloads where pin/unpin is the > dominant cost are still relatively uncommon. I expect them to get > more common as we fix other problems. > > Anyhow, I do have some vague thoughts on how to fix this. Buffer pins > are a lot like weak relation locks, in that they are a type of lock > that is taken frequently, but rarely conflicts. And the fast-path > locking in 9.2 provides a demonstration of how to handle this kind of > problem efficiently: making the weak, rarely-conflicting locks > cheaper, at the cost of some additional expense when a conflicting > lock (in this case, a buffer cleanup lock) is taken. In particular, > each backend has its own area to record weak relation locks, and a > strong relation lock must scan all of those areas and migrate any > locks found there to the main lock table. I don't think it would be > feasible to adopt exactly this solution for buffer pins, because page > eviction and buffer cleanup locks, while not exactly common, are > common enough that we can't require a scan of N per-backend areas > every time one of those operations occurs. > > But, maybe we could have a system of this type that only applies to > the very hottest buffers. Suppose we introduce two new buffer flags, > BUF_NAILED and BUF_NAIL_REMOVAL. When we detect excessive contention > on the buffer header spinlock, we set BUF_NAILED. Once we do that, > the buffer can't be evicted until that flag is removed, and backends > are permitted to record pins in a per-backend area protected by a > per-backend spinlock or lwlock, rather than in the buffer header. > When we want to un-nail the buffer, we set BUF_NAIL_REMOVAL.
Hm, couple questions: how do you determine if/when to un-nail a buffer, and who makes that decision (bgwriter?) Is there a limit to how many buffers you are allowed to nail? It seems like a much stronger idea, but one downside I see vs the 'pin for longer idea' i was kicking around was how to deal stale nailed buffers and keeping them from uncontrollably growing so that you have to either stop nailing or forcibly evicting them. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers