On Fri, Jun 1, 2012 at 3:40 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Jun 1, 2012 at 3:16 PM, Florian Pflug <f...@phlo.org> wrote:
>> Ok, now you've lost me. If the read() blocks, how on earth can a few
>> additional pins/unpins ever account for any meaningful execution time?
>>
>> It seems to me that once read() blocks we're talking about a delay in the
>> order of the scheduling granularity (i.e., milliseconds, in the best case),
>> while even in the word case pinning a buffer shouldn't take more than
>> 1000 cycles (for comparison, I think a cache miss across all layers costs
>> a few hundred cycles). So there's at the very least 3 order of magnitude
>> between those two...
>
> I'm not sure what you want me to say here.  s_lock shows up in the
> profile, and some of that is from PinBuffer.  I think any detectable
> number of calls to s_lock is a sign of Bad Things (TM).  I can't
> reproduce anything as severe as what the OP is seeing, but what does
> that prove?  In a couple years we'll have systems with 128 cores
> floating around, and things that are minor problems at 32 or even 64
> cores will be crippling at 128 cores.  IME, spinlock contention has a
> very sharp tipping point.  It's only a minor annoyance and then you
> hit some threshold number of cores and, bam, you're spending 70-90% of
> your time across all cores fighting over that one spinlock.

I think your approach, nailing buffers, is really the way to go.  It
nails buffers based on detected contention which is very desirable --
uncontended spinlocks aren't broken and don't need to be fixed.  It
also doesn't add overhead in the general case whereas a side by side
backend queue does.

Another nice aspect is that you're not changing the lifetime of the
pin as the backend sees it but storing the important stuff (the
interplay with usage_count is a nice touch) on the buffer itself --
you want to keep as little as possible in the backend private side and
your patch does that; it's more amenable to 3rd party intervention
(flush your buffers right now!) then extended pins.  It exploits the
fact that pins can overlap and that the reference count is useless if
the buffer is always in memory anyways.  It immediately self corrects
when the first backend gripes whereas a per backend solution will
grind down as each backend independently determines it's got a problem
-- not pleasant if your case is 'walking' a set of buffers.

Buffer pins aren't a cache: with a cache you are trying to mask a slow
operation (like a disk i/o) with a faster such that the amount of slow
operations are minimized.  Buffer pins however are very different in
that we only care about contention on the reference count (the buffer
itself is not locked!) which makes me suspicious that caching type
algorithms are the wrong place to be looking.  I think it comes to do
picking between your relatively complex but general, lock displacement
approach or a specific strategy based on known bottlenecks.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to