Re: [HACKERS] Wait free LW_SHARED acquisition

Heikki Linnakangas Fri, 27 Sep 2013 01:13:08 -0700

On 27.09.2013 10:21, Andres Freund wrote:

Hi,


On 2013-09-27 10:14:46 +0300, Heikki Linnakangas wrote:

On 27.09.2013 01:55, Andres Freund wrote:

We have had several customers running postgres on bigger machines report
problems on busy systems. Most recently one where a fully cached
workload completely stalled in s_lock()s due to the *shared* lwlock
acquisition in BufferAlloc() around the buffer partition lock.

Increasing the padding to a full cacheline helps making the partitioning
of the partition space actually effective (before it's essentially
halved on a read-mostly workload), but that still leaves one with very
hot spinlocks.

So the goal is to have LWLockAcquire(LW_SHARED) never block unless
somebody else holds an exclusive lock. To produce enough appetite for
reading the rest of the long mail, here's some numbers on a
pgbench -j 90 -c 90 -T 60 -S (-i -s 10) on a 4xE5-4620

master + padding: tps = 146904.451764
master + padding + lwlock: tps = 590445.927065


How does that compare with simply increasing NUM_BUFFER_PARTITIONS?


Heaps better. In the case causing this investigation lots of the pages
with hot spinlocks were the simply the same ones over and over again,
partitioning the lockspace won't help much there.
That's not exactly an uncommon scenario since often enough there's a
small amount of data hit very frequently and lots more that is accessed
only infrequently. E.g. recently inserted data and such tends to be very hot.

I see. So if only a few buffers are really hot, I'm assuming the problemisn't just the buffer partition lock, but also the spinlock on thebuffer header, and the buffer content lwlock. Yeah, improving LWLockswould be a nice wholesale solution to that. I don't see any fundamentalflaw in your algorithm. Nevertheless, I'm going to throw in a couple ofother ideas:

* Keep a small 4-5 entry cache of buffer lookups in each backend of mostrecently accessed buffers. Before searching for a buffer in theSharedBufHash, check the local cache.

* To pin a buffer, use an atomic fetch-and-add instruction to increasethe refcount. PinBuffer() also increases usage_count, but you could dothat without holding a lock; it doesn't need to be accurate.

One problem with your patch is going to be to make it also work withoutthe CAS and fetch-and-add instructions. Those are probably present inall the architectures we support, but it'll take some effort to get thearchitecture-specific code done. Until it's all done, it would be goodto be able to fall back to plain spinlocks, which we already have. Also,when someone ports PostgreSQL to a new architecture in the future, itwould be helpful if you wouldn't need to write all thearchitecture-specific code immediately to get it to compile.

Did you benchmark your patch against the compare-and-swap patch I postedearlier?(http://www.postgresql.org/message-id/[email protected]). Juston a theoretical level, I would assume your patch to scale better sinceit uses fetch-and-add instead of compare-and-swap for acquiring a sharedlock. But in practice it might be a wash.


- Heikki


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition

Reply via email to