ReadRecentBuffer() doesn't scale well

Andres Freund Mon, 26 Jun 2023 19:06:05 -0700

Hi,

As mentioned nearby [1], Thomas brought up [2] the idea of using
ReadRecentBuffer() _bt_getroot().  I couldn't resist and prototyped it.


Unfortunately it scaled way worse at first. This is not an inherent issue, but
due to an implementation choice in ReadRecentBuffer().  Whereas the normal
BufferAlloc() path uses PinBuffer(), ReadRecentBuffer() first does
LockBufHdr(), checks if the buffer ID is the same and then uses
PinBuffer_Locked().

The problem with that is that PinBuffer() takes care to not hold the buffer
header spinlock, it uses compare_exchange to atomically acquire the pin, while
guaranteing nobody holds the lock.  When holding the buffer header spinlock,
there obviously is the risk of being scheduled out (or even just not have
exclusive access to the cacheline).

ReadRecentBuffer() scales worse even if LockBufHdr() is immediately followed
by PinBuffer_Locked(), so it's really just holding the lock that is the issue.


The fairly obvious solution to this is to just use PinBuffer() and just unpin
the buffer if its identity was changed concurrently. There could be an
unlocked pre-check as well.  However, there's the following comment in
ReadRecentBuffer():
                         * It's now safe to pin the buffer.  We can't pin first 
and ask
                         * questions later, because it might confuse code paths 
like
                         * InvalidateBuffer() if we pinned a random 
non-matching buffer.
                         */

But I'm not sure I buy that - there's plenty other things that can briefly
acquire a buffer pin (e.g. checkpointer, reclaiming the buffer for other
contents, etc).



Another difference between using PinBuffer() and PinBuffer_locked() is that
the latter does not adjust a buffer's usagecount.

Leaving the scalability issue aside, isn't it somewhat odd that optimizing a
codepath to use ReadRecentBuffer() instead of ReadBuffer() leads to not
increasing usagecount anymore?


FWIW, once that's fixed, using ReadRecentBuffer() for _bt_getroot(), caching
the root page's buffer id in RelationData, seems a noticeable win. About 7% in
a concurrent, read-only pgbench that utilizes batches of 10. And it should be
easy to get much bigger wins, e.g. with a index nested loop with a relatively
small index on the inner side.


Greetings,

Andres Freund

[1] 
https://www.postgresql.org/message-id/20230627013458.axge7iylw7llyvww%40awork3.anarazel.de
[2] https://twitter.com/MengTangmu/status/1673439083518115840

ReadRecentBuffer() doesn't scale well

Reply via email to