On Sat, Feb 28, 2026 at 7:16 PM Andres Freund <[email protected]> wrote:
>
> > +BufferNeedsWALFlush(BufferDesc *bufdesc, bool exclusive_locked)
> > +{
> > +     uint32          buf_state = pg_atomic_read_u64(&bufdesc->state);
> > +     Buffer          buffer;
> > +     char       *page;
> > +     XLogRecPtr      lsn;
> > +
> > +     /*
> > +      * Unlogged buffers can't need WAL flush. See FlushBuffer() for more
> > +      * details on unlogged relations with LSNs.
> > +      */
> > +     if (!(buf_state & BM_PERMANENT))
> > +             return false;
> > +
> > +     buffer = BufferDescriptorGetBuffer(bufdesc);
> > +     page = BufferGetPage(buffer);
> > +
> > +     if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer) || 
> > exclusive_locked)
> > +             lsn = PageGetLSN(page);
> > +     else
> > +     {
> > +             /* Buffer is either share locked or not locked */
> > +             LockBufHdr(bufdesc);
> > +             lsn = PageGetLSN(page);
> > +             UnlockBufHdr(bufdesc);
> > +     }
>
> I buy the exclusive_locked branch, but the rest seems a bit dubious:
>
> - BufferIsLocal(buffer) is impossible, the buffer would not be permanent, and
>   I don't think we use strategy

Changed it to an assert.

> - XLogHintBitIsNeeded() isn't *that* cheap and if we dirtied the buffer, the
>   relative cost of locking the buffer header isn't going to matter.

BufferGetLSNAtomic() checks XLogHintBitIsNeeded() as part of what it
calls a "fast path". I suppose in the case of BufferNeedsWALFlush() we
expect it only to be called on dirty buffers, so perhaps we don't need
to worry about being "fast". But why would XLogHintBitIsNeeded() be
slower than the buffer header lock? Is accessing the ControlFile
variable expensive or is it just the external function call?

> > diff --git a/src/backend/storage/buffer/freelist.c 
> > b/src/backend/storage/buffer/freelist.c
>
> > @@ -795,8 +800,14 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, 
> > BufferDesc *buf, bool from_r
> >               strategy->buffers[strategy->current] != 
> > BufferDescriptorGetBuffer(buf))
> >               return false;
> >
> > +     Assert(BufferIsLockedByMe(BufferDescriptorGetBuffer(buf)));
> > +     Assert(!(pg_atomic_read_u64(&buf->state) & BM_LOCKED));
> > +
> > +     if (!BufferNeedsWALFlush(buf, false))
> > +             return false;
>
> It's a bit unfortunate that now you have an external function call from
> bufmgr.c (for StrategyRejectBuffer()) and then an external call back to
> bufmgr.c, just to then return back to freelist.c and then to bufmgr.c.  And
> you need to re-read the buf_state in BufferNeedsWALFlush() again.
>
> I'd probably just call BufferNeedsWALFlush() in GetVictimBuffer() and pass it
> the old buf_state, to avoid having to re-fetch it.

My idea was to avoid taking the buffer header lock if it is a
non-bulkread strategy -- which you can't do if you need to call
BufferNeedsWALFlush() from bufmgr.c. You have to check
BufferNeedsWALFlush() before StrategyRejectBuffer() because
StrategyRejectBuffer() sets the buffer to InvalidBuffer in the ring.
So you think that the external function call will be more expensive
than taking the buffer header lock?

And, I find the current StrategyRejectBuffer() kind of silly and hard
to understand -- its comment mentions needing WAL flush, but, of
course it doesn't check that at all. I've changed it as you suggest,
but I did really prefer it the other way from a code readability
perspective.

- Melanie
From 7db7aa69f5c56aa2ae419d99b3308a6209f6f82a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 7 Mar 2026 11:38:14 -0500
Subject: [PATCH v2] Avoid WAL-flush checks for unlogged buffers in
 GetVictimBuffer()

GetVictimBuffer() rejects a victim buffer if it is from a bulkread
strategy ring and reusing it would require flushing WAL. Unlogged table
buffers can have fake LSNs (e.g. unlogged GiST pages) and calling
XLogNeedsFlush() on a fake LSN is meaningless.

This isn't an issue currently because the bulkread strategy is not used
for relations with fake LSNs. However, fixing it avoids future issues
and avoids acquiring the buffer header lock for unlogged table buffers.
While we're at it, avoid taking the buffer header lock for buffers not
from the strategy ring.

Author: Melanie Plageman <[email protected]>
Reported-by: Andres Freund <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/fmkqmyeyy7bdpvcgkheb6yaqewemkik3ls6aaveyi5ibmvtxnd%40nu2kvy5rq3a6
---
 src/backend/access/transam/xlog.c     |  3 ++
 src/backend/storage/buffer/bufmgr.c   | 68 +++++++++++++++++++++------
 src/backend/storage/buffer/freelist.c |  4 +-
 3 files changed, 59 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b9b678f3722..f846f3814b4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3182,6 +3182,9 @@ XLogNeedsFlush(XLogRecPtr record)
 			return true;
 	}
 
+	/* Protect against callers using fake LSNs */
+	Assert(record <= GetXLogInsertRecPtr());
+
 	/* Quick exit if already known flushed */
 	if (record <= LogwrtResult.Flush)
 		return false;
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 5f3d083e938..6d16f872674 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -625,6 +625,8 @@ static bool PinBuffer(BufferDesc *buf, BufferAccessStrategy strategy,
 static void PinBuffer_Locked(BufferDesc *buf);
 static void UnpinBuffer(BufferDesc *buf);
 static void UnpinBufferNoOwner(BufferDesc *buf);
+static bool BufferNeedsWALFlush(BufferDesc *bufdesc, uint64 buf_state,
+								bool exclusive_locked);
 static void BufferSync(int flags);
 static int	SyncOneBuffer(int buf_id, bool skip_recently_used,
 						  WritebackContext *wb_context);
@@ -2522,22 +2524,13 @@ again:
 		 * lock to inspect the page LSN, so this can't be done inside
 		 * StrategyGetBuffer.
 		 */
-		if (strategy != NULL)
+		if (strategy && from_ring &&
+			BufferNeedsWALFlush(buf_hdr, buf_state, false) &&
+			StrategyRejectBuffer(strategy, buf_hdr, from_ring))
 		{
-			XLogRecPtr	lsn;
-
-			/* Read the LSN while holding buffer header lock */
-			buf_state = LockBufHdr(buf_hdr);
-			lsn = BufferGetLSN(buf_hdr);
-			UnlockBufHdr(buf_hdr);
-
-			if (XLogNeedsFlush(lsn)
-				&& StrategyRejectBuffer(strategy, buf_hdr, from_ring))
-			{
-				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-				UnpinBuffer(buf_hdr);
-				goto again;
-			}
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+			UnpinBuffer(buf_hdr);
+			goto again;
 		}
 
 		/* OK, do the I/O */
@@ -3437,6 +3430,51 @@ TrackNewBufferPin(Buffer buf)
 							  BLCKSZ);
 }
 
+/*
+ * Returns true if the buffer needs WAL flushed before it can be written out.
+ *
+ * The caller must pass a buf_state value read from bufdesc->state.
+ *
+ * If the result is required to be correct, the caller must hold a buffer
+ * content lock. If they only hold a shared content lock, we'll need to
+ * acquire the buffer header spinlock, so they must not already hold it.
+ */
+static bool
+BufferNeedsWALFlush(BufferDesc *bufdesc, uint64 buf_state,
+					bool exclusive_locked)
+{
+	Buffer		buffer;
+	char	   *page;
+	XLogRecPtr	lsn;
+
+	/*
+	 * Unlogged buffers can't need WAL flush. See FlushBuffer() for more
+	 * details on unlogged relations with LSNs.
+	 */
+	if (!(buf_state & BM_PERMANENT))
+		return false;
+
+	buffer = BufferDescriptorGetBuffer(bufdesc);
+	Assert(!BufferIsLocal(buffer));
+	page = BufferGetPage(buffer);
+
+	if (exclusive_locked)
+	{
+		Assert(BufferIsLockedByMeInMode(buffer, BUFFER_LOCK_EXCLUSIVE));
+		lsn = PageGetLSN(page);
+	}
+	else
+	{
+		/* Buffer is either share locked or not locked. */
+		LockBufHdr(bufdesc);
+		lsn = PageGetLSN(page);
+		UnlockBufHdr(bufdesc);
+	}
+
+	return XLogNeedsFlush(lsn);
+}
+
+
 #define ST_SORT sort_checkpoint_bufferids
 #define ST_ELEMENT_TYPE CkptSortItem
 #define ST_COMPARE(a, b) ckpt_buforder_comparator(a, b)
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index b7687836188..6f31cd22289 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -786,6 +786,8 @@ IOContextForStrategy(BufferAccessStrategy strategy)
 bool
 StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool from_ring)
 {
+	Assert(strategy);
+
 	/* We only do this in bulkread mode */
 	if (strategy->btype != BAS_BULKREAD)
 		return false;
@@ -796,7 +798,7 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool from_r
 		return false;
 
 	/*
-	 * Remove the dirty buffer from the ring; necessary to prevent infinite
+	 * Remove the dirty buffer from the ring; necessary to prevent an infinite
 	 * loop if all ring members are dirty.
 	 */
 	strategy->buffers[strategy->current] = InvalidBuffer;
-- 
2.43.0

Reply via email to