Hi,

On 2026-02-15 11:52:39 -0800, Noah Misch wrote:
> On Sat, Feb 07, 2026 at 12:44:25PM +0200, Heikki Linnakangas wrote:
> > On 03/02/2026 00:33, Andres Freund wrote:
> > > - The way MarkBufferDirtyHint() operates was copied into
> > >    heap_inplace_update_and_unlock(). Now that MarkBufferDirtyHint() won't 
> > > work
> > >    that way anymore, it seems better to go with the alternative approach 
> > > the
> > >    comments already outlined, namely to only delay updating of the buffer
> > >    contents.
> > > 
> > >    I've done this in a prequisite commit, as it doesn't actually depend 
> > > on any
> > >    of the other changes.  Noah, any chance you could take a look at this?
> 
> v12-0001-heapam-Don-t-mimic-MarkBufferDirtyHint-in-inplac.patch looks good.

> > How about this:
> > 
> >      * We avoid that by using a temporary copy of the buffer to hide our
> >      * change from other backends until it's been WAL-logged. We apply our
> >      * change to the temporary copy and WAL-log it before modifying the real
> >      * page. That way any action a reader of the in-place-updated value 
> > takes
> >      * will be WAL logged after this change.
> 
> Either v12 or v12 w/ this edit is fine with me.  I find this proposed text
> redundant with nearby comment "register block matching what buffer will look
> like after changes", so I mildly prefer v12.

Thanks for the review!


I pushed this and many of the later patches in the series.  Here are updated
versions of the remaining changes.  The last two previously were one commit
with "WIP" in the title. The first one has, I think, not had a lot of review -
but it's also not a complicated change.


I see decent performance improvements with a fully s_b resident pipelined
pgbench -S with 0002+0003, ~7-8% on an older small two socket machine.

The improvement is just from reducing the number of atomic operations on
contended cachelines (i.e. inner btree pages).

Without pipelining the difference is smaller (1-2%), because of the context
switches are the bigger bottleneck.


More extreme worloads involving an index nested loop join benefit
more. E.g. the setup and query from
https://anarazel.de/talks/2024-05-29-pgconf-dev-c2c/postgres-perf-c2c.pdf
slide 23, show a 25% improvement on the same 2 socket machine.


We could probably do something similar for the also very common combination of
PinBuffer() + LockBuffer(), but I think it'd be a fair bit more complicated,
and would require new APIs, rather than just using existing APIs more widely.

Greetings,

Andres Freund
>From 66ef53cdd3d5a99539c984488e11dd330ecbdf50 Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Tue, 13 Jan 2026 20:10:32 -0500
Subject: [PATCH v13 1/3] bufmgr: Don't copy pages while writing out

After the series of preceding commits introducing and using
BufferBeginSetHintBits()/BufferSetHintBits16(), hint bits are not set anymore
while IO is going on. Therefore we do not need to copy pages while they are
being written out anymore.

For the same reason XLogSaveBufferForHint() now does not need to operate on a
copy of the page anymore, but can instead use the normal XLogRegisterBuffer()
mechanism. For that the assertions and comments to XLogRegisterBuffer() had to
be updated to allow share-exclusive locked buffers to be registered.

Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d
---
 src/include/storage/bufpage.h           |  3 +-
 src/backend/access/hash/hashpage.c      |  2 +-
 src/backend/access/transam/xloginsert.c | 45 +++++------------------
 src/backend/storage/buffer/bufmgr.c     | 11 ++----
 src/backend/storage/buffer/localbuf.c   |  2 +-
 src/backend/storage/page/bufpage.c      | 48 ++++---------------------
 src/backend/storage/smgr/bulk_write.c   |  2 +-
 src/test/modules/test_aio/test_aio.c    |  2 +-
 8 files changed, 23 insertions(+), 92 deletions(-)

diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 3de58ba4312..e5267b93fe6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -537,7 +537,6 @@ extern void PageIndexMultiDelete(Page page, OffsetNumber *itemnos, int nitems);
 extern void PageIndexTupleDeleteNoCompact(Page page, OffsetNumber offnum);
 extern bool PageIndexTupleOverwrite(Page page, OffsetNumber offnum,
 									const void *newtup, Size newsize);
-extern char *PageSetChecksumCopy(Page page, BlockNumber blkno);
-extern void PageSetChecksumInplace(Page page, BlockNumber blkno);
+extern void PageSetChecksum(Page page, BlockNumber blkno);
 
 #endif							/* BUFPAGE_H */
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 8e220a3ae16..52c20208c66 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -1029,7 +1029,7 @@ _hash_alloc_buckets(Relation rel, BlockNumber firstblock, uint32 nblocks)
 					zerobuf.data,
 					true);
 
-	PageSetChecksumInplace(page, lastblock);
+	PageSetChecksum(page, lastblock);
 	smgrextend(RelationGetSmgr(rel), MAIN_FORKNUM, lastblock, zerobuf.data,
 			   false);
 
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index ac3c1a78396..baa12cfb3ef 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -251,9 +251,9 @@ XLogRegisterBuffer(uint8 block_id, Buffer buffer, uint8 flags)
 	Assert(begininsert_called);
 
 	/*
-	 * Ordinarily, buffer should be exclusive-locked and marked dirty before
-	 * we get here, otherwise we could end up violating one of the rules in
-	 * access/transam/README.
+	 * Ordinarily, the buffer should be exclusive-locked (or share-exclusive
+	 * in case of hint bits) and marked dirty before we get here, otherwise we
+	 * could end up violating one of the rules in access/transam/README.
 	 *
 	 * Some callers intentionally register a clean page and never update that
 	 * page's LSN; in that case they can pass the flag REGBUF_NO_CHANGE to
@@ -261,8 +261,11 @@ XLogRegisterBuffer(uint8 block_id, Buffer buffer, uint8 flags)
 	 */
 #ifdef USE_ASSERT_CHECKING
 	if (!(flags & REGBUF_NO_CHANGE))
-		Assert(BufferIsLockedByMeInMode(buffer, BUFFER_LOCK_EXCLUSIVE) &&
-			   BufferIsDirty(buffer));
+	{
+		Assert(BufferIsDirty(buffer));
+		Assert(BufferIsLockedByMeInMode(buffer, BUFFER_LOCK_EXCLUSIVE) ||
+			   BufferIsLockedByMeInMode(buffer, BUFFER_LOCK_SHARE_EXCLUSIVE));
+	}
 #endif
 
 	if (block_id >= max_registered_block_id)
@@ -1071,12 +1074,6 @@ XLogCheckBufferNeedsBackup(Buffer buffer)
  * buffer. The buffer already needs to have been marked dirty by
  * MarkBufferDirtyHint().
  *
- * We can't use the plain backup block mechanism since that relies on the
- * Buffer being exclusively locked. Since some modifications (setting LSN, hint
- * bits) are allowed in a sharelocked buffer that can lead to wal checksum
- * failures. So instead we copy the page and insert the copied data as normal
- * record data.
- *
  * We only need to do something if page has not yet been full page written in
  * this checkpoint round. The LSN of the inserted wal record is returned if we
  * had to write, InvalidXLogRecPtr otherwise.
@@ -1112,37 +1109,13 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
 	if (lsn <= RedoRecPtr)
 	{
 		int			flags = 0;
-		PGAlignedBlock copied_buffer;
-		char	   *origdata = (char *) BufferGetBlock(buffer);
-		RelFileLocator rlocator;
-		ForkNumber	forkno;
-		BlockNumber blkno;
-
-		/*
-		 * Copy buffer so we don't have to worry about concurrent hint bit or
-		 * lsn updates. We assume pd_lower/upper cannot be changed without an
-		 * exclusive lock, so the contents bkp are not racy.
-		 */
-		if (buffer_std)
-		{
-			/* Assume we can omit data between pd_lower and pd_upper */
-			Page		page = BufferGetPage(buffer);
-			uint16		lower = ((PageHeader) page)->pd_lower;
-			uint16		upper = ((PageHeader) page)->pd_upper;
-
-			memcpy(copied_buffer.data, origdata, lower);
-			memcpy(copied_buffer.data + upper, origdata + upper, BLCKSZ - upper);
-		}
-		else
-			memcpy(copied_buffer.data, origdata, BLCKSZ);
 
 		XLogBeginInsert();
 
 		if (buffer_std)
 			flags |= REGBUF_STANDARD;
 
-		BufferGetTag(buffer, &rlocator, &forkno, &blkno);
-		XLogRegisterBlock(0, &rlocator, forkno, blkno, copied_buffer.data, flags);
+		XLogRegisterBuffer(0, buffer, flags);
 
 		recptr = XLogInsert(RM_XLOG_ID, XLOG_FPI_FOR_HINT);
 	}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 6ded968e163..e5c06882395 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4416,7 +4416,6 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
 	ErrorContextCallback errcallback;
 	instr_time	io_start;
 	Block		bufBlock;
-	char	   *bufToWrite;
 
 	Assert(BufferLockHeldByMeInMode(buf, BUFFER_LOCK_EXCLUSIVE) ||
 		   BufferLockHeldByMeInMode(buf, BUFFER_LOCK_SHARE_EXCLUSIVE));
@@ -4479,12 +4478,8 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
 	 */
 	bufBlock = BufHdrGetBlock(buf);
 
-	/*
-	 * Update page checksum if desired.  Since we have only shared lock on the
-	 * buffer, other processes might be updating hint bits in it, so we must
-	 * copy the page to private storage if we do checksumming.
-	 */
-	bufToWrite = PageSetChecksumCopy((Page) bufBlock, buf->tag.blockNum);
+	/* Update page checksum if desired. */
+	PageSetChecksum((Page) bufBlock, buf->tag.blockNum);
 
 	io_start = pgstat_prepare_io_time(track_io_timing);
 
@@ -4494,7 +4489,7 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
 	smgrwrite(reln,
 			  BufTagGetForkNum(&buf->tag),
 			  buf->tag.blockNum,
-			  bufToWrite,
+			  bufBlock,
 			  false);
 
 	/*
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 404c6bccbdd..b69398c6375 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -199,7 +199,7 @@ FlushLocalBuffer(BufferDesc *bufHdr, SMgrRelation reln)
 		reln = smgropen(BufTagGetRelFileLocator(&bufHdr->tag),
 						MyProcNumber);
 
-	PageSetChecksumInplace(localpage, bufHdr->tag.blockNum);
+	PageSetChecksum(localpage, bufHdr->tag.blockNum);
 
 	io_start = pgstat_prepare_io_time(track_io_timing);
 
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index de85911e3ac..5cc92e68079 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -1494,51 +1494,15 @@ PageIndexTupleOverwrite(Page page, OffsetNumber offnum,
 /*
  * Set checksum for a page in shared buffers.
  *
- * If checksums are disabled, or if the page is not initialized, just return
- * the input.  Otherwise, we must make a copy of the page before calculating
- * the checksum, to prevent concurrent modifications (e.g. setting hint bits)
- * from making the final checksum invalid.  It doesn't matter if we include or
- * exclude hints during the copy, as long as we write a valid page and
- * associated checksum.
+ * If checksums are disabled, or if the page is not initialized, just
+ * return. Otherwise compute and set the checksum.
  *
- * Returns a pointer to the block-sized data that needs to be written. Uses
- * statically-allocated memory, so the caller must immediately write the
- * returned page and not refer to it again.
- */
-char *
-PageSetChecksumCopy(Page page, BlockNumber blkno)
-{
-	static char *pageCopy = NULL;
-
-	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
-		return page;
-
-	/*
-	 * We allocate the copy space once and use it over on each subsequent
-	 * call.  The point of palloc'ing here, rather than having a static char
-	 * array, is first to ensure adequate alignment for the checksumming code
-	 * and second to avoid wasting space in processes that never call this.
-	 */
-	if (pageCopy == NULL)
-		pageCopy = MemoryContextAllocAligned(TopMemoryContext,
-											 BLCKSZ,
-											 PG_IO_ALIGN_SIZE,
-											 0);
-
-	memcpy(pageCopy, page, BLCKSZ);
-	((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);
-	return pageCopy;
-}
-
-/*
- * Set checksum for a page in private memory.
- *
- * This must only be used when we know that no other process can be modifying
- * the page buffer.
+ * In the past this needed to be done on a copy of the page, due to the
+ * possibility of e.g., hint bits being set concurrently. However, this is not
+ * necessary anymore as hint bits won't be set while IO is going on.
  */
 void
-PageSetChecksumInplace(Page page, BlockNumber blkno)
+PageSetChecksum(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
 	if (PageIsNew(page) || !DataChecksumsEnabled())
diff --git a/src/backend/storage/smgr/bulk_write.c b/src/backend/storage/smgr/bulk_write.c
index 36b28824ec8..f3c24082a69 100644
--- a/src/backend/storage/smgr/bulk_write.c
+++ b/src/backend/storage/smgr/bulk_write.c
@@ -279,7 +279,7 @@ smgr_bulk_flush(BulkWriteState *bulkstate)
 		BlockNumber blkno = pending_writes[i].blkno;
 		Page		page = pending_writes[i].buf->data;
 
-		PageSetChecksumInplace(page, blkno);
+		PageSetChecksum(page, blkno);
 
 		if (blkno >= bulkstate->relsize)
 		{
diff --git a/src/test/modules/test_aio/test_aio.c b/src/test/modules/test_aio/test_aio.c
index b1aa8af9ec0..2ae4a559fab 100644
--- a/src/test/modules/test_aio/test_aio.c
+++ b/src/test/modules/test_aio/test_aio.c
@@ -288,7 +288,7 @@ modify_rel_block(PG_FUNCTION_ARGS)
 	}
 	else
 	{
-		PageSetChecksumInplace(page, blkno);
+		PageSetChecksum(page, blkno);
 	}
 
 	smgrwrite(RelationGetSmgr(rel),
-- 
2.53.0.1.gb2826b52eb

>From abf51df70c27ad0fd255ffe5877a25b556709baa Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Wed, 11 Mar 2026 15:12:53 -0400
Subject: [PATCH v13 2/3] Use UnlockReleaseBuffer() in more places

An upcoming commit will make UnlockReleaseBuffer() considerably faster and
more scalable than doing LockBuffer(BUFFER_LOCK_UNLOCK); ReleaseBuffer();. But
it's a small performance benefit even as-is.

Most of the callsites changed in this patch are not performance sensitive,
however some, like the nbtree ones, are in critical paths.

This patch changes all the easily convertible places over to
UnlockReleaseBuffer() mainly because I needed to check all of them anyway, and
reducing cases where the operations are done separately makes the checking
easier.

Discussion: https://postgr.es/m/
---
 src/backend/access/heap/heapam.c          |  6 ++--
 src/backend/access/heap/hio.c             |  7 +++--
 src/backend/access/nbtree/nbtpage.c       | 36 +++++++++++++++++++----
 src/backend/storage/buffer/bufmgr.c       |  3 +-
 src/backend/storage/freespace/freespace.c |  3 +-
 contrib/amcheck/verify_gin.c              |  6 ++--
 contrib/pageinspect/rawpage.c             |  3 +-
 contrib/pgstattuple/pgstatindex.c         |  3 +-
 src/test/modules/test_aio/test_aio.c      |  7 ++---
 9 files changed, 44 insertions(+), 30 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8f1c11a9350..7b988ff36df 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1687,8 +1687,7 @@ heap_fetch(Relation relation,
 	offnum = ItemPointerGetOffsetNumber(tid);
 	if (offnum < FirstOffsetNumber || offnum > PageGetMaxOffsetNumber(page))
 	{
-		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-		ReleaseBuffer(buffer);
+		UnlockReleaseBuffer(buffer);
 		*userbuf = InvalidBuffer;
 		tuple->t_data = NULL;
 		return false;
@@ -1704,8 +1703,7 @@ heap_fetch(Relation relation,
 	 */
 	if (!ItemIdIsNormal(lp))
 	{
-		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-		ReleaseBuffer(buffer);
+		UnlockReleaseBuffer(buffer);
 		*userbuf = InvalidBuffer;
 		tuple->t_data = NULL;
 		return false;
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index d26ceacd38c..1097f44a74e 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -711,14 +711,15 @@ loop:
 		 * unlock the two buffers in, so this can be slightly simpler than the
 		 * code above.
 		 */
-		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 		if (otherBuffer == InvalidBuffer)
-			ReleaseBuffer(buffer);
+			UnlockReleaseBuffer(buffer);
 		else if (otherBlock != targetBlock)
 		{
+			UnlockReleaseBuffer(buffer);
 			LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK);
-			ReleaseBuffer(buffer);
 		}
+		else
+			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 
 		/* Is there an ongoing bulk extension? */
 		if (bistate && bistate->next_free != InvalidBlockNumber)
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 4125c185e8b..acefe68a382 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -1007,24 +1007,48 @@ _bt_relandgetbuf(Relation rel, Buffer obuf, BlockNumber blkno, int access)
 
 	Assert(BlockNumberIsValid(blkno));
 	if (BufferIsValid(obuf))
-		_bt_unlockbuf(rel, obuf);
-	buf = ReleaseAndReadBuffer(obuf, rel, blkno);
+	{
+		if (BufferGetBlockNumber(obuf) == blkno)
+		{
+			/* trade in old lock mode for new lock */
+			_bt_unlockbuf(rel, obuf);
+			buf = obuf;
+		}
+		else
+		{
+			/* release lock and pin at once, that's a bit more efficient */
+			_bt_relbuf(rel, obuf);
+			buf = ReadBuffer(rel, blkno);
+		}
+	}
+	else
+		buf = ReadBuffer(rel, blkno);
+
 	_bt_lockbuf(rel, buf, access);
-
 	_bt_checkpage(rel, buf);
+
 	return buf;
 }
 
 /*
  *	_bt_relbuf() -- release a locked buffer.
  *
- * Lock and pin (refcount) are both dropped.
+ * Lock and pin (refcount) are both dropped. This is a bit more efficient than
+ * doing the two operations separately.
  */
 void
 _bt_relbuf(Relation rel, Buffer buf)
 {
-	_bt_unlockbuf(rel, buf);
-	ReleaseBuffer(buf);
+	/*
+	 * Buffer is pinned and locked, which means that it is expected to be
+	 * defined and addressable.  Check that proactively.
+	 */
+	VALGRIND_CHECK_MEM_IS_DEFINED(BufferGetPage(buf), BLCKSZ);
+
+	UnlockReleaseBuffer(buf);
+
+	if (!RelationUsesLocalBuffers(rel))
+		VALGRIND_MAKE_MEM_NOACCESS(BufferGetPage(buf), BLCKSZ);
 }
 
 /*
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index e5c06882395..53b327200d7 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2531,8 +2531,7 @@ again:
 			XLogNeedsFlush(BufferGetLSN(buf_hdr)) &&
 			StrategyRejectBuffer(strategy, buf_hdr, from_ring))
 		{
-			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-			UnpinBuffer(buf_hdr);
+			UnlockReleaseBuffer(buf);
 			goto again;
 		}
 
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index b9a8f368a63..40d67a96178 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -915,9 +915,8 @@ fsm_vacuum_page(Relation rel, FSMAddress addr,
 		((FSMPage) PageGetContents(page))->fp_next_slot = 0;
 		BufferFinishSetHintBits(buf, false, false);
 	}
-	LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
-	ReleaseBuffer(buf);
+	UnlockReleaseBuffer(buf);
 
 	return max_avail;
 }
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index f7a15a21467..abfad07d5e4 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -368,8 +368,7 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
 				stack->next = ptr;
 			}
 		}
-		LockBuffer(buffer, GIN_UNLOCK);
-		ReleaseBuffer(buffer);
+		UnlockReleaseBuffer(buffer);
 
 		/* Step to next item in the queue */
 		stack_next = stack->next;
@@ -642,8 +641,7 @@ gin_check_parent_keys_consistency(Relation rel,
 			prev_attnum = current_attnum;
 		}
 
-		LockBuffer(buffer, GIN_UNLOCK);
-		ReleaseBuffer(buffer);
+		UnlockReleaseBuffer(buffer);
 
 		/* Step to next item in the queue */
 		stack_next = stack->next;
diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c
index 86fe245cac5..f3996dc39fc 100644
--- a/contrib/pageinspect/rawpage.c
+++ b/contrib/pageinspect/rawpage.c
@@ -193,8 +193,7 @@ get_raw_page_internal(text *relname, ForkNumber forknum, BlockNumber blkno)
 
 	memcpy(raw_page_data, BufferGetPage(buf), BLCKSZ);
 
-	LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-	ReleaseBuffer(buf);
+	UnlockReleaseBuffer(buf);
 
 	relation_close(rel, AccessShareLock);
 
diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index ef723af1f19..a4716bd1b36 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -323,8 +323,7 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo)
 			indexStat.internal_pages++;
 
 		/* Unlock and release buffer */
-		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-		ReleaseBuffer(buffer);
+		UnlockReleaseBuffer(buffer);
 	}
 
 	relation_close(rel, AccessShareLock);
diff --git a/src/test/modules/test_aio/test_aio.c b/src/test/modules/test_aio/test_aio.c
index 2ae4a559fab..138e1259dfd 100644
--- a/src/test/modules/test_aio/test_aio.c
+++ b/src/test/modules/test_aio/test_aio.c
@@ -221,9 +221,7 @@ modify_rel_block(PG_FUNCTION_ARGS)
 	 */
 	memcpy(page, BufferGetPage(buf), BLCKSZ);
 
-	LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-
-	ReleaseBuffer(buf);
+	UnlockReleaseBuffer(buf);
 
 	/*
 	 * Don't want to have a buffer in-memory that's marked valid where the
@@ -496,8 +494,7 @@ invalidate_rel_block(PG_FUNCTION_ARGS)
 				else
 					FlushOneBuffer(buf);
 			}
-			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-			ReleaseBuffer(buf);
+			UnlockReleaseBuffer(buf);
 
 			if (BufferIsLocal(buf))
 				InvalidateLocalBuffer(GetLocalBufferDescriptor(-buf - 1), true);
-- 
2.53.0.1.gb2826b52eb

>From 80eee071308c2548cb0dfc3f2dbdf04bd4b151bf Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Tue, 3 Feb 2026 11:01:58 -0500
Subject: [PATCH v13 3/3] bufmgr: Make UnlockReleaseBuffer() more efficient

Now that the buffer content lock is implemented as part of BufferDesc.state,
releasing the lock and unpinning the buffer can be implemented as a single
atomic operation.

This improves workloads that have heavy contention on a small number of
buffers substantially, I e.g., see a ~20% improvement for pipelined readonly
pgbench on an older two socket machine.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/storage/buffer/bufmgr.c | 55 +++++++++++++++++++++++++++--
 1 file changed, 52 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 53b327200d7..9147e4c127b 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -5508,13 +5508,62 @@ ReleaseBuffer(Buffer buffer)
 /*
  * UnlockReleaseBuffer -- release the content lock and pin on a buffer
  *
- * This is just a shorthand for a common combination.
+ * This is just a, more efficient, shorthand for a common combination.
  */
 void
 UnlockReleaseBuffer(Buffer buffer)
 {
-	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-	ReleaseBuffer(buffer);
+	int			mode;
+	BufferDesc *buf;
+	PrivateRefCountEntry *ref;
+	uint64		sub;
+	uint64		lockstate;
+
+	Assert(BufferIsPinned(buffer));
+
+	if (BufferIsLocal(buffer))
+	{
+		UnpinLocalBuffer(buffer);
+		return;
+	}
+
+	ResourceOwnerForgetBuffer(CurrentResourceOwner, buffer);
+
+	buf = GetBufferDescriptor(buffer - 1);
+
+	mode = BufferLockDisownInternal(buffer, buf);
+
+	/* compute state modification for lock release */
+	sub = BufferLockReleaseSub(mode);
+
+	/* compute state modification for pin release */
+	ref = GetPrivateRefCountEntry(buffer, false);
+	Assert(ref != NULL);
+	Assert(ref->data.refcount > 0);
+	ref->data.refcount--;
+
+	/* no more backend local pins, reduce shared pin count */
+	if (likely(ref->data.refcount == 0))
+	{
+		sub |= BUF_REFCOUNT_ONE;
+		ForgetPrivateRefCountEntry(ref);
+	}
+
+	/* perform the lock and pin release in one atomic op */
+	lockstate = pg_atomic_sub_fetch_u64(&buf->state, sub);
+
+	/* wake up waiters for the lock */
+	BufferLockProcessRelease(buf, mode, lockstate);
+
+	/* wake up waiter for the pin release */
+	if (lockstate & BM_PIN_COUNT_WAITER)
+		WakePinCountWaiter(buf);
+
+	/*
+	 * Now okay to allow cancel/die interrupts again, were held when the lock
+	 * was acquired.
+	 */
+	RESUME_INTERRUPTS();
 }
 
 /*
-- 
2.53.0.1.gb2826b52eb

Reply via email to