Re: Deadlock detector fails to activate on a hot standby replica

Vitaly Davydov Tue, 09 Jun 2026 09:19:13 -0700

On 6/5/26 14:58, Fujii Masao wrote:
> I've updated the patch based on these comments.
> Attached is the latest version.
>
> I removed the TAP test from this patch for now. I'll consider adding
> a test for this separately later.


Thank you again for your assistance. I agree with the code changes you've
proposed. I've updated 031_recovery_conflict.pl to include the deadlock
test, instead of creating a separate TAP test. The attached patch contains
these changes in its second commit. I've kept the patch version as v3 since
the original fix remains unchanged; this update only adds new tests.

With best regards,
Vitaly

From a2287174d5c595752f7e5c447807af219c78582e Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <[email protected]>
Date: Wed, 20 May 2026 11:48:34 +0300
Subject: [PATCH 1/2] Fix deadlock detector activation in a recovery conflict

When the startup process in a deadlock with a backend, it sends the
signal to the backend to trigger the deadlock detector when
the deadlock timeout is elapsed (deadlock_timeout guc). Due to some
optimization in timeout.c, when spontaneous SIGALRM signals are
possible, which doesn't relate to any enabled timeout, the function
ResolveRecoveryConflictWithBufferPin can never send the signal to the
conflicting backend, becase the deadlock timeout will never be
triggered.

The patch fixes ResolveRecoveryConflictWithBufferPin by ignoring
spontaneous SIGALRM signals, that are possible in the current
implementation of timeout.c functionality.
---
 src/backend/storage/buffer/bufmgr.c | 23 ++++++++
 src/backend/storage/ipc/standby.c   | 81 ++++++++++++++++++++---------
 src/include/storage/bufmgr.h        |  1 +
 3 files changed, 80 insertions(+), 25 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index cc398db124d..7502b461786 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4749,6 +4749,29 @@ BufferGetLSNAtomic(Buffer buffer)
 #endif
 }
 
+/*
+ * BufferGetRefCount
+ *		Return the current reference counter for a shared buffer.
+ */
+uint32
+BufferGetRefCount(Buffer buffer)
+{
+	BufferDesc *bufHdr;
+	uint64		buf_state;
+	uint32		buf_refcount;
+
+	Assert(BufferIsValid(buffer));
+	Assert(!BufferIsLocal(buffer));
+
+	bufHdr = GetBufferDescriptor(buffer - 1);
+
+	buf_state = LockBufHdr(bufHdr);
+	buf_refcount = BUF_STATE_GET_REFCOUNT(buf_state);
+	UnlockBufHdr(bufHdr);
+
+	return buf_refcount;
+}
+
 /* ---------------------------------------------------------------------
  *		DropRelationBuffers
  *
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index de9092fdf5b..8cea201df5f 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -790,14 +790,21 @@ cleanup:
  * Deadlocks are extremely rare, and relatively expensive to check for,
  * so we don't do a deadlock check right away ... only if we have had to wait
  * at least deadlock_timeout.
+ *
+ * The current process should be the waiter process and should have
+ * published the waited buffer via SetStartupBufferPinWaitBufId().
  */
 void
 ResolveRecoveryConflictWithBufferPin(void)
 {
 	TimestampTz ltime;
+	int			bufid;
 
 	Assert(InHotStandby);
 
+	bufid = GetStartupBufferPinWaitBufId();
+	Assert(bufid >= 0);
+
 	ltime = GetStandbyLimitTime();
 
 	if (GetCurrentTimestamp() >= ltime && ltime != 0)
@@ -833,35 +840,59 @@ ResolveRecoveryConflictWithBufferPin(void)
 		enable_timeouts(timeouts, cnt);
 	}
 
-	/*
-	 * Wait to be signaled by UnpinBuffer() or for the wait to be interrupted
-	 * by one of the timeouts established above.
-	 *
-	 * We assume that only UnpinBuffer() and the timeout requests established
-	 * above can wake us up here. WakeupRecovery() called by walreceiver or
-	 * SIGHUP signal handler, etc cannot do that because it uses the different
-	 * latch from that ProcWaitForSignal() waits on.
-	 */
-	ProcWaitForSignal(WAIT_EVENT_BUFFER_CLEANUP);
-
-	if (got_standby_delay_timeout)
-		SendRecoveryConflictWithBufferPin(RECOVERY_CONFLICT_BUFFERPIN);
-	else if (got_standby_deadlock_timeout)
+	for (;;)
 	{
+		uint32		refcount;
+
 		/*
-		 * Send out a request for hot-standby backends to check themselves for
-		 * deadlocks.
+		 * Wait to be signaled by UnpinBuffer() or for the wait to be
+		 * interrupted by one of the timeouts established above.
 		 *
-		 * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will wait
-		 * to be signaled by UnpinBuffer() again and send a request for
-		 * deadlocks check if deadlock_timeout happens. This causes the
-		 * request to continue to be sent every deadlock_timeout until the
-		 * buffer is unpinned or ltime is reached. This would increase the
-		 * workload in the startup process and backends. In practice it may
-		 * not be so harmful because the period that the buffer is kept pinned
-		 * is basically no so long. But we should fix this?
+		 * ProcWaitForSignal() can also wake up for unrelated reasons, so
+		 * recheck that the buffer is pinned by the current waiter process
+		 * only (reference counter should be 1). Continue waiting, if no
+		 * registered timeout is fired or the buffer is still pinned by other
+		 * processes as well.
+		 */
+		ProcWaitForSignal(WAIT_EVENT_BUFFER_CLEANUP);
+
+		/*
+		 * Once the reference count is 1, the waiter process itself is the
+		 * only backend pinning the buffer at the moment. There is a chance to
+		 * lock the buffer exclusively.
 		 */
-		SendRecoveryConflictWithBufferPin(RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK);
+		refcount = BufferGetRefCount(bufid + 1);
+		Assert(refcount > 0);
+
+		if (refcount == 0)
+			elog(ERROR,
+				 "buffer refcount dropped to zero while waiting for cleanup lock");
+		if (refcount == 1)
+			break;
+
+		if (got_standby_delay_timeout)
+		{
+			SendRecoveryConflictWithBufferPin(RECOVERY_CONFLICT_BUFFERPIN);
+			break;
+		}
+		else if (got_standby_deadlock_timeout)
+		{
+			/*
+			 * Send out a request for hot-standby backends to check themselves
+			 * for deadlocks.
+			 *
+			 * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will
+			 * wait to be signaled by UnpinBuffer() again and send a request
+			 * for deadlocks check if deadlock_timeout happens. This causes
+			 * the request to continue to be sent every deadlock_timeout until
+			 * the buffer is unpinned or ltime is reached. This would increase
+			 * the workload in the startup process and backends. In practice
+			 * it may not be so harmful because the period that the buffer is
+			 * kept pinned is basically no so long. But we should fix this?
+			 */
+			SendRecoveryConflictWithBufferPin(RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK);
+			break;
+		}
 	}
 
 	/*
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 6837b35fc6d..c38b620a3a1 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -311,6 +311,7 @@ extern void DropDatabaseBuffers(Oid dbid);
 
 extern bool BufferIsPermanent(Buffer buffer);
 extern XLogRecPtr BufferGetLSNAtomic(Buffer buffer);
+extern uint32 BufferGetRefCount(Buffer buffer);
 extern void BufferGetTag(Buffer buffer, RelFileLocator *rlocator,
 						 ForkNumber *forknum, BlockNumber *blknum);
 
-- 
2.43.0

From dfdb8e0e37ecbce515bbff52bf742e47e0d063a6 Mon Sep 17 00:00:00 2001
From: Vitaly Davydov <[email protected]>
Date: Tue, 9 Jun 2026 18:23:28 +0300
Subject: [PATCH 2/2] Add new deadlock conflict test in
 031_recovery_conflict.pl

It checks that the deadlock detector is triggered on the hot standby's
client backend process when a deadlock with the startup process occurs
in the scenario of an interference of log_startup_process_interval and
deadlock_timeout timeouts.

An infinite deadlock occurs between the standby's startup process and a
backend process when handling an XLOG_PRUNE_PAGE record. The issue arises
because the startup process fails to trigger the deadlock detector in the
conflicting backend.
---
 src/test/recovery/t/031_recovery_conflict.pl | 100 +++++++++++++++++++
 1 file changed, 100 insertions(+)

diff --git a/src/test/recovery/t/031_recovery_conflict.pl b/src/test/recovery/t/031_recovery_conflict.pl
index 7a740f69806..cac78443d0c 100644
--- a/src/test/recovery/t/031_recovery_conflict.pl
+++ b/src/test/recovery/t/031_recovery_conflict.pl
@@ -307,6 +307,106 @@ $psql_standby->quit;
 $node_standby->stop();
 $node_primary->stop();
 
+## RECOVERY CONFLICT 6: Startup and backend infinite deadlock on hot standby
+#
+# An infinite deadlock occurs between the standby's startup process and a
+# backend process when handling an XLOG_PRUNE_PAGE record. The issue arises
+# because the startup process fails to trigger the deadlock detector in the
+# conflicting backend.
+#
+# The root cause is interference between two timeouts: deadlock_timeout and
+# log_startup_progress_interval. Due to this interference, the
+# deadlock_timeout event never fires for the startup process.
+#
+# This behavior is a side effect of the timeout optimization logic implemented
+# in timeout.c. While this optimization improves performance by coalescing
+# timer events, it introduces a requirement that all timeout handlers must
+# verify if their specific timeout was indeed triggered before proceeding. In
+# this case, the handler responsible for invoking the deadlock detector does
+# not correctly account for this interference, leading to the failure.
+#
+# Discussion: https://www.postgresql.org/message-id/flat/44c24dcf-5710-410f-b1b6-d10b315f3d51%40postgrespro.ru
+#
+
+$test_db = 'postgres';
+$table1 = "test_startup_backend_deadlock_table_1";
+$table2 = "test_startup_backend_deadlock_table_2";
+
+$node_primary->append_conf('postgresql.conf', qq[
+	max_standby_streaming_delay = -1
+	log_startup_progress_interval = 2000
+	deadlock_timeout = 3000
+	autovacuum = off
+]);
+
+$node_primary->start();
+
+$node_standby->append_conf('postgresql.conf', qq[
+	max_standby_streaming_delay = -1
+	log_startup_progress_interval = 2000
+	deadlock_timeout = 3000
+	autovacuum = off
+]);
+
+$node_standby->start();
+
+$log_location = -s $node_standby->logfile;
+
+$node_primary->safe_psql('postgres', qq[
+	CREATE TABLE $table1(a int, b int);
+	CREATE TABLE $table2(a int, b int);
+	INSERT INTO $table1 VALUES (1);
+]);
+
+# Generate a few dead rows, to later be cleaned up by vacuum. Then acquire a
+# lock on another relation in a prepared xact, so it's held continuously by
+# the startup process. The standby psql will block acquiring that lock while
+# holding a pin that vacuum needs, triggering the deadlock.
+$node_primary->safe_psql($test_db, qq[
+	BEGIN;
+	INSERT INTO $table1(a) SELECT generate_series(1, 100) i;
+	ROLLBACK;
+]);
+
+$node_primary->safe_psql($test_db, qq[
+	BEGIN;
+	LOCK TABLE $table2;
+	PREPARE TRANSACTION 'lock';
+	INSERT INTO $table1(a) VALUES (170);
+	SELECT txid_current();
+]);
+
+$node_primary->wait_for_catchup($node_standby, 'replay', $node_primary->lsn('write'));
+
+$psql_standby = $node_standby->background_psql($test_db, on_error_stop => 0);
+
+$psql_standby->query_until(qr/^1$/m, qq[
+	BEGIN;
+	-- hold pin
+	DECLARE test_recovery_conflict_cursor CURSOR FOR SELECT a FROM $table1;
+	FETCH FORWARD FROM test_recovery_conflict_cursor;
+	-- wait for lock held by prepared transaction
+	SELECT * FROM $table2;
+]);
+
+ok(1, "cursor holding conflicting pin, also waiting for lock, established");
+
+# VACUUM will prune away rows, causing a buffer pin conflict, while standby
+# psql is waiting on lock.
+$node_primary->safe_psql($test_db, qq[VACUUM $table1;]);
+
+# Wait and check that the deadlock detector was triggered and found a deadlock
+# in the backend process (not in startup process).
+note("Waiting for deadlock detector to launch...");
+check_conflict_log("User transaction caused buffer deadlock with recovery.");
+
+# clean up for next tests
+$node_primary->safe_psql($test_db, qq[ROLLBACK PREPARED 'lock';]);
+
+$psql_standby->quit();
+
+$node_standby->stop();
+$node_primary->stop();
 
 done_testing();
 
-- 
2.43.0

Re: Deadlock detector fails to activate on a hot standby replica

Reply via email to