I discussed this elsewhere [0], but I thought it deserved its own thread.

After setting wal_retrieve_retry_interval to 1ms in the tests, I noticed
that some of the archiving tests began consistently failing on Windows.  I
believe the problem is that WaitForWALToBecomeAvailable() depends on the
call to WaitLatch() for wal_retrieve_retry_interval to ensure that signals
are dispatched (i.e., pgwin32_dispatch_queued_signals()).  With a low retry
interval, WaitForWALToBecomeAvailable() might skip the call to WaitLatch(),
and the signals are never processed.

The attached patch fixes this by always calling WaitLatch(), even if
wal_retrieve_retry_interval milliseconds have already elapsed and the
timeout is 0.

[0] https://postgr.es/m/20221231235019.GA1223171%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From d08f01156ce1ce1290bad440da97852c2aa6c24b Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandboss...@gmail.com>
Date: Sat, 31 Dec 2022 15:16:54 -0800
Subject: [PATCH v1 1/1] ensure signals are dispatched in startup process on
 Windows

---
 src/backend/access/transam/xlogrecovery.c | 36 ++++++++++++++---------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index bc3c3eb3e7..b1a4c4ab6d 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3466,6 +3466,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		 */
 		if (lastSourceFailed)
 		{
+			long		wait_time;
+
 			/*
 			 * Don't allow any retry loops to occur during nonblocking
 			 * readahead.  Let the caller process everything that has been
@@ -3556,33 +3558,39 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					 * and retry from the archive, but if it hasn't been long
 					 * since last attempt, sleep wal_retrieve_retry_interval
 					 * milliseconds to avoid busy-waiting.
+					 *
+					 * NB: Even if it's already been wal_retrieve_rety_interval
+					 * milliseconds since the last attempt, we still call
+					 * WaitLatch() with the timeout set to 0 to make sure any
+					 * queued signals are dispatched on Windows builds.
 					 */
 					now = GetCurrentTimestamp();
 					if (!TimestampDifferenceExceeds(last_fail_time, now,
 													wal_retrieve_retry_interval))
 					{
-						long		wait_time;
-
 						wait_time = wal_retrieve_retry_interval -
 							TimestampDifferenceMilliseconds(last_fail_time, now);
 
 						elog(LOG, "waiting for WAL to become available at %X/%X",
 							 LSN_FORMAT_ARGS(RecPtr));
+					}
+					else
+						wait_time = 0;
 
-						/* Do background tasks that might benefit us later. */
-						KnownAssignedTransactionIdsIdleMaintenance();
+					/* Do background tasks that might benefit us later. */
+					KnownAssignedTransactionIdsIdleMaintenance();
 
-						(void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
-										 WL_LATCH_SET | WL_TIMEOUT |
-										 WL_EXIT_ON_PM_DEATH,
-										 wait_time,
-										 WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL);
-						ResetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
-						now = GetCurrentTimestamp();
+					(void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
+									 WL_LATCH_SET | WL_TIMEOUT |
+									 WL_EXIT_ON_PM_DEATH,
+									 wait_time,
+									 WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL);
+					ResetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
+					now = GetCurrentTimestamp();
+
+					/* Handle interrupt signals of startup process */
+					HandleStartupProcInterrupts();
 
-						/* Handle interrupt signals of startup process */
-						HandleStartupProcInterrupts();
-					}
 					last_fail_time = now;
 					currentSource = XLOG_FROM_ARCHIVE;
 					break;
-- 
2.25.1

Reply via email to