I discussed this elsewhere [0], but I thought it deserved its own thread. After setting wal_retrieve_retry_interval to 1ms in the tests, I noticed that some of the archiving tests began consistently failing on Windows. I believe the problem is that WaitForWALToBecomeAvailable() depends on the call to WaitLatch() for wal_retrieve_retry_interval to ensure that signals are dispatched (i.e., pgwin32_dispatch_queued_signals()). With a low retry interval, WaitForWALToBecomeAvailable() might skip the call to WaitLatch(), and the signals are never processed.
The attached patch fixes this by always calling WaitLatch(), even if wal_retrieve_retry_interval milliseconds have already elapsed and the timeout is 0. [0] https://postgr.es/m/20221231235019.GA1223171%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
>From d08f01156ce1ce1290bad440da97852c2aa6c24b Mon Sep 17 00:00:00 2001 From: Nathan Bossart <nathandboss...@gmail.com> Date: Sat, 31 Dec 2022 15:16:54 -0800 Subject: [PATCH v1 1/1] ensure signals are dispatched in startup process on Windows --- src/backend/access/transam/xlogrecovery.c | 36 ++++++++++++++--------- 1 file changed, 22 insertions(+), 14 deletions(-) diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index bc3c3eb3e7..b1a4c4ab6d 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -3466,6 +3466,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, */ if (lastSourceFailed) { + long wait_time; + /* * Don't allow any retry loops to occur during nonblocking * readahead. Let the caller process everything that has been @@ -3556,33 +3558,39 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, * and retry from the archive, but if it hasn't been long * since last attempt, sleep wal_retrieve_retry_interval * milliseconds to avoid busy-waiting. + * + * NB: Even if it's already been wal_retrieve_rety_interval + * milliseconds since the last attempt, we still call + * WaitLatch() with the timeout set to 0 to make sure any + * queued signals are dispatched on Windows builds. */ now = GetCurrentTimestamp(); if (!TimestampDifferenceExceeds(last_fail_time, now, wal_retrieve_retry_interval)) { - long wait_time; - wait_time = wal_retrieve_retry_interval - TimestampDifferenceMilliseconds(last_fail_time, now); elog(LOG, "waiting for WAL to become available at %X/%X", LSN_FORMAT_ARGS(RecPtr)); + } + else + wait_time = 0; - /* Do background tasks that might benefit us later. */ - KnownAssignedTransactionIdsIdleMaintenance(); + /* Do background tasks that might benefit us later. */ + KnownAssignedTransactionIdsIdleMaintenance(); - (void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch, - WL_LATCH_SET | WL_TIMEOUT | - WL_EXIT_ON_PM_DEATH, - wait_time, - WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL); - ResetLatch(&XLogRecoveryCtl->recoveryWakeupLatch); - now = GetCurrentTimestamp(); + (void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch, + WL_LATCH_SET | WL_TIMEOUT | + WL_EXIT_ON_PM_DEATH, + wait_time, + WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL); + ResetLatch(&XLogRecoveryCtl->recoveryWakeupLatch); + now = GetCurrentTimestamp(); + + /* Handle interrupt signals of startup process */ + HandleStartupProcInterrupts(); - /* Handle interrupt signals of startup process */ - HandleStartupProcInterrupts(); - } last_fail_time = now; currentSource = XLOG_FROM_ARCHIVE; break; -- 2.25.1