The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: tested, failed Spec compliant: not tested Documentation: not tested
Hi, I looked at v2, focusing on the latest question about log_recovery_conflict_waits behavior. The patch applies cleanly on current master at db5ed03217b9c238703df8b4b286115d6e940488. git diff --check reports no issues. I built both master and v2 locally with: ./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu make -s -j8 make -s install Both builds passed. I tried to check the case Zsolt described. In my local repro, the final cancellation still happens at about max_standby_streaming_delay in both master and v2. The visible difference is that v2 delays the first log_recovery_conflict_waits "recovery still waiting" message. I used a small primary/standby setup based on the buffer-pin conflict pattern from src/test/recovery/t/031_recovery_conflict.pl: log_recovery_conflict_waits = on deadlock_timeout = 100ms max_standby_streaming_delay = 5s The scenario is: 1. Create a small table. 2. Generate an aborted tuple on the primary and replay it on the standby. 3. Open a cursor on the standby and fetch one row, leaving the transaction idle with the cursor open. 4. Run VACUUM FREEZE on the primary. On current master, the standby log shows: recovery still waiting after 101.060 ms: recovery conflict on buffer pin terminating connection due to conflict with recovery recovery finished waiting after 5001.324 ms: recovery conflict on buffer pin With v2, the standby log shows: recovery still waiting after 5001.064 ms: recovery conflict on buffer pin terminating connection due to conflict with recovery recovery finished waiting after 5001.535 ms: recovery conflict on buffer pin So in this repro the early "still waiting" log at deadlock_timeout is lost, and the first waiting log is emitted only when the wait is already being resolved at about max_standby_streaming_delay. This seems to come from the new second ProcWaitForSignal() inside ResolveRecoveryConflictWithBufferPin(). After the deadlock timeout fires, v2 sends RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK and waits again internally. For a non-deadlock buffer pin holder, LockBufferForCleanup() does not get control back after deadlock_timeout, so it cannot emit the early recovery conflict log message. This looks similar to the reason ResolveRecoveryConflictWithLock() has the logging_conflict argument: if the conflict still needs to be logged, it returns after sending the deadlock-check signal, and only avoids the repeated wait on the later call after the conflict has been logged. Should ResolveRecoveryConflictWithBufferPin() preserve the same behavior, for example by skipping the new second wait until LockBufferForCleanup() has had a chance to emit the first log_recovery_conflict_waits message? I have not checked the repeated SIGUSR1 count with strace in my local macOS environment, and I have not reviewed the backpatching question. Regards, Ilmar Yunusov The new status of this patch is: Waiting on Author
