Hi, On Tue, Jan 6, 2026 at 7:54 PM Alexander Korotkov <[email protected]> wrote: > > On Tue, Jan 6, 2026 at 9:29 AM Xuneng Zhou <[email protected]> wrote: > > On Tue, Jan 6, 2026 at 1:43 PM Thomas Munro <[email protected]> wrote: > > > Could this be causing the recent flapping failures on CI/macOS in > > > recovery/031_recovery_conflict? I didn't have time to dig personally > > > but f30848cb looks relevant: > > > > > > Waiting for replication conn standby's replay_lsn to pass 0/03467F58 on > > > primary > > > error running SQL: 'psql:<stdin>:1: ERROR: canceling statement due to > > > conflict with recovery > > > DETAIL: User was or might have been using tablespace that must be > > > dropped.' > > > while running 'psql --no-psqlrc --no-align --tuples-only --quiet > > > --dbname port=25195 > > > host=/var/folders/g9/7rkt8rt1241bwwhd3_s8ndp40000gn/T/LqcCJnsueI > > > dbname='postgres' --file - --variable ON_ERROR_STOP=1' with sql 'WAIT > > > FOR LSN '0/03467F58' WITH (MODE 'standby_replay', timeout '180s', > > > no_throw);' at /Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm > > > line 2300. > > > > > > https://cirrus-ci.com/task/5771274900733952 > > > > > > The master branch in time-descending order, macOS tasks only: > > > > > > task_id | substring | status > > > ------------------+-----------+----------- > > > 6460882231754752 | c970bdc0 | FAILED > > > 5771274900733952 | 6ca8506e | FAILED > > > 6217757068361728 | 63ed3bc7 | FAILED > > > 5980650261446656 | ae283736 | FAILED > > > 6585898394976256 | 5f13999a | COMPLETED > > > 4527474786172928 | 7f9acc9b | COMPLETED > > > 4826100842364928 | e8d4e94a | COMPLETED > > > 4540563027918848 | b9ee5f2d | FAILED > > > 6358528648019968 | c5af141c | FAILED > > > 5998005284765696 | e212a0f8 | COMPLETED > > > 6488580526178304 | b85d5dc0 | FAILED > > > 5034091344560128 | 7dc95cc3 | ABORTED > > > 5688692477526016 | bb048e31 | COMPLETED > > > 5481187977723904 | d351063e | COMPLETED > > > 5101831568752640 | f30848cb | COMPLETED <-- the change > > > 6395317408497664 | 3f33b63d | COMPLETED > > > 6741325208354816 | 877ae5db | COMPLETED > > > 4594007789010944 | de746e0d | COMPLETED > > > 6497208998035456 | 461b8cc9 | COMPLETED > > > > Thanks for raising this issue. I think it is related to f30848cb after > > some analysis. I'll prepare a follow-up patch to fix it. > > Sorry, I've mistakenly referenced this report from commit [1]. I > thought it was related, but it appears to be not. [1] is related to > the report I've got from Ruikai Peng off-list. > > Regarding the present failure, could it happen before ExecWaitStmt() > calls PopActiveSnapshot() and InvalidateCatalogSnapshot()? If so, we > should do preliminary efforts to release these snapshots. > > 1. > https://git.postgresql.org/pg/commitdiff/bf308639bfcfa38541e24733e074184153a8ab7f >
I agree that moving PopActiveSnapshot() and InvalidateCatalogSnapshot() to the very beginning of ExecWaitStmt() appears to be a sensible optimization. However, in this particular failure scenario, it may not address the issue. For tablespace conflicts, recovery conflict resolution uses GetConflictingVirtualXIDs(InvalidTransactionId, InvalidOid), which returns all active backends, regardless of their snapshot state. As a result, even if all snapshots are released at the start of ExecWaitStmt(), the session would still be canceled during replay of DROP TABLESPACE. Given this, I am considering handling this conflict class explicitly: if the WAIT FOR statement is terminated and the error indicates a recovery conflict, we fall back to the existing polling-based approach. * Ask everybody to cancel their queries immediately so we can ensure no * temp files remain and we can remove the tablespace. Nuke the entire * site from orbit, it's the only way to be sure. * * XXX: We could work out the pids of active backends using this * tablespace by examining the temp filenames in the directory. We would * then convert the pids into VirtualXIDs before attempting to cancel * them. I am also wondering whether this optimization would be helpful. -- Best, Xuneng
v1-0001-Fix-wait_for_catchup-failure-when-standby-session.patch
Description: Binary data
