I wrote: >> It's been kind of hidden by other buildfarm noise, but >> 031_recovery_conflict.pl is not as stable as it should be [1][2][3][4].
> After digging around in the code, I think this is almost certainly > some manifestation of the previously-complained-of problem [1] that > RecoveryConflictInterrupt is not safe to call in a signal handler, > leading the conflicting backend to sometimes decide that it's not > the problem. I happened to notice that while skink continues to fail off-and-on in 031_recovery_conflict.pl, the symptoms have changed! What we're getting now typically looks like [1]: [10:45:11.475](0.023s) ok 14 - startup deadlock: lock acquisition is waiting Waiting for replication conn standby's replay_lsn to pass 0/33FB8B0 on primary done timed out waiting for match: (?^:User transaction caused buffer deadlock with recovery.) at t/031_recovery_conflict.pl line 367. where absolutely nothing happens in the standby log, until we time out: 2022-07-24 10:45:11.452 UTC [1468367][client backend][2/4:0] LOG: statement: SELECT * FROM test_recovery_conflict_table2; 2022-07-24 10:45:11.472 UTC [1468547][client backend][3/2:0] LOG: statement: SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted; 2022-07-24 10:48:15.860 UTC [1468362][walreceiver][:0] FATAL: could not receive data from WAL stream: server closed the connection unexpectedly So this is not a case of RecoveryConflictInterrupt doing the wrong thing: the startup process hasn't detected the buffer conflict in the first place. Don't know what to make of that, but I vaguely suspect a test timing problem. gull has shown this once as well, although at a different step in the script [2]. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-07-24%2007%3A00%3A29 [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gull&dt=2022-07-23%2009%3A34%3A54