Hi Alexander, On Thu, Jun 18, 2026 at 4:00 AM Alexander Lakhin <[email protected]> wrote: > > Hello hackers, > > A recent failure produced by skink [1] apparently materializes an > interesting race condition: > 398/398 recovery - postgresql:recovery/048_vacuum_horizon_floor TIMEOUT > 15000.51s killed by signal 15 SIGTERM > > regress_log_048_vacuum_horizon_floor contains: > [05:01:16.990](0.000s) ok 3 - Cursor query returned 7 from second fetch. > Expected value 7. > [09:08:43.280](14846.290s) # die: death by signal at > /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm > line 181. > [09:08:43.281](0.001s) 1..3 > death by signal at > /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm > line 181. > > 048_vacuum_horizon_floor_primary.log contains: > 2026-06-14 05:00:51.700 CEST [655012][client backend][1/9:0] LOG: statement: > VACUUM (VERBOSE, FREEZE, PARALLEL 0) > vac_horizon_floor_table; > 2026-06-14 05:00:51.887 CEST [655012][client backend][1/10:0] INFO: > aggressively vacuuming > "test_db.public.vac_horizon_floor_table" > 2026-06-14 05:00:53.494 CEST [669376][client backend][5/2:0] LOG: statement: > SELECT count(*) >= 1 FROM pg_stat_activity > WHERE pid = 655012 > AND wait_event = 'BufferCleanup'; > 2026-06-14 05:01:07.789 CEST [676703][walsender][26/0:0] LOG: received > replication command: IDENTIFY_SYSTEM > 2026-06-14 05:01:07.789 CEST [676703][walsender][26/0:0] STATEMENT: > IDENTIFY_SYSTEM > 2026-06-14 05:01:07.992 CEST [676703][walsender][26/0:0] LOG: received > replication command: START_REPLICATION 0/03000000 > TIMELINE 1 > 2026-06-14 05:01:07.992 CEST [676703][walsender][26/0:0] STATEMENT: > START_REPLICATION 0/03000000 TIMELINE 1 > 2026-06-14 05:01:11.486 CEST [678834][client backend][6/2:0] LOG: statement: > SELECT EXISTS (SELECT * FROM > pg_stat_replication); > 2026-06-14 05:01:15.472 CEST [656147][client backend][2/2:0] LOG: statement: > FETCH vac_horizon_floor_cursor1 > ; > 2026-06-14 05:01:19.594 CEST [681642][client backend][7/2:0] LOG: statement: > SELECT index_vacuum_count > 0 > FROM pg_stat_progress_vacuum > WHERE datname='test_db' AND relid::regclass = > 'vac_horizon_floor_table'::regclass; > 2026-06-14 05:01:24.183 CEST [656147][client backend][2/2:0] LOG: statement: > COMMIT; > 2026-06-14 05:01:26.473 CEST [684424][client backend][8/2:0] LOG: statement: > SELECT vacuum_count > 0 > FROM pg_stat_all_tables WHERE relname = 'vac_horizon_floor_table'; > ... > 2026-06-14 09:08:40.354 CEST [3155195][client backend][7/2090:0] LOG: > statement: SELECT vacuum_count > 0 > FROM pg_stat_all_tables WHERE relname = 'vac_horizon_floor_table'; > 2026-06-14 09:08:42.036 CEST [3156594][client backend][8/2090:0] LOG: > statement: SELECT vacuum_count > 0 > FROM pg_stat_all_tables WHERE relname = 'vac_horizon_floor_table'; > 2026-06-14 09:08:43.285 CEST [636342][postmaster][:0] LOG: received > immediate shutdown request > > > The next run of the test [2] took about 2 minutes: > 53/398 recovery - postgresql:recovery/048_vacuum_horizon_floor OK > 120.90s 3 subtests passed > > There was a similar failure in the past year [3]: > 38/267 postgresql:recovery / recovery/043_vacuum_horizon_floor > ERROR 335.83s exit status 29 > > regress_log_043_vacuum_horizon_floor > [16:59:38.183](0.003s) ok 3 - Cursor query returned 1 from second fetch. > Expected value 1. > IPC::Run: timeout on timer #2 at /usr/share/perl5/IPC/Run.pm line 3025. > # Postmaster PID for node "primary" is 2872485 > > But that time it looked more like an ordinary timeout. > > I've managed to reproduce such a failure locally, on a slowed down VM, > under Valgrind: > ITERATION 23 > Sun Jun 14 01:02:30 PM UTC 2026 > # +++ tap check in src/test/recovery +++ > t/048_vacuum_horizon_floor.pl .. ok > All tests successful. > Files=1, Tests=3, 131 wallclock secs ( 0.01 usr 0.00 sys + 46.21 cusr 3.12 > csys = 49.34 CPU) > Result: PASS > ITERATION 24 > Sun Jun 14 01:04:41 PM UTC 2026 > # +++ tap check in src/test/recovery +++ > t/048_vacuum_horizon_floor.pl .. 3/? # die: IPC::Run: timeout on timer #2 at > /usr/share/perl5/IPC/Run.pm line 3361. > # Looks like your test exited with 29 just after 3. > t/048_vacuum_horizon_floor.pl .. Dubious, test returned 29 (wstat 7424, > 0x1d00) > > With this diagnostic addition: > --- a/src/backend/storage/buffer/bufmgr.c > +++ b/src/backend/storage/buffer/bufmgr.c > @@ -6743,9 +6743,12 @@ LockBufferForCleanup(Buffer buffer) > } > bufHdr->wait_backend_pgprocno = MyProcNumber; > PinCountWaitBuf = bufHdr; > - UnlockBufHdrExt(bufHdr, buf_state, > - BM_PIN_COUNT_WAITER, 0, > - 0); > +for (volatile int i = 0; i < 10000000; i++); > + buf_state = UnlockBufHdrExt(bufHdr, buf_state, > + BM_PIN_COUNT_WAITER, 0, > + 0); > + if (BUF_STATE_GET_REFCOUNT(buf_state) == 1) > + elog(LOG, "!!!LockBufferForCleanup| wakeup lost"); > LockBuffer(buffer, BUFFER_LOCK_UNLOCK); > > /* Wait to be signaled by UnpinBuffer() */ > @@ -6794,7 +6797,11 @@ LockBufferForCleanup(Buffer buffer) > SetStartupBufferPinWaitBufId(-1); > } > else > +{ > +elog(LOG, "!!!LockBufferForCleanup]| before ProcWaitForSignal"); > ProcWaitForSignal(WAIT_EVENT_BUFFER_CLEANUP); > +elog(LOG, "!!!LockBufferForCleanup]| after ProcWaitForSignal"); > +} > > /* > * Remove flag marking us as waiter. Normally this will not > be set > > (The delay before UnlockBufHdrExt() helps reproducing for me --- the test > fails within 10 iterations.) > > I could see: > # +++ tap check in src/test/recovery +++ > t/048_vacuum_horizon_floor.pl .. 3/? # die: IPC::Run: timeout on timer #2 at > /usr/share/perl5/IPC/Run.pm line 3156. > # Looks like your test exited with 29 just after 3. > t/048_vacuum_horizon_floor.pl .. Dubious, test returned 29 (wstat 7424, > 0x1d00) > All 3 subtests passed > > Test Summary Report > ------------------- > t/048_vacuum_horizon_floor.pl (Wstat: 7424 (exited 29) Tests: 3 Failed: 0) > Non-zero exit status: 29 > Files=1, Tests=3, 318 wallclock secs ( 0.01 usr 0.00 sys + 93.12 cusr 4.73 > csys = 97.86 CPU) > > 048_vacuum_horizon_floor_primary.log contains: > 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] > LOG: !!!LockBufferForCleanup| wakeup lost > 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] > CONTEXT: while scanning block 26 of relation > "public.vac_horizon_floor_table" > 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] > STATEMENT: VACUUM (VERBOSE, FREEZE, PARALLEL > 0) vac_horizon_floor_table; > 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] > LOG: !!!LockBufferForCleanup]| before > ProcWaitForSignal > 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] > CONTEXT: while scanning block 26 of relation > "public.vac_horizon_floor_table" > 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] > STATEMENT: VACUUM (VERBOSE, FREEZE, PARALLEL > 0) vac_horizon_floor_table; > ... > 2026-06-17 19:04:36.835 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] > LOG: !!!LockBufferForCleanup]| after > ProcWaitForSignal > 2026-06-17 19:04:36.835 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] > CONTEXT: while scanning block 26 of relation > "public.vac_horizon_floor_table" > 2026-06-17 19:04:36.835 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] > STATEMENT: VACUUM (VERBOSE, FREEZE, PARALLEL > 0) vac_horizon_floor_table; > > > [1] > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2026-06-14%2001%3A21%3A38 > [2] > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2026-06-14%2007%3A22%3A09 > [3] > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-05%2015%3A43%3A53
Thanks for reporting this issue. I can reproduce it with the delay. This looks like a real lost-wakeup race in LockBufferForCleanup(). The relevant sequence in failed test 048 is: - Session B opens a cursor and fetches one heap tuple, leaving a heap buffer pinned. - Session A starts VACUUM (FREEZE). - VACUUM reaches that page and waits in LockBufferForCleanup(). - Session B later advances/closes the cursor, releasing the pin. - VACUUM is expected to wake up and finish. In the failure, the tap test had already passed its three sql assertions, but then it timed out waiting for VACUUM completion: ok 3 - Cursor query returned 7 from second fetch poll_query_until timed out: SELECT vacuum_count > 0 ... last actual query output: f Looks like your test exited with 29 just after 3. The diagnostic log shows the actual race: LOG: !!!LockBufferForCleanup| wakeup lost CONTEXT: while scanning block 90 of relation "public.vac_horizon_floor_table" LOG: !!!LockBufferForCleanup]| before ProcWaitForSignal CONTEXT: while scanning block 90 of relation "public.vac_horizon_floor_table" There was no later "after ProcWaitForSignal" before the shutdown, which implies that VACUUM published itself as a waiter, entered ProcWaitForSignal(), and not been woken up later. The direct regression appears to be 5310fac6e0f. It allows this interleaving: W: LockBufferForCleanup() holds buffer header lock W: observes refcount > 1 P: releases the last competing pin with atomic fetch_sub P: old state does not contain BM_PIN_COUNT_WAITER, so no wakeup W: publishes BM_PIN_COUNT_WAITER W: sleeps in ProcWaitForSignal() At this point the condition W wanted is already true: refcount is 1, meaning only W's own pin remains. So W could sleep indefinitely as no future unpin to wake it. We can fix this with the state returned by UnlockBufHdrExt() when publishing BM_PIN_COUNT_WAITER. If the wait refcount is 1, do not enter the wait path. Instead, fall through to the existing waiter-bit cleanup and retry the loop to acquire the cleanup lock normally. The reproducer test passed after applying the patch. -- Regards, Xuneng Zhou HighGo Software Co., Ltd.
v1-0001-Fix-lost-wakeup-in-LockBufferForCleanup.patch
Description: Binary data
