Hi Alexander,

On Thu, Jun 18, 2026 at 4:00 AM Alexander Lakhin <[email protected]> wrote:
>
> Hello hackers,
>
> A recent failure produced by skink [1] apparently materializes an
> interesting race condition:
> 398/398 recovery - postgresql:recovery/048_vacuum_horizon_floor TIMEOUT       
>  15000.51s   killed by signal 15 SIGTERM
>
> regress_log_048_vacuum_horizon_floor contains:
> [05:01:16.990](0.000s) ok 3 - Cursor query returned 7 from second fetch. 
> Expected value 7.
> [09:08:43.280](14846.290s) # die: death by signal at
> /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm
>  line 181.
> [09:08:43.281](0.001s) 1..3
> death by signal at 
> /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm
>  line 181.
>
> 048_vacuum_horizon_floor_primary.log contains:
> 2026-06-14 05:00:51.700 CEST [655012][client backend][1/9:0] LOG: statement: 
> VACUUM (VERBOSE, FREEZE, PARALLEL 0)
> vac_horizon_floor_table;
> 2026-06-14 05:00:51.887 CEST [655012][client backend][1/10:0] INFO: 
> aggressively vacuuming
> "test_db.public.vac_horizon_floor_table"
> 2026-06-14 05:00:53.494 CEST [669376][client backend][5/2:0] LOG: statement: 
> SELECT count(*) >= 1 FROM pg_stat_activity
>              WHERE pid = 655012
>              AND wait_event = 'BufferCleanup';
> 2026-06-14 05:01:07.789 CEST [676703][walsender][26/0:0] LOG: received 
> replication command: IDENTIFY_SYSTEM
> 2026-06-14 05:01:07.789 CEST [676703][walsender][26/0:0] STATEMENT: 
> IDENTIFY_SYSTEM
> 2026-06-14 05:01:07.992 CEST [676703][walsender][26/0:0] LOG: received 
> replication command: START_REPLICATION 0/03000000
> TIMELINE 1
> 2026-06-14 05:01:07.992 CEST [676703][walsender][26/0:0] STATEMENT: 
> START_REPLICATION 0/03000000 TIMELINE 1
> 2026-06-14 05:01:11.486 CEST [678834][client backend][6/2:0] LOG: statement: 
> SELECT EXISTS (SELECT * FROM
> pg_stat_replication);
> 2026-06-14 05:01:15.472 CEST [656147][client backend][2/2:0] LOG: statement: 
> FETCH vac_horizon_floor_cursor1
>      ;
> 2026-06-14 05:01:19.594 CEST [681642][client backend][7/2:0] LOG: statement: 
> SELECT index_vacuum_count > 0
>          FROM pg_stat_progress_vacuum
>          WHERE datname='test_db' AND relid::regclass = 
> 'vac_horizon_floor_table'::regclass;
> 2026-06-14 05:01:24.183 CEST [656147][client backend][2/2:0] LOG: statement: 
> COMMIT;
> 2026-06-14 05:01:26.473 CEST [684424][client backend][8/2:0] LOG: statement: 
> SELECT vacuum_count > 0
>          FROM pg_stat_all_tables WHERE relname = 'vac_horizon_floor_table';
> ...
> 2026-06-14 09:08:40.354 CEST [3155195][client backend][7/2090:0] LOG:  
> statement: SELECT vacuum_count > 0
>          FROM pg_stat_all_tables WHERE relname = 'vac_horizon_floor_table';
> 2026-06-14 09:08:42.036 CEST [3156594][client backend][8/2090:0] LOG:  
> statement: SELECT vacuum_count > 0
>          FROM pg_stat_all_tables WHERE relname = 'vac_horizon_floor_table';
> 2026-06-14 09:08:43.285 CEST [636342][postmaster][:0] LOG:  received 
> immediate shutdown request
>
>
> The next run of the test [2] took about 2 minutes:
>   53/398 recovery - postgresql:recovery/048_vacuum_horizon_floor OK           
>     120.90s   3 subtests passed
>
> There was a similar failure in the past year [3]:
>   38/267 postgresql:recovery / recovery/043_vacuum_horizon_floor              
>      ERROR 335.83s   exit status 29
>
> regress_log_043_vacuum_horizon_floor
> [16:59:38.183](0.003s) ok 3 - Cursor query returned 1 from second fetch. 
> Expected value 1.
> IPC::Run: timeout on timer #2 at /usr/share/perl5/IPC/Run.pm line 3025.
> # Postmaster PID for node "primary" is 2872485
>
> But that time it looked more like an ordinary timeout.
>
> I've managed to reproduce such a failure locally, on a slowed down VM,
> under Valgrind:
> ITERATION 23
> Sun Jun 14 01:02:30 PM UTC 2026
> # +++ tap check in src/test/recovery +++
> t/048_vacuum_horizon_floor.pl .. ok
> All tests successful.
> Files=1, Tests=3, 131 wallclock secs ( 0.01 usr  0.00 sys + 46.21 cusr  3.12 
> csys = 49.34 CPU)
> Result: PASS
> ITERATION 24
> Sun Jun 14 01:04:41 PM UTC 2026
> # +++ tap check in src/test/recovery +++
> t/048_vacuum_horizon_floor.pl .. 3/? # die: IPC::Run: timeout on timer #2 at 
> /usr/share/perl5/IPC/Run.pm line 3361.
> # Looks like your test exited with 29 just after 3.
> t/048_vacuum_horizon_floor.pl .. Dubious, test returned 29 (wstat 7424, 
> 0x1d00)
>
> With this diagnostic addition:
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -6743,9 +6743,12 @@ LockBufferForCleanup(Buffer buffer)
>                  }
>                  bufHdr->wait_backend_pgprocno = MyProcNumber;
>                  PinCountWaitBuf = bufHdr;
> -               UnlockBufHdrExt(bufHdr, buf_state,
> -                                               BM_PIN_COUNT_WAITER, 0,
> -                                               0);
> +for (volatile int i = 0; i < 10000000; i++);
> +               buf_state = UnlockBufHdrExt(bufHdr, buf_state,
> + BM_PIN_COUNT_WAITER, 0,
> +                                                               0);
> +               if (BUF_STATE_GET_REFCOUNT(buf_state) == 1)
> +                       elog(LOG, "!!!LockBufferForCleanup| wakeup lost");
>                  LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
>
>                  /* Wait to be signaled by UnpinBuffer() */
> @@ -6794,7 +6797,11 @@ LockBufferForCleanup(Buffer buffer)
>                          SetStartupBufferPinWaitBufId(-1);
>                  }
>                  else
> +{
> +elog(LOG, "!!!LockBufferForCleanup]| before ProcWaitForSignal");
> ProcWaitForSignal(WAIT_EVENT_BUFFER_CLEANUP);
> +elog(LOG, "!!!LockBufferForCleanup]| after ProcWaitForSignal");
> +}
>
>                  /*
>                   * Remove flag marking us as waiter. Normally this will not 
> be set
>
> (The delay before UnlockBufHdrExt() helps reproducing for me --- the test
> fails within 10 iterations.)
>
> I could see:
> # +++ tap check in src/test/recovery +++
> t/048_vacuum_horizon_floor.pl .. 3/? # die: IPC::Run: timeout on timer #2 at 
> /usr/share/perl5/IPC/Run.pm line 3156.
> # Looks like your test exited with 29 just after 3.
> t/048_vacuum_horizon_floor.pl .. Dubious, test returned 29 (wstat 7424, 
> 0x1d00)
> All 3 subtests passed
>
> Test Summary Report
> -------------------
> t/048_vacuum_horizon_floor.pl (Wstat: 7424 (exited 29) Tests: 3 Failed: 0)
>    Non-zero exit status: 29
> Files=1, Tests=3, 318 wallclock secs ( 0.01 usr  0.00 sys + 93.12 cusr  4.73 
> csys = 97.86 CPU)
>
> 048_vacuum_horizon_floor_primary.log contains:
> 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] 
> LOG:  !!!LockBufferForCleanup| wakeup lost
> 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] 
> CONTEXT:  while scanning block 26 of relation
> "public.vac_horizon_floor_table"
> 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] 
> STATEMENT:  VACUUM (VERBOSE, FREEZE, PARALLEL
> 0) vac_horizon_floor_table;
> 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] 
> LOG:  !!!LockBufferForCleanup]| before
> ProcWaitForSignal
> 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] 
> CONTEXT:  while scanning block 26 of relation
> "public.vac_horizon_floor_table"
> 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] 
> STATEMENT:  VACUUM (VERBOSE, FREEZE, PARALLEL
> 0) vac_horizon_floor_table;
> ...
> 2026-06-17 19:04:36.835 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] 
> LOG:  !!!LockBufferForCleanup]| after
> ProcWaitForSignal
> 2026-06-17 19:04:36.835 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] 
> CONTEXT:  while scanning block 26 of relation
> "public.vac_horizon_floor_table"
> 2026-06-17 19:04:36.835 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] 
> STATEMENT:  VACUUM (VERBOSE, FREEZE, PARALLEL
> 0) vac_horizon_floor_table;
>
>
> [1] 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2026-06-14%2001%3A21%3A38
> [2] 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2026-06-14%2007%3A22%3A09
> [3] 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-05%2015%3A43%3A53

Thanks for reporting this issue. I can reproduce it with the delay.
This looks like a real lost-wakeup race in LockBufferForCleanup().

The relevant sequence in failed test 048 is:

- Session B opens a cursor and fetches one heap tuple, leaving a heap
buffer pinned.
- Session A starts VACUUM (FREEZE).
- VACUUM reaches that page and waits in LockBufferForCleanup().
- Session B later advances/closes the cursor, releasing the pin.
- VACUUM is expected to wake up and finish.

In the failure, the tap test had already passed its three sql
assertions, but then it timed out waiting for VACUUM completion:

ok 3 - Cursor query returned 7 from second fetch
poll_query_until timed out:
  SELECT vacuum_count > 0 ...
last actual query output: f
Looks like your test exited with 29 just after 3.

The diagnostic log shows the actual race:

LOG: !!!LockBufferForCleanup| wakeup lost
CONTEXT: while scanning block 90 of relation "public.vac_horizon_floor_table"

LOG: !!!LockBufferForCleanup]| before ProcWaitForSignal
CONTEXT: while scanning block 90 of relation "public.vac_horizon_floor_table"

There was no later "after ProcWaitForSignal" before the shutdown,
which implies that VACUUM published itself as a waiter, entered
ProcWaitForSignal(), and not been woken up later.

The direct regression appears to be 5310fac6e0f. It allows this interleaving:

W: LockBufferForCleanup() holds buffer header lock
W: observes refcount > 1
P: releases the last competing pin with atomic fetch_sub
P: old state does not contain BM_PIN_COUNT_WAITER, so no wakeup
W: publishes BM_PIN_COUNT_WAITER
W: sleeps in ProcWaitForSignal()

At this point the condition W wanted is already true: refcount is 1,
meaning only W's own pin remains. So W could sleep indefinitely as no
future unpin to wake it.

We can fix this with the state returned by UnlockBufHdrExt() when
publishing BM_PIN_COUNT_WAITER. If the wait refcount is 1, do not
enter the wait path. Instead, fall through to the existing waiter-bit
cleanup and retry the loop to acquire the cleanup lock normally. The
reproducer test passed after applying the patch.

-- 
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.

Attachment: v1-0001-Fix-lost-wakeup-in-LockBufferForCleanup.patch
Description: Binary data

Reply via email to