Hi, On 2026-06-22 16:14:08 +0800, Xuneng Zhou wrote: > This looks like a real lost-wakeup race in LockBufferForCleanup().
I don't think any wakeup is lost. The problem is that we're waiting for a wakeup when it is not guaranteed that one will arrive. > The relevant sequence in failed test 048 is: > > - Session B opens a cursor and fetches one heap tuple, leaving a heap > buffer pinned. > - Session A starts VACUUM (FREEZE). > - VACUUM reaches that page and waits in LockBufferForCleanup(). > - Session B later advances/closes the cursor, releasing the pin. > - VACUUM is expected to wake up and finish. > > In the failure, the tap test had already passed its three sql > assertions, but then it timed out waiting for VACUUM completion: > > ok 3 - Cursor query returned 7 from second fetch > poll_query_until timed out: > SELECT vacuum_count > 0 ... > last actual query output: f > Looks like your test exited with 29 just after 3. > > The diagnostic log shows the actual race: > > LOG: !!!LockBufferForCleanup| wakeup lost > CONTEXT: while scanning block 90 of relation "public.vac_horizon_floor_table" > LOG: !!!LockBufferForCleanup]| before ProcWaitForSignal > CONTEXT: while scanning block 90 of relation "public.vac_horizon_floor_table" I don't think this is meaningful output, as written upthread. > There was no later "after ProcWaitForSignal" before the shutdown, > which implies that VACUUM published itself as a waiter, entered > ProcWaitForSignal(), and not been woken up later. > The direct regression appears to be 5310fac6e0f. It allows this interleaving: > > W: LockBufferForCleanup() holds buffer header lock > W: observes refcount > 1 > P: releases the last competing pin with atomic fetch_sub > P: old state does not contain BM_PIN_COUNT_WAITER, so no wakeup > W: publishes BM_PIN_COUNT_WAITER > W: sleeps in ProcWaitForSignal() > > At this point the condition W wanted is already true: refcount is 1, > meaning only W's own pin remains. So W could sleep indefinitely as no > future unpin to wake it. However, this indeed seems to be the problem. > We can fix this with the state returned by UnlockBufHdrExt() when > publishing BM_PIN_COUNT_WAITER. If the wait refcount is 1, do not > enter the wait path. Instead, fall through to the existing waiter-bit > cleanup and retry the loop to acquire the cleanup lock normally. The > reproducer test passed after applying the patch. Why are we retrying, when we already achieved everything we wanted? Seems we should just unset BM_PIN_COUNT_WAITER / wait_backend_pgprocno. I think the easiest way to do this would be to not unlock the buffer header lock, but just atomically-or-in BM_PIN_COUNT_WAITER and then recheck if we already are done, while still holding the header lock. Greetings, Andres Freund
