On Mon, Jun 22, 2026 at 4:14 AM Xuneng Zhou <[email protected]> wrote: > > The direct regression appears to be 5310fac6e0f. It allows this interleaving: > > W: LockBufferForCleanup() holds buffer header lock > W: observes refcount > 1 > P: releases the last competing pin with atomic fetch_sub > P: old state does not contain BM_PIN_COUNT_WAITER, so no wakeup > W: publishes BM_PIN_COUNT_WAITER > W: sleeps in ProcWaitForSignal() > > At this point the condition W wanted is already true: refcount is 1, > meaning only W's own pin remains. So W could sleep indefinitely as no > future unpin to wake it. > > We can fix this with the state returned by UnlockBufHdrExt() when > publishing BM_PIN_COUNT_WAITER. If the wait refcount is 1, do not > enter the wait path. Instead, fall through to the existing waiter-bit > cleanup and retry the loop to acquire the cleanup lock normally. The > reproducer test passed after applying the patch.
Thanks for investigating! Does the reproducer pass prior to 5310fac6e0f? - Melanie
