Hi Andres,

On Tue, Jun 23, 2026 at 4:04 AM Andres Freund <[email protected]> wrote:
>
> Hi,
>
> On 2026-06-17 23:00:00 +0300, Alexander Lakhin wrote:
> > Hello hackers,
> > With this diagnostic addition:
> > --- a/src/backend/storage/buffer/bufmgr.c
> > +++ b/src/backend/storage/buffer/bufmgr.c
> > @@ -6743,9 +6743,12 @@ LockBufferForCleanup(Buffer buffer)
> >                 }
> >                 bufHdr->wait_backend_pgprocno = MyProcNumber;
> >                 PinCountWaitBuf = bufHdr;
> > -               UnlockBufHdrExt(bufHdr, buf_state,
> > -                                               BM_PIN_COUNT_WAITER, 0,
> > -                                               0);
> > +for (volatile int i = 0; i < 10000000; i++);
> > +               buf_state = UnlockBufHdrExt(bufHdr, buf_state,
> > + BM_PIN_COUNT_WAITER, 0,
> > +                                                               0);
> > +               if (BUF_STATE_GET_REFCOUNT(buf_state) == 1)
> > +                       elog(LOG, "!!!LockBufferForCleanup| wakeup lost");
> >                 LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
> >
>
> I don't think seeing BUF_STATE_GET_REFCOUNT(buf_state) == 1 guarantees that a
> wakeup was lost in any sort of way. Nothing prevents another backend from
> acquiring a pin on the buffer while we are waiting for a pincount, so
> occasionally seeing BUF_STATE_GET_REFCOUNT() == 1 here is to be expected.

Yeah, the log is misleading as observing
BUF_STATE_GET_REFCOUNT(buf_state) == 1 could be temporary, not
necessarily leading to lost wake-up.

A better wording seems to be:

elog(LOG, "buffer refcount is 1 after publishing cleanup-lock waiter");


> You also can't expect repeatedly doing
> UnlockBufHdrExt(set_bits=BM_PIN_COUNT_WAITER) to really make sense, because
> BM_PIN_COUNT_WAITER only works if it was set *before* another backend releases
> its pin, because otherwise the other backend obviously won't notice that the
> flag was set if it only was set after the pg_atomic_fetch_sub_u64() in
> UnpinBufferNoOwner().

This also makes sense to me. However the reproducer seems not to
repeatedly do UnlockBufHdrExt(set_bits=BM_PIN_COUNT_WAITER).  It uses
a for loop as a delay before setting the BM_PIN_COUNT_WAITER, which
seems ok to me. Am I missing sth here?


>
> I don't think the problem is that we are loosing a signal, the problem is that
> we need to recheck if BUF_STATE_GET_REFCOUNT(buf_state) == 1 after setting
> BM_PIN_COUNT_WAITER.  I think this is a bug in (depending on your perspective)
> either
>
> commit 5310fac6e0f
> Author: Andres Freund <[email protected]>
> Date:   2025-11-06 16:43:16 -0500
>
>     bufmgr: Use atomic sub for unpinning buffers
>
> or
>
> commit c75ebc657ffce8dab76471da31aafb79fbe3fda2
> Author: Andres Freund <[email protected]>
> Date:   2025-11-06 16:42:10 -0500
>
>     bufmgr: Allow some buffer state modifications while holding header lock
>
>
> Because before those commits, the buffer could not be unpinned while we held
> the buffer header lock. With them, this race opens up.

I agree with this as the root cause of the issue. Thanks for your analysis!

-- 
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.


Reply via email to