Hi Andres, On Tue, Jun 23, 2026 at 4:04 AM Andres Freund <[email protected]> wrote: > > Hi, > > On 2026-06-17 23:00:00 +0300, Alexander Lakhin wrote: > > Hello hackers, > > With this diagnostic addition: > > --- a/src/backend/storage/buffer/bufmgr.c > > +++ b/src/backend/storage/buffer/bufmgr.c > > @@ -6743,9 +6743,12 @@ LockBufferForCleanup(Buffer buffer) > > } > > bufHdr->wait_backend_pgprocno = MyProcNumber; > > PinCountWaitBuf = bufHdr; > > - UnlockBufHdrExt(bufHdr, buf_state, > > - BM_PIN_COUNT_WAITER, 0, > > - 0); > > +for (volatile int i = 0; i < 10000000; i++); > > + buf_state = UnlockBufHdrExt(bufHdr, buf_state, > > + BM_PIN_COUNT_WAITER, 0, > > + 0); > > + if (BUF_STATE_GET_REFCOUNT(buf_state) == 1) > > + elog(LOG, "!!!LockBufferForCleanup| wakeup lost"); > > LockBuffer(buffer, BUFFER_LOCK_UNLOCK); > > > > I don't think seeing BUF_STATE_GET_REFCOUNT(buf_state) == 1 guarantees that a > wakeup was lost in any sort of way. Nothing prevents another backend from > acquiring a pin on the buffer while we are waiting for a pincount, so > occasionally seeing BUF_STATE_GET_REFCOUNT() == 1 here is to be expected.
Yeah, the log is misleading as observing BUF_STATE_GET_REFCOUNT(buf_state) == 1 could be temporary, not necessarily leading to lost wake-up. A better wording seems to be: elog(LOG, "buffer refcount is 1 after publishing cleanup-lock waiter"); > You also can't expect repeatedly doing > UnlockBufHdrExt(set_bits=BM_PIN_COUNT_WAITER) to really make sense, because > BM_PIN_COUNT_WAITER only works if it was set *before* another backend releases > its pin, because otherwise the other backend obviously won't notice that the > flag was set if it only was set after the pg_atomic_fetch_sub_u64() in > UnpinBufferNoOwner(). This also makes sense to me. However the reproducer seems not to repeatedly do UnlockBufHdrExt(set_bits=BM_PIN_COUNT_WAITER). It uses a for loop as a delay before setting the BM_PIN_COUNT_WAITER, which seems ok to me. Am I missing sth here? > > I don't think the problem is that we are loosing a signal, the problem is that > we need to recheck if BUF_STATE_GET_REFCOUNT(buf_state) == 1 after setting > BM_PIN_COUNT_WAITER. I think this is a bug in (depending on your perspective) > either > > commit 5310fac6e0f > Author: Andres Freund <[email protected]> > Date: 2025-11-06 16:43:16 -0500 > > bufmgr: Use atomic sub for unpinning buffers > > or > > commit c75ebc657ffce8dab76471da31aafb79fbe3fda2 > Author: Andres Freund <[email protected]> > Date: 2025-11-06 16:42:10 -0500 > > bufmgr: Allow some buffer state modifications while holding header lock > > > Because before those commits, the buffer could not be unpinned while we held > the buffer header lock. With them, this race opens up. I agree with this as the root cause of the issue. Thanks for your analysis! -- Regards, Xuneng Zhou HighGo Software Co., Ltd.
