В Пт, 25/02/2022 в 09:01 -0800, Andres Freund пишет:
> Hi,
> 
> On 2022-02-25 12:51:22 +0300, Yura Sokolov wrote:
> > > > +        * The usage_count starts out at 1 so that the buffer can 
> > > > survive one
> > > > +        * clock-sweep pass.
> > > > +        *
> > > > +        * We use direct atomic OR instead of Lock+Unlock since no 
> > > > other backend
> > > > +        * could be interested in the buffer. But StrategyGetBuffer,
> > > > +        * Flush*Buffers, Drop*Buffers are scanning all buffers and 
> > > > locks them to
> > > > +        * compare tag, and UnlockBufHdr does raw write to state. So we 
> > > > have to
> > > > +        * spin if we found buffer locked.
> > > 
> > > So basically the first half of of the paragraph is wrong, because no, we
> > > can't?
> > 
> > Logically, there are no backends that could be interesting in the buffer.
> > Physically they do LockBufHdr/UnlockBufHdr just to check they are not 
> > interesting.
> 
> Yea, but that's still being interested in the buffer...
> 
> 
> > > > +        * Note that we write tag unlocked. It is also safe since there 
> > > > is always
> > > > +        * check for BM_VALID when tag is compared.
> > > >          */
> > > >         buf->tag = newTag;
> > > > -       buf_state &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED |
> > > > -                                  BM_CHECKPOINT_NEEDED | BM_IO_ERROR | 
> > > > BM_PERMANENT |
> > > > -                                  BUF_USAGECOUNT_MASK);
> > > >         if (relpersistence == RELPERSISTENCE_PERMANENT || forkNum == 
> > > > INIT_FORKNUM)
> > > > -               buf_state |= BM_TAG_VALID | BM_PERMANENT | 
> > > > BUF_USAGECOUNT_ONE;
> > > > +               new_bits = BM_TAG_VALID | BM_PERMANENT | 
> > > > BUF_USAGECOUNT_ONE;
> > > >         else
> > > > -               buf_state |= BM_TAG_VALID | BUF_USAGECOUNT_ONE;
> > > > -
> > > > -       UnlockBufHdr(buf, buf_state);
> > > > +               new_bits = BM_TAG_VALID | BUF_USAGECOUNT_ONE;
> > > >  
> > > > -       if (oldPartitionLock != NULL)
> > > > +       buf_state = pg_atomic_fetch_or_u32(&buf->state, new_bits);
> > > > +       while (unlikely(buf_state & BM_LOCKED))
> > > 
> > > I don't think it's safe to atomic in arbitrary bits. If somebody else has
> > > locked the buffer header in this moment, it'll lead to completely bogus
> > > results, because unlocking overwrites concurrently written contents (which
> > > there shouldn't be any, but here there are)...
> > 
> > That is why there is safety loop in the case buf->state were locked just
> > after first optimistic atomic_fetch_or. 99.999% times this loop will not
> > have a job. But in case other backend did lock buf->state, loop waits
> > until it releases lock and retry atomic_fetch_or.
> > > And or'ing contents in also doesn't make sense because we it doesn't work 
> > > to
> > > actually unset any contents?
> > 
> > Sorry, I didn't understand sentence :((
> 
> You're OR'ing multiple bits into buf->state. LockBufHdr() only ORs in
> BM_LOCKED. ORing BM_LOCKED is fine:
> Either the buffer is not already locked, in which case it just sets the
> BM_LOCKED bit, acquiring the lock. Or it doesn't change anything, because
> BM_LOCKED already was set.
> 
> But OR'ing in multiple bits is *not* fine, because it'll actually change the
> contents of ->state while the buffer header is locked.

First, both states are valid: before atomic_or and after.
Second, there are no checks for buffer->state while buffer header is locked.
All LockBufHdr users uses result of LockBufHdr. (I just checked that).

> > > Why don't you just use LockBufHdr/UnlockBufHdr?
> > 
> > This pair makes two atomic writes to memory. Two writes are heavier than
> > one write in this version (if optimistic case succeed).
> 
> UnlockBufHdr doesn't use a locked atomic op. It uses a write barrier and an
> unlocked write.

Write barrier is not free on any platform.

Well, while I don't see problem with modifying buffer->state, there is problem
with modifying buffer->tag: I missed Drop*Buffers doesn't check BM_TAG_VALID
flag. Therefore either I had to add this check to those places, or return to
LockBufHdr+UnlockBufHdr pair.

For patch simplicity I'll return Lock+UnlockBufHdr pair. But it has measurable
impact on low connection numbers on many-sockets.

> 
> Greetings,
> 
> Andres Freund



Reply via email to