Re: Issue with the PRNG used by Postgres

Andres Freund Wed, 10 Apr 2024 10:33:08 -0700

Hi,

On 2024-04-10 13:03:05 -0400, Tom Lane wrote:
> After thinking about this some more, it is fairly clear that that *is*
> a mistake that can cause a thundering-herd problem.


> Assume we have two or more backends waiting in perform_spin_delay, and for
> whatever reason the scheduler wakes them up simultaneously.

That's not really possible, at least not repeatably. Multiple processes
obviously can't be scheduled concurrently on one CPU and scheduling something
on another core entails interrupting that CPU with an inter processor
interrupt or that other CPU scheduling on its own, without coordination.

That obviously isn't a reason to not fix the delay logic in lwlock.c.


Looks like the wrong logic was introduced by me in

commit 008608b9d51061b1f598c197477b3dc7be9c4a64
Author: Andres Freund <[email protected]>
Date:   2016-04-10 20:12:32 -0700

    Avoid the use of a separate spinlock to protect a LWLock's wait queue.

Likely because I was trying to avoid the overhead of init_local_spin_delay(),
without duplicating the few lines to acquire the "spinlock".


> So I think we need something like the attached.

LGTM.

I think it might be worth breaking LWLockWaitListLock() into two pieces, a
fastpath to be inlined into a caller, and a slowpath, but that's separate work
from a bugfix.


I looked around and the other uses of init_local_spin_delay() look correct
from this angle. However LockBufHdr() is more expensive than it needs to be,
because it always initializes SpinDelayStatus. IIRC I've seen that show up in
profiles before, but never got around to writing a nice-enough patch.  But
that's also something separate from a bugfix.

Greetings,

Andres Freund

Re: Issue with the PRNG used by Postgres

Reply via email to