I wrote:
> I don't think it's correct to re-initialize the SpinDelayStatus each
> time around the outer loop. That state should persist through the
> entire acquire operation, as it does in a regular spinlock acquire.
> As this stands, it resets the delay to minimum each time around the
> outer loop, and I bet it is that behavior not the RNG that's to blame
> for what he's seeing.
After thinking about this some more, it is fairly clear that that *is*
a mistake that can cause a thundering-herd problem. Assume we have
two or more backends waiting in perform_spin_delay, and for whatever
reason the scheduler wakes them up simultaneously. They see the
LW_FLAG_LOCKED bit clear (because whoever had the lock when they went
to sleep is long gone) and iterate the outer loop. One gets the lock;
the rest go back to sleep. And they will all sleep exactly
MIN_DELAY_USEC, because they all reset their SpinDelayStatus.
Lather, rinse, repeat. If the s_lock code were being used as
intended, they would soon acquire different cur_delay settings;
but that never happens. That is not the RNG's fault.
So I think we need something like the attached.
regards, tom lane
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index b1e388dc7c..54c84b9d67 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -857,46 +857,49 @@ static void
LWLockWaitListLock(LWLock *lock)
{
uint32 old_state;
-#ifdef LWLOCK_STATS
- lwlock_stats *lwstats;
- uint32 delays = 0;
- lwstats = get_lwlock_stats_entry(lock);
-#endif
+ /*
+ * Try once to acquire the lock, before we go to the trouble of setting up
+ * state for the spin loop.
+ */
+ old_state = pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_LOCKED);
+ if (!(old_state & LW_FLAG_LOCKED))
+ return; /* got lock */
- while (true)
{
- /* always try once to acquire lock directly */
- old_state = pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_LOCKED);
- if (!(old_state & LW_FLAG_LOCKED))
- break; /* got lock */
+ SpinDelayStatus delayStatus;
+#ifdef LWLOCK_STATS
+ lwlock_stats *lwstats = get_lwlock_stats_entry(lock);
+#endif
- /* and then spin without atomic operations until lock is released */
- {
- SpinDelayStatus delayStatus;
+ init_local_spin_delay(&delayStatus);
- init_local_spin_delay(&delayStatus);
+ while (true)
+ {
+ /* try again to acquire lock directly */
+ old_state = pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_LOCKED);
+ if (!(old_state & LW_FLAG_LOCKED))
+ break; /* got lock */
+ /* spin without atomic operations until lock is released */
while (old_state & LW_FLAG_LOCKED)
{
perform_spin_delay(&delayStatus);
old_state = pg_atomic_read_u32(&lock->state);
}
-#ifdef LWLOCK_STATS
- delays += delayStatus.delays;
-#endif
- finish_spin_delay(&delayStatus);
+
+ /*
+ * Retry. The lock might obviously already be re-acquired by the
+ * time we're attempting to get it again.
+ */
}
- /*
- * Retry. The lock might obviously already be re-acquired by the time
- * we're attempting to get it again.
- */
- }
+ finish_spin_delay(&delayStatus);
#ifdef LWLOCK_STATS
- lwstats->spin_delay_count += delays;
+ lwstats->spin_delay_count += delayStatus.delays;
#endif
+ }
}
/*