Hi Matthias and Robert, Matthias van de Meent <boekewurm+postg...@gmail.com> writes:
> On Thu, 4 Jan 2024 at 08:09, Andy Fan <zhihuifan1...@163.com> wrote: >> >> My question is if someone doesn't obey the rule by mistake (everyone >> can make mistake), shall we PANIC on a production environment? IMO I >> think it can be a WARNING on a production environment and be a stuck >> when 'ifdef USE_ASSERT_CHECKING'. >> [...] >> I think a experienced engineer like Thomas can make this mistake and the >> patch was reviewed by 3 peoples, the bug is still there. It is not easy >> to say just don't do it. >> >> the attached code show the prototype in my mind. Any feedback is welcome. > > While I understand your point and could maybe agree with the change > itself (a crash isn't great), It's great that both of you agree that the crash is not great. > I don't think it's an appropriate fix > for the problem of holding a spinlock while waiting for a LwLock, as > spin.h specifically mentions the following (and you quoted the same): > > """ > Keep in mind the coding rule that spinlocks must not be held for more > than a few instructions. > """ Yes, I agree that the known [LW]LockAcquire after holding a Spin lock should be fixed at the first chance rather than pander to it with my previous patch. My previous patch just take care of the *unknown* cases (and I cced thomas in the hope that he knows the bug). I also agree that the special case about [LW]LockAcquire should be detected more effective as you suggested below. So v2 comes and commit 2 is for this suggestion. > > I suspect that we'd be better off with stronger protections against > waiting for LwLocks while we hold any spin lock. More specifically, > I'm thinking about something like tracking how many spin locks we > hold, and Assert()-ing that we don't hold any such locks when we start > to wait for an LwLock or run CHECK_FOR_INTERRUPTS-related code (with > potential manual contextual overrides in specific areas of code where > specific care has been taken to make it safe to hold spin locks while > doing those operations - although I consider their existence unlikely > I can't rule them out as I've yet to go through all lock-touching > code). This would probably work in a similar manner as > START_CRIT_SECTION/END_CRIT_SECTION. > > Kind regards, > > Matthias van de Meent > Neon (https://neon.tech) -- Best Regards Andy Fan
>From 7d7fd0f0e9b13a290bfffaec0ad40773191155f2 Mon Sep 17 00:00:00 2001 From: "yizhi.fzh" <yizhi....@alibaba-inc.com> Date: Thu, 4 Jan 2024 14:33:37 +0800 Subject: [PATCH v2 1/2] Don't call s_lock_stuck in production environment In the current implementation, if a spin lock is misused, the s_lock_stuck in perform_spin_delay can cause the entire system to PANIC. In order to balance fault tolerance and the ability to detect incorrect usage, we can use WARNING to replace s_lock_stuck in the production environment, but still use s_lock_stuck in builds with USE_ASSERT_CHECKING. --- src/backend/storage/lmgr/s_lock.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c index 327ac64f7c..9446605122 100644 --- a/src/backend/storage/lmgr/s_lock.c +++ b/src/backend/storage/lmgr/s_lock.c @@ -67,6 +67,7 @@ slock_t dummy_spinlock; static int spins_per_delay = DEFAULT_SPINS_PER_DELAY; +#ifdef USE_ASSERT_CHECKING /* * s_lock_stuck() - complain about a stuck spinlock */ @@ -85,6 +86,7 @@ s_lock_stuck(const char *file, int line, const char *func) func, file, line); #endif } +#endif /* * s_lock(lock) - platform-independent portion of waiting for a spinlock. @@ -132,7 +134,14 @@ perform_spin_delay(SpinDelayStatus *status) if (++(status->spins) >= spins_per_delay) { if (++(status->delays) > NUM_DELAYS) + { +#ifdef USE_ASSERT_CHECKING s_lock_stuck(status->file, status->line, status->func); +#else + if (status->delays % NUM_DELAYS == 0) + elog(WARNING, "perform spin lock on %s %d times", status->func, status->delays); +#endif + } if (status->cur_delay == 0) /* first time to delay? */ status->cur_delay = MIN_DELAY_USEC; -- 2.34.1
>From fae6241960a2c0df8df5d83001ce298228f4cbf2 Mon Sep 17 00:00:00 2001 From: "yizhi.fzh" <yizhi....@alibaba-inc.com> Date: Thu, 4 Jan 2024 22:19:50 +0800 Subject: [PATCH v2 2/2] Disallow [LW]LockAcquire when holding a spin lock-file-mode Spinlocks are intended for *very* short-term locks, but sometimes it is misused, this commit guards no [LW]LockAcquire should be called when holding a spin lock. --- src/backend/storage/buffer/bufmgr.c | 1 + src/backend/storage/lmgr/lock.c | 2 ++ src/backend/storage/lmgr/lwlock.c | 2 ++ src/backend/utils/init/globals.c | 1 + src/include/storage/buf_internals.h | 1 + src/include/storage/spin.h | 24 ++++++++++++++++++++++-- 6 files changed, 29 insertions(+), 2 deletions(-) diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 9f9d3f24ac..d2534ee9aa 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -5419,6 +5419,7 @@ LockBufHdr(BufferDesc *desc) perform_spin_delay(&delayStatus); } finish_spin_delay(&delayStatus); + START_SPIN_LOCK(); return old_buf_state | BM_LOCKED; } diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index b8c57b3e16..0710adbef8 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -776,6 +776,8 @@ LockAcquireExtended(const LOCKTAG *locktag, bool found_conflict; bool log_lock = false; + Assert(SpinLockCount == 0); + if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods)) elog(ERROR, "unrecognized lock method: %d", lockmethodid); lockMethodTable = LockMethods[lockmethodid]; diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index 315a78cda9..5709cadcd5 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -1205,6 +1205,8 @@ LWLockAcquire(LWLock *lock, LWLockMode mode) Assert(mode == LW_SHARED || mode == LW_EXCLUSIVE); + Assert(SpinLockCount == 0); + PRINT_LWDEBUG("LWLockAcquire", lock, mode); #ifdef LWLOCK_STATS diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c index 60bc1217fb..125dc53375 100644 --- a/src/backend/utils/init/globals.c +++ b/src/backend/utils/init/globals.c @@ -40,6 +40,7 @@ volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false; volatile uint32 InterruptHoldoffCount = 0; volatile uint32 QueryCancelHoldoffCount = 0; volatile uint32 CritSectionCount = 0; +volatile uint32 SpinLockCount = 0; int MyProcPid; pg_time_t MyStartTime; diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h index 2c4fd92e39..be2d090308 100644 --- a/src/include/storage/buf_internals.h +++ b/src/include/storage/buf_internals.h @@ -363,6 +363,7 @@ UnlockBufHdr(BufferDesc *desc, uint32 buf_state) { pg_write_barrier(); pg_atomic_write_u32(&desc->state, buf_state & (~BM_LOCKED)); + END_SPIN_LOCK(); } /* in bufmgr.c */ diff --git a/src/include/storage/spin.h b/src/include/storage/spin.h index 5d809cc980..c1b96a047f 100644 --- a/src/include/storage/spin.h +++ b/src/include/storage/spin.h @@ -56,12 +56,32 @@ #include "storage/pg_sema.h" #endif +extern PGDLLIMPORT volatile uint32 SpinLockCount; + +#define START_SPIN_LOCK() (SpinLockCount++) +#define END_SPIN_LOCK() \ +do { \ + Assert(SpinLockCount > 0); \ + SpinLockCount--; \ +} while(0) #define SpinLockInit(lock) S_INIT_LOCK(lock) -#define SpinLockAcquire(lock) S_LOCK(lock) +#define SpinLockAcquire(lock) \ + do \ + { \ + S_LOCK(lock); \ + START_SPIN_LOCK(); \ + } while (false); + + +#define SpinLockRelease(lock) \ + do \ + { \ + S_UNLOCK(lock); \ + END_SPIN_LOCK(); \ + } while (false); -#define SpinLockRelease(lock) S_UNLOCK(lock) #define SpinLockFree(lock) S_LOCK_FREE(lock) -- 2.34.1