Hi Matthias and Robert,

Matthias van de Meent <boekewurm+postg...@gmail.com> writes:

> On Thu, 4 Jan 2024 at 08:09, Andy Fan <zhihuifan1...@163.com> wrote:
>>
>> My question is if someone doesn't obey the rule by mistake (everyone
>> can make mistake), shall we PANIC on a production environment? IMO I
>> think it can be a WARNING on a production environment and be a stuck
>> when 'ifdef USE_ASSERT_CHECKING'.
>> [...]
>> I think a experienced engineer like Thomas can make this mistake and the
>> patch was reviewed by 3 peoples, the bug is still there. It is not easy
>> to say just don't do it.
>>
>> the attached code show the prototype in my mind. Any feedback is welcome.
>
> While I understand your point and could maybe agree with the change
> itself (a crash isn't great),

It's great that both of you agree that the crash is not great. 

> I don't think it's an appropriate fix
> for the problem of holding a spinlock while waiting for a LwLock, as
> spin.h specifically mentions the following (and you quoted the same):
>
> """
> Keep in mind the coding rule that spinlocks must not be held for more
> than a few instructions.
> """

Yes, I agree that the known [LW]LockAcquire after holding a Spin lock
should be fixed at the first chance rather than pander to it with my
previous patch. My previous patch just take care of the *unknown*
cases (and I cced thomas in the hope that he knows the bug). I also
agree that the special case about [LW]LockAcquire should be detected
more effective as you suggested below. So v2 comes and commit 2 is for
this suggestion. 

>
> I suspect that we'd be better off with stronger protections against
> waiting for LwLocks while we hold any spin lock. More specifically,
> I'm thinking about something like tracking how many spin locks we
> hold, and Assert()-ing that we don't hold any such locks when we start
> to wait for an LwLock or run CHECK_FOR_INTERRUPTS-related code (with
> potential manual contextual overrides in specific areas of code where
> specific care has been taken to make it safe to hold spin locks while
> doing those operations - although I consider their existence unlikely
> I can't rule them out as I've yet to go through all lock-touching
> code). This would probably work in a similar manner as
> START_CRIT_SECTION/END_CRIT_SECTION.
>
> Kind regards,
>
> Matthias van de Meent
> Neon (https://neon.tech)


-- 
Best Regards
Andy Fan

>From 7d7fd0f0e9b13a290bfffaec0ad40773191155f2 Mon Sep 17 00:00:00 2001
From: "yizhi.fzh" <yizhi....@alibaba-inc.com>
Date: Thu, 4 Jan 2024 14:33:37 +0800
Subject: [PATCH v2 1/2] Don't call s_lock_stuck in production environment

In the current implementation, if a spin lock is misused, the
s_lock_stuck in perform_spin_delay can cause the entire system to
PANIC. In order to balance fault tolerance and the ability to detect
incorrect usage, we can use WARNING to replace s_lock_stuck in the
production environment, but still use s_lock_stuck in builds with
USE_ASSERT_CHECKING.
---
 src/backend/storage/lmgr/s_lock.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c
index 327ac64f7c..9446605122 100644
--- a/src/backend/storage/lmgr/s_lock.c
+++ b/src/backend/storage/lmgr/s_lock.c
@@ -67,6 +67,7 @@ slock_t		dummy_spinlock;
 static int	spins_per_delay = DEFAULT_SPINS_PER_DELAY;
 
 
+#ifdef USE_ASSERT_CHECKING
 /*
  * s_lock_stuck() - complain about a stuck spinlock
  */
@@ -85,6 +86,7 @@ s_lock_stuck(const char *file, int line, const char *func)
 		 func, file, line);
 #endif
 }
+#endif
 
 /*
  * s_lock(lock) - platform-independent portion of waiting for a spinlock.
@@ -132,7 +134,14 @@ perform_spin_delay(SpinDelayStatus *status)
 	if (++(status->spins) >= spins_per_delay)
 	{
 		if (++(status->delays) > NUM_DELAYS)
+		{
+#ifdef USE_ASSERT_CHECKING
 			s_lock_stuck(status->file, status->line, status->func);
+#else
+			if (status->delays % NUM_DELAYS == 0)
+				elog(WARNING, "perform spin lock on %s %d times", status->func, status->delays);
+#endif
+		}
 
 		if (status->cur_delay == 0) /* first time to delay? */
 			status->cur_delay = MIN_DELAY_USEC;
-- 
2.34.1

>From fae6241960a2c0df8df5d83001ce298228f4cbf2 Mon Sep 17 00:00:00 2001
From: "yizhi.fzh" <yizhi....@alibaba-inc.com>
Date: Thu, 4 Jan 2024 22:19:50 +0800
Subject: [PATCH v2 2/2] Disallow [LW]LockAcquire when holding a spin
 lock-file-mode

Spinlocks are intended for *very* short-term locks, but sometimes it is
misused, this commit guards no [LW]LockAcquire should be called when
holding a spin lock.
---
 src/backend/storage/buffer/bufmgr.c |  1 +
 src/backend/storage/lmgr/lock.c     |  2 ++
 src/backend/storage/lmgr/lwlock.c   |  2 ++
 src/backend/utils/init/globals.c    |  1 +
 src/include/storage/buf_internals.h |  1 +
 src/include/storage/spin.h          | 24 ++++++++++++++++++++++--
 6 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 9f9d3f24ac..d2534ee9aa 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -5419,6 +5419,7 @@ LockBufHdr(BufferDesc *desc)
 		perform_spin_delay(&delayStatus);
 	}
 	finish_spin_delay(&delayStatus);
+	START_SPIN_LOCK();
 	return old_buf_state | BM_LOCKED;
 }
 
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index b8c57b3e16..0710adbef8 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -776,6 +776,8 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	bool		found_conflict;
 	bool		log_lock = false;
 
+	Assert(SpinLockCount == 0);
+
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
 	lockMethodTable = LockMethods[lockmethodid];
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 315a78cda9..5709cadcd5 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1205,6 +1205,8 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 
 	Assert(mode == LW_SHARED || mode == LW_EXCLUSIVE);
 
+	Assert(SpinLockCount == 0);
+
 	PRINT_LWDEBUG("LWLockAcquire", lock, mode);
 
 #ifdef LWLOCK_STATS
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 60bc1217fb..125dc53375 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -40,6 +40,7 @@ volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
 volatile uint32 InterruptHoldoffCount = 0;
 volatile uint32 QueryCancelHoldoffCount = 0;
 volatile uint32 CritSectionCount = 0;
+volatile uint32 SpinLockCount = 0;
 
 int			MyProcPid;
 pg_time_t	MyStartTime;
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index 2c4fd92e39..be2d090308 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -363,6 +363,7 @@ UnlockBufHdr(BufferDesc *desc, uint32 buf_state)
 {
 	pg_write_barrier();
 	pg_atomic_write_u32(&desc->state, buf_state & (~BM_LOCKED));
+	END_SPIN_LOCK();
 }
 
 /* in bufmgr.c */
diff --git a/src/include/storage/spin.h b/src/include/storage/spin.h
index 5d809cc980..c1b96a047f 100644
--- a/src/include/storage/spin.h
+++ b/src/include/storage/spin.h
@@ -56,12 +56,32 @@
 #include "storage/pg_sema.h"
 #endif
 
+extern PGDLLIMPORT volatile uint32 SpinLockCount;
+
+#define START_SPIN_LOCK() (SpinLockCount++)
+#define END_SPIN_LOCK() \
+do { \
+	Assert(SpinLockCount > 0); \
+	SpinLockCount--; \
+} while(0)
 
 #define SpinLockInit(lock)	S_INIT_LOCK(lock)
 
-#define SpinLockAcquire(lock) S_LOCK(lock)
+#define SpinLockAcquire(lock) \
+	do \
+	{ \
+		S_LOCK(lock); \
+		START_SPIN_LOCK(); \
+	} while (false);
+
+
+#define SpinLockRelease(lock) \
+	do \
+	{ \
+		S_UNLOCK(lock); \
+		END_SPIN_LOCK(); \
+	} while (false);
 
-#define SpinLockRelease(lock) S_UNLOCK(lock)
 
 #define SpinLockFree(lock)	S_LOCK_FREE(lock)
 
-- 
2.34.1

Reply via email to