Re: Issue with the PRNG used by Postgres

Andres Freund Fri, 12 Apr 2024 12:33:18 -0700

Hi,

On 2024-04-12 11:33:17 -0700, Andres Freund wrote:
> I wonder if, for easy backpatching, the easiest solution is to just reset
> errno before calling pg_usleep(), and only increment status->delays if
> errno != EINTR. Given our pg_usleep() implementations, that'd preven the stuck
> spinlock logic from triggering too fast.


Here's a patch implementing this approach. I confirmed that before we trigger
the stuck spinlock logic very quickly and after we don't. However, if most
sleeps are interrupted, it can delay the stuck spinlock detection a good
bit. But that seems much better than triggering it too quickly.

Greetings,

Andres Freund

>From 7a2e2aff29157e6e1b64dee60456f14a1e294237 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Fri, 12 Apr 2024 10:39:46 -0700
Subject: [PATCH v1 1/2] meson: Add target for s_lock_test

s_lock_test can't currently be used for automated tests as
a) it takes a while
b) it doesn't report failures in a usable way
but it is still useful to be able to build and run it manually.

The addition of PGDLLIMPORT to my_wait_event info is necessary to get link the
program correctly when targeting windows. Apparently that hadn't been tried before.

TODO: Should we build this by default, so it doesn't accidentally break, like
it has in the past?
---
 src/backend/storage/lmgr/meson.build | 13 +++++++++++++
 src/backend/storage/lmgr/s_lock.c    |  2 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/lmgr/meson.build b/src/backend/storage/lmgr/meson.build
index 05ac41e809a..26463d4e8a2 100644
--- a/src/backend/storage/lmgr/meson.build
+++ b/src/backend/storage/lmgr/meson.build
@@ -11,3 +11,16 @@ backend_sources += files(
   's_lock.c',
   'spin.c',
 )
+
+# Test binary for spinlocks. This isn't run as a test as
+# a) it takes a while
+# b) it doesn't report failures in a usable way
+# but it is still useful to be able to build and run it manually.
+s_lock_test = executable('s_lock_test',
+  files('s_lock.c'),
+  dependencies: [backend_code],
+  link_with: [common_static, pgport_static],
+  kwargs: default_bin_args + {'install': false},
+  c_args: ['-DS_LOCK_TEST=1'],
+  build_by_default: false,
+)
diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c
index 9b389d99512..f725d026317 100644
--- a/src/backend/storage/lmgr/s_lock.c
+++ b/src/backend/storage/lmgr/s_lock.c
@@ -67,7 +67,7 @@
  * s_lock_test.
  */
 static uint32 local_my_wait_event_info;
-uint32	   *my_wait_event_info = &local_my_wait_event_info;
+PGDLLIMPORT uint32 *my_wait_event_info = &local_my_wait_event_info;
 #endif
 
 static int	spins_per_delay = DEFAULT_SPINS_PER_DELAY;
-- 
2.44.0.279.g3bd955d269

>From 67dbf92aceaa506746c66276354ab430a85f06c7 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Fri, 12 Apr 2024 12:10:20 -0700
Subject: [PATCH v1 2/2] WIP: Avoid spuriously triggering the stuck spinlock
 detection

When signals are delivered, the sleep in perform_spin_delay() is
interrupted. Until now, perform_spin_delay() did not take that into account,
and behaved as if the sleep was uninterrupted. In some workloads signals can
be frequent enough to trigger the old logic almost immediately.

As this needs to be fixed in the backbranches, we are somewhat constrained by
ABI compatibility. Otherwise we e.g. could track the actual elapsed time in
SpinDelayStatus.

Instead, check if errno == EINTR after pg_usleep(), and only increase
SpinDelayStatus->delays if the sleep was not interrupted. Both the normal and
the windows implementation of pg_usleep() set errno == EINTR when interrupted.

This can substantially increase the amount of time until a stuck spinlock is
detected. While not great, it's much better than spuriously crash-restarting.
---
 src/port/pgsleep.c                |  3 +++
 src/backend/storage/lmgr/s_lock.c | 12 +++++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/port/pgsleep.c b/src/port/pgsleep.c
index 1284458bfce..1f87ef03490 100644
--- a/src/port/pgsleep.c
+++ b/src/port/pgsleep.c
@@ -36,6 +36,8 @@
  * might happen to run before the sleep begins, allowing the full delay.
  * Better practice is to use WaitLatch() with a timeout, so that backends
  * respond to latches and signals promptly.
+ *
+ * Some callers rely on errno being set to EINTR when interrupted.
  */
 void
 pg_usleep(long microsec)
@@ -49,6 +51,7 @@ pg_usleep(long microsec)
 		delay.tv_nsec = (microsec % 1000000L) * 1000;
 		(void) nanosleep(&delay, NULL);
 #else
+		/* will not be interrupted due to alertable = FALSE */
 		SleepEx((microsec < 500 ? 1 : (microsec + 500) / 1000), FALSE);
 #endif
 	}
diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c
index f725d026317..e173131e5b4 100644
--- a/src/backend/storage/lmgr/s_lock.c
+++ b/src/backend/storage/lmgr/s_lock.c
@@ -137,7 +137,7 @@ perform_spin_delay(SpinDelayStatus *status)
 	/* Block the process every spins_per_delay tries */
 	if (++(status->spins) >= spins_per_delay)
 	{
-		if (++(status->delays) > NUM_DELAYS)
+		if (status->delays > NUM_DELAYS)
 			s_lock_stuck(status->file, status->line, status->func);
 
 		if (status->cur_delay == 0) /* first time to delay? */
@@ -152,9 +152,19 @@ perform_spin_delay(SpinDelayStatus *status)
 		 * this is better than nothing.
 		 */
 		pgstat_report_wait_start(WAIT_EVENT_SPIN_DELAY);
+		errno = 0;				/* see below */
 		pg_usleep(status->cur_delay);
 		pgstat_report_wait_end();
 
+		/*
+		 * Only count the sleep above as a delay, if not interrupted.
+		 * Otherwise a process getting signaled rapidly can trigger the stuck
+		 * spinlock logic in a fraction of the expected time. This isn't a
+		 * great fix for that problem, but doesn't require an API / ABI break.
+		 */
+		if (errno != EINTR)
+			status->delays++;
+
 #if defined(S_LOCK_TEST)
 		fprintf(stdout, "*");
 		fflush(stdout);
-- 
2.44.0.279.g3bd955d269

Re: Issue with the PRNG used by Postgres

Reply via email to