On Mon, Jan 22, 2024 at 03:38:15PM -0600, Nathan Bossart wrote:
> On Mon, Jan 22, 2024 at 01:24:54PM -0800, Andres Freund wrote:
>> On 2024-01-22 15:19:36 -0600, Nathan Bossart wrote:
>>> I think this is because the autoprewarm state was moved to a DSM segment,
>>> and DSM segments are detached before the on_shmem_exit callbacks are called
>>> during process exit.  Moving apw_detach_shmem to the before_shmem_exit list
>>> seems to resolve the crashes.
>> 
>> That seems plausible. Would still be nice to have at least this test ensure
>> that the shutdown code works. Perhaps just a check of the control file after
>> shutdown, ensuring that the state is "shutdowned" vs crashed?
> 
> I'll give that a try.  I'll also expand the comment above the
> before_shmem_exit() call.

Here is a patch.

This might be a topic for another thread, but I do wonder whether we could
put a generic pg_controldata check in node->stop that would at least make
sure that the state is some sort of expected shut-down state.  My first
thought is that it could be a tad expensive, but... maybe it wouldn't be
too bad.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From b339a84802b76d42f4863e52e6d91ab28873e00a Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nat...@postgresql.org>
Date: Mon, 22 Jan 2024 16:22:57 -0600
Subject: [PATCH v1 1/1] fix autoprewarm core dump

---
 contrib/pg_prewarm/autoprewarm.c  | 10 ++++++++--
 contrib/pg_prewarm/t/001_basic.pl |  6 ++++++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index 9ea6c2252a..06ee21d496 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -164,8 +164,14 @@ autoprewarm_main(Datum main_arg)
 	if (apw_init_shmem())
 		first_time = false;
 
-	/* Set on-detach hook so that our PID will be cleared on exit. */
-	on_shmem_exit(apw_detach_shmem, 0);
+	/*
+	 * Set on-detach hook so that our PID will be cleared on exit.
+	 *
+	 * NB: Autoprewarm's state is stored in a DSM segment, and DSM segments
+	 * are detached before calling the on_shmem_exit callbacks, so we must put
+	 * apw_detach_shmem in the before_shmem_exit callback list.
+	 */
+	before_shmem_exit(apw_detach_shmem, 0);
 
 	/*
 	 * Store our PID in the shared memory area --- unless there's already
diff --git a/contrib/pg_prewarm/t/001_basic.pl b/contrib/pg_prewarm/t/001_basic.pl
index bcd23a6914..825d3448ee 100644
--- a/contrib/pg_prewarm/t/001_basic.pl
+++ b/contrib/pg_prewarm/t/001_basic.pl
@@ -55,4 +55,10 @@ $node->wait_for_log(
 
 $node->stop;
 
+# control file should indicate normal shut down
+command_like(
+	[ 'pg_controldata', $node->data_dir() ],
+	qr/Database cluster state:\s*shut down/,
+	'cluster shut down normally');
+
 done_testing();
-- 
2.25.1

Reply via email to