On Mon, Jan 22, 2024 at 03:38:15PM -0600, Nathan Bossart wrote: > On Mon, Jan 22, 2024 at 01:24:54PM -0800, Andres Freund wrote: >> On 2024-01-22 15:19:36 -0600, Nathan Bossart wrote: >>> I think this is because the autoprewarm state was moved to a DSM segment, >>> and DSM segments are detached before the on_shmem_exit callbacks are called >>> during process exit. Moving apw_detach_shmem to the before_shmem_exit list >>> seems to resolve the crashes. >> >> That seems plausible. Would still be nice to have at least this test ensure >> that the shutdown code works. Perhaps just a check of the control file after >> shutdown, ensuring that the state is "shutdowned" vs crashed? > > I'll give that a try. I'll also expand the comment above the > before_shmem_exit() call.
Here is a patch. This might be a topic for another thread, but I do wonder whether we could put a generic pg_controldata check in node->stop that would at least make sure that the state is some sort of expected shut-down state. My first thought is that it could be a tad expensive, but... maybe it wouldn't be too bad. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
>From b339a84802b76d42f4863e52e6d91ab28873e00a Mon Sep 17 00:00:00 2001 From: Nathan Bossart <nat...@postgresql.org> Date: Mon, 22 Jan 2024 16:22:57 -0600 Subject: [PATCH v1 1/1] fix autoprewarm core dump --- contrib/pg_prewarm/autoprewarm.c | 10 ++++++++-- contrib/pg_prewarm/t/001_basic.pl | 6 ++++++ 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c index 9ea6c2252a..06ee21d496 100644 --- a/contrib/pg_prewarm/autoprewarm.c +++ b/contrib/pg_prewarm/autoprewarm.c @@ -164,8 +164,14 @@ autoprewarm_main(Datum main_arg) if (apw_init_shmem()) first_time = false; - /* Set on-detach hook so that our PID will be cleared on exit. */ - on_shmem_exit(apw_detach_shmem, 0); + /* + * Set on-detach hook so that our PID will be cleared on exit. + * + * NB: Autoprewarm's state is stored in a DSM segment, and DSM segments + * are detached before calling the on_shmem_exit callbacks, so we must put + * apw_detach_shmem in the before_shmem_exit callback list. + */ + before_shmem_exit(apw_detach_shmem, 0); /* * Store our PID in the shared memory area --- unless there's already diff --git a/contrib/pg_prewarm/t/001_basic.pl b/contrib/pg_prewarm/t/001_basic.pl index bcd23a6914..825d3448ee 100644 --- a/contrib/pg_prewarm/t/001_basic.pl +++ b/contrib/pg_prewarm/t/001_basic.pl @@ -55,4 +55,10 @@ $node->wait_for_log( $node->stop; +# control file should indicate normal shut down +command_like( + [ 'pg_controldata', $node->data_dir() ], + qr/Database cluster state:\s*shut down/, + 'cluster shut down normally'); + done_testing(); -- 2.25.1