On Thu, Feb 02, 2023 at 11:44:22PM -0800, Andres Freund wrote:
> On 2023-02-03 02:24:03 -0500, Tom Lane wrote:
>> Andres Freund <and...@anarazel.de> writes:
>> > A workaround for the back branches could be to have a test in
>> > StartupProcShutdownHandler() that tests if MyProcPid == getpid(), and
>> > not do the proc_exit() if they don't match. We probably should just do
>> > an _exit() in that case.
>> 
>> Might work.
> 
> I wonder if we should add code complaining loudly about such a mismatch
> to proc_exit(), in addition to handling it more silently in
> StartupProcShutdownHandler().   Also, an assertion in
> [Auxiliary]ProcKill that proc->xid == MyProcPid == getpid() seems like a
> good idea.

>From the discussion, it sounds like we don't want to depend on the child
process receiving/handling the signal, so we can't get rid of the
break-out-of-system() behavior (at least not in back-branches).  I've put
together some work-in-progress patches for the stopgap/back-branch fix.

0001 is just v1-0001 from upthread.  This moves Pre/PostRestoreCommand to
surround only the call to system().  I think this should get us closer to
pre-v15 behavior.

0002 adds the getpid() check mentioned above to
StartupProcShutdownHandler(), and it adds assertions to proc_exit() and
[Auxiliary]ProcKill().

0003 adds checks for shutdown requests before and after the call to
shell_restore().  IMO the Pre/PostRestoreCommand stuff is an implementation
detail for restore_command, so I think it behooves us to have some
additional shutdown checks that apply even for recovery modules.  This
patch could probably be moved to the recovery modules thread.

Is this somewhat close to what folks had in mind?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 773cf2aa72d21d05b01c6c5f9308eab287826168 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandboss...@gmail.com>
Date: Wed, 1 Feb 2023 14:32:02 -0800
Subject: [PATCH v4 1/3] stopgap fix for restore_command

---
 src/backend/access/transam/shell_restore.c | 22 ++++++++++++++++++++--
 src/backend/access/transam/xlogarchive.c   |  7 -------
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/transam/shell_restore.c b/src/backend/access/transam/shell_restore.c
index 8458209f49..abec023c1a 100644
--- a/src/backend/access/transam/shell_restore.c
+++ b/src/backend/access/transam/shell_restore.c
@@ -21,6 +21,7 @@
 #include "access/xlogarchive.h"
 #include "access/xlogrecovery.h"
 #include "common/percentrepl.h"
+#include "postmaster/startup.h"
 #include "storage/ipc.h"
 #include "utils/wait_event.h"
 
@@ -124,8 +125,7 @@ shell_recovery_end(const char *lastRestartPointFileName)
  * human-readable name describing the command emitted in the logs. If
  * 'failOnSignal' is true and the command is killed by a signal, a FATAL
  * error is thrown. Otherwise, 'fail_elevel' is used for the log message.
- * If 'exitOnSigterm' is true and the command is killed by SIGTERM, we exit
- * immediately.
+ * If 'exitOnSigterm' is true and SIGTERM is received, we exit immediately.
  *
  * Returns whether the command succeeded.
  */
@@ -146,7 +146,25 @@ ExecuteRecoveryCommand(const char *command, const char *commandName,
 	 */
 	fflush(NULL);
 	pgstat_report_wait_start(wait_event_info);
+
+	/*
+	 * PreRestoreCommand() is used to tell the SIGTERM handler for the startup
+	 * process that it is okay to proc_exit() right away on SIGTERM.  This is
+	 * done for the duration of the system() call because there isn't a good
+	 * way to break out while it is executing.  Since we might call proc_exit()
+	 * in a signal handler here, it is extremely important that nothing but the
+	 * system() call happens between the calls to PreRestoreCommand() and
+	 * PostRestoreCommand().  Any additional code must go before or after this
+	 * section.
+	 */
+	if (exitOnSigterm)
+		PreRestoreCommand();
+
 	rc = system(command);
+
+	if (exitOnSigterm)
+		PostRestoreCommand();
+
 	pgstat_report_wait_end();
 
 	if (rc != 0)
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 4b89addf97..66312c816b 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -147,18 +147,11 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 	else
 		XLogFileName(lastRestartPointFname, 0, 0L, wal_segment_size);
 
-	/*
-	 * Check signals before restore command and reset afterwards.
-	 */
-	PreRestoreCommand();
-
 	/*
 	 * Copy xlog from archival storage to XLOGDIR
 	 */
 	ret = shell_restore(xlogfname, xlogpath, lastRestartPointFname);
 
-	PostRestoreCommand();
-
 	if (ret)
 	{
 		/*
-- 
2.25.1

>From c16c46aac4af029d807d7060d3fb09a7b707ee4d Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandboss...@gmail.com>
Date: Fri, 3 Feb 2023 10:24:14 -0800
Subject: [PATCH v4 2/3] do not call proc_exit in process forked for
 restore_command

---
 src/backend/postmaster/startup.c | 14 +++++++++++++-
 src/backend/storage/ipc/ipc.c    |  3 +++
 src/backend/storage/lmgr/proc.c  |  2 ++
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index bcd23542f1..503eb1a5a6 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -19,6 +19,8 @@
  */
 #include "postgres.h"
 
+#include <unistd.h>
+
 #include "access/xlog.h"
 #include "access/xlogrecovery.h"
 #include "access/xlogutils.h"
@@ -121,7 +123,17 @@ StartupProcShutdownHandler(SIGNAL_ARGS)
 	int			save_errno = errno;
 
 	if (in_restore_command)
-		proc_exit(1);
+	{
+		/*
+		 * If we are in a child process (e.g., forked by system() in
+		 * shell_restore()), we don't want to call any exit callbacks.  The
+		 * parent will take care of that.
+		 */
+		if (MyProcPid == (int) getpid())
+			proc_exit(1);
+		else
+			_exit(1);
+	}
 	else
 		shutdown_requested = true;
 	WakeupRecovery();
diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c
index 1904d21795..6796cabc3e 100644
--- a/src/backend/storage/ipc/ipc.c
+++ b/src/backend/storage/ipc/ipc.c
@@ -103,6 +103,9 @@ static int	on_proc_exit_index,
 void
 proc_exit(int code)
 {
+	/* proc_exit() is not safe in forked processes from system(), etc. */
+	Assert(MyProcPid == getpid());
+
 	/* Clean up everything that must be cleaned up */
 	proc_exit_prepare(code);
 
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 22b4278610..5bc53f34a4 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -805,6 +805,7 @@ ProcKill(int code, Datum arg)
 	dlist_head *procgloballist;
 
 	Assert(MyProc != NULL);
+	Assert(MyProcPid == getpid());	/* not safe if forked by system(), etc. */
 
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
@@ -925,6 +926,7 @@ AuxiliaryProcKill(int code, Datum arg)
 	PGPROC	   *proc;
 
 	Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS);
+	Assert(MyProcPid == getpid());  /* not safe if forked by system(), etc. */
 
 	auxproc = &AuxiliaryProcs[proctype];
 
-- 
2.25.1

>From 4e43cda2281d53c6fd4702a9d80553f0f9bd7458 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandboss...@gmail.com>
Date: Fri, 3 Feb 2023 10:34:05 -0800
Subject: [PATCH v4 3/3] handle shutdown requests before and after restoring a
 file

---
 src/backend/access/transam/xlogarchive.c |  8 ++++++++
 src/backend/postmaster/startup.c         | 16 ++++++++++++----
 src/include/postmaster/startup.h         |  1 +
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 66312c816b..18cf5b032b 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -147,11 +147,19 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 	else
 		XLogFileName(lastRestartPointFname, 0, 0L, wal_segment_size);
 
+	/*
+	 * Check for pending shutdown requests before and after restoring the file,
+	 * and exit if there is one.
+	 */
+	HandleStartupProcShutdownRequest();
+
 	/*
 	 * Copy xlog from archival storage to XLOGDIR
 	 */
 	ret = shell_restore(xlogfname, xlogpath, lastRestartPointFname);
 
+	HandleStartupProcShutdownRequest();
+
 	if (ret)
 	{
 		/*
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 503eb1a5a6..499dc72404 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -195,8 +195,7 @@ HandleStartupProcInterrupts(void)
 	/*
 	 * Check if we were requested to exit without finishing recovery.
 	 */
-	if (shutdown_requested)
-		proc_exit(1);
+	HandleStartupProcShutdownRequest();
 
 	/*
 	 * Emergency bailout if postmaster has died.  This is to avoid the
@@ -220,6 +219,16 @@ HandleStartupProcInterrupts(void)
 		ProcessLogMemoryContextInterrupt();
 }
 
+/*
+ * Exit if there is a pending shutdown request.
+ */
+void
+HandleStartupProcShutdownRequest(void)
+{
+	if (shutdown_requested)
+		proc_exit(1);
+}
+
 
 /* --------------------------------
  *		signal handler routines
@@ -295,8 +304,7 @@ PreRestoreCommand(void)
 	 * shutdown request received just before this.
 	 */
 	in_restore_command = true;
-	if (shutdown_requested)
-		proc_exit(1);
+	HandleStartupProcShutdownRequest();
 }
 
 void
diff --git a/src/include/postmaster/startup.h b/src/include/postmaster/startup.h
index dd957f9291..8eea0490ff 100644
--- a/src/include/postmaster/startup.h
+++ b/src/include/postmaster/startup.h
@@ -29,6 +29,7 @@ extern void HandleStartupProcInterrupts(void);
 extern void StartupProcessMain(void) pg_attribute_noreturn();
 extern void PreRestoreCommand(void);
 extern void PostRestoreCommand(void);
+extern void HandleStartupProcShutdownRequest(void);
 extern bool IsPromoteSignaled(void);
 extern void ResetPromoteSignaled(void);
 
-- 
2.25.1

Reply via email to