Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

Andres Freund Fri, 02 Jun 2017 17:21:07 -0700

On 2017-06-01 17:29:12 -0700, Andres Freund wrote:
> On 2017-06-02 08:38:51 +0900, Michael Paquier wrote:
> > On Fri, Jun 2, 2017 at 7:05 AM, Andres Freund <and...@anarazel.de> wrote:
> > > I'm a unhappy how this is reusing SIGINT for WalSndLastCycleHandler.
> > > Normally INT is used cancel interrupts, and since walsender is now also
> > > working as a normal backend, this overlap is bad.  Even for plain
> > > walsender backend this seems bad, because now non-superusers replication
> > > users will terminate replication connections when they do
> > > pg_cancel_backend().  For replication=dbname users it's especially bad
> > > because there can be completely legitimate uses of pg_cancel_backend().
> >
> > Signals for WAL senders are set in WalSndSignals() which uses SIG_IGN
> > for SIGINT now in ~9.6, and StatementCancelHandler does not get set up
> > for a non-am_walsender backend. Am I missing something?
> 
> Yes, but nothing in those observeration actually addresses my point?
> 
> Some points:
> 
> 1) 086221cf6b1727c2baed4703c582f657b7c5350e changes things so walsender
>    backends use SIGINT for WalSndLastCycleHandler(), which is now
>    triggerable by pg_cancel_backend(). Especially for logical rep
>    walsenders it's not absurd sending that.
> 2) Walsenders now can run normal queries.
> 3) Afaict 086221cf6b1727c2baed4703c582f657b7c5350e doesn't really
>    address the PANIC problem for database connected walsenders at all,
>    because it doesn't even cancel non-replication commands.  I.e. an
>    already running query can just continue to run.  Which afaict just
>    entirely breaks shutdown.  If the connection is idle, or running a
>    query, we'll just wait forever.
> 4) the whole logic introduced in the above commit doesn't actually
>    appear to handle logical decoding senders properly - wasn't the whole
>    issue at hand that those can write WAL in some case?  But
>    nevertheless WalSndWaitForWal() does a
>    WalSndSetState(WALSNDSTATE_STOPPING); *and then continues decoding
>    and waiting* - which seems to obviate the entire point of that commit.
> 
> I'm working on a patch rejiggering things so:
> 
> a) upon shutdown checkpointer (so we can use procsignal), not
>    postmaster, sends PROCSIG_WALSND_INIT_STOPPING to all walsenders (so
>    we don't have to use up a normal signal handler)
> b) Upon reception walsenders immediately exit if !replication_active,
>    otherwise sets got_STOPPING
> c) Logical decoding will finish if got_STOPPING is sent, *WITHOUT*, as
>    currently done, moving to WALSNDSTATE_STOPPING.  Not yet quite sure
>    how to best handle the WALSNDSTATE_STOPPING transition in WalSndLoop().
> d) Once all remaining walsenders are in stopping state, postmaster sends
>    SIGUSR2 to trigger shutdown (basically as before)
> 
> Does that seem to make sense?


Attached is a *preliminary* patch series implementing this.  I've first
reverted the previous patch, as otherwise backpatchable versions of the
necessary patches would get too complicated, due to the signals used and
such.

This also fixes several of the issues from the somewhat related thread at
http://archives.postgresql.org/message-id/20170421014030.fdzvvvbrz4nckrow%40alap3.anarazel.de

I'm not perfectly happy with the use of XLogBackgroundFlush() but we
don't currently expose anything else to flush all pending WAL afaics -
it's not too bad either.  Without that we can end up waiting forever
while if the last XLogInserts are done by an asynchronously committing
backend or the relevant backends exited before getting to flush out
their records, because walwriter has already been shut down at that
point.

Comments?

- Andres

>From 187f99cdf98886be954cd1edda275c51b83da5ef Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Fri, 2 Jun 2017 14:14:34 -0700
Subject: [PATCH 1/4] Revert "Prevent panic during shutdown checkpoint"

This reverts commit 086221cf6b1727c2baed4703c582f657b7c5350e, which
was made to master only.

The approach implemented in the above commit has some issues.  While
this could easily be fixed incrementally, doing so would make
backpatching considerably harder, so instead first revert this patch.

Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxp...@alap3.anarazel.de
---
 doc/src/sgml/monitoring.sgml                |   5 -
 src/backend/access/transam/xlog.c           |   6 --
 src/backend/postmaster/postmaster.c         |   7 +-
 src/backend/replication/walsender.c         | 143 ++++------------------------
 src/include/replication/walsender.h         |   1 -
 src/include/replication/walsender_private.h |   3 +-
 6 files changed, 24 insertions(+), 141 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 79ca45a156..5640c0d84a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1690,11 +1690,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
            <literal>backup</>: This WAL sender is sending a backup.
           </para>
          </listitem>
-         <listitem>
-          <para>
-           <literal>stopping</>: This WAL sender is stopping.
-          </para>
-         </listitem>
        </itemizedlist>
      </entry>
     </row>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 399822d3fe..35ee7d1cb6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8324,12 +8324,6 @@ ShutdownXLOG(int code, Datum arg)
 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
 			(errmsg("shutting down")));
 
-	/*
-	 * Wait for WAL senders to be in stopping state.  This prevents commands
-	 * from writing new WAL.
-	 */
-	WalSndWaitStopping();
-
 	if (RecoveryInProgress())
 		CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	else
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 35b4ec88d3..5c79b1e40d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2918,7 +2918,7 @@ reaper(SIGNAL_ARGS)
 				 * Waken walsenders for the last time. No regular backends
 				 * should be around anymore.
 				 */
-				SignalChildren(SIGINT);
+				SignalChildren(SIGUSR2);
 
 				pmState = PM_SHUTDOWN_2;
 
@@ -3656,9 +3656,7 @@ PostmasterStateMachine(void)
 				/*
 				 * If we get here, we are proceeding with normal shutdown. All
 				 * the regular children are gone, and it's time to tell the
-				 * checkpointer to do a shutdown checkpoint. All WAL senders
-				 * are told to switch to a stopping state so that the shutdown
-				 * checkpoint can go ahead.
+				 * checkpointer to do a shutdown checkpoint.
 				 */
 				Assert(Shutdown > NoShutdown);
 				/* Start the checkpointer if not running */
@@ -3667,7 +3665,6 @@ PostmasterStateMachine(void)
 				/* And tell it to shut down */
 				if (CheckpointerPID != 0)
 				{
-					SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
 					signal_child(CheckpointerPID, SIGUSR2);
 					pmState = PM_SHUTDOWN;
 				}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 49cce38880..aa705e5b35 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -24,14 +24,11 @@
  * are treated as not a crash but approximately normal termination;
  * the walsender will exit quickly without sending any more XLOG records.
  *
- * If the server is shut down, postmaster sends us SIGUSR2 after all regular
- * backends have exited. This causes the walsender to switch to the "stopping"
- * state. In this state, the walsender will reject any replication command
- * that may generate WAL activity. The checkpointer begins the shutdown
- * checkpoint once all walsenders are confirmed as stopping. When the shutdown
- * checkpoint finishes, the postmaster sends us SIGINT. This instructs
- * walsender to send any outstanding WAL, including the shutdown checkpoint
- * record, wait for it to be replicated to the standby, and then exit.
+ * If the server is shut down, postmaster sends us SIGUSR2 after all
+ * regular backends have exited and the shutdown checkpoint has been written.
+ * This instructs walsender to send any outstanding WAL, including the
+ * shutdown checkpoint record, wait for it to be replicated to the standby,
+ * and then exit.
  *
  *
  * Portions Copyright (c) 2010-2017, PostgreSQL Global Development Group
@@ -180,14 +177,13 @@ static bool WalSndCaughtUp = false;
 
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t got_SIGINT = false;
-static volatile sig_atomic_t got_SIGUSR2 = false;
+static volatile sig_atomic_t walsender_ready_to_stop = false;
 
 /*
- * This is set while we are streaming. When not set, SIGINT signal will be
+ * This is set while we are streaming. When not set, SIGUSR2 signal will be
  * handled like SIGTERM. When set, the main loop is responsible for checking
- * got_SIGINT and terminating when it's set (after streaming any remaining
- * WAL).
+ * walsender_ready_to_stop and terminating when it's set (after streaming any
+ * remaining WAL).
  */
 static volatile sig_atomic_t replication_active = false;
 
@@ -217,7 +213,6 @@ static struct
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndXLogSendHandler(SIGNAL_ARGS);
-static void WalSndSwitchStopping(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
@@ -306,14 +301,11 @@ WalSndErrorCleanup(void)
 	ReplicationSlotCleanup();
 
 	replication_active = false;
-	if (got_SIGINT)
+	if (walsender_ready_to_stop)
 		proc_exit(0);
 
 	/* Revert back to startup state */
 	WalSndSetState(WALSNDSTATE_STARTUP);
-
-	if (got_SIGUSR2)
-		WalSndSetState(WALSNDSTATE_STOPPING);
 }
 
 /*
@@ -686,7 +678,7 @@ StartReplication(StartReplicationCmd *cmd)
 		WalSndLoop(XLogSendPhysical);
 
 		replication_active = false;
-		if (got_SIGINT)
+		if (walsender_ready_to_stop)
 			proc_exit(0);
 		WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1064,7 +1056,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	{
 		ereport(LOG,
 				(errmsg("terminating walsender process after promotion")));
-		got_SIGINT = true;
+		walsender_ready_to_stop = true;
 	}
 
 	WalSndSetState(WALSNDSTATE_CATCHUP);
@@ -1115,7 +1107,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	ReplicationSlotRelease();
 
 	replication_active = false;
-	if (got_SIGINT)
+	if (walsender_ready_to_stop)
 		proc_exit(0);
 	WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1327,14 +1319,6 @@ WalSndWaitForWal(XLogRecPtr loc)
 			RecentFlushPtr = GetXLogReplayRecPtr(NULL);
 
 		/*
-		 * If postmaster asked us to switch to the stopping state, do so.
-		 * Shutdown is in progress and this will allow the checkpointer to
-		 * move on with the shutdown checkpoint.
-		 */
-		if (got_SIGUSR2)
-			WalSndSetState(WALSNDSTATE_STOPPING);
-
-		/*
 		 * If postmaster asked us to stop, don't wait here anymore. This will
 		 * cause the xlogreader to return without reading a full record, which
 		 * is the fastest way to reach the mainloop which then can quit.
@@ -1343,7 +1327,7 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * RecentFlushPtr, so we can send all remaining data before shutting
 		 * down.
 		 */
-		if (got_SIGINT)
+		if (walsender_ready_to_stop)
 			break;
 
 		/*
@@ -1418,22 +1402,6 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext old_context;
 
 	/*
-	 * If WAL sender has been told that shutdown is getting close, switch its
-	 * status accordingly to handle the next replication commands correctly.
-	 */
-	if (got_SIGUSR2)
-		WalSndSetState(WALSNDSTATE_STOPPING);
-
-	/*
-	 * Throw error if in stopping mode.  We need prevent commands that could
-	 * generate WAL while the shutdown checkpoint is being written.  To be
-	 * safe, we just prohibit all new commands.
-	 */
-	if (MyWalSnd->state == WALSNDSTATE_STOPPING)
-		ereport(ERROR,
-				(errmsg("cannot execute new commands while WAL sender is in stopping mode")));
-
-	/*
 	 * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
 	 * command arrives. Clean up the old stuff if there's anything.
 	 */
@@ -2155,20 +2123,13 @@ WalSndLoop(WalSndSendDataCallback send_data)
 			}
 
 			/*
-			 * At the reception of SIGUSR2, switch the WAL sender to the
-			 * stopping state.
-			 */
-			if (got_SIGUSR2)
-				WalSndSetState(WALSNDSTATE_STOPPING);
-
-			/*
-			 * When SIGINT arrives, we send any outstanding logs up to the
+			 * When SIGUSR2 arrives, we send any outstanding logs up to the
 			 * shutdown checkpoint record (i.e., the latest record), wait for
 			 * them to be replicated to the standby, and exit. This may be a
 			 * normal termination at shutdown, or a promotion, the walsender
 			 * is not sure which.
 			 */
-			if (got_SIGINT)
+			if (walsender_ready_to_stop)
 				WalSndDone(send_data);
 		}
 
@@ -2907,23 +2868,7 @@ WalSndXLogSendHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR2: set flag to switch to stopping state */
-static void
-WalSndSwitchStopping(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	got_SIGUSR2 = true;
-	SetLatch(MyLatch);
-
-	errno = save_errno;
-}
-
-/*
- * SIGINT: set flag to do a last cycle and shut down afterwards. The WAL
- * sender should already have been switched to WALSNDSTATE_STOPPING at
- * this point.
- */
+/* SIGUSR2: set flag to do a last cycle and shut down afterwards */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
 {
@@ -2938,7 +2883,7 @@ WalSndLastCycleHandler(SIGNAL_ARGS)
 	if (!replication_active)
 		kill(MyProcPid, SIGTERM);
 
-	got_SIGINT = true;
+	walsender_ready_to_stop = true;
 	SetLatch(MyLatch);
 
 	errno = save_errno;
@@ -2951,14 +2896,14 @@ WalSndSignals(void)
 	/* Set up signal handlers */
 	pqsignal(SIGHUP, WalSndSigHupHandler);		/* set flag to read config
 												 * file */
-	pqsignal(SIGINT, WalSndLastCycleHandler);	/* request a last cycle and
-												 * shutdown */
+	pqsignal(SIGINT, SIG_IGN);	/* not used */
 	pqsignal(SIGTERM, die);		/* request shutdown */
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
 	pqsignal(SIGUSR1, WalSndXLogSendHandler);	/* request WAL sending */
-	pqsignal(SIGUSR2, WalSndSwitchStopping);	/* switch to stopping state */
+	pqsignal(SIGUSR2, WalSndLastCycleHandler);	/* request a last cycle and
+												 * shutdown */
 
 	/* Reset some signals that are accepted by postmaster but not here */
 	pqsignal(SIGCHLD, SIG_DFL);
@@ -3036,50 +2981,6 @@ WalSndWakeup(void)
 	}
 }
 
-/*
- * Wait that all the WAL senders have reached the stopping state. This is
- * used by the checkpointer to control when shutdown checkpoints can
- * safely begin.
- */
-void
-WalSndWaitStopping(void)
-{
-	for (;;)
-	{
-		int			i;
-		bool		all_stopped = true;
-
-		for (i = 0; i < max_wal_senders; i++)
-		{
-			WalSndState state;
-			WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
-
-			SpinLockAcquire(&walsnd->mutex);
-
-			if (walsnd->pid == 0)
-			{
-				SpinLockRelease(&walsnd->mutex);
-				continue;
-			}
-
-			state = walsnd->state;
-			SpinLockRelease(&walsnd->mutex);
-
-			if (state != WALSNDSTATE_STOPPING)
-			{
-				all_stopped = false;
-				break;
-			}
-		}
-
-		/* safe to leave if confirmation is done for all WAL senders */
-		if (all_stopped)
-			return;
-
-		pg_usleep(10000L);		/* wait for 10 msec */
-	}
-}
-
 /* Set state for current walsender (only called in walsender) */
 void
 WalSndSetState(WalSndState state)
@@ -3113,8 +3014,6 @@ WalSndGetStateString(WalSndState state)
 			return "catchup";
 		case WALSNDSTATE_STREAMING:
 			return "streaming";
-		case WALSNDSTATE_STOPPING:
-			return "stopping";
 	}
 	return "UNKNOWN";
 }
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 99f12377e0..2ca903872e 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -44,7 +44,6 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
-extern void WalSndWaitStopping(void);
 extern void WalSndRqstFileReload(void);
 
 /*
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 36311e124c..2c59056cef 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -24,8 +24,7 @@ typedef enum WalSndState
 	WALSNDSTATE_STARTUP = 0,
 	WALSNDSTATE_BACKUP,
 	WALSNDSTATE_CATCHUP,
-	WALSNDSTATE_STREAMING,
-	WALSNDSTATE_STOPPING
+	WALSNDSTATE_STREAMING
 } WalSndState;
 
 /*
-- 
2.12.0.264.gd6db3f2165.dirty

>From fc7a0a3227fbd2ad71e07ee22fa3e9f2de861d3e Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Fri, 2 Jun 2017 16:54:51 -0700
Subject: [PATCH 2/4] Have walsenders participate in procsignal infrastructure.

The non-participation in procsignal was a problem for both changes in
master, e.g. parallelism not working for normal statements run in
walsender backends, and older branches, e.g. recovery conflicts and
catchup interrupts not working for logical decoding walsenders.

This commit thus replaces the previous WalSndXLogSendHandler with
procsignal_sigusr1_handler.  In branches since db0f6cad48 that can
lead to additional SetLatch calls, but that only rarely seems to make
a difference.

Author: Andres Freund
Discussion: https://postgr.es/m/20170421014030.fdzvvvbrz4nck...@alap3.anarazel.de
Backpatch: 9.4, earlier commits don't seem to benefit sufficiently
---
 src/backend/replication/walsender.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index aa705e5b35..27aa3e6bc7 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -212,7 +212,6 @@ static struct
 
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
-static void WalSndXLogSendHandler(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
@@ -2857,17 +2856,6 @@ WalSndSigHupHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR1: set flag to send WAL records */
-static void
-WalSndXLogSendHandler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	latch_sigusr1_handler();
-
-	errno = save_errno;
-}
-
 /* SIGUSR2: set flag to do a last cycle and shut down afterwards */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
@@ -2901,7 +2889,7 @@ WalSndSignals(void)
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
-	pqsignal(SIGUSR1, WalSndXLogSendHandler);	/* request WAL sending */
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
 	pqsignal(SIGUSR2, WalSndLastCycleHandler);	/* request a last cycle and
 												 * shutdown */
 
-- 
2.12.0.264.gd6db3f2165.dirty

>From f8c2a9b96778dca5e6c94af68316f608651d79e2 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Fri, 2 Jun 2017 14:15:34 -0700
Subject: [PATCH 3/4] Prevent possibility of panics during shutdown checkpoint.

When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and
throws a PANIC if so.  At that point, only walsenders are still
active, so one might think this could not happen, but walsenders can
also generate WAL, for instance in BASE_BACKUP and logical decoding
related commands (e.g. via hint bits).  So they can trigger this panic
if such a command is run while the shutdown checkpoint is being
written.

To fix this, divide the walsender shutdown into two phases.  First,
checkpointer, itself triggered by postmaster, sends a
PROCSIG_WALSND_INIT_STOPPING signal to all walsenders.  If the backend
is idle or runs an SQL query this causes the backend to shutdown, if
logical replication is in progress all existing WAL records are
processed followed by a shutdown.  Otherwise this causes the walsender
to switch to the "stopping" state. In this state, the walsender will
reject any further replication commands. The checkpointer begins the
shutdown checkpoint once all walsenders are confirmed as
stopping. When the shutdown checkpoint finishes, the postmaster sends
us SIGUSR2. This instructs walsender to send any outstanding WAL,
including the shutdown checkpoint record, wait for it to be replicated
to the standby, and then exit.

Author: Andres Freund, based on an earlier patch by Michael Paquier
Reported-By: Fujii Masao, Andres Freund
Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxp...@alap3.anarazel.de
Backpatch: 9.4, where logical decoding was introduced
---
 doc/src/sgml/monitoring.sgml                |   5 +
 src/backend/access/transam/xlog.c           |  11 ++
 src/backend/replication/walsender.c         | 188 ++++++++++++++++++++++++----
 src/backend/storage/ipc/procsignal.c        |   4 +
 src/include/replication/walsender.h         |   4 +
 src/include/replication/walsender_private.h |   3 +-
 src/include/storage/procsignal.h            |   1 +
 7 files changed, 188 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5640c0d84a..79ca45a156 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1690,6 +1690,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
            <literal>backup</>: This WAL sender is sending a backup.
           </para>
          </listitem>
+         <listitem>
+          <para>
+           <literal>stopping</>: This WAL sender is stopping.
+          </para>
+         </listitem>
        </itemizedlist>
      </entry>
     </row>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 35ee7d1cb6..70d2570dc2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8324,6 +8324,17 @@ ShutdownXLOG(int code, Datum arg)
 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
 			(errmsg("shutting down")));
 
+	/*
+	 * Signal walsenders to move to stopping state.
+	 */
+	WalSndInitStopping();
+
+	/*
+	 * Wait for WAL senders to be in stopping state.  This prevents commands
+	 * from writing new WAL.
+	 */
+	WalSndWaitStopping();
+
 	if (RecoveryInProgress())
 		CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	else
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 27aa3e6bc7..ff5aff496d 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -24,11 +24,17 @@
  * are treated as not a crash but approximately normal termination;
  * the walsender will exit quickly without sending any more XLOG records.
  *
- * If the server is shut down, postmaster sends us SIGUSR2 after all
- * regular backends have exited and the shutdown checkpoint has been written.
- * This instructs walsender to send any outstanding WAL, including the
- * shutdown checkpoint record, wait for it to be replicated to the standby,
- * and then exit.
+ * If the server is shut down, checkpointer sends us
+ * PROCSIG_WALSND_INIT_STOPPING after all regular backends have exited.  If
+ * the backend is idle or runs an SQL query this causes the backend to
+ * shutdown, if logical replication is in progress all existing WAL records
+ * are processed followed by a shutdown.  Otherwise this causes the walsender
+ * to switch to the "stopping" state. In this state, the walsender will reject
+ * any further replication commands. The checkpointer begins the shutdown
+ * checkpoint once all walsenders are confirmed as stopping. When the shutdown
+ * checkpoint finishes, the postmaster sends us SIGUSR2. This instructs
+ * walsender to send any outstanding WAL, including the shutdown checkpoint
+ * record, wait for it to be replicated to the standby, and then exit.
  *
  *
  * Portions Copyright (c) 2010-2017, PostgreSQL Global Development Group
@@ -177,13 +183,14 @@ static bool WalSndCaughtUp = false;
 
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t walsender_ready_to_stop = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+static volatile sig_atomic_t got_STOPPING = false;
 
 /*
- * This is set while we are streaming. When not set, SIGUSR2 signal will be
- * handled like SIGTERM. When set, the main loop is responsible for checking
- * walsender_ready_to_stop and terminating when it's set (after streaming any
- * remaining WAL).
+ * This is set while we are streaming. When not set
+ * PROCSIG_WALSND_INIT_STOPPING signal will be handled like SIGTERM. When set,
+ * the main loop is responsible for checking got_STOPPING and terminating when
+ * it's set (after streaming any remaining WAL).
  */
 static volatile sig_atomic_t replication_active = false;
 
@@ -300,7 +307,8 @@ WalSndErrorCleanup(void)
 	ReplicationSlotCleanup();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+
+	if (got_STOPPING || got_SIGUSR2)
 		proc_exit(0);
 
 	/* Revert back to startup state */
@@ -677,7 +685,7 @@ StartReplication(StartReplicationCmd *cmd)
 		WalSndLoop(XLogSendPhysical);
 
 		replication_active = false;
-		if (walsender_ready_to_stop)
+		if (got_STOPPING)
 			proc_exit(0);
 		WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1055,7 +1063,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	{
 		ereport(LOG,
 				(errmsg("terminating walsender process after promotion")));
-		walsender_ready_to_stop = true;
+		got_STOPPING = true;
 	}
 
 	WalSndSetState(WALSNDSTATE_CATCHUP);
@@ -1106,7 +1114,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	ReplicationSlotRelease();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_STOPPING)
 		proc_exit(0);
 	WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1311,6 +1319,14 @@ WalSndWaitForWal(XLogRecPtr loc)
 		/* Check for input from the client */
 		ProcessRepliesIfAny();
 
+		/*
+		 * If we're shutting down, trigger pending WAL to be written out,
+		 * otherwise we'd possibly end up waiting for WAL that never gets
+		 * written, because walwriter has shut down already.
+		 */
+		if (got_STOPPING)
+			XLogBackgroundFlush();
+
 		/* Update our idea of the currently flushed position. */
 		if (!RecoveryInProgress())
 			RecentFlushPtr = GetFlushRecPtr();
@@ -1326,7 +1342,7 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * RecentFlushPtr, so we can send all remaining data before shutting
 		 * down.
 		 */
-		if (walsender_ready_to_stop)
+		if (got_STOPPING)
 			break;
 
 		/*
@@ -1401,6 +1417,22 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext old_context;
 
 	/*
+	 * If WAL sender has been told that shutdown is getting close, switch its
+	 * status accordingly to handle the next replication commands correctly.
+	 */
+	if (got_STOPPING)
+		WalSndSetState(WALSNDSTATE_STOPPING);
+
+	/*
+	 * Throw error if in stopping mode.  We need prevent commands that could
+	 * generate WAL while the shutdown checkpoint is being written.  To be
+	 * safe, we just prohibit all new commands.
+	 */
+	if (MyWalSnd->state == WALSNDSTATE_STOPPING)
+		ereport(ERROR,
+				(errmsg("cannot execute new commands while WAL sender is in stopping mode")));
+
+	/*
 	 * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
 	 * command arrives. Clean up the old stuff if there's anything.
 	 */
@@ -2128,7 +2160,7 @@ WalSndLoop(WalSndSendDataCallback send_data)
 			 * normal termination at shutdown, or a promotion, the walsender
 			 * is not sure which.
 			 */
-			if (walsender_ready_to_stop)
+			if (got_SIGUSR2)
 				WalSndDone(send_data);
 		}
 
@@ -2443,6 +2475,10 @@ XLogSendPhysical(void)
 	XLogRecPtr	endptr;
 	Size		nbytes;
 
+	/* If requested switch the WAL sender to the stopping state. */
+	if (got_STOPPING)
+		WalSndSetState(WALSNDSTATE_STOPPING);
+
 	if (streamingDoneSending)
 	{
 		WalSndCaughtUp = true;
@@ -2733,7 +2769,16 @@ XLogSendLogical(void)
 		 * point, then we're caught up.
 		 */
 		if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
+		{
 			WalSndCaughtUp = true;
+
+			/*
+			 * Have WalSndLoop() terminate the connection in an orderly
+			 * manner, after writing out all the pending data.
+			 */
+			if (got_STOPPING)
+				got_SIGUSR2 = true;
+		}
 	}
 
 	/* Update shared memory status */
@@ -2843,6 +2888,26 @@ WalSndRqstFileReload(void)
 	}
 }
 
+/*
+ * Handle PROCSIG_WALSND_INIT_STOPPING signal.
+ */
+void
+HandleWalSndInitStopping(void)
+{
+	Assert(am_walsender);
+
+	/*
+	 * If replication has not yet started, die like with SIGTERM. If
+	 * replication is active, only set a flag and wake up the main loop. It
+	 * will send any outstanding WAL, wait for it to be replicated to the
+	 * standby, and then exit gracefully.
+	 */
+	if (!replication_active)
+		kill(MyProcPid, SIGTERM);
+	else
+		got_STOPPING = true;
+}
+
 /* SIGHUP: set flag to re-read config file at next convenient time */
 static void
 WalSndSigHupHandler(SIGNAL_ARGS)
@@ -2856,22 +2921,17 @@ WalSndSigHupHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR2: set flag to do a last cycle and shut down afterwards */
+/*
+ * SIGUSR2: set flag to do a last cycle and shut down afterwards. The WAL
+ * sender should already have been switched to WALSNDSTATE_STOPPING at
+ * this point.
+ */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
 
-	/*
-	 * If replication has not yet started, die like with SIGTERM. If
-	 * replication is active, only set a flag and wake up the main loop. It
-	 * will send any outstanding WAL, wait for it to be replicated to the
-	 * standby, and then exit gracefully.
-	 */
-	if (!replication_active)
-		kill(MyProcPid, SIGTERM);
-
-	walsender_ready_to_stop = true;
+	got_SIGUSR2 = true;
 	SetLatch(MyLatch);
 
 	errno = save_errno;
@@ -2969,6 +3029,78 @@ WalSndWakeup(void)
 	}
 }
 
+/*
+ * Signal all walsenders to move to stopping state.
+ *
+ * This will trigger walsenders to send the remaining WAL, prevent them from
+ * accepting further commands. After that they'll wait till the last WAL is
+ * written.
+ */
+void
+WalSndInitStopping(void)
+{
+	int			i;
+
+	for (i = 0; i < max_wal_senders; i++)
+	{
+		WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
+		pid_t		pid;
+
+		SpinLockAcquire(&walsnd->mutex);
+		pid = walsnd->pid;
+		SpinLockRelease(&walsnd->mutex);
+
+		if (pid == 0)
+			continue;
+
+		SendProcSignal(pid, PROCSIG_WALSND_INIT_STOPPING, InvalidBackendId);
+	}
+}
+
+/*
+ * Wait that all the WAL senders have reached the stopping state. This is
+ * used by the checkpointer to control when shutdown checkpoints can
+ * safely begin.
+ */
+void
+WalSndWaitStopping(void)
+{
+	for (;;)
+	{
+		int			i;
+		bool		all_stopped = true;
+
+		for (i = 0; i < max_wal_senders; i++)
+		{
+			WalSndState state;
+			WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
+
+			SpinLockAcquire(&walsnd->mutex);
+
+			if (walsnd->pid == 0)
+			{
+				SpinLockRelease(&walsnd->mutex);
+				continue;
+			}
+
+			state = walsnd->state;
+			SpinLockRelease(&walsnd->mutex);
+
+			if (state != WALSNDSTATE_STOPPING)
+			{
+				all_stopped = false;
+				break;
+			}
+		}
+
+		/* safe to leave if confirmation is done for all WAL senders */
+		if (all_stopped)
+			return;
+
+		pg_usleep(10000L);		/* wait for 10 msec */
+	}
+}
+
 /* Set state for current walsender (only called in walsender) */
 void
 WalSndSetState(WalSndState state)
@@ -3002,6 +3134,8 @@ WalSndGetStateString(WalSndState state)
 			return "catchup";
 		case WALSNDSTATE_STREAMING:
 			return "streaming";
+		case WALSNDSTATE_STOPPING:
+			return "stopping";
 	}
 	return "UNKNOWN";
 }
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 4a21d5512d..b9302ac630 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -20,6 +20,7 @@
 #include "access/parallel.h"
 #include "commands/async.h"
 #include "miscadmin.h"
+#include "replication/walsender.h"
 #include "storage/latch.h"
 #include "storage/ipc.h"
 #include "storage/proc.h"
@@ -270,6 +271,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_PARALLEL_MESSAGE))
 		HandleParallelMessageInterrupt();
 
+	if (CheckProcSignal(PROCSIG_WALSND_INIT_STOPPING))
+		HandleWalSndInitStopping();
+
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_DATABASE))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
 
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 2ca903872e..edc497f91c 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -44,6 +44,10 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
+extern void WalSndInitStopping(void);
+extern void WalSndWaitStopping(void);
+extern void HandleWalSndInitStopping(void);
+extern void WalSndRqstStopping(void);
 extern void WalSndRqstFileReload(void);
 
 /*
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 2c59056cef..36311e124c 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -24,7 +24,8 @@ typedef enum WalSndState
 	WALSNDSTATE_STARTUP = 0,
 	WALSNDSTATE_BACKUP,
 	WALSNDSTATE_CATCHUP,
-	WALSNDSTATE_STREAMING
+	WALSNDSTATE_STREAMING,
+	WALSNDSTATE_STOPPING
 } WalSndState;
 
 /*
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index d068dde5d7..553f0f43f7 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -32,6 +32,7 @@ typedef enum
 	PROCSIG_CATCHUP_INTERRUPT,	/* sinval catchup interrupt */
 	PROCSIG_NOTIFY_INTERRUPT,	/* listen/notify interrupt */
 	PROCSIG_PARALLEL_MESSAGE,	/* message from cooperating parallel backend */
+	PROCSIG_WALSND_INIT_STOPPING,	/* ask walsenders to prepare for shutdown  */
 
 	/* Recovery conflict reasons */
 	PROCSIG_RECOVERY_CONFLICT_DATABASE,
-- 
2.12.0.264.gd6db3f2165.dirty

>From d5364e4503e43bfb919298132817ec608fa61519 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Fri, 2 Jun 2017 16:07:08 -0700
Subject: [PATCH 4/4] Wire up query cancel interrupt for walsender backends.

This allows to cancel commands run over replication connections. While
it might have some use before v10, it has become important now that
normal SQL commands are allowed in database connected walsender
connections.

Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/7966f454-7cd7-2b0c-8b70-cdca9d5a8...@2ndquadrant.com
---
 src/backend/replication/walsender.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index ff5aff496d..94a0b34389 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2944,7 +2944,7 @@ WalSndSignals(void)
 	/* Set up signal handlers */
 	pqsignal(SIGHUP, WalSndSigHupHandler);		/* set flag to read config
 												 * file */
-	pqsignal(SIGINT, SIG_IGN);	/* not used */
+	pqsignal(SIGINT, StatementCancelHandler);	/* query cancel */
 	pqsignal(SIGTERM, die);		/* request shutdown */
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
-- 
2.12.0.264.gd6db3f2165.dirty

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

Reply via email to