On 03/06/17 18:53, Peter Eisentraut wrote:
> On 6/2/17 14:52, Peter Eisentraut wrote:
>> On 5/24/17 15:14, Petr Jelinek wrote:
>>> All the locking works just fine the way it is in master. The issue with
>>> deadlock with apply comes from the wrong handling of the SIGTERM in the
>>> apply (we didn't set InterruptPending). I changed the SIGTERM handler in
>>> patch 0001 just to die which is actually the correct behavior for apply
>>> workers. I also moved the connection cleanup code to the
>>> before_shmem_exit callback (similar to walreceiver) and now that part
>>> works correctly.
>>
>> I have committed this, in two separate parts.  This should fix the
>> originally reported issue.
>>
>> I will continue to work through your other patches.  I notice there is
>> still a bit of discussion about another patch, so please let me know if
>> there is anything else I should be looking for.
> 
> I have committed the remaining two patches.  I believe this fixes the
> originally reported issue.
> 

So the fact that we moved workers to standard interrupt handling broke
launcher in subtle ways because it still uses it's own SIGTERM handling
but some function it calls are using CHECK_FOR_INTERRUPTS (they are used
by worker as well). I think we need to move launcher to standard
interrupt handling as well. It's not same as other processes though as
it's allowed to be terminated any time (just like autovacuum launcher)
so we just proc_exit(0) instead of FATALing out.

This is related to the nightjar failures btw.

As a side note, we are starting to have several IsSomeTypeOfProcess
functions for these kind of things. I wonder if bgworker infrastructure
should somehow provide this type of stuff (the proposed bgw_type might
help there) as well as maybe being able to register interrupt handler
for the worker (it's really hard to do it via custom SIGTERM handler as
visible on worker, launcher and walsender issues we are fixing).
Obviously this is PG11 related thought.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 3fb7ed57d669beba3feba894a11c9dcff8e60414 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Tue, 6 Jun 2017 19:26:12 +0200
Subject: [PATCH] Use standard interrupt handling in logical replication
 launcher

---
 src/backend/replication/logical/launcher.c | 41 ++++++++++++------------------
 src/backend/tcop/postgres.c                |  9 +++++++
 src/include/replication/logicallauncher.h  |  2 ++
 3 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 5aaf24b..52169f1 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -81,7 +81,6 @@ static void logicalrep_worker_cleanup(LogicalRepWorker *worker);
 
 /* Flags set by signal handlers */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t got_SIGTERM = false;
 
 static bool on_commit_launcher_wakeup = false;
 
@@ -623,20 +622,6 @@ logicalrep_worker_onexit(int code, Datum arg)
 	ApplyLauncherWakeup();
 }
 
-/* SIGTERM: set flag to exit at next convenient time */
-static void
-logicalrep_launcher_sigterm(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	got_SIGTERM = true;
-
-	/* Waken anything waiting on the process latch */
-	SetLatch(MyLatch);
-
-	errno = save_errno;
-}
-
 /* SIGHUP: set flag to reload configuration at next convenient time */
 static void
 logicalrep_launcher_sighup(SIGNAL_ARGS)
@@ -798,13 +783,14 @@ ApplyLauncherMain(Datum main_arg)
 
 	before_shmem_exit(logicalrep_launcher_onexit, (Datum) 0);
 
+	Assert(LogicalRepCtx->launcher_pid == 0);
+	LogicalRepCtx->launcher_pid = MyProcPid;
+
 	/* Establish signal handlers. */
 	pqsignal(SIGHUP, logicalrep_launcher_sighup);
-	pqsignal(SIGTERM, logicalrep_launcher_sigterm);
+	pqsignal(SIGTERM, die);
 	BackgroundWorkerUnblockSignals();
 
-	LogicalRepCtx->launcher_pid = MyProcPid;
-
 	/*
 	 * Establish connection to nailed catalogs (we only ever access
 	 * pg_subscription).
@@ -812,7 +798,7 @@ ApplyLauncherMain(Datum main_arg)
 	BackgroundWorkerInitializeConnection(NULL, NULL);
 
 	/* Enter main loop */
-	while (!got_SIGTERM)
+	for (;;)
 	{
 		int			rc;
 		List	   *sublist;
@@ -822,6 +808,8 @@ ApplyLauncherMain(Datum main_arg)
 		TimestampTz now;
 		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
 
+		CHECK_FOR_INTERRUPTS();
+
 		now = GetCurrentTimestamp();
 
 		/* Limit the start retry to once a wal_retrieve_retry_interval */
@@ -894,13 +882,16 @@ ApplyLauncherMain(Datum main_arg)
 		ResetLatch(&MyProc->procLatch);
 	}
 
-	LogicalRepCtx->launcher_pid = 0;
-
-	/* ... and if it returns, we're done */
-	ereport(DEBUG1,
-			(errmsg("logical replication launcher shutting down")));
+	/* Not reachable */
+}
 
-	proc_exit(0);
+/*
+ * Is current process the logical replication launcher?
+ */
+bool
+IsLogicalLauncher(void)
+{
+	return LogicalRepCtx->launcher_pid == MyProcPid;
 }
 
 /*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 1c60b43..91ca8df 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -55,6 +55,7 @@
 #include "pg_getopt.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/postmaster.h"
+#include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
 #include "replication/slot.h"
 #include "replication/walsender.h"
@@ -2848,6 +2849,14 @@ ProcessInterrupts(void)
 			ereport(FATAL,
 					(errcode(ERRCODE_ADMIN_SHUTDOWN),
 					 errmsg("terminating logical replication worker due to administrator command")));
+		else if (IsLogicalLauncher())
+		{
+			ereport(DEBUG1,
+					(errmsg("logical replication launcher shutting down")));
+
+			/* The logical replication launcher can be stopped at any time. */
+			proc_exit(0);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/include/replication/logicallauncher.h b/src/include/replication/logicallauncher.h
index d202a23..4f3e89e 100644
--- a/src/include/replication/logicallauncher.h
+++ b/src/include/replication/logicallauncher.h
@@ -24,4 +24,6 @@ extern void ApplyLauncherShmemInit(void);
 extern void ApplyLauncherWakeupAtCommit(void);
 extern void AtEOXact_ApplyLauncher(bool isCommit);
 
+extern bool IsLogicalLauncher(void);
+
 #endif   /* LOGICALLAUNCHER_H */
-- 
2.7.4

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to