On Fri, Oct 3, 2014 at 1:09 PM, Robert Haas <robertmh...@gmail.com> wrote:
> Further debugging reveals that sigusr1_handler() gets called
> repeatedly, to start autovacuum workers, and it keeps waking up and
> starting them.  But that doesn't cause the background workers to get
> started either, because although sigusr1_handler() contains a call to
> maybe_start_bgworker, it only does that if start_bgworker = true,
> which only happens if
> CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE) is true,
> which on these calls it isn't.
> And I think this might also be the missing ingredient answering
> Andres's question from before: why doesn't the 60-second
> select()-timeout cause the background worker to eventually start even
> if the SELECT doesn't get interrupted?  There seems to be a SIGUSR1
> arriving about every 3 seconds, and I bet that's resetting the timer
> every time.

For now I propose to address this by committing the attached patch,
which gets rid of the separate start_bgworker flag inside
sigusr1_handler() and instead uses the same StartWorkerNeeded ||
HaveCrashedWorker test that we use inside ServerLoop() to decide
whether to call maybe_start_bgworker().  Either more signals will
arrive (in which case the signal handler will launch an additional
background worker every time a signal arrives) or they won't (in which
case the 60-second timeout will eventually expire, and ServerLoop()
will kick into high gear and satisfy all outstanding requests).  This
isn't really right, because there might still be a quite noticeable
delay waiting for workers to get launched, but at least the delay
would be bounded to at most 60 seconds rather than, as at present,
potentially infinite.

A totally correct fix will require a bit more thought.  A search of
the git history reveals that the problem of a signal restarting the
timeout is not new: Tom fixed a similar problem back in 2007 by making
the autovacuum launcher sleep for at most a second at a time.  Such a
fix isn't ideal here, because we really don't want an up-to-1-second
delay launching a newly-registered background worker if there's a way
to avoid that -- it's probably OK for launching daemons, but it's not
so hot for parallel query.  However, we could:

(1) Use the self-pipe trick. We could not directly use a latch, at
least not without a new API, because we might be waiting on more than
one socket.

(2) Have the postmaster not set SA_RESTART for the sigusr1 handler.  I
don't know how platform-independent this approach would be.

(3) Have sigusr1_handler repeatedly call maybe_start_bgworker() until
StartWorkerNeeded becomes false, instead of just calling it once.
ServerLoop() is carefully coded to call maybe_start_bgworker() just
once per iteration, presumably to make sure the server stays
responsive even if the bgworker-starting machinery is quite busy;
looping inside the signal handler would give up that nice property
unless we had some way to break out of the loop if there's activity on
the socket.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ce920ab..6220a8e 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4752,7 +4752,6 @@ static void
 sigusr1_handler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
-	bool		start_bgworker = false;
 
 	PG_SETMASK(&BlockSig);
 
@@ -4760,7 +4759,7 @@ sigusr1_handler(SIGNAL_ARGS)
 	if (CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE))
 	{
 		BackgroundWorkerStateChange();
-		start_bgworker = true;
+		StartWorkerNeeded = true;
 	}
 
 	/*
@@ -4801,10 +4800,10 @@ sigusr1_handler(SIGNAL_ARGS)
 
 		pmState = PM_HOT_STANDBY;
 		/* Some workers may be scheduled to start now */
-		start_bgworker = true;
+		StartWorkerNeeded = true;
 	}
 
-	if (start_bgworker)
+	if (StartWorkerNeeded || HaveCrashedWorker)
 		maybe_start_bgworker();
 
 	if (CheckPostmasterSignal(PMSIGNAL_WAKEN_ARCHIVER) &&
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to