On Fri, Oct 3, 2014 at 1:09 PM, Robert Haas <[email protected]> wrote:
> Further debugging reveals that sigusr1_handler() gets called
> repeatedly, to start autovacuum workers, and it keeps waking up and
> starting them. But that doesn't cause the background workers to get
> started either, because although sigusr1_handler() contains a call to
> maybe_start_bgworker, it only does that if start_bgworker = true,
> which only happens if
> CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE) is true,
> which on these calls it isn't.
> And I think this might also be the missing ingredient answering
> Andres's question from before: why doesn't the 60-second
> select()-timeout cause the background worker to eventually start even
> if the SELECT doesn't get interrupted? There seems to be a SIGUSR1
> arriving about every 3 seconds, and I bet that's resetting the timer
> every time.
For now I propose to address this by committing the attached patch,
which gets rid of the separate start_bgworker flag inside
sigusr1_handler() and instead uses the same StartWorkerNeeded ||
HaveCrashedWorker test that we use inside ServerLoop() to decide
whether to call maybe_start_bgworker(). Either more signals will
arrive (in which case the signal handler will launch an additional
background worker every time a signal arrives) or they won't (in which
case the 60-second timeout will eventually expire, and ServerLoop()
will kick into high gear and satisfy all outstanding requests). This
isn't really right, because there might still be a quite noticeable
delay waiting for workers to get launched, but at least the delay
would be bounded to at most 60 seconds rather than, as at present,
potentially infinite.
A totally correct fix will require a bit more thought. A search of
the git history reveals that the problem of a signal restarting the
timeout is not new: Tom fixed a similar problem back in 2007 by making
the autovacuum launcher sleep for at most a second at a time. Such a
fix isn't ideal here, because we really don't want an up-to-1-second
delay launching a newly-registered background worker if there's a way
to avoid that -- it's probably OK for launching daemons, but it's not
so hot for parallel query. However, we could:
(1) Use the self-pipe trick. We could not directly use a latch, at
least not without a new API, because we might be waiting on more than
one socket.
(2) Have the postmaster not set SA_RESTART for the sigusr1 handler. I
don't know how platform-independent this approach would be.
(3) Have sigusr1_handler repeatedly call maybe_start_bgworker() until
StartWorkerNeeded becomes false, instead of just calling it once.
ServerLoop() is carefully coded to call maybe_start_bgworker() just
once per iteration, presumably to make sure the server stays
responsive even if the bgworker-starting machinery is quite busy;
looping inside the signal handler would give up that nice property
unless we had some way to break out of the loop if there's activity on
the socket.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ce920ab..6220a8e 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4752,7 +4752,6 @@ static void
sigusr1_handler(SIGNAL_ARGS)
{
int save_errno = errno;
- bool start_bgworker = false;
PG_SETMASK(&BlockSig);
@@ -4760,7 +4759,7 @@ sigusr1_handler(SIGNAL_ARGS)
if (CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE))
{
BackgroundWorkerStateChange();
- start_bgworker = true;
+ StartWorkerNeeded = true;
}
/*
@@ -4801,10 +4800,10 @@ sigusr1_handler(SIGNAL_ARGS)
pmState = PM_HOT_STANDBY;
/* Some workers may be scheduled to start now */
- start_bgworker = true;
+ StartWorkerNeeded = true;
}
- if (start_bgworker)
+ if (StartWorkerNeeded || HaveCrashedWorker)
maybe_start_bgworker();
if (CheckPostmasterSignal(PMSIGNAL_WAKEN_ARCHIVER) &&
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers