Tom Lane wrote:
Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> writes:
Tom Lane wrote:
Maybe the postmaster should wait for startup process exit before
deciding to open for business, instead of just a signal.
Not a bad idea. Although, there's nothing wrong with the current way
either. The startup process does a proc_exit(0) right after sending the
signal ATM, so there's no real work left at that point.
The thing wrong with it is assuming that nothing interesting will happen
during proc_exit(). We hang enough stuff on on_proc_exit hooks that
that seems like a pretty shaky assumption.
I can't get too worried, given that proc_exit() is a very well-beaten
code path. Admittedly not so much for an auxiliary process, but that's
just a dumbed down version of what happens with a full-blown backend.
However I started looking into that idea anyway, and figured that it
does simplify the logic in postmaster.c quite a bit, so I think it's
worth doing on those grounds alone. Attached is a patch against CVS HEAD
and also against a snapshot before the recovery infra patch, for easier
reading. I'll give that some more testing and commit if I find no issues.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 7695,7701 **** StartupProcessMain(void)
BuildFlatFiles(false);
/* Let postmaster know that startup is finished */
! SendPostmasterSignal(PMSIGNAL_RECOVERY_COMPLETED);
/* exit normally */
proc_exit(0);
--- 7695,7701 ----
BuildFlatFiles(false);
/* Let postmaster know that startup is finished */
! SetPostmasterSignal(PMSIGNAL_RECOVERY_COMPLETED);
/* exit normally */
proc_exit(0);
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***************
*** 227,240 **** static int Shutdown = NoShutdown;
static bool FatalError = false; /* T if recovering from backend crash */
static bool RecoveryError = false; /* T if recovery failed */
- /* State of WAL redo */
- #define NoRecovery 0
- #define RecoveryStarted 1
- #define RecoveryConsistent 2
- #define RecoveryCompleted 3
-
- static int RecoveryStatus = NoRecovery;
-
/*
* We use a simple state machine to control startup, shutdown, and
* crash recovery (which is rather like shutdown followed by startup).
--- 227,232 ----
***************
*** 253,261 **** static int RecoveryStatus = NoRecovery;
* point, if we had the infrastructure to do that.
*
* When the WAL redo is finished, the startup process signals us the third
! * time, and we switch to PM_RUN state. The startup process can also skip the
! * recovery and consistent recovery phases altogether, as it will during
! * normal startup when there's no recovery to be done, for example.
*
* Normal child backends can only be launched when we are in PM_RUN state.
* (We also allow it in PM_WAIT_BACKUP state, but only for superusers.)
--- 245,256 ----
* point, if we had the infrastructure to do that.
*
* When the WAL redo is finished, the startup process signals us the third
! * time, and exits. We don't process the 3d signal immediately but when we
! * see the that the startup process has exited, we check that we have
! * received the signal. If everything is OK, we then switch to PM_RUN state.
! * The startup process can also skip the recovery and consistent recovery
! * phases altogether, as it will during normal startup when there's no
! * recovery to be done, for example.
*
* Normal child backends can only be launched when we are in PM_RUN state.
* (We also allow it in PM_WAIT_BACKUP state, but only for superusers.)
***************
*** 338,344 **** static void pmdie(SIGNAL_ARGS);
static void reaper(SIGNAL_ARGS);
static void sigusr1_handler(SIGNAL_ARGS);
static void dummy_handler(SIGNAL_ARGS);
- static void CheckRecoverySignals(void);
static void CleanupBackend(int pid, int exitstatus);
static void HandleChildCrash(int pid, int exitstatus, const char *procname);
static void LogChildExit(int lev, const char *procname,
--- 333,338 ----
***************
*** 2019,2025 **** pmdie(SIGNAL_ARGS)
ereport(LOG,
(errmsg("received smart shutdown request")));
! if (pmState == PM_RUN || pmState == PM_RECOVERY || pmState == PM_RECOVERY_CONSISTENT)
{
/* autovacuum workers are told to shut down immediately */
SignalAutovacWorkers(SIGTERM);
--- 2013,2020 ----
ereport(LOG,
(errmsg("received smart shutdown request")));
! if (pmState == PM_RUN || pmState == PM_RECOVERY ||
! pmState == PM_RECOVERY_CONSISTENT)
{
/* autovacuum workers are told to shut down immediately */
SignalAutovacWorkers(SIGTERM);
***************
*** 2159,2181 **** reaper(SIGNAL_ARGS)
*/
if (pid == StartupPID)
{
StartupPID = 0;
/*
! * Check if we've received a signal from the startup process
! * first. This can change pmState. If the startup process sends
! * a signal and exits immediately after that, we might not have
! * processed the signal yet. We need to know if it completed
! * recovery before it exited.
*/
! CheckRecoverySignals();
/*
* Unexpected exit of startup process (including FATAL exit)
* during PM_STARTUP is treated as catastrophic. There is no
! * other processes running yet.
*/
! if (pmState == PM_STARTUP)
{
LogChildExit(LOG, _("startup process"),
pid, exitstatus);
--- 2154,2177 ----
*/
if (pid == StartupPID)
{
+ bool recoveryCompleted;
+
StartupPID = 0;
/*
! * Check if the startup process completed recovery before exiting
*/
! if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_COMPLETED))
! recoveryCompleted = true;
! else
! recoveryCompleted = false;
/*
* Unexpected exit of startup process (including FATAL exit)
* during PM_STARTUP is treated as catastrophic. There is no
! * other processes running yet, so we can just exit.
*/
! if (pmState == PM_STARTUP && !recoveryCompleted)
{
LogChildExit(LOG, _("startup process"),
pid, exitstatus);
***************
*** 2196,2212 **** reaper(SIGNAL_ARGS)
continue;
}
/*
* Startup process exited normally, but didn't finish recovery.
* This can happen if someone else than postmaster kills the
* startup process with SIGTERM. Treat it like a crash.
*/
! if (pmState == PM_RECOVERY || pmState == PM_RECOVERY_CONSISTENT)
{
RecoveryError = true;
HandleChildCrash(pid, exitstatus,
_("startup process"));
continue;
}
}
/*
--- 2192,2255 ----
continue;
}
/*
+ * Startup process exited in response to a shutdown request (or
+ * it finished normally regardless of the shutdown request).
+ */
+ if (Shutdown > NoShutdown)
+ {
+ pmState = PM_WAIT_BACKENDS;
+ /* PostmasterStateMachine logic does the rest */
+ continue;
+ }
+ /*
* Startup process exited normally, but didn't finish recovery.
* This can happen if someone else than postmaster kills the
* startup process with SIGTERM. Treat it like a crash.
*/
! if (!recoveryCompleted)
{
RecoveryError = true;
HandleChildCrash(pid, exitstatus,
_("startup process"));
continue;
}
+
+ /*
+ * Startup succeeded, commence normal operations
+ */
+ pmState = PM_RUN;
+
+ /*
+ * Load the flat authorization file into postmaster's cache. The
+ * startup process has recomputed this from the database contents,
+ * so we wait till it finishes before loading it.
+ */
+ load_role();
+
+ /*
+ * Crank up the background writer, if we didn't do that already
+ * when we entered consistent recovery phase. It doesn't matter
+ * if this fails, we'll just try again later.
+ */
+ if (BgWriterPID == 0)
+ BgWriterPID = StartBackgroundWriter();
+
+ /*
+ * Likewise, start other special children as needed. In a restart
+ * situation, some of them may be alive already.
+ */
+ if (WalWriterPID == 0)
+ WalWriterPID = StartWalWriter();
+ if (AutoVacuumingActive() && AutoVacPID == 0)
+ AutoVacPID = StartAutoVacLauncher();
+ if (XLogArchivingActive() && PgArchPID == 0)
+ PgArchPID = pgarch_start();
+ if (PgStatPID == 0)
+ PgStatPID = pgstat_start();
+
+ /* at this point we are really open for business */
+ ereport(LOG,
+ (errmsg("database system is ready to accept connections")));
}
/*
***************
*** 2622,2748 **** LogChildExit(int lev, const char *procname, int pid, int exitstatus)
static void
PostmasterStateMachine(void)
{
- /* Startup states */
-
- if (pmState == PM_STARTUP && RecoveryStatus > NoRecovery)
- {
- /* WAL redo has started. We're out of reinitialization. */
- FatalError = false;
-
- /*
- * Go to shutdown mode if a shutdown request was pending.
- */
- if (Shutdown > NoShutdown)
- {
- pmState = PM_WAIT_BACKENDS;
- /* PostmasterStateMachine logic does the rest */
- }
- else
- {
- /*
- * Crank up the background writer. It doesn't matter if this
- * fails, we'll just try again later.
- */
- Assert(BgWriterPID == 0);
- BgWriterPID = StartBackgroundWriter();
-
- pmState = PM_RECOVERY;
- }
- }
- if (pmState == PM_RECOVERY && RecoveryStatus >= RecoveryConsistent)
- {
- /*
- * Go to shutdown mode if a shutdown request was pending.
- */
- if (Shutdown > NoShutdown)
- {
- pmState = PM_WAIT_BACKENDS;
- /* PostmasterStateMachine logic does the rest */
- }
- else
- {
- /*
- * Startup process has entered recovery. We consider that good
- * enough to reset FatalError.
- */
- pmState = PM_RECOVERY_CONSISTENT;
-
- /*
- * Load the flat authorization file into postmaster's cache. The
- * startup process won't have recomputed this from the database yet,
- * so we it may change following recovery.
- */
- load_role();
-
- /*
- * Likewise, start other special children as needed.
- */
- Assert(PgStatPID == 0);
- PgStatPID = pgstat_start();
-
- /* XXX at this point we could accept read-only connections */
- ereport(DEBUG1,
- (errmsg("database system is in consistent recovery mode")));
- }
- }
- if ((pmState == PM_RECOVERY ||
- pmState == PM_RECOVERY_CONSISTENT ||
- pmState == PM_STARTUP) &&
- RecoveryStatus == RecoveryCompleted)
- {
- /*
- * Startup succeeded.
- *
- * Go to shutdown mode if a shutdown request was pending.
- */
- if (Shutdown > NoShutdown)
- {
- pmState = PM_WAIT_BACKENDS;
- /* PostmasterStateMachine logic does the rest */
- }
- else
- {
- /*
- * Otherwise, commence normal operations.
- */
- pmState = PM_RUN;
-
- /*
- * Load the flat authorization file into postmaster's cache. The
- * startup process has recomputed this from the database contents,
- * so we wait till it finishes before loading it.
- */
- load_role();
-
- /*
- * Crank up the background writer, if we didn't do that already
- * when we entered consistent recovery phase. It doesn't matter
- * if this fails, we'll just try again later.
- */
- if (BgWriterPID == 0)
- BgWriterPID = StartBackgroundWriter();
-
- /*
- * Likewise, start other special children as needed. In a restart
- * situation, some of them may be alive already.
- */
- if (WalWriterPID == 0)
- WalWriterPID = StartWalWriter();
- if (AutoVacuumingActive() && AutoVacPID == 0)
- AutoVacPID = StartAutoVacLauncher();
- if (XLogArchivingActive() && PgArchPID == 0)
- PgArchPID = pgarch_start();
- if (PgStatPID == 0)
- PgStatPID = pgstat_start();
-
- /* at this point we are really open for business */
- ereport(LOG,
- (errmsg("database system is ready to accept connections")));
- }
- }
-
- /* Shutdown states */
-
if (pmState == PM_WAIT_BACKUP)
{
/*
--- 2665,2670 ----
***************
*** 2904,2911 **** PostmasterStateMachine(void)
shmem_exit(1);
reset_shared(PostPortNumber);
- RecoveryStatus = NoRecovery;
-
StartupPID = StartupDataBase();
Assert(StartupPID != 0);
pmState = PM_STARTUP;
--- 2826,2831 ----
***************
*** 4010,4056 **** ExitPostmaster(int status)
}
/*
! * common code used in sigusr1_handler() and reaper() to handle
! * recovery-related signals from startup process
*/
static void
! CheckRecoverySignals(void)
{
! bool changed = false;
! if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED))
! {
! Assert(pmState == PM_STARTUP);
! RecoveryStatus = RecoveryStarted;
! changed = true;
! }
! if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_CONSISTENT))
{
! RecoveryStatus = RecoveryConsistent;
! changed = true;
}
! if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_COMPLETED))
{
! RecoveryStatus = RecoveryCompleted;
! changed = true;
! }
!
! if (changed)
! PostmasterStateMachine();
! }
! /*
! * sigusr1_handler - handle signal conditions from child processes
! */
! static void
! sigusr1_handler(SIGNAL_ARGS)
! {
! int save_errno = errno;
! PG_SETMASK(&BlockSig);
! CheckRecoverySignals();
if (CheckPostmasterSignal(PMSIGNAL_PASSWORD_CHANGE))
{
--- 3930,3987 ----
}
/*
! * sigusr1_handler - handle signal conditions from child processes
*/
static void
! sigusr1_handler(SIGNAL_ARGS)
{
! int save_errno = errno;
! PG_SETMASK(&BlockSig);
! /*
! * RECOVERY_STARTED and RECOVERY_CONSISTENT signals are ignored in
! * unexpected states. If the startup process quickly starts up, completes
! * recovery, exits, we might process the death of the startup process
! * first. We don't want to go back to recovery in that case.
! */
! if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) &&
! pmState == PM_STARTUP)
{
! /* WAL redo has started. We're out of reinitialization. */
! FatalError = false;
!
! /*
! * Crank up the background writer. It doesn't matter if this
! * fails, we'll just try again later.
! */
! Assert(BgWriterPID == 0);
! BgWriterPID = StartBackgroundWriter();
!
! pmState = PM_RECOVERY;
}
! if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_CONSISTENT) &&
! pmState == PM_RECOVERY)
{
! /*
! * Load the flat authorization file into postmaster's cache. The
! * startup process won't have recomputed this from the database yet,
! * so we it may change following recovery.
! */
! load_role();
! /*
! * Likewise, start other special children as needed.
! */
! Assert(PgStatPID == 0);
! PgStatPID = pgstat_start();
! /* XXX at this point we could accept read-only connections */
! ereport(DEBUG1,
! (errmsg("database system is in consistent recovery mode")));
! pmState = PM_RECOVERY_CONSISTENT;
! }
if (CheckPostmasterSignal(PMSIGNAL_PASSWORD_CHANGE))
{
*** a/src/backend/storage/ipc/ipc.c
--- b/src/backend/storage/ipc/ipc.c
***************
*** 76,81 **** static int on_proc_exit_index,
--- 76,83 ----
void
proc_exit(int code)
{
+ elog(LOG, "proc_exit: %d", on_proc_exit_index);
+
/*
* Once we set this flag, we are committed to exit. Any ereport() will
* NOT send control back to the main loop, but right back here.
***************
*** 95,102 **** proc_exit(int code)
InterruptHoldoffCount = 1;
CritSectionCount = 0;
- elog(DEBUG3, "proc_exit(%d)", code);
-
/* do our shared memory exits first */
shmem_exit(code);
--- 97,102 ----
*** a/src/backend/storage/ipc/pmsignal.c
--- b/src/backend/storage/ipc/pmsignal.c
***************
*** 72,77 **** SendPostmasterSignal(PMSignalReason reason)
--- 72,94 ----
}
/*
+ * SetPostmasterSignal - like SendPostmasterSignal, but don't wake up
+ * postmaster
+ *
+ * This is for signals that the postmaster polls with CheckPostmasterSignal()
+ * but isn't interested in processing immediately.
+ */
+ void
+ SetPostmasterSignal(PMSignalReason reason)
+ {
+ /* If called in a standalone backend, do nothing */
+ if (!IsUnderPostmaster)
+ return;
+ /* Atomically set the proper flag */
+ PMSignalFlags[reason] = true;
+ }
+
+ /*
* CheckPostmasterSignal - check to see if a particular reason has been
* signaled, and clear the signal flag. Should be called by postmaster
* after receiving SIGUSR1.
*** a/src/include/storage/pmsignal.h
--- b/src/include/storage/pmsignal.h
***************
*** 39,44 **** typedef enum
--- 39,45 ----
*/
extern void PMSignalInit(void);
extern void SendPostmasterSignal(PMSignalReason reason);
+ extern void SetPostmasterSignal(PMSignalReason reason);
extern bool CheckPostmasterSignal(PMSignalReason reason);
extern bool PostmasterIsAlive(bool amDirectChild);
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***************
*** 225,235 **** static pid_t StartupPID = 0,
--- 225,257 ----
static int Shutdown = NoShutdown;
static bool FatalError = false; /* T if recovering from backend crash */
+ static bool RecoveryError = false; /* T if recovery failed */
/*
* We use a simple state machine to control startup, shutdown, and
* crash recovery (which is rather like shutdown followed by startup).
*
+ * After doing all the postmaster initialization work, we enter PM_STARTUP
+ * state and the startup process is launched. The startup process begins by
+ * reading the control file and other preliminary initialization steps. When
+ * it's ready to start WAL redo, it signals postmaster, and we switch to
+ * PM_RECOVERY phase. The background writer is launched, while the startup
+ * process continues applying WAL.
+ *
+ * After reaching a consistent point in WAL redo, startup process signals
+ * us again, and we switch to PM_RECOVERY_CONSISTENT phase. There's currently
+ * no difference between PM_RECOVERY and PM_RECOVERY_CONSISTENT, but we
+ * could start accepting connections to perform read-only queries at this
+ * point, if we had the infrastructure to do that.
+ *
+ * When the WAL redo is finished, the startup process signals us the third
+ * time, and exits. We don't process the 3d signal immediately but when we
+ * see the that the startup process has exited, we check that we have
+ * received the signal. If everything is OK, we then switch to PM_RUN state.
+ * The startup process can also skip the recovery and consistent recovery
+ * phases altogether, as it will during normal startup when there's no
+ * recovery to be done, for example.
+ *
* Normal child backends can only be launched when we are in PM_RUN state.
* (We also allow it in PM_WAIT_BACKUP state, but only for superusers.)
* In other states we handle connection requests by launching "dead_end"
***************
*** 245,259 **** static bool FatalError = false; /* T if recovering from backend crash */
*
* Notice that this state variable does not distinguish *why* we entered
* states later than PM_RUN --- Shutdown and FatalError must be consulted
! * to find that out. FatalError is never true in PM_RUN state, nor in
! * PM_SHUTDOWN states (because we don't enter those states when trying to
! * recover from a crash). It can be true in PM_STARTUP state, because we
! * don't clear it until we've successfully recovered.
*/
typedef enum
{
PM_INIT, /* postmaster starting */
PM_STARTUP, /* waiting for startup subprocess */
PM_RUN, /* normal "database is alive" state */
PM_WAIT_BACKUP, /* waiting for online backup mode to end */
PM_WAIT_BACKENDS, /* waiting for live backends to exit */
--- 267,285 ----
*
* Notice that this state variable does not distinguish *why* we entered
* states later than PM_RUN --- Shutdown and FatalError must be consulted
! * to find that out. FatalError is never true in PM_RECOVERY_* or PM_RUN
! * states, nor in PM_SHUTDOWN states (because we don't enter those states
! * when trying to recover from a crash). It can be true in PM_STARTUP state,
! * because we don't clear it until we've successfully started WAL redo.
! * Similarly, RecoveryError means that we have crashed during recovery, and
! * should not try to restart.
*/
typedef enum
{
PM_INIT, /* postmaster starting */
PM_STARTUP, /* waiting for startup subprocess */
+ PM_RECOVERY, /* in recovery mode */
+ PM_RECOVERY_CONSISTENT, /* consistent recovery mode */
PM_RUN, /* normal "database is alive" state */
PM_WAIT_BACKUP, /* waiting for online backup mode to end */
PM_WAIT_BACKENDS, /* waiting for live backends to exit */
***************
*** 1302,1308 **** ServerLoop(void)
* state that prevents it, start one. It doesn't matter if this
* fails, we'll just try again later.
*/
! if (BgWriterPID == 0 && pmState == PM_RUN)
BgWriterPID = StartBackgroundWriter();
/*
--- 1328,1336 ----
* state that prevents it, start one. It doesn't matter if this
* fails, we'll just try again later.
*/
! if (BgWriterPID == 0 &&
! (pmState == PM_RUN || pmState == PM_RECOVERY ||
! pmState == PM_RECOVERY_CONSISTENT))
BgWriterPID = StartBackgroundWriter();
/*
***************
*** 1752,1758 **** canAcceptConnections(void)
return CAC_WAITBACKUP; /* allow superusers only */
if (Shutdown > NoShutdown)
return CAC_SHUTDOWN; /* shutdown is pending */
! if (pmState == PM_STARTUP && !FatalError)
return CAC_STARTUP; /* normal startup */
return CAC_RECOVERY; /* else must be crash recovery */
}
--- 1780,1789 ----
return CAC_WAITBACKUP; /* allow superusers only */
if (Shutdown > NoShutdown)
return CAC_SHUTDOWN; /* shutdown is pending */
! if (!FatalError &&
! (pmState == PM_STARTUP ||
! pmState == PM_RECOVERY ||
! pmState == PM_RECOVERY_CONSISTENT))
return CAC_STARTUP; /* normal startup */
return CAC_RECOVERY; /* else must be crash recovery */
}
***************
*** 1982,1988 **** pmdie(SIGNAL_ARGS)
ereport(LOG,
(errmsg("received smart shutdown request")));
! if (pmState == PM_RUN)
{
/* autovacuum workers are told to shut down immediately */
SignalAutovacWorkers(SIGTERM);
--- 2013,2020 ----
ereport(LOG,
(errmsg("received smart shutdown request")));
! if (pmState == PM_RUN || pmState == PM_RECOVERY ||
! pmState == PM_RECOVERY_CONSISTENT)
{
/* autovacuum workers are told to shut down immediately */
SignalAutovacWorkers(SIGTERM);
***************
*** 2019,2025 **** pmdie(SIGNAL_ARGS)
if (StartupPID != 0)
signal_child(StartupPID, SIGTERM);
! if (pmState == PM_RUN || pmState == PM_WAIT_BACKUP)
{
ereport(LOG,
(errmsg("aborting any active transactions")));
--- 2051,2064 ----
if (StartupPID != 0)
signal_child(StartupPID, SIGTERM);
! if (pmState == PM_RECOVERY)
! {
! /* only bgwriter is active in this state */
! pmState = PM_WAIT_BACKENDS;
! }
! if (pmState == PM_RUN ||
! pmState == PM_WAIT_BACKUP ||
! pmState == PM_RECOVERY_CONSISTENT)
{
ereport(LOG,
(errmsg("aborting any active transactions")));
***************
*** 2115,2125 **** reaper(SIGNAL_ARGS)
*/
if (pid == StartupPID)
{
StartupPID = 0;
- Assert(pmState == PM_STARTUP);
! /* FATAL exit of startup is treated as catastrophic */
! if (!EXIT_STATUS_0(exitstatus))
{
LogChildExit(LOG, _("startup process"),
pid, exitstatus);
--- 2154,2177 ----
*/
if (pid == StartupPID)
{
+ bool recoveryCompleted;
+
StartupPID = 0;
! /*
! * Check if the startup process completed recovery before exiting
! */
! if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_COMPLETED))
! recoveryCompleted = true;
! else
! recoveryCompleted = false;
!
! /*
! * Unexpected exit of startup process (including FATAL exit)
! * during PM_STARTUP is treated as catastrophic. There is no
! * other processes running yet, so we can just exit.
! */
! if (pmState == PM_STARTUP && !recoveryCompleted)
{
LogChildExit(LOG, _("startup process"),
pid, exitstatus);
***************
*** 2127,2141 **** reaper(SIGNAL_ARGS)
(errmsg("aborting startup due to startup process failure")));
ExitPostmaster(1);
}
-
/*
! * Startup succeeded - we are done with system startup or
! * recovery.
*/
! FatalError = false;
!
/*
! * Go to shutdown mode if a shutdown request was pending.
*/
if (Shutdown > NoShutdown)
{
--- 2179,2199 ----
(errmsg("aborting startup due to startup process failure")));
ExitPostmaster(1);
}
/*
! * Any unexpected exit (including FATAL exit) of the startup
! * process is treated as a crash, except that we don't want
! * to reinitialize.
*/
! if (!EXIT_STATUS_0(exitstatus))
! {
! RecoveryError = true;
! HandleChildCrash(pid, exitstatus,
! _("startup process"));
! continue;
! }
/*
! * Startup process exited in response to a shutdown request (or
! * it finished normally regardless of the shutdown request).
*/
if (Shutdown > NoShutdown)
{
***************
*** 2143,2151 **** reaper(SIGNAL_ARGS)
/* PostmasterStateMachine logic does the rest */
continue;
}
/*
! * Otherwise, commence normal operations.
*/
pmState = PM_RUN;
--- 2201,2221 ----
/* PostmasterStateMachine logic does the rest */
continue;
}
+ /*
+ * Startup process exited normally, but didn't finish recovery.
+ * This can happen if someone else than postmaster kills the
+ * startup process with SIGTERM. Treat it like a crash.
+ */
+ if (!recoveryCompleted)
+ {
+ RecoveryError = true;
+ HandleChildCrash(pid, exitstatus,
+ _("startup process"));
+ continue;
+ }
/*
! * Startup succeeded, commence normal operations
*/
pmState = PM_RUN;
***************
*** 2157,2167 **** reaper(SIGNAL_ARGS)
load_role();
/*
! * Crank up the background writer. It doesn't matter if this
! * fails, we'll just try again later.
*/
! Assert(BgWriterPID == 0);
! BgWriterPID = StartBackgroundWriter();
/*
* Likewise, start other special children as needed. In a restart
--- 2227,2238 ----
load_role();
/*
! * Crank up the background writer, if we didn't do that already
! * when we entered consistent recovery phase. It doesn't matter
! * if this fails, we'll just try again later.
*/
! if (BgWriterPID == 0)
! BgWriterPID = StartBackgroundWriter();
/*
* Likewise, start other special children as needed. In a restart
***************
*** 2178,2186 **** reaper(SIGNAL_ARGS)
/* at this point we are really open for business */
ereport(LOG,
! (errmsg("database system is ready to accept connections")));
!
! continue;
}
/*
--- 2249,2255 ----
/* at this point we are really open for business */
ereport(LOG,
! (errmsg("database system is ready to accept connections")));
}
/*
***************
*** 2443,2448 **** HandleChildCrash(int pid, int exitstatus, const char *procname)
--- 2512,2529 ----
}
}
+ /* Take care of the startup process too */
+ if (pid == StartupPID)
+ StartupPID = 0;
+ else if (StartupPID != 0 && !FatalError)
+ {
+ ereport(DEBUG2,
+ (errmsg_internal("sending %s to process %d",
+ (SendStop ? "SIGSTOP" : "SIGQUIT"),
+ (int) StartupPID)));
+ signal_child(BgWriterPID, (SendStop ? SIGSTOP : SIGQUIT));
+ }
+
/* Take care of the bgwriter too */
if (pid == BgWriterPID)
BgWriterPID = 0;
***************
*** 2514,2520 **** HandleChildCrash(int pid, int exitstatus, const char *procname)
FatalError = true;
/* We now transit into a state of waiting for children to die */
! if (pmState == PM_RUN ||
pmState == PM_WAIT_BACKUP ||
pmState == PM_SHUTDOWN)
pmState = PM_WAIT_BACKENDS;
--- 2595,2603 ----
FatalError = true;
/* We now transit into a state of waiting for children to die */
! if (pmState == PM_RECOVERY ||
! pmState == PM_RECOVERY_CONSISTENT ||
! pmState == PM_RUN ||
pmState == PM_WAIT_BACKUP ||
pmState == PM_SHUTDOWN)
pmState = PM_WAIT_BACKENDS;
***************
*** 2723,2728 **** PostmasterStateMachine(void)
--- 2806,2820 ----
}
/*
+ * If recovery failed, wait for all non-syslogger children to exit,
+ * and then exit postmaster. We don't try to reinitialize when recovery
+ * fails, because more than likely it will just fail again and we will
+ * keep trying forever.
+ */
+ if (RecoveryError && pmState == PM_NO_CHILDREN)
+ ExitPostmaster(1);
+
+ /*
* If we need to recover from a crash, wait for all non-syslogger
* children to exit, then reset shmem and StartupDataBase.
*/
***************
*** 3847,3852 **** sigusr1_handler(SIGNAL_ARGS)
--- 3939,3988 ----
PG_SETMASK(&BlockSig);
+ /*
+ * RECOVERY_STARTED and RECOVERY_CONSISTENT signals are ignored in
+ * unexpected states. If the startup process quickly starts up, completes
+ * recovery, exits, we might process the death of the startup process
+ * first. We don't want to go back to recovery in that case.
+ */
+ if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) &&
+ pmState == PM_STARTUP)
+ {
+ /* WAL redo has started. We're out of reinitialization. */
+ FatalError = false;
+
+ /*
+ * Crank up the background writer. It doesn't matter if this
+ * fails, we'll just try again later.
+ */
+ Assert(BgWriterPID == 0);
+ BgWriterPID = StartBackgroundWriter();
+
+ pmState = PM_RECOVERY;
+ }
+ if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_CONSISTENT) &&
+ pmState == PM_RECOVERY)
+ {
+ /*
+ * Load the flat authorization file into postmaster's cache. The
+ * startup process won't have recomputed this from the database yet,
+ * so we it may change following recovery.
+ */
+ load_role();
+
+ /*
+ * Likewise, start other special children as needed.
+ */
+ Assert(PgStatPID == 0);
+ PgStatPID = pgstat_start();
+
+ /* XXX at this point we could accept read-only connections */
+ ereport(DEBUG1,
+ (errmsg("database system is in consistent recovery mode")));
+
+ pmState = PM_RECOVERY_CONSISTENT;
+ }
+
if (CheckPostmasterSignal(PMSIGNAL_PASSWORD_CHANGE))
{
/*
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers