Hi,
This thread came from another thread about collecting the WAL
stats([1]).
Is it better to make the stats collector shutdown without writing the
stats files
if the immediate shutdown is requested?
There was a related discussion([2]) although it's stopped on 12/1/2016.
IIUC, the thread's consensus are
(1) It's useful to make the stats collector shutdown quickly without
writing the stats files
if the immediate shutdown is requested in some cases because there
is a possibility
that it takes a long time until the failover happens. The possible
reasons are that
disk write speed is slow, the stats files are big, and so on. And
there is no negative impact
on the users because all stats files are removed in a crash recovery
phase now.
(2) As another aspect, it needs to change the behavior removing all
stats files because autovacuum
uses the stats. There are some ideas, for example writing the stats
files every X minutes
(using wal or another mechanism) and even if a crash occurs, the
startup process can restore
the stats with slightly low freshness. But, it needs to find a way
how to handle the stats files
when deleting on PITR rewind or stats collector crash happens.
I agree that the above consensus and I think we can make separate two
patches.
The first one is for (1) and the second one is for (2).
Why don't you apply the patch for (1) first?
I attached the patch based on tsunakawa-san's patch([2]).
(v1-0001-pgstat_avoid_writing_on_sigquit.patch)
At this time, there are no cons points for the users because
the stats files are removed in a crash recovery phase as pointed in the
discussion.
[1]
https://www.postgresql.org/message-id/c616cf24bf4ecd818f7cab0900a40842%40oss.nttdata.com
[2]
https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F5EF25A%40G01JPEXMBYT05
Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b1e2d94951..03d191dfe6 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -386,6 +386,8 @@ static void pgstat_recv_connstat(PgStat_MsgConn *msg, int len);
static void pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len);
static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
+static void SignalPgstatHandlerForCrashExit(SIGNAL_ARGS);
+
/* ------------------------------------------------------------
* Public functions called from postmaster follow
* ------------------------------------------------------------
@@ -725,6 +727,8 @@ pgstat_reset_remove_files(const char *directory)
snprintf(fname, sizeof(fname), "%s/%s", directory,
entry->d_name);
unlink(fname);
+
+ elog(DEBUG2, "removing stats file \"%s\"", fname);
}
FreeDir(dir);
}
@@ -4821,13 +4825,13 @@ PgstatCollectorMain(int argc, char *argv[])
/*
* Ignore all signals usually bound to some action in the postmaster,
- * except SIGHUP and SIGQUIT. Note we don't need a SIGUSR1 handler to
- * support latch operations, because we only use a local latch.
+ * except SIGHUP, SIGTERM and SIGQUIT. Note we don't need a SIGUSR1
+ * handler to support latch operations, because we only use a local latch.
*/
pqsignal(SIGHUP, SignalHandlerForConfigReload);
pqsignal(SIGINT, SIG_IGN);
- pqsignal(SIGTERM, SIG_IGN);
- pqsignal(SIGQUIT, SignalHandlerForShutdownRequest);
+ pqsignal(SIGTERM, SignalHandlerForShutdownRequest);
+ pqsignal(SIGQUIT, SignalPgstatHandlerForCrashExit);
pqsignal(SIGALRM, SIG_IGN);
pqsignal(SIGPIPE, SIG_IGN);
pqsignal(SIGUSR1, SIG_IGN);
@@ -4852,8 +4856,8 @@ PgstatCollectorMain(int argc, char *argv[])
AddWaitEventToSet(wes, WL_SOCKET_READABLE, pgStatSock, NULL, NULL);
/*
- * Loop to process messages until we get SIGQUIT or detect ungraceful
- * death of our parent postmaster.
+ * Loop to process messages until we get SIGTERM or SIGQUIT of our parent
+ * postmaster.
*
* For performance reasons, we don't want to do ResetLatch/WaitLatch after
* every message; instead, do that only after a recv() fails to obtain a
@@ -4871,7 +4875,7 @@ PgstatCollectorMain(int argc, char *argv[])
ResetLatch(MyLatch);
/*
- * Quit if we get SIGQUIT from the postmaster.
+ * Quit if we get SIGTERM from the postmaster.
*/
if (ShutdownRequestPending)
break;
@@ -7458,3 +7462,17 @@ pgstat_count_slru_truncate(int slru_idx)
{
slru_entry(slru_idx)->m_truncate += 1;
}
+
+/*
+ * Simple signal handler for handling SIGQUIT to exit quickly without
+ * doing any additional works, for example writing stats files.
+ */
+static void
+SignalPgstatHandlerForCrashExit(SIGNAL_ARGS)
+{
+ /*
+ * We arrange to do _exit(1) because the stats collector doesn't touch
+ * shared memory.
+ */
+ _exit(1);
+}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6436ae0f48..b9ca39ef95 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -3084,7 +3084,7 @@ reaper(SIGNAL_ARGS)
* nothing left for it to do.
*/
if (PgStatPID != 0)
- signal_child(PgStatPID, SIGQUIT);
+ signal_child(PgStatPID, SIGTERM);
}
else
{