On 2021-03-18 13:37, Fujii Masao wrote:
On 2021/03/18 11:59, kuroda.hay...@fujitsu.com wrote:
Dear Ikeda-san,

I confirmed new patch and no problem was found. Thanks.
(I'm not a native English speaker, so I cannot check your comments correctly, sorry)

One user-visible side-effect by this change is; with the patch, the stats is cleared when only the stats collector is killed (with SIGQUIT) accidentally
and restarted by postmaster later.

Thanks for your comments.

As you said, the temporary stats files don't removed if the stats collector is killed with SIGQUIT. So, if the user change the GUC parameter "stats_temp_directory" after immediate shutdown,
temporary stats file can't be removed forever.

But, I think this case is rarely happened and unimportant. Actually, pgstat_write_statsfiles() didn't check error of unlink() and the same problem is occurred if the server is crashed now.
The documentation said following. I think it's enough.

```
For better performance, <varname>stats_temp_directory</varname> can be pointed at a RAM-based file system, decreasing physical I/O requirements. When the server shuts down cleanly, a permanent copy of the statistics data is stored in the <filename>pg_stat</filename> subdirectory, so that
   statistics can be retained across server restarts.  When recovery is
performed at server start (e.g., after immediate shutdown, server crash,
   and point-in-time recovery), all statistics counters are reset.
```


On the other than, currently the stats is
written in that case and subsequently-restarted stats collector can use
that stats file. I'm not sure if we need to keep supporting this
behavior, though.

I don't have any strong opinion this behaivor is useless too.

Since the reinitialized phase is not executed when only the stats collector is crashed (since it didn't touch the shared memory), if the permanent stats file is exists, the
stats collector can use it. But, IIUC, the case is rare.

The case is happened by operation mistake which a operator sends the SIGQUIT signal to the stats collector. Please let me know if there are more important case.

But, if SIGKILL is sent by operator, the stats can't be rescure now because the permanent stats files can't be written before exiting. Since the case which can rescure the stats is rare,
I think it's ok to initialize the stats even if SIGQUIT is sent.

If to support this feature, we need to implement the following first.

(2) As another aspect, it needs to change the behavior removing all stats files because autovacuum uses the stats. There are some ideas, for example writing the stats files every X minutes (using wal or another mechanism) and even if a crash occurs, the startup process can restore the stats with slightly low freshness. But, it needs to find a way how to handle the stats files
   when deleting on PITR rewind or stats collector crash happens.



When only the stats collector exits by SIGQUIT, with the patch
FreeWaitEventSet() is also skipped. Is this ok?

Thanks, I fixed it.


-        * Loop to process messages until we get SIGQUIT or detect ungraceful
-        * death of our parent postmaster.
+ * Loop to process messages until we get SIGTERM or SIGQUIT of our parent
+        * postmaster.

"SIGTERM or SIGQUIT of our parent postmaster" should be
"SIGTERM, SIGQUIT, or detect ungraceful death of our parent postmaster"?

Yes, I fixed it.


+SignalHandlerForUnsafeExit(SIGNAL_ARGS)

I don't think SignalHandlerForUnsafeExit is good name. Because that's not "unsafe" exit. No? Even after this signal handler is triggered, the server is still running normally and a process that exits will be restarted later. What
about SignalHandlerForNonCrashExit or SignalHandlerForNonFatalExit?

OK, I fixed.
I changed to the SignalPgstatHandlerForNonCrashExit() to add FreeWaitEventSet()
in the handler for the stats collector.


Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index dd9136a942..2cc5a39ec5 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -93,9 +93,8 @@ SignalHandlerForCrashExit(SIGNAL_ARGS)
  * shut down and exit.
  *
  * Typically, this handler would be used for SIGTERM, but some processes use
- * other signals. In particular, the checkpointer exits on SIGUSR2, the
- * stats collector on SIGQUIT, and the WAL writer exits on either SIGINT
- * or SIGTERM.
+ * other signals. In particular, the checkpointer exits on SIGUSR2 and the
+ * WAL writer exits on either SIGINT or SIGTERM.
  *
  * ShutdownRequestPending should be checked at a convenient place within the
  * main loop, or else the main loop should call HandleMainLoopInterrupts.
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 208a33692f..7c09a8d5fa 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -190,6 +190,8 @@ static time_t last_pgstat_start_time;
 
 static bool pgStatRunningInCollector = false;
 
+static WaitEventSet *wes;
+
 /*
  * Structures in which backends store per-table info that's waiting to be
  * sent to the collector.
@@ -386,6 +388,8 @@ static void pgstat_recv_connstat(PgStat_MsgConn *msg, int len);
 static void pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len);
 static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
+static void SignalPgstatHandlerForNonCrashExit(SIGNAL_ARGS);
+
 /* ------------------------------------------------------------
  * Public functions called from postmaster follow
  * ------------------------------------------------------------
@@ -724,6 +728,7 @@ pgstat_reset_remove_files(const char *directory)
 
 		snprintf(fname, sizeof(fname), "%s/%s", directory,
 				 entry->d_name);
+		elog(DEBUG2, "removing stats file \"%s\"", fname);
 		unlink(fname);
 	}
 	FreeDir(dir);
@@ -4817,17 +4822,16 @@ PgstatCollectorMain(int argc, char *argv[])
 	PgStat_Msg	msg;
 	int			wr;
 	WaitEvent	event;
-	WaitEventSet *wes;
 
 	/*
 	 * Ignore all signals usually bound to some action in the postmaster,
-	 * except SIGHUP and SIGQUIT.  Note we don't need a SIGUSR1 handler to
-	 * support latch operations, because we only use a local latch.
+	 * except SIGHUP, SIGTERM and SIGQUIT.  Note we don't need a SIGUSR1
+	 * handler to support latch operations, because we only use a local latch.
 	 */
 	pqsignal(SIGHUP, SignalHandlerForConfigReload);
 	pqsignal(SIGINT, SIG_IGN);
-	pqsignal(SIGTERM, SIG_IGN);
-	pqsignal(SIGQUIT, SignalHandlerForShutdownRequest);
+	pqsignal(SIGTERM, SignalHandlerForShutdownRequest);
+	pqsignal(SIGQUIT, SignalPgstatHandlerForNonCrashExit);
 	pqsignal(SIGALRM, SIG_IGN);
 	pqsignal(SIGPIPE, SIG_IGN);
 	pqsignal(SIGUSR1, SIG_IGN);
@@ -4852,8 +4856,8 @@ PgstatCollectorMain(int argc, char *argv[])
 	AddWaitEventToSet(wes, WL_SOCKET_READABLE, pgStatSock, NULL, NULL);
 
 	/*
-	 * Loop to process messages until we get SIGQUIT or detect ungraceful
-	 * death of our parent postmaster.
+	 * Loop to process messages until we get SIGTERM, SIGQUIT or detect
+	 * ungraceful death of our parent postmaster.
 	 *
 	 * For performance reasons, we don't want to do ResetLatch/WaitLatch after
 	 * every message; instead, do that only after a recv() fails to obtain a
@@ -4871,7 +4875,7 @@ PgstatCollectorMain(int argc, char *argv[])
 		ResetLatch(MyLatch);
 
 		/*
-		 * Quit if we get SIGQUIT from the postmaster.
+		 * Quit if we get SIGTERM from the postmaster.
 		 */
 		if (ShutdownRequestPending)
 			break;
@@ -7462,3 +7466,19 @@ pgstat_count_slru_truncate(int slru_idx)
 {
 	slru_entry(slru_idx)->m_truncate += 1;
 }
+
+/*
+ * Simple signal handler for handling SIGQUIT to exit quickly without
+ * writing stats files.
+ */
+static void
+SignalPgstatHandlerForNonCrashExit(SIGNAL_ARGS)
+{
+	FreeWaitEventSet(wes);
+
+	/*
+	 * We arrange to do _exit(1) because the stats collector doesn't touch
+	 * shared memory.
+	 */
+	_exit(1);
+}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6436ae0f48..b9ca39ef95 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -3084,7 +3084,7 @@ reaper(SIGNAL_ARGS)
 				 * nothing left for it to do.
 				 */
 				if (PgStatPID != 0)
-					signal_child(PgStatPID, SIGQUIT);
+					signal_child(PgStatPID, SIGTERM);
 			}
 			else
 			{

Reply via email to