Hi,

On 2021-03-24 18:36:14 +0900, Fujii Masao wrote:
> Barring any objection, I'm thinking to commit this patch.

To which branches?  I am *strongly* opposed to backpatching this.


>  /*
> - * Simple signal handler for exiting quickly as if due to a crash.
> + * Simple signal handler for processes which have not yet touched or do not
> + * touch shared memory to exit quickly.
>   *
> - * Normally, this would be used for handling SIGQUIT.
> + * Note that if processes already touched shared memory, use
> + * SignalHandlerForCrashExit() instead and force the postmaster into
> + * a system reset cycle because shared memory may be corrupted.
> + */
> +void
> +SignalHandlerForNonCrashExit(SIGNAL_ARGS)
> +{
> +     /*
> +      * Since we don't touch shared memory, we can just pull the plug and 
> exit
> +      * without running any atexit handlers.
> +      */
> +     _exit(1);
> +}

This strikes me as a quite a misleading function name. Outside of very
narrow circumstances a normal exit shouldn't use _exit(1). Neither 1 no
_exit() (as opposed to exit()) seem appropriate. This seems like a bad
idea.

Also, won't this lead to postmaster now starting to complain about
pgstat having crashed in an immediate shutdown? Right now only exit
status 0 is expected:

                if (pid == PgStatPID)
                {
                        PgStatPID = 0;
                        if (!EXIT_STATUS_0(exitstatus))
                                LogChildExit(LOG, _("statistics collector 
process"),
                                                         pid, exitstatus);
                        if (pmState == PM_RUN || pmState == PM_HOT_STANDBY)
                                PgStatPID = pgstat_start();
                        continue;
                }



> + * The statistics collector is started by the postmaster as soon as the
> + * startup subprocess finishes, or as soon as the postmaster is ready
> + * to accept read only connections during archive recovery.  It remains
> + * alive until the postmaster commands it to terminate.  Normal
> + * termination is by SIGUSR2 after the checkpointer must exit(0),
> + * which instructs the statistics collector to save the final statistics
> + * to reuse at next startup and then exit(0).

I can't parse "...after the checkpointer must exit(0), which instructs..."


> + * Emergency termination is by SIGQUIT; like any backend, the statistics
> + * collector will exit quickly without saving the final statistics.  It's
> + * ok because the startup process will remove all statistics at next
> + * startup after emergency termination.

Normally there won't be any stats to remove, no? The permanent stats
file has been removed when the stats collector started up.


Greetings,

Andres Freund


Reply via email to