Hi, On 2023-02-01 10:12:26 -0500, Tom Lane wrote: > Andres Freund <and...@anarazel.de> writes: > > On 2023-02-01 16:21:16 +1300, Thomas Munro wrote: > >> It's always in proc_exit() in StartupProcShutdownHandler(), a SIGTERM > >> handler which is allowed to call that while in_restore_command is > >> true. > > > Ugh, no wonder we're getting crashes. This whole business seems bogus as > > hell. > > Indeed :-( > > > I don't see a choice but to revert the recent changes. They need a > > fairly large rewrite. > > 9a740f81e clearly made things a lot worse, but it wasn't great > before. Can we see a way forward to removing the problem entirely?
Yea, I think we can - we should stop relying on system(). If we instead run the command properly as a subprocess, we don't need to do bad things in the signal handler anymore. > The fundamental issue is that we have no good way to break out > of system(), and I think the original idea was that > in_restore_command would be set *only* for the duration of the > system() call. That's clearly been lost sight of completely, > but maybe as a stopgap we could try to get back to that. We could push the functions setting in_restore_command down into ExecuteRecoveryCommand(). But I don't think that'd end up necessarily being right either - we'd now use the mechanism in places we previously didn't (cleanup/end commands). And there's just plenty other stuff in the 14bdb3f13de 9a740f81eb0 that doesn't look right: - We now have two places open-coding what BuildRestoreCommand did - I'm doubtful that the new shell_* functions are the base for a good API to abstract restoring files - the error message for a failed restore command seems to have gotten worse: could not restore file \"%s\" from archive: %s" -> "%s \"%s\": %s", commandName, command - shell_* imo is not a good namespace for something called from xlog.c, xlogarchive.c. I realize the intention is that shell_archive.c is going to be its own "restore module", but for now it imo looks odd - The comment moved out of RestoreArchivedFile() doesn't seems less useful at its new location - explanation of why we use GetOldestRestartPoint() is halfway lost My name is listed as the first Reviewed-by, but I certainly haven't done any meaningful review of these patches. I just replied to top-level email proposing "recovery modules". Greetings, Andres Freund