Dear PostgreSQL Community, I found 8 Bugs in `shell_archive.c`
While studying how `archive_command` works, I discovered **8 real issues** that affect reliability, performance, and safety: | # | Bug | Impact | |---|-----|--------| | 1 | Infinite loop on command failure | Archiver hangs forever | | 2 | Unreachable code after `return` | Dead logic | | 3 | Discarded `stdout` from archive command | `DROP DATABASE` hangs for full command duration | | 4 | Aggressive polling with `Sleep(100)` | CPU waste (already fixed in core) | | 5 | `malloc`/`free` in backend | **Memory corruption risk** | | 6 | Poor variable names (`dwRc`, `bytesRead`) | Hard to read | | 7 | Manual `popen` / `CreateProcess` | Missed PG infrastructure | | 8 | Redundant `return;` in `void` function | Style issue | I refactored `src/backend/archive/shell_archive.c` with **PostgreSQL-style fixes**: - Replaced `popen()` and `CreateProcess()` → `OpenPipeStream()` - Used `fileno(archiveFile)` → `archiveFd` correctly - Switched to `palloc()` / `pfree()` for memory safety - Renamed variables: `dwRc` → `exit_code`, `bytesRead` → `nread`, etc. - Read `stdout` to prevent `DROP DATABASE` hangs - Used `WaitLatchOrSocket()` for interruptible waiting - Removed redundant `return;` and dead code This is my contribution to improve the PostgreSQL archiver process. I have attached a patch implementing the discussed fixes — please review and share any suggestions or feedback. Thanks ! Lakshmi On Wed, Oct 15, 2025 at 12:20 AM Robert Haas <[email protected]> wrote: > Here are a few comments from me: > > I think that by calling system(), anything that is printed out by the > child process ends up in the PostgreSQL log. With the patch, the output of > the command seems like it will be read into a buffer and discarded. That's > a significant behavior difference. > > This patch has a 10ms polling loop after which it checks for interrupts, > and then repeats. I think our normal convention in these kinds of cases is > to use WaitLatchOrSocket(). That allows for a longer sleep time (like 1s) > which results in fewer processor wakeups, while actually still being more > responsive because the arrival of an interrupt should set the process latch > and terminate the wait. > > In general, I agree with Artem's comments about writing code that fits the > PostgreSQL style. We don't want to invent new ways of solving problems for > which we already have infrastructure, nor do we want to solve problems in > this case in a way that is different from what we do in other cases. Using > ereport appropriately is part of that, but there's a lot more to it. Artem > mentioned malloc and free, but notice, for example, that we also have our > own wrapper around popen() in OpenPipeStream(). Maybe we shouldn't use that > here, but we shouldn't *accidentally* not use that here. > > I wonder whether we should really be using popen() in any form, actually. > If what we want is for the output of the command to go to our standard > output, as it does with system(), that's exactly what popen() is designed > not to do, so it doesn't seem like a natural fit. Perhaps we should be > using fork() + exec()? Or maybe we already have some good infrastructure > for this we can reuse somewhere? > > -- > Robert Haas > EDB: http://www.enterprisedb.com >
From 8d2ed12aa4056ea20b14cf8678a6e24588bfc843 Mon Sep 17 00:00:00 2001 From: Lakshmi <[email protected]> Date: Mon, 10 Nov 2025 15:29:31 +0530 Subject: [PATCH] refactor: shell_archive.c - use OpenPipeStream, improve naming, remove dead code --- src/backend/archive/shell_archive.c | 151 +++++++++++++++++++++++++++- 1 file changed, 146 insertions(+), 5 deletions(-) diff --git a/src/backend/archive/shell_archive.c b/src/backend/archive/shell_archive.c index 828723afe47..f64d5f9591b 100644 --- a/src/backend/archive/shell_archive.c +++ b/src/backend/archive/shell_archive.c @@ -16,13 +16,21 @@ #include "postgres.h" #include <sys/wait.h> - +#include "latch.h" /* For WaitLatchOrSocket */ +#include "miscadmin.h" /* For MyLatch */ +#ifdef WIN32 +#include <windows.h> /* For WaitForSingleObject, DWORD, etc. */ +#endif #include "access/xlog.h" #include "archive/archive_module.h" #include "archive/shell_archive.h" #include "common/percentrepl.h" #include "pgstat.h" - +#include "utils/elog.h" /* For elog logging */ +#include "postgres.h" /* already there */ +#include "utils/palloc.h" /* add this line */ +#include "libpq/pqformat.h" /* for OpenPipeStream */ +#include "storage/latch.h" /* for WaitLatchOrSocket */ static bool shell_archive_configured(ArchiveModuleState *state); static bool shell_archive_file(ArchiveModuleState *state, const char *file, @@ -53,12 +61,34 @@ shell_archive_configured(ArchiveModuleState *state) return false; } +#define WAIT_INTERVAL_MS 1000 /* 1s for efficient latch waiting */ + static bool shell_archive_file(ArchiveModuleState *state, const char *file, const char *path) { char *xlogarchcmd; char *nativePath = NULL; +#ifndef WIN32 + FILE *archiveFd = NULL; + int archiveFileno; + char buf[1024]; + ssize_t nread; + +#else + size_t cmdPrefixLen; + size_t cmdlen; + char *win32cmd = palloc(strlen(xlogarchcmd) + 30); /* cmd.exe /c "..." + null */ + if (win32cmd == NULL) +{ + ereport(FATAL, + (errmsg_internal("Failed to palloc win32cmd: %m"))); + return false; +} + STARTUPINFO si; + PROCESS_INFORMATION pi; + int exit_code = 0; +#endif int rc; if (path) @@ -77,14 +107,125 @@ shell_archive_file(ArchiveModuleState *state, const char *file, fflush(NULL); pgstat_report_wait_start(WAIT_EVENT_ARCHIVE_COMMAND); - rc = system(xlogarchcmd); + + /* + * Start the command and read until it completes, while keep checking for + * interrupts to process pending events. + */ +#ifndef WIN32 + archiveFile = OpenPipeStream(xlogarchcmd, PG_BINARY_R); +if (archiveFile == NULL) +{ + ereport(FATAL, + (errcode_for_file_access(), + errmsg("could not open archive command pipe: %m"))); +} + while (true) + { + CHECK_FOR_INTERRUPTS(); + nread = read(archiveFd, &buf, sizeof(buf)); + if ((nread > 0) || (nread == -1 && errno == EAGAIN)) + if (nread > 0) +{ + buf[nread] = '\0'; /* Null-terminate for string * + elog(LOG, "Archive command stdout: %s", buf); +} + else + break; + } + rc = pclose(archiveFd); + } + else + rc = -1; +#else + /* + * * Create a palloc'd copy of the command string, we need to prefix it with + * cmd /c as the commandLine argument to CreateProcess still expects .exe + * files. + */ + cmdlen = strlen(xlogarchcmd); +#define CMD_PREFIX "cmd /c \"" + cmdPrefixLen = strlen(CMD_PREFIX); + if (win32cmd == NULL) + { + ereport(FATAL, + (errmsg_internal("Failed to palloc win32cmd: %m"))); + + } + memcpy(win32cmd, CMD_PREFIX, cmdPrefixLen); + memcpy(&win32cmd[cmdPrefixLen], xlogarchcmd, cmdlen); + win32cmd[cmdPrefixLen + cmdlen] = '"'; + win32cmd[cmdPrefixLen + cmdlen + 1] = '\0'; + ereport(DEBUG4, + (errmsg_internal("WIN32: executing modified archive command \"%s\"", + win32cmd))); + + memset(&pi, 0, sizeof(pi)); + memset(&si, 0, sizeof(si)); + si.cb = sizeof(si); + + archiveFile = OpenPipeStream(xlogarchcmd, PG_BINARY_R); +if (archiveFile == NULL) +{ + ereport(FATAL, + (errcode_for_file_access(), + errmsg("could not open archive command pipe: %m"))); +} + + + DWORD result; +ResetLatch(MyLatch); + while (true) + { + CHECK_FOR_INTERRUPTS(); + int latch_rc = WaitLatchOrSocket(MyLatch, + WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, + PGINVALID_SOCKET, + WAIT_INTERVAL_MS, + WAIT_EVENT_ARCHIVER_WAIT_CHILD); /* Or WAIT_EVENT_ARCHIVER_MAIN if undefined */ +if (latch_rc & WL_LATCH_SET) +{ + ResetLatch(MyLatch); + CHECK_FOR_INTERRUPTS(); +} +DWORD result = WaitForSingleObject(pi.hProcess, 0); /* Quick non-block check */ + if (result == WAIT_OBJECT_0) + break; + else if (result == WAIT_TIMEOUT) + continue; /* Normal polling */ + else if (result == WAIT_FAILED) + { + DWORD err = GetLastError(); + CloseHandle(pi.hProcess); + CloseHandle(pi.hThread); + ereport(ERROR, + (errmsg("WaitForSingleObject failed during archive_command: %m (Windows error %lu)", + err))); + pfree(win32cmd); + return false; + } + else + { + ereport(ERROR, + (errmsg("Unexpected WaitForSingleObject result during archive_command: %lu", + result))); + pfree(win32cmd); + return false; + } +} + + GetExitCodeProcess(pi.hProcess, &exit_code); + CloseHandle(pi.hProcess); + CloseHandle(pi.hThread); + rc = exit_code; +#endif pgstat_report_wait_end(); if (rc != 0) { /* * If either the shell itself, or a called command, died on a signal, - * abort the archiver. We do this because system() ignores SIGINT and + * abort the archiver. We do this because pclose() ignores SIGINT and * SIGQUIT while waiting; so a signal is very likely something that * should have interrupted us too. Also die if the shell got a hard * "command not found" type of error. If we overreact it's no big @@ -126,7 +267,7 @@ shell_archive_file(ArchiveModuleState *state, const char *file, xlogarchcmd))); } pfree(xlogarchcmd); - + pfree(win32cmd); return false; } pfree(xlogarchcmd); -- 2.39.5
