Dear PostgreSQL Community,

I found 8 Bugs in `shell_archive.c`


While studying how `archive_command` works, I discovered **8 real issues**
that affect reliability, performance, and safety:

| # | Bug | Impact | |---|-----|--------| | 1 | Infinite loop on command
failure | Archiver hangs forever | | 2 | Unreachable code after `return` |
Dead logic | | 3 | Discarded `stdout` from archive command | `DROP
DATABASE` hangs for full command duration | | 4 | Aggressive polling with
`Sleep(100)` | CPU waste (already fixed in core) | | 5 | `malloc`/`free` in
backend | **Memory corruption risk** | | 6 | Poor variable names (`dwRc`,
`bytesRead`) | Hard to read | | 7 | Manual `popen` / `CreateProcess` |
Missed PG infrastructure | | 8 | Redundant `return;` in `void` function |
Style issue |

I refactored `src/backend/archive/shell_archive.c` with **PostgreSQL-style
fixes**:

- Replaced `popen()` and `CreateProcess()` → `OpenPipeStream()` - Used
`fileno(archiveFile)` → `archiveFd` correctly - Switched to `palloc()` /
`pfree()` for memory safety - Renamed variables: `dwRc` → `exit_code`,
`bytesRead` → `nread`, etc. - Read `stdout` to prevent `DROP DATABASE`
hangs - Used `WaitLatchOrSocket()` for interruptible waiting - Removed
redundant `return;` and dead code This is my contribution to improve the
PostgreSQL archiver process. I have attached a patch implementing the
discussed fixes — please review and share any suggestions or feedback. Thanks
!
Lakshmi


On Wed, Oct 15, 2025 at 12:20 AM Robert Haas <[email protected]> wrote:

> Here are a few comments from me:
>
> I think that by calling system(), anything that is printed out by the
> child process ends up in the PostgreSQL log. With the patch, the output of
> the command seems like it will be read into a buffer and discarded. That's
> a significant behavior difference.
>
> This patch has a 10ms polling loop after which it checks for interrupts,
> and then repeats. I think our normal convention in these kinds of cases is
> to use WaitLatchOrSocket(). That allows for a longer sleep time (like 1s)
> which results in fewer processor wakeups, while actually still being more
> responsive because the arrival of an interrupt should set the process latch
> and terminate the wait.
>
> In general, I agree with Artem's comments about writing code that fits the
> PostgreSQL style. We don't want to invent new ways of solving problems for
> which we already have infrastructure, nor do we want to solve problems in
> this case in a way that is different from what we do in other cases. Using
> ereport appropriately is part of that, but there's a lot more to it. Artem
> mentioned malloc and free, but notice, for example, that we also have our
> own wrapper around popen() in OpenPipeStream(). Maybe we shouldn't use that
> here, but we shouldn't *accidentally* not use that here.
>
> I wonder whether we should really be using popen() in any form, actually.
> If what we want is for the output of the command to go to our standard
> output, as it does with system(), that's exactly what popen() is designed
> not to do, so it doesn't seem like a natural fit. Perhaps we should be
> using fork() + exec()? Or maybe we already have some good infrastructure
> for this we can reuse somewhere?
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>
From 8d2ed12aa4056ea20b14cf8678a6e24588bfc843 Mon Sep 17 00:00:00 2001
From: Lakshmi <[email protected]>
Date: Mon, 10 Nov 2025 15:29:31 +0530
Subject: [PATCH] refactor: shell_archive.c - use OpenPipeStream, improve
 naming, remove dead code

---
 src/backend/archive/shell_archive.c | 151 +++++++++++++++++++++++++++-
 1 file changed, 146 insertions(+), 5 deletions(-)

diff --git a/src/backend/archive/shell_archive.c b/src/backend/archive/shell_archive.c
index 828723afe47..f64d5f9591b 100644
--- a/src/backend/archive/shell_archive.c
+++ b/src/backend/archive/shell_archive.c
@@ -16,13 +16,21 @@
 #include "postgres.h"
 
 #include <sys/wait.h>
-
+#include "latch.h"  /* For WaitLatchOrSocket */
+#include "miscadmin.h"  /* For MyLatch */
+#ifdef WIN32
+#include <windows.h>  /* For WaitForSingleObject, DWORD, etc. */
+#endif
 #include "access/xlog.h"
 #include "archive/archive_module.h"
 #include "archive/shell_archive.h"
 #include "common/percentrepl.h"
 #include "pgstat.h"
-
+#include "utils/elog.h"  /* For elog logging */
+#include "postgres.h"      /* already there */
+#include "utils/palloc.h"  /* add this line */
+#include "libpq/pqformat.h"    /* for OpenPipeStream */
+#include "storage/latch.h"     /* for WaitLatchOrSocket */
 static bool shell_archive_configured(ArchiveModuleState *state);
 static bool shell_archive_file(ArchiveModuleState *state,
 							   const char *file,
@@ -53,12 +61,34 @@ shell_archive_configured(ArchiveModuleState *state)
 	return false;
 }
 
+#define WAIT_INTERVAL_MS 1000  /* 1s for efficient latch waiting */
+
 static bool
 shell_archive_file(ArchiveModuleState *state, const char *file,
 				   const char *path)
 {
 	char	   *xlogarchcmd;
 	char	   *nativePath = NULL;
+#ifndef WIN32
+	FILE	   *archiveFd = NULL;
+	int			archiveFileno;
+	char		buf[1024];
+	ssize_t nread;
+
+#else
+	size_t		cmdPrefixLen;
+	size_t		cmdlen;
+	char *win32cmd = palloc(strlen(xlogarchcmd) + 30);  /* cmd.exe /c "..." + null */
+        if (win32cmd == NULL)
+{
+    ereport(FATAL,
+        (errmsg_internal("Failed to palloc win32cmd: %m")));
+    return false;
+}
+	STARTUPINFO si;
+	PROCESS_INFORMATION pi;
+	int exit_code = 0;
+#endif
 	int			rc;
 
 	if (path)
@@ -77,14 +107,125 @@ shell_archive_file(ArchiveModuleState *state, const char *file,
 
 	fflush(NULL);
 	pgstat_report_wait_start(WAIT_EVENT_ARCHIVE_COMMAND);
-	rc = system(xlogarchcmd);
+
+	/*
+	 * Start the command and read until it completes, while keep checking for
+	 * interrupts to process pending events.
+	 */
+#ifndef WIN32
+	archiveFile = OpenPipeStream(xlogarchcmd, PG_BINARY_R);
+if (archiveFile == NULL)
+{
+    ereport(FATAL,
+            (errcode_for_file_access(),
+             errmsg("could not open archive command pipe: %m")));
+}
+		while (true)
+		{
+			CHECK_FOR_INTERRUPTS();
+			nread = read(archiveFd, &buf, sizeof(buf));
+			if ((nread > 0) || (nread == -1 && errno == EAGAIN))
+				if (nread > 0)
+{
+    buf[nread] = '\0';  /* Null-terminate for string *
+    elog(LOG, "Archive command stdout: %s", buf);
+}
+			else
+				break;
+		}
+		rc = pclose(archiveFd);
+	}
+	else
+		rc = -1;
+#else
+	/*
+	 * * Create a palloc'd copy of the command string, we need to prefix it with
+	 * cmd /c as the commandLine argument to CreateProcess still expects .exe
+	 * files.
+	 */
+	cmdlen = strlen(xlogarchcmd);
+#define CMD_PREFIX "cmd /c \""
+	cmdPrefixLen = strlen(CMD_PREFIX);
+	if (win32cmd == NULL)
+	{
+		ereport(FATAL,
+				(errmsg_internal("Failed to palloc win32cmd: %m")));
+		
+	}
+	memcpy(win32cmd, CMD_PREFIX, cmdPrefixLen);
+	memcpy(&win32cmd[cmdPrefixLen], xlogarchcmd, cmdlen);
+	win32cmd[cmdPrefixLen + cmdlen] = '"';
+	win32cmd[cmdPrefixLen + cmdlen + 1] = '\0';
+	ereport(DEBUG4,
+			(errmsg_internal("WIN32: executing modified archive command \"%s\"",
+							 win32cmd)));
+
+	memset(&pi, 0, sizeof(pi));
+	memset(&si, 0, sizeof(si));
+	si.cb = sizeof(si);
+
+	archiveFile = OpenPipeStream(xlogarchcmd, PG_BINARY_R);
+if (archiveFile == NULL)
+{
+    ereport(FATAL,
+            (errcode_for_file_access(),
+             errmsg("could not open archive command pipe: %m")));
+}
+	
+
+	DWORD result;
+ResetLatch(MyLatch);
+     while (true)
+    { 
+     CHECK_FOR_INTERRUPTS();
+    int latch_rc = WaitLatchOrSocket(MyLatch,
+                                WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+                                PGINVALID_SOCKET,
+                                WAIT_INTERVAL_MS,
+                                WAIT_EVENT_ARCHIVER_WAIT_CHILD);  /* Or WAIT_EVENT_ARCHIVER_MAIN if undefined */
+if (latch_rc & WL_LATCH_SET)
+{
+    ResetLatch(MyLatch);
+    CHECK_FOR_INTERRUPTS();
+}
+DWORD result = WaitForSingleObject(pi.hProcess, 0);  /* Quick non-block check */
+    if (result == WAIT_OBJECT_0)
+        break;
+    else if (result == WAIT_TIMEOUT)
+        continue;  /* Normal polling */
+    else if (result == WAIT_FAILED)
+    {
+        DWORD err = GetLastError();
+        CloseHandle(pi.hProcess);
+        CloseHandle(pi.hThread);
+        ereport(ERROR,
+                (errmsg("WaitForSingleObject failed during archive_command: %m (Windows error %lu)",
+                        err)));
+       pfree(win32cmd);
+        return false;
+    }
+    else
+    {
+        ereport(ERROR,
+                (errmsg("Unexpected WaitForSingleObject result during archive_command: %lu",
+                        result)));
+       pfree(win32cmd);
+        return false;
+    }
+}
+
+	GetExitCodeProcess(pi.hProcess, &exit_code);
+	CloseHandle(pi.hProcess);
+	CloseHandle(pi.hThread);
+	rc = exit_code;
+#endif
 	pgstat_report_wait_end();
 
 	if (rc != 0)
 	{
 		/*
 		 * If either the shell itself, or a called command, died on a signal,
-		 * abort the archiver.  We do this because system() ignores SIGINT and
+		 * abort the archiver.  We do this because pclose() ignores SIGINT and
 		 * SIGQUIT while waiting; so a signal is very likely something that
 		 * should have interrupted us too.  Also die if the shell got a hard
 		 * "command not found" type of error.  If we overreact it's no big
@@ -126,7 +267,7 @@ shell_archive_file(ArchiveModuleState *state, const char *file,
 							   xlogarchcmd)));
 		}
 		pfree(xlogarchcmd);
-
+               pfree(win32cmd);
 		return false;
 	}
 	pfree(xlogarchcmd);
-- 
2.39.5

Reply via email to