Hi,

Here's an experimental patch that gives you optional extra tar (and
potentially zip etc) support if compiled --with-libarchive, but only
for pg_waldump where we expect to meet user-generated archives.  The
recent band-aid applied to pg_waldump/t/001_basic.pl becomes:

+# If we don't have libarchive, then we tell tar to stick to ustar format that
+# astreamer_tar.c can decode.  Otherwise we should be able to accept anything
+# that any current tar produces.
+@tar_p_flags = tar_portability_options($tar)
+  if !check_pg_config("#define USE_LIBARCHIVE");

I was compelled to try this to avoid being sucked into the rabbithole
of hacking on tar code, after pg_waldump broke my computer[1].  It
doesn't seem to make much sense to try to speedrun everything that
happened to archiving since 1988 when you're a database project.  I
was encouraged by Robert's prediction[2] that we'd probably want to do
precisely this as soon as we started accepting user-generated
archives.  I postdict the same!

libarchive is really easy to work with, widely used and seems well put
together.  The only thing I was a bit sad about was the lack of an
async-friendly API that would let us push a raw byte stream into it.
So I tried modelling it as a "source only" astreamer that you pump by
calling astreamer_pull() when you want more content to be delivered to
the next streamer.

I don't immediately see why that'd be a problem, but I may lack
imagination.  It's still incremental, can still stop earlier, and we
don't do any multiplexing or AIO in this or any other uses of
astreamers.  It does mean that pg_waldump's read_archive_file() has to
treat this astreamer slightly differently though, which is annoying.
Perhaps that could be fixed if astreamer_file.c provided
"astreamer_file_reader" with the same semantics, so that it could
unconditionally call astreamer_pull(privateInfo->archive_streamer),
instead of doing the read, push-into-stream itself?  Just a thought.

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGL2dppjO4o28ZY7n_LTWviKLAi-7KZ%3Dtx5w2HGevCEYPA%40mail.gmail.com#0897c3b9c0aa583fef9459a711c7de60
[2] 
https://www.postgresql.org/message-id/CA+TgmoYg0C4ZkuSD=mag+wbq=0ggibm+-k1zm7lhjtdpiol...@mail.gmail.com
From 81237c97cf23efc2f2a364bf5bbf4efd27709733 Mon Sep 17 00:00:00 2001
From: Thomas Munro <[email protected]>
Date: Sun, 5 Apr 2026 02:25:55 +1200
Subject: [PATCH 1/4] libarchive: Add configure and meson options.

A follow-up patch will make use of it.

(Proof-of-concept)
---
 configure                  | 140 +++++++++++++++++++++++++++++++++++++
 configure.ac               |  13 ++++
 meson.build                |  16 +++++
 meson_options.txt          |   3 +
 src/Makefile.global.in     |   4 ++
 src/include/pg_config.h.in |   3 +
 6 files changed, 179 insertions(+)

diff --git a/configure b/configure
index fe22bc71d0c..6141cdb0256 100755
--- a/configure
+++ b/configure
@@ -718,6 +718,9 @@ LIBCURL_CPPFLAGS
 LIBCURL_LIBS
 LIBCURL_CFLAGS
 with_libcurl
+LIBARCHIVE_LIBS
+LIBARCHIVE_CFLAGS
+with_libarchive
 with_uuid
 LIBURING_LIBS
 LIBURING_CFLAGS
@@ -877,6 +880,7 @@ with_libedit_preferred
 with_liburing
 with_uuid
 with_ossp_uuid
+with_libarchive
 with_libcurl
 with_libnuma
 with_libxml
@@ -911,6 +915,8 @@ ICU_CFLAGS
 ICU_LIBS
 LIBURING_CFLAGS
 LIBURING_LIBS
+LIBARCHIVE_CFLAGS
+LIBARCHIVE_LIBS
 LIBCURL_CFLAGS
 LIBCURL_LIBS
 LIBNUMA_CFLAGS
@@ -1596,6 +1602,7 @@ Optional Packages:
   --with-liburing         build with io_uring support, for asynchronous I/O
   --with-uuid=LIB         build contrib/uuid-ossp using LIB (bsd,e2fs,ossp)
   --with-ossp-uuid        obsolete spelling of --with-uuid=ossp
+  --with-libarchive       build with libarchive support
   --with-libcurl          build with libcurl support
   --with-libnuma          build with libnuma support
   --with-libxml           build with XML support
@@ -1635,6 +1642,10 @@ Some influential environment variables:
               C compiler flags for LIBURING, overriding pkg-config
   LIBURING_LIBS
               linker flags for LIBURING, overriding pkg-config
+  LIBARCHIVE_CFLAGS
+              C compiler flags for LIBARCHIVE, overriding pkg-config
+  LIBARCHIVE_LIBS
+              linker flags for LIBARCHIVE, overriding pkg-config
   LIBCURL_CFLAGS
               C compiler flags for LIBCURL, overriding pkg-config
   LIBCURL_LIBS
@@ -8912,6 +8923,135 @@ fi
 
 
 
+#
+# libarchive
+#
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with libarchive support" >&5
+$as_echo_n "checking whether to build with libarchive support... " >&6; }
+
+
+
+# Check whether --with-libarchive was given.
+if test "${with_libarchive+set}" = set; then :
+  withval=$with_libarchive;
+  case $withval in
+    yes)
+
+$as_echo "#define USE_LIBARCHIVE 1" >>confdefs.h
+
+      ;;
+    no)
+      :
+      ;;
+    *)
+      as_fn_error $? "no argument expected for --with-libarchive option" "$LINENO" 5
+      ;;
+  esac
+
+else
+  with_libarchive=no
+
+fi
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $with_libarchive" >&5
+$as_echo "$with_libarchive" >&6; }
+
+if test "$with_libarchive" = yes ; then
+
+pkg_failed=no
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for libarchive" >&5
+$as_echo_n "checking for libarchive... " >&6; }
+
+if test -n "$LIBARCHIVE_CFLAGS"; then
+    pkg_cv_LIBARCHIVE_CFLAGS="$LIBARCHIVE_CFLAGS"
+ elif test -n "$PKG_CONFIG"; then
+    if test -n "$PKG_CONFIG" && \
+    { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libarchive\""; } >&5
+  ($PKG_CONFIG --exists --print-errors "libarchive") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; then
+  pkg_cv_LIBARCHIVE_CFLAGS=`$PKG_CONFIG --cflags "libarchive" 2>/dev/null`
+		      test "x$?" != "x0" && pkg_failed=yes
+else
+  pkg_failed=yes
+fi
+ else
+    pkg_failed=untried
+fi
+if test -n "$LIBARCHIVE_LIBS"; then
+    pkg_cv_LIBARCHIVE_LIBS="$LIBARCHIVE_LIBS"
+ elif test -n "$PKG_CONFIG"; then
+    if test -n "$PKG_CONFIG" && \
+    { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libarchive\""; } >&5
+  ($PKG_CONFIG --exists --print-errors "libarchive") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; then
+  pkg_cv_LIBARCHIVE_LIBS=`$PKG_CONFIG --libs "libarchive" 2>/dev/null`
+		      test "x$?" != "x0" && pkg_failed=yes
+else
+  pkg_failed=yes
+fi
+ else
+    pkg_failed=untried
+fi
+
+
+
+if test $pkg_failed = yes; then
+        { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+
+if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
+        _pkg_short_errors_supported=yes
+else
+        _pkg_short_errors_supported=no
+fi
+        if test $_pkg_short_errors_supported = yes; then
+	        LIBARCHIVE_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libarchive" 2>&1`
+        else
+	        LIBARCHIVE_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libarchive" 2>&1`
+        fi
+	# Put the nasty error message in config.log where it belongs
+	echo "$LIBARCHIVE_PKG_ERRORS" >&5
+
+	as_fn_error $? "Package requirements (libarchive) were not met:
+
+$LIBARCHIVE_PKG_ERRORS
+
+Consider adjusting the PKG_CONFIG_PATH environment variable if you
+installed software in a non-standard prefix.
+
+Alternatively, you may set the environment variables LIBARCHIVE_CFLAGS
+and LIBARCHIVE_LIBS to avoid the need to call pkg-config.
+See the pkg-config man page for more details." "$LINENO" 5
+elif test $pkg_failed = untried; then
+        { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+	{ { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
+as_fn_error $? "The pkg-config script could not be found or is too old.  Make sure it
+is in your PATH or set the PKG_CONFIG environment variable to the full
+path to pkg-config.
+
+Alternatively, you may set the environment variables LIBARCHIVE_CFLAGS
+and LIBARCHIVE_LIBS to avoid the need to call pkg-config.
+See the pkg-config man page for more details.
+
+To get pkg-config, see <http://pkg-config.freedesktop.org/>.
+See \`config.log' for more details" "$LINENO" 5; }
+else
+	LIBARCHIVE_CFLAGS=$pkg_cv_LIBARCHIVE_CFLAGS
+	LIBARCHIVE_LIBS=$pkg_cv_LIBARCHIVE_LIBS
+        { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+$as_echo "yes" >&6; }
+
+fi
+fi
+
+
 #
 # libcurl
 #
diff --git a/configure.ac b/configure.ac
index 6873b7546dd..ad75e2200c7 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1059,6 +1059,19 @@ fi
 AC_SUBST(with_uuid)
 
 
+#
+# libarchive
+#
+AC_MSG_CHECKING([whether to build with libarchive support])
+PGAC_ARG_BOOL(with, libarchive, no, [build with libarchive support],
+              [AC_DEFINE([USE_LIBARCHIVE], 1, [Define to 1 to build with libarchive support. (--with-libarchive)])])
+AC_MSG_RESULT([$with_libarchive])
+AC_SUBST(with_libarchive)
+if test "$with_libarchive" = yes ; then
+  PKG_CHECK_MODULES(LIBARCHIVE, libarchive)
+fi
+
+
 #
 # libcurl
 #
diff --git a/meson.build b/meson.build
index 6bc74c2ba79..cce5fc40d05 100644
--- a/meson.build
+++ b/meson.build
@@ -923,6 +923,22 @@ endif
 
 
 ###############################################################
+# Library: libarchive
+###############################################################
+
+libarchive_opt = get_option('libarchive')
+if not libarchive_opt.disabled()
+  libarchive = dependency('libarchive', required: libarchive_opt)
+else
+  libarchive = not_found_dep
+endif
+if libarchive.found()
+  cdata.set('USE_LIBARCHIVE', 1)
+endif
+
+
+
+i###############################################################
 # Library: LLVM
 ###############################################################
 
diff --git a/meson_options.txt b/meson_options.txt
index 6a793f3e479..671ffe127ff 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -100,6 +100,9 @@ option('icu', type: 'feature', value: 'auto',
 option('ldap', type: 'feature', value: 'auto',
   description: 'LDAP support')
 
+option('libarchive', type : 'feature', value: 'auto',
+  description: 'libarchive support')
+
 option('libcurl', type : 'feature', value: 'auto',
   description: 'libcurl support')
 
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index a7699b026bb..060165bd27d 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -195,6 +195,7 @@ with_systemd	= @with_systemd@
 with_gssapi	= @with_gssapi@
 with_krb_srvnam	= @with_krb_srvnam@
 with_ldap	= @with_ldap@
+with_libarchive	= @with_libarchive@
 with_libcurl	= @with_libcurl@
 with_libnuma	= @with_libnuma@
 with_liburing	= @with_liburing@
@@ -224,6 +225,9 @@ krb_srvtab = @krb_srvtab@
 ICU_CFLAGS		= @ICU_CFLAGS@
 ICU_LIBS		= @ICU_LIBS@
 
+LIBARCHIVE_CFLAGS	= @LIBARCHIVE_CFLAGS@
+LIBARCHIVE_LIBS		= @LIBARCHIVE_LIBS@
+
 LIBNUMA_CFLAGS		= @LIBNUMA_CFLAGS@
 LIBNUMA_LIBS		= @LIBNUMA_LIBS@
 
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index d8d61918aff..62c2cc93902 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -695,6 +695,9 @@
 /* Define to 1 to build with LDAP support. (--with-ldap) */
 #undef USE_LDAP
 
+/* Define to 1 to build with libarchive support. (--with-libarchive) */
+#undef USE_LIBARCHIVE
+
 /* Define to 1 to build with libcurl support. (--with-libcurl) */
 #undef USE_LIBCURL
 
-- 
2.53.0

From be73511ccaa67adfb09b1f93588970c3898664b0 Mon Sep 17 00:00:00 2001
From: Thomas Munro <[email protected]>
Date: Sat, 4 Apr 2026 16:48:14 +1300
Subject: [PATCH 2/4] libarchive: Provide astreamer_libarchive.c.

This allows modern tar files (and potential other unrelated formats) to
be consumed from a file, with support for various compression
algorithms.

This astreamer is a unusual in that it produces data rather than having
data pushed into it with astreamer_content().  The wrapper
astreamer_pull() is used to signal that difference, though it just calls
astreamer_content() with NULL data.

(Proof-of-concept)
---
 src/fe_utils/Makefile               |   4 +
 src/fe_utils/astreamer_libarchive.c | 257 ++++++++++++++++++++++++++++
 src/fe_utils/meson.build            |   4 +
 src/include/fe_utils/astreamer.h    |  12 ++
 src/tools/pgindent/typedefs.list    |   1 +
 5 files changed, 278 insertions(+)
 create mode 100644 src/fe_utils/astreamer_libarchive.c

diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index cbfbf93ac69..f6c88a73ee7 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -40,6 +40,10 @@ OBJS = \
 	string_utils.o \
 	version.o
 
+ifeq ($(with_libarchive), yes)
+OBJS += astreamer_libarchive.o
+endif
+
 ifeq ($(PORTNAME), win32)
 override CPPFLAGS += -DFD_SETSIZE=1024
 endif
diff --git a/src/fe_utils/astreamer_libarchive.c b/src/fe_utils/astreamer_libarchive.c
new file mode 100644
index 00000000000..d57853171f4
--- /dev/null
+++ b/src/fe_utils/astreamer_libarchive.c
@@ -0,0 +1,257 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_libarchive.c
+ *
+ * This module reads from archives using https://www.libarchive.org/.
+ *
+ * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/fe_utils/astreamer_libarchive.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <archive.h>
+#include <archive_entry.h>
+
+#include "common/logging.h"
+#include "fe_utils/astreamer.h"
+
+/* This is the data size we'll try to stream at once. */
+#define ASTREAMER_LIBARCHIVE_READER_BUFFER_SIZE (128 * 1024)
+
+typedef struct astreamer_libarchive_reader
+{
+	astreamer	base;
+	astreamer_member member;
+	struct archive *archive;
+	bool		end_of_file;
+	bool		end_of_archive;
+	char		data[ASTREAMER_LIBARCHIVE_READER_BUFFER_SIZE];
+} astreamer_libarchive_reader;
+
+static void astreamer_libarchive_reader_content(astreamer *streamer,
+												astreamer_member *member,
+												const char *data, int len,
+												astreamer_archive_context context);
+static void astreamer_libarchive_reader_finalize(astreamer *streamer);
+static void astreamer_libarchive_reader_free(astreamer *streamer);
+
+static const astreamer_ops astreamer_libarchive_reader_ops = {
+	.content = astreamer_libarchive_reader_content,
+	.finalize = astreamer_libarchive_reader_finalize,
+	.free = astreamer_libarchive_reader_free
+};
+
+/*
+ * Create an astreamer that decodes 'pathname' with libarchive and feeds its
+ * contents to 'next'.  This streamer is a source that must be the first in
+ * the chain, and content should be produced by calling
+ * astreamer_pull(streamer).
+ */
+astreamer *
+astreamer_libarchive_reader_new_pathname(astreamer *next,
+										 const char *pathname)
+{
+	astreamer_libarchive_reader *streamer;
+	int			r;
+
+	streamer = palloc0_object(astreamer_libarchive_reader);
+	*((const astreamer_ops **) &streamer->base.bbs_ops) =
+		&astreamer_libarchive_reader_ops;
+	streamer->base.bbs_next = next;
+
+	/* Prepare to read tar archives with any known compression filter. */
+	streamer->archive = archive_read_new();
+	if (streamer->archive == NULL)
+		pg_fatal("out of memory");
+	if (archive_read_support_format_tar(streamer->archive) != ARCHIVE_OK)
+		pg_fatal("libarchive: could not initialize tar format: %s",
+				 archive_error_string(streamer->archive));
+	if (archive_read_support_filter_all(streamer->archive) != ARCHIVE_OK)
+		pg_fatal("libarchive: could not initialize tar filter: %s",
+				 archive_error_string(streamer->archive));
+
+	/* Open file. */
+	r = archive_read_open_filename(streamer->archive,
+								   pathname,
+								   ASTREAMER_LIBARCHIVE_READER_BUFFER_SIZE);
+	if (r != ARCHIVE_OK)
+		pg_fatal("libarchive: could not open \"%s\": %s",
+				 pathname,
+				 archive_error_string(streamer->archive));
+
+	/* Start by wanting a new file. */
+	streamer->end_of_file = true;
+	streamer->end_of_archive = false;
+
+	return &streamer->base;
+}
+
+/* Fill in an astreamer member given a libarchive entry. */
+static void
+astreamer_libarchive_reader_fill_member(astreamer_member *member,
+										struct archive_entry *entry)
+{
+	strlcpy(member->pathname,
+			archive_entry_pathname(entry),
+			sizeof(member->pathname));
+	member->size = archive_entry_size(entry);
+	member->mode = archive_entry_mode(entry);
+	member->uid = archive_entry_uid(entry);
+	member->gid = archive_entry_gid(entry);
+	switch (archive_entry_filetype(entry))
+	{
+		case AE_IFREG:
+			member->is_regular = true;
+			break;
+		case AE_IFDIR:
+			member->is_directory = true;
+			break;
+		case AE_IFLNK:
+			member->is_symlink = true;
+			strlcpy(member->linktarget,
+					archive_entry_symlink(entry),
+					sizeof(member->linktarget));
+			break;
+		default:
+			break;
+	}
+}
+
+static void
+astreamer_libarchive_reader_content(astreamer *streamer,
+									astreamer_member *member,
+									const char *data_ignored,
+									int len_ignored,
+									astreamer_archive_context context)
+{
+	astreamer_libarchive_reader *mystreamer;
+	ssize_t		size;
+
+	/*
+	 * This should be reached by calling astreamer_pull().
+	 *
+	 * If libarchive had a non-blocking or push API (cf discussion in
+	 * libarchive issue #1268), then we could push raw data in here, like
+	 * astreamer_tar_parser.
+	 *
+	 * Given only a blocking interface, we have to ask it to pull data into
+	 * our astreamer pipeline.  The amount it reads at once is bounded by
+	 * ASTREAMER_LIBARCHIVE_READER_BUFFER_SIZE, and we'll return control after
+	 * emitting just one data chunk that so that the caller has the chance to
+	 * give up early.
+	 */
+	Assert(member == NULL);
+	Assert(data_ignored == NULL);
+	Assert(len_ignored == 0);
+	Assert(context == ASTREAMER_UNKNOWN);
+
+	mystreamer = (astreamer_libarchive_reader *) streamer;
+
+	while (!mystreamer->end_of_archive)
+	{
+		/* Do we need a new file? */
+		if (mystreamer->end_of_file)
+		{
+			struct archive_entry *entry;
+
+			/* Start next file, or discover end of archive. */
+			switch (archive_read_next_header(mystreamer->archive, &entry))
+			{
+				case ARCHIVE_RETRY:
+					continue;
+				case ARCHIVE_FATAL:
+					pg_fatal("libarchive: %s",
+							 archive_error_string(mystreamer->archive));
+					break;
+				case ARCHIVE_WARN:
+					pg_log_warning("libarchive: %s",
+								   archive_error_string(mystreamer->archive));
+					pg_fallthrough;
+				case ARCHIVE_OK:
+					/* Send file header, then fall through to send one chunk. */
+					mystreamer->end_of_file = false;
+					astreamer_libarchive_reader_fill_member(&mystreamer->member,
+															entry);
+					astreamer_content(mystreamer->base.bbs_next,
+									  &mystreamer->member,
+									  NULL,
+									  0,
+									  ASTREAMER_MEMBER_HEADER);
+					break;
+				case ARCHIVE_EOF:
+					/* End of archive. */
+					mystreamer->end_of_archive = true;
+					astreamer_content(mystreamer->base.bbs_next,
+									  NULL,
+									  NULL,
+									  0,
+									  ASTREAMER_ARCHIVE_TRAILER);
+					return;
+				default:
+					pg_fatal("unexpected result from archive_read_next_header()");
+					break;
+			}
+		}
+
+		/* Stream a chunk of data, or discover end of file. */
+		Assert(!mystreamer->end_of_file);
+		size = archive_read_data(mystreamer->archive,
+								 mystreamer->data,
+								 sizeof(mystreamer->data));
+		switch (size)
+		{
+			case ARCHIVE_RETRY:
+				continue;
+			case ARCHIVE_FATAL:
+				pg_fatal("libarchive: %s",
+						 archive_error_string(mystreamer->archive));
+				pg_unreachable();
+			case ARCHIVE_WARN:
+				pg_log_warning("libarchive: %s",
+							   archive_error_string(mystreamer->archive));
+				continue;
+			default:
+				break;
+		}
+
+		if (size == 0)
+		{
+			/* Send trailer, and go around to start another file. */
+			mystreamer->end_of_file = true;
+			astreamer_content(mystreamer->base.bbs_next,
+							  &mystreamer->member,
+							  NULL,
+							  0,
+							  ASTREAMER_MEMBER_TRAILER);
+			continue;
+		}
+
+		/* Stream large chunk and return. */
+		astreamer_content(mystreamer->base.bbs_next,
+						  &mystreamer->member,
+						  mystreamer->data,
+						  size,
+						  ASTREAMER_MEMBER_CONTENTS);
+		return;
+	}
+}
+
+static void
+astreamer_libarchive_reader_finalize(astreamer *streamer)
+{
+	astreamer_finalize(streamer->bbs_next);
+}
+
+static void
+astreamer_libarchive_reader_free(astreamer *streamer)
+{
+	astreamer_libarchive_reader *mystreamer;
+
+	mystreamer = (astreamer_libarchive_reader *) streamer;
+	archive_free(mystreamer->archive);
+	pfree(streamer);
+}
diff --git a/src/fe_utils/meson.build b/src/fe_utils/meson.build
index 86befca192e..6b95c36e9a5 100644
--- a/src/fe_utils/meson.build
+++ b/src/fe_utils/meson.build
@@ -21,6 +21,10 @@ fe_utils_sources = files(
   'version.c',
 )
 
+if libarchive.found()
+  fe_utils_sources += 'astreamer_libarchive.c'
+endif
+
 psqlscan = custom_target('psqlscan',
   input: 'psqlscan.l',
   output: 'psqlscan.c',
diff --git a/src/include/fe_utils/astreamer.h b/src/include/fe_utils/astreamer.h
index 8329e4efbc5..c6c54e954e9 100644
--- a/src/include/fe_utils/astreamer.h
+++ b/src/include/fe_utils/astreamer.h
@@ -142,6 +142,13 @@ astreamer_content(astreamer *streamer, astreamer_member *member,
 	streamer->bbs_ops->content(streamer, member, data, len, context);
 }
 
+/* Variant for astreamers that produce data themselves. */
+static inline void
+astreamer_pull(astreamer *streamer)
+{
+	astreamer_content(streamer, NULL, NULL, 0, ASTREAMER_UNKNOWN);
+}
+
 /* Finalize a astreamer. */
 static inline void
 astreamer_finalize(astreamer *streamer)
@@ -228,4 +235,9 @@ extern astreamer *astreamer_tar_parser_new(astreamer *next);
 extern astreamer *astreamer_tar_terminator_new(astreamer *next);
 extern astreamer *astreamer_tar_archiver_new(astreamer *next);
 
+#ifdef USE_LIBARCHIVE
+extern astreamer *astreamer_libarchive_reader_new_pathname(astreamer *next,
+														   const char *pathname);
+#endif
+
 #endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c72f6c59573..1bb3a2bafd4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3564,6 +3564,7 @@ astreamer_archive_context
 astreamer_extractor
 astreamer_gzip_decompressor
 astreamer_gzip_writer
+astreamer_libarchive_reader
 astreamer_lz4_frame
 astreamer_member
 astreamer_ops
-- 
2.53.0

From b54e6c7b05d9bf34bde60b72f3a8d1989b2b17dc Mon Sep 17 00:00:00 2001
From: Thomas Munro <[email protected]>
Date: Sun, 5 Apr 2026 03:07:19 +1200
Subject: [PATCH 3/4] fixup: Use more efficient zero-copy API?

We can pass a pointer to data in libarchive's internal buffer directly
to the next streamer, avoiding one copy.  To do this we also have to
expand any sparse regions ourselves.

XXX not sure it's worth the complexity for non-performance critical
code?
---
 src/fe_utils/astreamer_libarchive.c | 63 ++++++++++++++++++++++++-----
 1 file changed, 53 insertions(+), 10 deletions(-)

diff --git a/src/fe_utils/astreamer_libarchive.c b/src/fe_utils/astreamer_libarchive.c
index d57853171f4..d28c4ed915f 100644
--- a/src/fe_utils/astreamer_libarchive.c
+++ b/src/fe_utils/astreamer_libarchive.c
@@ -29,7 +29,8 @@ typedef struct astreamer_libarchive_reader
 	struct archive *archive;
 	bool		end_of_file;
 	bool		end_of_archive;
-	char		data[ASTREAMER_LIBARCHIVE_READER_BUFFER_SIZE];
+	pgoff_t		offset;
+	char		zeroes[8192];
 } astreamer_libarchive_reader;
 
 static void astreamer_libarchive_reader_content(astreamer *streamer,
@@ -121,6 +122,27 @@ astreamer_libarchive_reader_fill_member(astreamer_member *member,
 	}
 }
 
+/* Emit zeroes up to offset. */
+static bool
+astreamer_libarchive_reader_expand_sparse(astreamer_libarchive_reader *mystreamer,
+										  pgoff_t offset)
+{
+	size_t		size;
+
+	while (mystreamer->offset < offset)
+	{
+		size = offset - mystreamer->offset;
+		size = Min(size, sizeof(mystreamer->zeroes));
+		astreamer_content(mystreamer->base.bbs_next,
+						  &mystreamer->member,
+						  mystreamer->zeroes,
+						  size,
+						  ASTREAMER_MEMBER_CONTENTS);
+		mystreamer->offset += size;
+	}
+	return true;
+}
+
 static void
 astreamer_libarchive_reader_content(astreamer *streamer,
 									astreamer_member *member,
@@ -129,7 +151,9 @@ astreamer_libarchive_reader_content(astreamer *streamer,
 									astreamer_archive_context context)
 {
 	astreamer_libarchive_reader *mystreamer;
-	ssize_t		size;
+	const void *data;
+	size_t		size;
+	pgoff_t		offset;
 
 	/*
 	 * This should be reached by calling astreamer_pull().
@@ -174,6 +198,7 @@ astreamer_libarchive_reader_content(astreamer *streamer,
 				case ARCHIVE_OK:
 					/* Send file header, then fall through to send one chunk. */
 					mystreamer->end_of_file = false;
+					mystreamer->offset = 0;
 					astreamer_libarchive_reader_fill_member(&mystreamer->member,
 															entry);
 					astreamer_content(mystreamer->base.bbs_next,
@@ -197,12 +222,19 @@ astreamer_libarchive_reader_content(astreamer *streamer,
 			}
 		}
 
-		/* Stream a chunk of data, or discover end of file. */
+		/*
+		 * Stream a chunk of data, or discover end of file.
+		 *
+		 * It would be a bit simpler to use archive_read_data(), but this
+		 * interface removes the need for copying to an output buffer.  In
+		 * exchange for that, we have to deal with expanding (rare) sparse
+		 * file zeroes.
+		 */
 		Assert(!mystreamer->end_of_file);
-		size = archive_read_data(mystreamer->archive,
-								 mystreamer->data,
-								 sizeof(mystreamer->data));
-		switch (size)
+		switch (archive_read_data_block(mystreamer->archive,
+										&data,
+										&size,
+										&offset))
 		{
 			case ARCHIVE_RETRY:
 				continue;
@@ -213,11 +245,20 @@ astreamer_libarchive_reader_content(astreamer *streamer,
 			case ARCHIVE_WARN:
 				pg_log_warning("libarchive: %s",
 							   archive_error_string(mystreamer->archive));
-				continue;
+				break;
+			case ARCHIVE_EOF:
+				size = 0;
+				break;
+			case ARCHIVE_OK:
+				break;
 			default:
+				pg_fatal("unexpected result from archive_read_next_data_block()");
 				break;
 		}
 
+		/* Expand any intervening sparse region. */
+		astreamer_libarchive_reader_expand_sparse(mystreamer, offset);
+
 		if (size == 0)
 		{
 			/* Send trailer, and go around to start another file. */
@@ -230,12 +271,14 @@ astreamer_libarchive_reader_content(astreamer *streamer,
 			continue;
 		}
 
-		/* Stream large chunk and return. */
+		/* Stream large chunk directly from libarchive's buffer and return. */
+		Assert(mystreamer->offset == offset);
 		astreamer_content(mystreamer->base.bbs_next,
 						  &mystreamer->member,
-						  mystreamer->data,
+						  data,
 						  size,
 						  ASTREAMER_MEMBER_CONTENTS);
+		mystreamer->offset += size;
 		return;
 	}
 }
-- 
2.53.0

From 7769392066efcd9c00a5c2be722eb91af10787a4 Mon Sep 17 00:00:00 2001
From: Thomas Munro <[email protected]>
Date: Sun, 5 Apr 2026 02:40:56 +1200
Subject: [PATCH 4/4] pg_waldump: Use astreamer_libarchive.c.

If this build supports libarchive, use astreamer_libarchive_reader
instead of astreamer_tar_parser to read WAL archives.  This allows
modern tar formats with more types of compression to be used.

(Proof-of-concept)
---
 src/bin/pg_waldump/Makefile          |  5 +++++
 src/bin/pg_waldump/archive_waldump.c | 31 +++++++++++++++++++++++++++-
 src/bin/pg_waldump/meson.build       |  2 +-
 src/bin/pg_waldump/t/001_basic.pl    |  9 +++++++-
 4 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile
index aabb87566a2..09005bf4ba4 100644
--- a/src/bin/pg_waldump/Makefile
+++ b/src/bin/pg_waldump/Makefile
@@ -23,6 +23,11 @@ OBJS = \
 override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils
 
+ifeq ($(with_libarchive), yes)
+# XXX figure out where this should go
+LDFLAGS_INTERNAL += $(LIBARCHIVE_LIBS)
+endif
+
 RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc*.c)))
 RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
 
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index e4a4bf44a7e..938a253ecab 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -129,12 +129,30 @@ void
 init_archive_reader(XLogDumpPrivate *privateInfo,
 					pg_compress_algorithm compression)
 {
-	int			fd;
 	astreamer  *streamer;
 	ArchivedWALFile *entry = NULL;
 	XLogLongPageHeader longhdr;
 	ArchivedWAL_iterator iter;
 
+#ifdef USE_LIBARCHIVE
+	char	   *pathname = NULL;
+
+	/* Open tar archive with libarchive. */
+	streamer = astreamer_waldump_new(privateInfo);
+	if (privateInfo->archive_dir)
+		pathname = psprintf("%s/%s",
+							privateInfo->archive_dir,
+							privateInfo->archive_name);
+	streamer =
+		astreamer_libarchive_reader_new_pathname(streamer,
+												 pathname ?
+												 pathname :
+												 privateInfo->archive_name);
+	if (pathname)
+		pfree(pathname);
+#else
+	int			fd;
+
 	/* Open tar archive and store its file descriptor */
 	fd = open_file_in_directory(privateInfo->archive_dir,
 								privateInfo->archive_name);
@@ -157,6 +175,7 @@ init_archive_reader(XLogDumpPrivate *privateInfo,
 		streamer = astreamer_lz4_decompressor_new(streamer);
 	else if (compression == PG_COMPRESSION_ZSTD)
 		streamer = astreamer_zstd_decompressor_new(streamer);
+#endif
 
 	privateInfo->archive_streamer = streamer;
 
@@ -286,10 +305,12 @@ free_archive_reader(XLogDumpPrivate *privateInfo)
 		privateInfo->archive_read_buf = NULL;
 	}
 
+#ifndef USE_LIBARCHIVE
 	/* Close the file. */
 	if (close(privateInfo->archive_fd) != 0)
 		pg_log_error("could not close file \"%s\": %m",
 					 privateInfo->archive_name);
+#endif
 }
 
 /*
@@ -537,12 +558,19 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
 static bool
 read_archive_file(XLogDumpPrivate *privateInfo)
 {
+#ifndef USE_LIBARCHIVE
 	int			rc;
+#endif
 
 	/* Fail if we already reached EOF in a prior call. */
 	if (privateInfo->archive_fd_eof)
 		return false;
 
+#ifdef USE_LIBARCHIVE
+	/* Tell libarchive to read more data. */
+	astreamer_pull(privateInfo->archive_streamer);
+#else
+
 	/* Try to read some more data. */
 	rc = read(privateInfo->archive_fd, privateInfo->archive_read_buf,
 			  privateInfo->archive_read_buf_size);
@@ -569,6 +597,7 @@ read_archive_file(XLogDumpPrivate *privateInfo)
 		/* Set flag to ensure we don't finalize more than once. */
 		privateInfo->archive_fd_eof = true;
 	}
+#endif
 
 	return true;
 }
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 5296f21b82c..0b2c4021107 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -19,7 +19,7 @@ endif
 
 pg_waldump = executable('pg_waldump',
   pg_waldump_sources,
-  dependencies: [frontend_code, libpq, lz4, zstd],
+  dependencies: [frontend_code, libarchive, libpq, lz4, zstd],
   c_args: ['-DFRONTEND'], # needed for xlogreader et al
   kwargs: default_bin_args,
 )
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 7dd1c3dd63e..62a15228b38 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -11,7 +11,13 @@ use Test::More;
 use List::Util qw(shuffle);
 
 my $tar = $ENV{TAR};
-my @tar_p_flags = tar_portability_options($tar);
+my @tar_p_flags;
+
+# If we don't have libarchive, then we tell tar to stick to ustar format that
+# astreamer_tar.c can decode.  Otherwise we should be able to accept anything
+# that any current tar produces.
+@tar_p_flags = tar_portability_options($tar)
+  if !check_pg_config("#define USE_LIBARCHIVE");
 
 program_help_ok('pg_waldump');
 program_version_ok('pg_waldump');
@@ -373,6 +379,7 @@ my @scenarios = (
 		'compression_flags' => '-czf',
 		'is_archive' => 1,
 		'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+		  || check_pg_config("#define USE_LIBARCHIVE")
 	});
 
 for my $scenario (@scenarios)
-- 
2.53.0

Reply via email to