On 3/26/24 15:09, Jakub Wartak wrote:
> On Sat, Mar 23, 2024 at 6:57 PM Tomas Vondra
> <tomas.von...@enterprisedb.com> wrote:
> 
>> On 3/23/24 14:47, Tomas Vondra wrote:
>>> On 3/23/24 13:38, Robert Haas wrote:
>>>> On Fri, Mar 22, 2024 at 8:26 PM Thomas Munro <thomas.mu...@gmail.com> 
>>>> wrote:
> [..]
>>> Yeah, that's in write_reconstructed_file() and the patch does not touch
>>> that at all. I agree it would be nice to use copy_file_range() in this
>>> part too, and it doesn't seem it'd be that hard to do, I think.
>>>
>>> It seems we'd just need a "fork" that either calls pread/pwrite or
>>> copy_file_range, depending on checksums and what was requested.
>>>
>>
>> Here's a patch to use copy_file_range in write_reconstructed_file too,
>> when requested/possible. One thing that I'm not sure about is whether to
>> do pg_fatal() if --copy-file-range but the platform does not support it.
> [..]
> 
> Hi Tomas, so I gave a go to the below patches today:
> - v20240323-2-0001-pg_combinebackup-allow-using-clone-copy_.patch
> - v20240323-2-0002-write_reconstructed_file.patch
> 
> My assessment:
> 
> v20240323-2-0001-pg_combinebackup-allow-using-clone-copy_.patch -
> looks like more or less good to go

There's some issues with the --dry-run, pointed out by Robert. Should be
fixed in the attached version.

> v20240323-2-0002-write_reconstructed_file.patch - needs work and
> without that clone/copy_file_range() good effects are unlikely
> 
> Given Debian 12, ~100GB DB, (pgbench -i -s 7000 , and some additional
> tables with GiST and GIN indexes , just to see more WAL record types)
> and with backups sizes in MB like that:
> 
> 106831  full
> 2823    incr.1 # captured after some time with pgbench -R 100
> 165     incr.2 # captured after some time with pgbench -R 100
> 
> Test cmd: rm -rf outtest; sync; sync ; sync; echo 3 | sudo tee
> /proc/sys/vm/drop_caches ; time /usr/pgsql17/bin/pg_combinebackup -o
> outtest full incr.1 incr.2
> 
> Test results of various copies on small I/O constrained XFS device:
> normal copy: 31m47.407s
> --clone copy: error: file cloning not supported on this platform (it's
> due #ifdef of having COPY_FILE_RANGE available)
> --copy-file-range: aborted, as it was taking too long , I was
> expecting it to accelerate, but it did not... obviously this is the
> transparent failover in case of calculating checksums...
> --manifest-checksums=NONE --copy-file-range: BUG, it keep on appending
> to just one file e.g. outtest/base/5/16427.29 with 200GB+ ?? and ended
> up with ENOSPC [more on this later]

That's really strange.

> --manifest-checksums=NONE --copy-file-range without v20240323-2-0002: 
> 27m23.887s
> --manifest-checksums=NONE --copy-file-range with v20240323-2-0002 and
> loop-fix: 5m1.986s but it creates corruption as it stands
> 

Thanks. I plan to do more similar tests, once my machines get done with
some other stuff.

> Issues:
> 
> 1. https://cirrus-ci.com/task/5937513653600256?logs=mingw_cross_warning#L327
> compains about win32/mingw:
> 
> [15:47:27.184] In file included from copy_file.c:22:
> [15:47:27.184] copy_file.c: In function ‘copy_file’:
> [15:47:27.184] ../../../src/include/common/logging.h:134:6: error:
> this statement may fall through [-Werror=implicit-fallthrough=]
> [15:47:27.184]   134 |   if (unlikely(__pg_log_level <= PG_LOG_DEBUG)) \
> [15:47:27.184]       |      ^
> [15:47:27.184] copy_file.c:96:5: note: in expansion of macro ‘pg_log_debug’
> [15:47:27.184]    96 |     pg_log_debug("would copy \"%s\" to \"%s\"
> (copy_file_range)",
> [15:47:27.184]       |     ^~~~~~~~~~~~
> [15:47:27.184] copy_file.c:99:4: note: here
> [15:47:27.184]    99 |    case COPY_MODE_COPYFILE:
> [15:47:27.184]       |    ^~~~
> [15:47:27.184] cc1: all warnings being treated as errors
> 

Yup, missing break.

> 2. I do not know what's the consensus between --clone and
> --copy-file-range , but if we have #ifdef FICLONE clone_works() #elif
> HAVE_COPY_FILE_RANGE copy_file_range_only_works() then we should also
> apply the same logic to the --help so that --clone is not visible
> there (for consistency?). Also the "error: file cloning not supported
> on this platform " is technically incorrect, Linux does support
> ioctl(FICLONE) and copy_file_range(), but we are just not choosing one
> over another (so technically it is "available"). Nitpicking I know.
> 

That's a good question, I'm not sure. But whatever we do, we should do
the same thing in pg_upgrade. Maybe there's some sort of precedent?

> 3. [v20240323-2-0002-write_reconstructed_file.patch]: The mentioned
> ENOSPACE spiral-of-death-bug symptoms are like that:
> 
> strace:
> copy_file_range(8, [697671680], 9, NULL, 8192, 0) = 8192
> copy_file_range(8, [697679872], 9, NULL, 8192, 0) = 8192
> copy_file_range(8, [697688064], 9, NULL, 8192, 0) = 8192
> copy_file_range(8, [697696256], 9, NULL, 8192, 0) = 8192
> copy_file_range(8, [697704448], 9, NULL, 8192, 0) = 8192
> copy_file_range(8, [697712640], 9, NULL, 8192, 0) = 8192
> copy_file_range(8, [697720832], 9, NULL, 8192, 0) = 8192
> copy_file_range(8, [697729024], 9, NULL, 8192, 0) = 8192
> copy_file_range(8, [697737216], 9, NULL, 8192, 0) = 8192
> copy_file_range(8, [697745408], 9, NULL, 8192, 0) = 8192
> copy_file_range(8, [697753600], 9, NULL, 8192, 0) = 8192
> copy_file_range(8, [697761792], 9, NULL, 8192, 0) = 8192
> copy_file_range(8, [697769984], 9, NULL, 8192, 0) = 8192
> 
> Notice that dest_off_t (poutoff) is NULL.
> 
> (gdb) where
> #0  0x00007f2cd56f6733 in copy_file_range (infd=8,
> pinoff=pinoff@entry=0x7f2cd53f54e8, outfd=outfd@entry=9,
> poutoff=poutoff@entry=0x0,
>     length=length@entry=8192, flags=flags@entry=0) at
> ../sysdeps/unix/sysv/linux/copy_file_range.c:28
> #1  0x00005555ecd077f4 in write_reconstructed_file
> (copy_mode=COPY_MODE_COPY_FILE_RANGE, dry_run=false, debug=true,
> checksum_ctx=0x7ffc4cdb7700,
>     offsetmap=<optimized out>, sourcemap=0x7f2cd54f6010,
> block_length=<optimized out>, output_filename=0x7ffc4cdba910
> "outtest/base/5/16427.29",
>     input_filename=0x7ffc4cdba510
> "incr.2/base/5/INCREMENTAL.16427.29") at reconstruct.c:648
> #2  reconstruct_from_incremental_file
> (input_filename=input_filename@entry=0x7ffc4cdba510
> "incr.2/base/5/INCREMENTAL.16427.29",
>     output_filename=output_filename@entry=0x7ffc4cdba910
> "outtest/base/5/16427.29",
> relative_path=relative_path@entry=0x7ffc4cdbc670 "base/5",
>     bare_file_name=bare_file_name@entry=0x5555ee2056ef "16427.29",
> n_prior_backups=n_prior_backups@entry=2,
>     prior_backup_dirs=prior_backup_dirs@entry=0x7ffc4cdbf248,
> manifests=0x5555ee137a10, manifest_path=0x7ffc4cdbad10
> "base/5/16427.29",
>     checksum_type=CHECKSUM_TYPE_NONE, checksum_length=0x7ffc4cdb9864,
> checksum_payload=0x7ffc4cdb9868, debug=true, dry_run=false,
>     copy_method=COPY_MODE_COPY_FILE_RANGE) at reconstruct.c:327
> 
> .. it's a spiral of death till ENOSPC. Reverting the
> v20240323-2-0002-write_reconstructed_file.patch helps. The problem
> lies in that do-wb=-inifity-loop (?) along with NULL for destination
> off_t. This seem to solves that thingy(?):
> 
> -                       do
> -                       {
> -                               wb = copy_file_range(s->fd,
> &offsetmap[i], wfd, NULL, BLCKSZ, 0);
> +                       //do
> +                       //{
> +                               wb = copy_file_range(s->fd,
> &offsetmap[i], wfd, &offsetmap[i], BLCKSZ, 0);
>                                 if (wb < 0)
>                                         pg_fatal("error while copying
> file range from \"%s\" to \"%s\": %m",
> 
> input_filename, output_filename);
> -                       } while (wb > 0);
> +                       //} while (wb > 0);
>  #else
> 
> ...so that way I've got it down to 5mins.
> 

Yeah, that retry logic is wrong. I ended up copying the check from the
"regular" copy branch, which simply bails out if copy_file_range returns
anything but the expected 8192.

I wonder if this should deal with partial writes, though. I mean, it's
allowed copy_file_range() copies only some of the bytes - I don't know
how often / in what situations that happens, though ... And if we want
to handle that for copy_file_range(), pwrite() needs the same treatment.

> 3. .. but onn startup I've got this after trying psql login: invalid
> page in block 0 of relation base/5/1259  . I've again reverted the
> v20240323-2-0002 to see if that helped for next-round of
> pg_combinebackup --manifest-checksums=NONE --copy-file-range and after
> 32mins of waiting it did help indeed: I was able to login and select
> counts worked and matched properly the data. I've reapplied the
> v20240323-2-0002 (with my fix to prevent that endless loop) and the
> issue was again(!) there. Probably it's related to the destination
> offset. I couldn't find more time to look on it today and the setup
> was big 100GB on slow device, so just letting You know as fast as
> possible.
> 

Can you see if you can still reproduce this with the attached version?

> 4. More efficiency is on the table option (optional patch node ; just
> for completeness; I dont think we have time for that? ): even if
> v20240323-2-0002 would work, the problem is that it would be sending
> syscall for every 8kB. We seem to be performing lots of per-8KB
> syscalls which hinder performance (both in copy_file_range and in
> normal copy):
> 
> pread64(8, ""..., 8192, 369115136)      = 8192 // 369115136 + 8192 =
> 369123328 (matches next pread offset)
> write(9, ""..., 8192)                   = 8192
> pread64(8, ""..., 8192, 369123328)      = 8192 // 369123328 + 8192 = 369131520
> write(9, ""..., 8192)                   = 8192
> pread64(8, ""..., 8192, 369131520)      = 8192 // and so on
> write(9, ""..., 8192)                   = 8192
> 
> Apparently there's no merging of adjacent IO/s, so pg_combinebackup
> wastes lots of time on issuing instead small syscalls but it could
> let's say do single pread/write (or even copy_file_range()). I think
> it was not evident in my earlier testing (200GB; 39min vs ~40s) as I
> had much smaller modifications in my incremental (think of 99% of
> static data).
> 

Yes, I've been thinking about exactly this optimization, but I think
we're way past proposing this for PG17. The changes that would require
in reconstruct_from_incremental_file are way too significant. Has to
wait for PG18 ;-)

I do think there's more on the table, as mentioned by Thomas a couple
days ago - maybe we shouldn't approach clone/copy_file_range merely as
an optimization to save time, it might be entirely reasonable to do this
simply to allow the filesystem to do CoW magic and save space (even if
we need to read the data and recalculate the checksum, which now
disables these copy methods).


> 5. I think we should change the subject with new patch revision, so
> that such functionality for incremental backups is not buried down in
> the pg_upgrade thread ;)
> 

OK.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From 13387b49b33cdb2a16c3d336368cd48c79f4dc76 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@2ndquadrant.com>
Date: Tue, 19 Mar 2024 15:16:29 +0100
Subject: [PATCH v20240326 1/2] pg_combinebackup - allow using
 clone/copy_file_range

---
 doc/src/sgml/ref/pg_combinebackup.sgml      |  34 ++++
 src/bin/pg_combinebackup/copy_file.c        | 166 ++++++++++++++++----
 src/bin/pg_combinebackup/copy_file.h        |  18 ++-
 src/bin/pg_combinebackup/pg_combinebackup.c |  18 ++-
 src/bin/pg_combinebackup/reconstruct.c      |   5 +-
 src/bin/pg_combinebackup/reconstruct.h      |   5 +-
 6 files changed, 206 insertions(+), 40 deletions(-)

diff --git a/doc/src/sgml/ref/pg_combinebackup.sgml b/doc/src/sgml/ref/pg_combinebackup.sgml
index 8a0a600c2b2..60a60e3fae6 100644
--- a/doc/src/sgml/ref/pg_combinebackup.sgml
+++ b/doc/src/sgml/ref/pg_combinebackup.sgml
@@ -185,6 +185,40 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--clone</option></term>
+      <listitem>
+       <para>
+        Use efficient file cloning (also known as <quote>reflinks</quote> on
+        some systems) instead of copying files to the new cluster.  This can
+        result in near-instantaneous copying of the data files, giving the
+        speed advantages of <option>-k</option>/<option>--link</option> while
+        leaving the old cluster untouched.
+       </para>
+
+       <para>
+        File cloning is only supported on some operating systems and file
+        systems.  If it is selected but not supported, the
+        <application>pg_combinebackup</application> run will error.  At present,
+        it is supported on Linux (kernel 4.5 or later) with Btrfs and XFS (on
+        file systems created with reflink support), and on macOS with APFS.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--copy-file-range</option></term>
+      <listitem>
+       <para>
+        Use the <function>copy_file_range</function> system call for efficient
+        copying.  On some file systems this gives results similar to
+        <option>--clone</option>, sharing physical disk blocks, while on others
+        it may still copy blocks, but do so via an optimized path.  At present,
+        it is supported on Linux and FreeBSD.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
        <term><option>-V</option></term>
        <term><option>--version</option></term>
diff --git a/src/bin/pg_combinebackup/copy_file.c b/src/bin/pg_combinebackup/copy_file.c
index e6d2423278a..a690ecb8e12 100644
--- a/src/bin/pg_combinebackup/copy_file.c
+++ b/src/bin/pg_combinebackup/copy_file.c
@@ -14,6 +14,7 @@
 #include <copyfile.h>
 #endif
 #include <fcntl.h>
+#include <limits.h>
 #include <sys/stat.h>
 #include <unistd.h>
 
@@ -24,6 +25,10 @@
 static void copy_file_blocks(const char *src, const char *dst,
 							 pg_checksum_context *checksum_ctx);
 
+static void copy_file_clone(const char *src, const char *dst);
+
+static void copy_file_by_range(const char *src, const char *dst);
+
 #ifdef WIN32
 static void copy_file_copyfile(const char *src, const char *dst);
 #endif
@@ -35,8 +40,11 @@ static void copy_file_copyfile(const char *src, const char *dst);
  */
 void
 copy_file(const char *src, const char *dst,
-		  pg_checksum_context *checksum_ctx, bool dry_run)
+		  pg_checksum_context *checksum_ctx, bool dry_run,
+		  CopyMode copy_mode)
 {
+	char   *strategy_name = NULL;
+
 	/*
 	 * In dry-run mode, we don't actually copy anything, nor do we read any
 	 * data from the source file, but we do verify that we can open it.
@@ -52,57 +60,86 @@ copy_file(const char *src, const char *dst,
 	}
 
 	/*
+	 *
+	 * If we need to compute a checksum, but the user perhaps requested
+	 * a special copy method that does not support this, fallback to the
+	 * default block-by-block copy. We don't want to fail if just one of
+	 * many files requires checksum, etc.
+	 *
 	 * If we don't need to compute a checksum, then we can use any special
 	 * operating system primitives that we know about to copy the file; this
-	 * may be quicker than a naive block copy.
+	 * may be quicker than a naive block copy. We only do this for WIN32.
+	 * On other operating systems the user has to explicitly specify one of
+	 * the available primitives - there may be multiple, we don't know which
+	 * are reliable/preferred.
 	 */
-	if (checksum_ctx->type == CHECKSUM_TYPE_NONE)
+	if (checksum_ctx->type != CHECKSUM_TYPE_NONE)
 	{
-		char	   *strategy_name = NULL;
-		void		(*strategy_implementation) (const char *, const char *) = NULL;
-
+		/* fallback to block-by-block copy */
+		copy_mode = COPY_MODE_COPY;
+	}
 #ifdef WIN32
-		strategy_name = "CopyFile";
-		strategy_implementation = copy_file_copyfile;
+	else
+	{
+		copy_mode = COPY_MODE_COPYFILE;
+	}
 #endif
 
-		if (strategy_name != NULL)
-		{
-			if (dry_run)
-				pg_log_debug("would copy \"%s\" to \"%s\" using strategy %s",
-							 src, dst, strategy_name);
-			else
-			{
-				pg_log_debug("copying \"%s\" to \"%s\" using strategy %s",
-							 src, dst, strategy_name);
-				(*strategy_implementation) (src, dst);
-			}
-			return;
-		}
+	/* Determine the name of the copy strategy for use in log messages. */
+	switch (copy_mode)
+	{
+		case COPY_MODE_CLONE:
+			strategy_name = "clone";
+			break;
+		case COPY_MODE_COPY:
+			/* leave NULL for simple block-by-block copy */
+			break;
+		case COPY_MODE_COPY_FILE_RANGE:
+			strategy_name = "copy_file_range";
+			break;
+#ifdef WIN32
+		case COPY_MODE_COPYFILE:
+			strategy_name = "CopyFile";
+			break;
+#endif
 	}
 
-	/*
-	 * Fall back to the simple approach of reading and writing all the blocks,
-	 * feeding them into the checksum context as we go.
-	 */
 	if (dry_run)
 	{
-		if (checksum_ctx->type == CHECKSUM_TYPE_NONE)
+		if (strategy_name)
+			pg_log_debug("would copy \"%s\" to \"%s\" using strategy %s",
+						 src, dst, strategy_name);
+		else
 			pg_log_debug("would copy \"%s\" to \"%s\"",
 						 src, dst);
-		else
-			pg_log_debug("would copy \"%s\" to \"%s\" and checksum with %s",
-						 src, dst, pg_checksum_type_name(checksum_ctx->type));
 	}
 	else
 	{
-		if (checksum_ctx->type == CHECKSUM_TYPE_NONE)
+
+		if (strategy_name)
+			pg_log_debug("copying \"%s\" to \"%s\" using strategy %s",
+						 src, dst, strategy_name);
+		else
 			pg_log_debug("copying \"%s\" to \"%s\"",
 						 src, dst);
-		else
-			pg_log_debug("copying \"%s\" to \"%s\" and checksumming with %s",
-						 src, dst, pg_checksum_type_name(checksum_ctx->type));
-		copy_file_blocks(src, dst, checksum_ctx);
+
+		switch (copy_mode)
+		{
+			case COPY_MODE_CLONE:
+				copy_file_clone(src, dst);
+				break;
+			case COPY_MODE_COPY:
+				copy_file_blocks(src, dst, checksum_ctx);
+				break;
+			case COPY_MODE_COPY_FILE_RANGE:
+				copy_file_by_range(src, dst);
+				break;
+#ifdef WIN32
+			case COPY_MODE_COPYFILE:
+				copy_file_copyfile(src, dst);
+				break;
+#endif
+		}
 	}
 }
 
@@ -156,6 +193,67 @@ copy_file_blocks(const char *src, const char *dst,
 	close(dest_fd);
 }
 
+static void
+copy_file_clone(const char *src, const char *dest)
+{
+#if defined(HAVE_COPYFILE) && defined(COPYFILE_CLONE_FORCE)
+	if (copyfile(src, dest, NULL, COPYFILE_CLONE_FORCE) < 0)
+		pg_fatal("error while cloning file \"%s\" to \"%s\": %m", src, dest);
+#elif defined(__linux__) && defined(FICLONE)
+	{
+		if ((src_fd = open(src, O_RDONLY | PG_BINARY, 0)) < 0)
+			pg_fatal("could not open file \"%s\": %m", src);
+
+		if ((dest_fd = open(dest, O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
+							pg_file_create_mode)) < 0)
+			pg_fatal("could not create file \"%s\": %m", dest);
+
+		if (ioctl(dest_fd, FICLONE, src_fd) < 0)
+		{
+			int			save_errno = errno;
+
+			unlink(dest);
+
+			pg_fatal("error while cloning file \"%s\" to \"%s\": %s",
+					 src, dest);
+		}
+	}
+#else
+	pg_fatal("file cloning not supported on this platform");
+#endif
+}
+
+static void
+copy_file_by_range(const char *src, const char *dest)
+{
+#if defined(HAVE_COPY_FILE_RANGE)
+	int			src_fd;
+	int			dest_fd;
+	ssize_t		nbytes;
+
+	if ((src_fd = open(src, O_RDONLY | PG_BINARY, 0)) < 0)
+		pg_fatal("could not open file \"%s\": %m", src);
+
+	if ((dest_fd = open(dest, O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
+						pg_file_create_mode)) < 0)
+		pg_fatal("could not create file \"%s\": %m", dest);
+
+	do
+	{
+		nbytes = copy_file_range(src_fd, NULL, dest_fd, NULL, SSIZE_MAX, 0);
+		if (nbytes < 0)
+			pg_fatal("error while copying file range from \"%s\" to \"%s\": %m",
+					 src, dest);
+	} while (nbytes > 0);
+
+	close(src_fd);
+	close(dest_fd);
+#else
+	pg_fatal("copy_file_range not supported on this platform");
+#endif
+}
+
+/* XXX maybe this should do the check internally, same as the other functions? */
 #ifdef WIN32
 static void
 copy_file_copyfile(const char *src, const char *dst)
diff --git a/src/bin/pg_combinebackup/copy_file.h b/src/bin/pg_combinebackup/copy_file.h
index 0f6bc09403f..3a1c5eb764f 100644
--- a/src/bin/pg_combinebackup/copy_file.h
+++ b/src/bin/pg_combinebackup/copy_file.h
@@ -11,9 +11,25 @@
 #ifndef COPY_FILE_H
 #define COPY_FILE_H
 
+#include "c.h"
 #include "common/checksum_helper.h"
+#include "common/file_utils.h"
+
+/*
+ * Enumeration to denote copy modes
+ */
+typedef enum CopyMode
+{
+	COPY_MODE_CLONE,
+	COPY_MODE_COPY,
+	COPY_MODE_COPY_FILE_RANGE,
+#ifdef WIN32
+	COPY_MODE_COPYFILE,
+#endif
+} CopyMode;
 
 extern void copy_file(const char *src, const char *dst,
-					  pg_checksum_context *checksum_ctx, bool dry_run);
+					  pg_checksum_context *checksum_ctx, bool dry_run,
+					  CopyMode copy_mode);
 
 #endif							/* COPY_FILE_H */
diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c
index 74f8be9eeac..b6e1e62e160 100644
--- a/src/bin/pg_combinebackup/pg_combinebackup.c
+++ b/src/bin/pg_combinebackup/pg_combinebackup.c
@@ -69,6 +69,7 @@ typedef struct cb_options
 	pg_checksum_type manifest_checksums;
 	bool		no_manifest;
 	DataDirSyncMethod sync_method;
+	CopyMode	copy_method;
 } cb_options;
 
 /*
@@ -129,6 +130,8 @@ main(int argc, char *argv[])
 		{"manifest-checksums", required_argument, NULL, 1},
 		{"no-manifest", no_argument, NULL, 2},
 		{"sync-method", required_argument, NULL, 3},
+		{"clone", no_argument, NULL, 4},
+		{"copy-file-range", no_argument, NULL, 5},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -156,6 +159,7 @@ main(int argc, char *argv[])
 	memset(&opt, 0, sizeof(opt));
 	opt.manifest_checksums = CHECKSUM_TYPE_CRC32C;
 	opt.sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
+	opt.copy_method = COPY_MODE_COPY;
 
 	/* process command-line options */
 	while ((c = getopt_long(argc, argv, "dnNPo:T:",
@@ -192,6 +196,12 @@ main(int argc, char *argv[])
 				if (!parse_sync_method(optarg, &opt.sync_method))
 					exit(1);
 				break;
+			case 4:
+				opt.copy_method = COPY_MODE_CLONE;
+				break;
+			case 5:
+				opt.copy_method = COPY_MODE_COPY_FILE_RANGE;
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -696,6 +706,8 @@ help(const char *progname)
 			 "                            use algorithm for manifest checksums\n"));
 	printf(_("      --no-manifest         suppress generation of backup manifest\n"));
 	printf(_("      --sync-method=METHOD  set method for syncing files to disk\n"));
+	printf(_("      --clone               clone (reflink) instead of copying files\n"));
+	printf(_("      --copy-file-range     copy using copy_file_range() syscall\n"));
 	printf(_("  -?, --help                show this help, then exit\n"));
 
 	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
@@ -937,7 +949,8 @@ process_directory_recursively(Oid tsoid,
 											  &checksum_length,
 											  &checksum_payload,
 											  opt->debug,
-											  opt->dry_run);
+											  opt->dry_run,
+											  opt->copy_method);
 		}
 		else
 		{
@@ -993,7 +1006,8 @@ process_directory_recursively(Oid tsoid,
 
 			/* Actually copy the file. */
 			snprintf(ofullpath, MAXPGPATH, "%s/%s", ofulldir, de->d_name);
-			copy_file(ifullpath, ofullpath, &checksum_ctx, opt->dry_run);
+			copy_file(ifullpath, ofullpath, &checksum_ctx, opt->dry_run,
+					  opt->copy_method);
 
 			/*
 			 * If copy_file() performed a checksum calculation for us, then
diff --git a/src/bin/pg_combinebackup/reconstruct.c b/src/bin/pg_combinebackup/reconstruct.c
index 41f06bb26b5..f5c7af8a23c 100644
--- a/src/bin/pg_combinebackup/reconstruct.c
+++ b/src/bin/pg_combinebackup/reconstruct.c
@@ -90,7 +90,8 @@ reconstruct_from_incremental_file(char *input_filename,
 								  int *checksum_length,
 								  uint8 **checksum_payload,
 								  bool debug,
-								  bool dry_run)
+								  bool dry_run,
+								  CopyMode copy_method)
 {
 	rfile	  **source;
 	rfile	   *latest_source = NULL;
@@ -319,7 +320,7 @@ reconstruct_from_incremental_file(char *input_filename,
 	 */
 	if (copy_source != NULL)
 		copy_file(copy_source->filename, output_filename,
-				  &checksum_ctx, dry_run);
+				  &checksum_ctx, dry_run, copy_method);
 	else
 	{
 		write_reconstructed_file(input_filename, output_filename,
diff --git a/src/bin/pg_combinebackup/reconstruct.h b/src/bin/pg_combinebackup/reconstruct.h
index 8e33a8a95a0..726f94389f3 100644
--- a/src/bin/pg_combinebackup/reconstruct.h
+++ b/src/bin/pg_combinebackup/reconstruct.h
@@ -13,7 +13,9 @@
 #ifndef RECONSTRUCT_H
 #define RECONSTRUCT_H
 
+#include "c.h"
 #include "common/checksum_helper.h"
+#include "common/file_utils.h"
 #include "load_manifest.h"
 
 extern void reconstruct_from_incremental_file(char *input_filename,
@@ -28,6 +30,7 @@ extern void reconstruct_from_incremental_file(char *input_filename,
 											  int *checksum_length,
 											  uint8 **checksum_payload,
 											  bool debug,
-											  bool dry_run);
+											  bool dry_run,
+											  CopyMode copy_mode);
 
 #endif
-- 
2.44.0

From ccd879c90da8c5383d997a4d0a5188d2497313f9 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@2ndquadrant.com>
Date: Sat, 23 Mar 2024 18:26:21 +0100
Subject: [PATCH v20240326 2/2] write_reconstructed_file

---
 src/bin/pg_combinebackup/reconstruct.c | 33 +++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_combinebackup/reconstruct.c b/src/bin/pg_combinebackup/reconstruct.c
index f5c7af8a23c..eec7860f0e3 100644
--- a/src/bin/pg_combinebackup/reconstruct.c
+++ b/src/bin/pg_combinebackup/reconstruct.c
@@ -59,7 +59,8 @@ static void write_reconstructed_file(char *input_filename,
 									 off_t *offsetmap,
 									 pg_checksum_context *checksum_ctx,
 									 bool debug,
-									 bool dry_run);
+									 bool dry_run,
+									 CopyMode copy_mode);
 static void read_bytes(rfile *rf, void *buffer, unsigned length);
 
 /*
@@ -325,7 +326,8 @@ reconstruct_from_incremental_file(char *input_filename,
 	{
 		write_reconstructed_file(input_filename, output_filename,
 								 block_length, sourcemap, offsetmap,
-								 &checksum_ctx, debug, dry_run);
+								 &checksum_ctx, debug, dry_run,
+								 copy_method);
 		debug_reconstruction(n_prior_backups + 1, source, dry_run);
 	}
 
@@ -528,7 +530,8 @@ write_reconstructed_file(char *input_filename,
 						 off_t *offsetmap,
 						 pg_checksum_context *checksum_ctx,
 						 bool debug,
-						 bool dry_run)
+						 bool dry_run,
+						 CopyMode copy_mode)
 {
 	int			wfd = -1;
 	unsigned	i;
@@ -630,6 +633,30 @@ write_reconstructed_file(char *input_filename,
 		if (dry_run)
 			continue;
 
+		/*
+		 * If requested, copy the block using copy_file_range.
+		 *
+		 * We can'd do this if the block needs to be zero-filled or when we
+		 * need to update checksum.
+		 */
+		if ((copy_mode == COPY_MODE_COPY_FILE_RANGE) &&
+			(s != NULL) && (checksum_ctx->type == CHECKSUM_TYPE_NONE))
+		{
+#if defined(HAVE_COPY_FILE_RANGE)
+			wb = copy_file_range(s->fd, &offsetmap[i], wfd, NULL, BLCKSZ, 0);
+
+			if (wb < 0)
+				pg_fatal("error while copying file range from \"%s\" to \"%s\": %m",
+						 input_filename, output_filename);
+			else if (wb != BLCKSZ)
+				pg_fatal("could not write file \"%s\": wrote only %d of %d bytes",
+						 output_filename, wb, BLCKSZ);
+#else
+			pg_fatal("copy_file_range not supported on this platform");
+#endif
+			continue;
+		}
+
 		/* Read or zero-fill the block as appropriate. */
 		if (s == NULL)
 		{
-- 
2.44.0

Reply via email to