Hi,

On Thu, 9 May 2024 at 19:29, Robert Haas <robertmh...@gmail.com> wrote:
>
> On Wed, May 8, 2024 at 10:34 AM Nazir Bilal Yavuz <byavu...@gmail.com> wrote:
> > > I'm not so sure about the GUC name. On the one hand, it feels like
> > > createdb should be spelled out as create_database, but on the other
> > > hand, the GUC name is quite long already. Then again, why would we
> > > make this specific to CREATE DATABASE in the first place? Would we
> > > also want alter_tablespace_file_copy_method and
> > > basic_archive.file_copy_method?
> >
> > I agree that it is already quite long, because of that I chose the
> > createdb as a prefix. I did not think that file cloning was planned to
> > be used in other places. If that is the case, does something like
> > 'preferred_copy_method' work? Then, we would mention which places will
> > be affected with this GUC in the docs.
>
> I would go with file_copy_method rather than preferred_copy_method,
> because (1) there's nothing "preferred" about it if we're using it
> unconditionally and (2) it's nice to clarify that we're talking about
> copying files rather than anything else.

Done.

>
> My personal enthusiasm for making platform-specific file copy methods
> usable all over PostgreSQL is quite limited. However, it is my
> observation that other people seem far more enthusiastic about it than
> I am. For example, consider how quickly it got added to
> pg_combinebackup. So, I suspect it's smart to plan on anything we add
> in this area getting used in a bunch of places. And perhaps it is even
> best to think about making it work in all of those places right from
> the start. If we build support into copydir and copy_file, then we
> just get everything that uses those, and all that remains is to
> document was is covered (and add comments so that future patch authors
> know they should further update the documentation).

I updated the documentation and put a comment on top of the copydir()
function to inform that further changes and uses of this function may
require documentation updates.

I also changed O_RDWR to O_WRONLY flag in the clone_file() function
based on Raniers' feedback. Also, does this feature need tests? I
thought about possible test cases but since this feature requires
specific file systems to run, I could not find any.

v6 is attached.

--
Regards,
Nazir Bilal Yavuz
Microsoft
From 33fcb21730683c628f81451e21a0cf2455fd41be Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavu...@gmail.com>
Date: Thu, 16 May 2024 15:22:46 +0300
Subject: [PATCH v6] Introduce file_copy_method GUC

This GUC can be set to either COPY (default) or CLONE (if system
supports it).

If CLONE option is chosen, similar to COPY; but attempting to use
efficient file copying system calls.  The kernel has the opportunity to
share block ranges in copy-on-write file systems, or maybe push down
the copy to network file systems and storage devices.

Currently works on Linux, FreeBSD and macOS.  More systems could be
supported.

Author: Thomas Munro <thomas.mu...@gmail.com>
Author: Nazir Bilal Yavuz <byavu...@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGLM%2Bt%2BSwBU-cHeMUXJCOgBxSHLGZutV5zCwY4qrCcE02w%40mail.gmail.com
---
 src/include/storage/copydir.h                 |  9 ++
 src/backend/storage/file/copydir.c            | 98 ++++++++++++++++++-
 .../utils/activity/wait_event_names.txt       |  1 +
 src/backend/utils/misc/guc_tables.c           | 12 +++
 src/backend/utils/misc/postgresql.conf.sample |  5 +
 doc/src/sgml/config.sgml                      | 35 +++++++
 doc/src/sgml/ref/alter_database.sgml          |  3 +-
 doc/src/sgml/ref/create_database.sgml         |  4 +-
 src/tools/pgindent/typedefs.list              |  1 +
 9 files changed, 165 insertions(+), 3 deletions(-)

diff --git a/src/include/storage/copydir.h b/src/include/storage/copydir.h
index a25e258f479..6edc3ea4f69 100644
--- a/src/include/storage/copydir.h
+++ b/src/include/storage/copydir.h
@@ -13,6 +13,15 @@
 #ifndef COPYDIR_H
 #define COPYDIR_H
 
+typedef enum FileCopyMethod
+{
+	FILE_COPY_METHOD_COPY,
+	FILE_COPY_METHOD_CLONE,
+} FileCopyMethod;
+
+/* GUC parameters */
+extern PGDLLIMPORT int file_copy_method;
+
 extern void copydir(const char *fromdir, const char *todir, bool recurse);
 extern void copy_file(const char *fromfile, const char *tofile);
 
diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index d4fbe542077..48e756ce284 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -21,17 +21,42 @@
 #include <fcntl.h>
 #include <unistd.h>
 
+#ifdef HAVE_COPYFILE_H
+#include <copyfile.h>
+#endif
+
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/copydir.h"
 #include "storage/fd.h"
+#include "utils/guc.h"
+
+/* GUCs */
+int			file_copy_method = FILE_COPY_METHOD_COPY;
+
+/*
+ * GUC support
+ */
+const struct config_enum_entry file_copy_method_options[] = {
+	{"copy", FILE_COPY_METHOD_COPY, false},
+#if (defined(HAVE_COPYFILE) && defined(COPYFILE_CLONE_FORCE)) || defined(HAVE_COPY_FILE_RANGE)
+	{"clone", FILE_COPY_METHOD_CLONE, false},
+#endif
+	{NULL, 0, false}
+};
+
+static void clone_file(const char *fromfile, const char *tofile);
 
 /*
  * copydir: copy a directory
  *
  * If recurse is false, subdirectories are ignored.  Anything that's not
  * a directory or a regular file is ignored.
+ *
+ * This function uses a file_copy_method GUC to determine copy method.
+ * Further changes and uses of this function may require documentation
+ * updates.
  */
 void
 copydir(const char *fromdir, const char *todir, bool recurse)
@@ -71,7 +96,12 @@ copydir(const char *fromdir, const char *todir, bool recurse)
 				copydir(fromfile, tofile, true);
 		}
 		else if (xlde_type == PGFILETYPE_REG)
-			copy_file(fromfile, tofile);
+		{
+			if (file_copy_method == FILE_COPY_METHOD_CLONE)
+				clone_file(fromfile, tofile);
+			else
+				copy_file(fromfile, tofile);
+		}
 	}
 	FreeDir(xldir);
 
@@ -214,3 +244,69 @@ copy_file(const char *fromfile, const char *tofile)
 
 	pfree(buffer);
 }
+
+/*
+ * clone one file
+ */
+static void
+clone_file(const char *fromfile, const char *tofile)
+{
+#if defined(HAVE_COPYFILE) && defined(COPYFILE_CLONE_FORCE)
+	if (copyfile(fromfile, tofile, NULL, COPYFILE_CLONE_FORCE) < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not clone file \"%s\" to \"%s\": %m",
+						fromfile, tofile)));
+#elif defined(HAVE_COPY_FILE_RANGE)
+	int			srcfd;
+	int			dstfd;
+	ssize_t		nbytes;
+
+	srcfd = OpenTransientFile(fromfile, O_RDONLY | PG_BINARY);
+	if (srcfd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", fromfile)));
+
+	dstfd = OpenTransientFile(tofile, O_WRONLY | O_CREAT | O_EXCL | PG_BINARY);
+	if (dstfd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not create file \"%s\": %m", tofile)));
+
+	do
+	{
+		/* If we got a cancel signal during the copy of the file, quit */
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Don't copy too much at once, so we can check for interrupts from
+		 * time to time if this falls back to a slow copy.
+		 */
+		pgstat_report_wait_start(WAIT_EVENT_COPY_FILE_COPY);
+		nbytes = copy_file_range(srcfd, NULL, dstfd, NULL, 1024 * 1024, 0);
+		if (nbytes < 0 && errno != EINTR)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not clone file \"%s\" to \"%s\": %m",
+							fromfile, tofile)));
+		pgstat_report_wait_end();
+	}
+	while (nbytes != 0);
+
+	if (CloseTransientFile(dstfd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", tofile)));
+
+	if (CloseTransientFile(srcfd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", fromfile)));
+#else
+	/*
+	 * If there is no CLONE support, clone_file() should not be called.
+	 */
+	pg_unreachable();
+#endif
+}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 87cbca28118..91c3d644bec 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -201,6 +201,7 @@ CONTROL_FILE_SYNC	"Waiting for the <filename>pg_control</filename> file to reach
 CONTROL_FILE_SYNC_UPDATE	"Waiting for an update to the <filename>pg_control</filename> file to reach durable storage."
 CONTROL_FILE_WRITE	"Waiting for a write to the <filename>pg_control</filename> file."
 CONTROL_FILE_WRITE_UPDATE	"Waiting for a write to update the <filename>pg_control</filename> file."
+COPY_FILE_COPY	"Waiting for a file copy operation."
 COPY_FILE_READ	"Waiting for a read during a file copy operation."
 COPY_FILE_WRITE	"Waiting for a write during a file copy operation."
 DATA_FILE_EXTEND	"Waiting for a relation data file to be extended."
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index ea2b0577bc6..3f528846e96 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -71,6 +71,7 @@
 #include "replication/slotsync.h"
 #include "replication/syncrep.h"
 #include "storage/bufmgr.h"
+#include "storage/copydir.h"
 #include "storage/large_object.h"
 #include "storage/pg_shmem.h"
 #include "storage/predicate.h"
@@ -491,6 +492,7 @@ extern const struct config_enum_entry archive_mode_options[];
 extern const struct config_enum_entry recovery_target_action_options[];
 extern const struct config_enum_entry wal_sync_method_options[];
 extern const struct config_enum_entry dynamic_shared_memory_options[];
+extern const struct config_enum_entry file_copy_method_options[];
 
 /*
  * GUC option variables that are exported from this module
@@ -4974,6 +4976,16 @@ struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"file_copy_method", PGC_POSTMASTER, RESOURCES_DISK,
+			gettext_noop("Selects the file copy method."),
+			NULL
+		},
+		&file_copy_method,
+		FILE_COPY_METHOD_COPY, file_copy_method_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"wal_sync_method", PGC_SIGHUP, WAL_SETTINGS,
 			gettext_noop("Selects the method used for forcing WAL updates to disk."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 83d5df8e460..208cbbb338a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -178,6 +178,11 @@
 #max_notify_queue_pages = 1048576	# limits the number of SLRU pages allocated
 					# for NOTIFY / LISTEN queue
 
+#file_copy_method = copy	# the default is the first option
+					# 	copy
+					# 	clone (if your system supports)
+					# (change requires restart)
+
 # - Kernel Resources -
 
 #max_files_per_process = 1000		# min 64
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3777cfa52a7..1b876cb5115 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2280,6 +2280,41 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc_file_copy_method" xreflabel="file_copy_method">
+      <term><varname>file_copy_method</varname> (<type>enum</type>)
+      <indexterm>
+       <primary><varname>file_copy_method</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the copy method that will be used while copying files.
+        Possible values are <literal>COPY</literal> (default) and
+        <literal>CLONE</literal> (if your system supports).  This option
+        currently effects <literal>FILE_COPY</literal> strategy in
+        <command>CREATE DATABASE ...</command> and <command> ALTER DATABASE ...
+        SET TABLESPACE ... </command> commands.  This parameter can only be set
+        at server start.
+       </para>
+
+       <para>
+        The <literal>CLONE</literal> method works the same way as
+        <literal>COPY</literal> method, except that it uses efficient file
+        cloning (also known as <quote>reflinks</quote> on
+        some systems) instead of copying files to the new data directory,
+        which can result in near-instantaneous copying of the data files.
+       </para>
+
+       <para>
+        File cloning is only supported on some operating systems and file
+        systems.  If it is selected but not supported, the server will error
+        and not start.  At present, it is supported on Linux (kernel 4.5 or
+        later) with Btrfs and XFS (on file systems created with reflink
+        support), and on macOS with APFS.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-notify-queue-pages" xreflabel="max_notify_queue_pages">
       <term><varname>max_notify_queue_pages</varname> (<type>integer</type>)
       <indexterm>
diff --git a/doc/src/sgml/ref/alter_database.sgml b/doc/src/sgml/ref/alter_database.sgml
index 2479c41e8d6..9d8ec677555 100644
--- a/doc/src/sgml/ref/alter_database.sgml
+++ b/doc/src/sgml/ref/alter_database.sgml
@@ -82,7 +82,8 @@ ALTER DATABASE <replaceable class="parameter">name</replaceable> RESET ALL
    default tablespace to the new tablespace.  The new default tablespace
    must be empty for this database, and no one can be connected to
    the database.  Tables and indexes in non-default tablespaces are
-   unaffected.
+   unaffected.  The copy method used while moving could be changed by
+   <xref linkend="guc_file_copy_method"/> option.
   </para>
 
   <para>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 7653cb902ee..62f57eb4c32 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -138,7 +138,9 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
         log volume substantially, especially if the template database is large,
         it also forces the system to perform a checkpoint both before and
         after the creation of the new database. In some situations, this may
-        have a noticeable negative impact on overall system performance.
+        have a noticeable negative impact on overall system performance. The
+        method used in <literal>FILE_COPY</literal> strategy could be changed
+        by <xref linkend="guc_file_copy_method"/> option.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 2b83c340fb7..204bc7ae69e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -769,6 +769,7 @@ FieldSelect
 FieldStore
 File
 FileBackupMethod
+FileCopyMethod
 FileFdwExecutionState
 FileFdwPlanState
 FileNameMap
-- 
2.43.0

Reply via email to