Here is an attempt at creating something that can be back-patched.  0001
simply replaces calls to durable_rename_excl() with durable_rename() and is
intended to be back-patched.  0002 removes the definition of
durable_rename_excl() and is _not_ intended for back-patching.  I imagine
0002 will need to be held back for v16devel.

I think back-patching 0001 will encounter a couple of small obstacles.  For
example, the call in basic_archive won't exist on most of the
back-branches, and durable_rename_excl() was named durable_link_or_rename()
before v13.  I don't mind producing a patch for each back-branch if needed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From d489c2bff029db6e07e5028788faf869c35f886b Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandboss...@gmail.com>
Date: Tue, 26 Apr 2022 11:56:50 -0700
Subject: [PATCH v3 1/2] Replace calls to durable_rename_excl() with
 durable_rename().

durable_rename_excl() attempts to avoid overwriting any existing
files by using link() and unlink(), but it falls back to rename()
on some platforms (e.g., Windows), which offers no such overwrite
protection.  Most callers use durable_rename_excl() just in case
there is an existing file, but in practice there shouldn't be one.
basic_archive used it to avoid overwriting an archive concurrently
created by another server, but as mentioned above, it will still
overwrite files on some platforms.

Furthermore, failures during durable_rename_excl() can result in
multiple hard links to the same file.  My testing demonstrated that
it was possible to end up with two links to the same file in pg_wal
after a crash just before unlink() during WAL recycling.
Specifically, the test produced links to the same file for the
current WAL file and the next one because the half-recycled WAL
file was re-recycled upon restarting.  This seems likely to lead to
WAL corruption.

This change replaces all calls to durable_rename_excl() with
durable_rename().  This removes the protection against
accidentally overwriting an existing file, but some platforms are
already living without it, and ordinarily there shouldn't be one.
The function itself is left around in case any extensions are using
it.  It will be removed in v16 via a follow-up commit.

Back-patch to all supported versions.  Before v13,
durable_rename_excl() was named durable_link_or_rename().

Author: Nathan Bossart
Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/20220418182336.GA2298576%40nathanxps13
---
 contrib/basic_archive/basic_archive.c |  5 +++--
 src/backend/access/transam/timeline.c | 14 ++------------
 src/backend/access/transam/xlog.c     |  8 ++------
 3 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
index e7efbfb9c3..ed33854c57 100644
--- a/contrib/basic_archive/basic_archive.c
+++ b/contrib/basic_archive/basic_archive.c
@@ -281,9 +281,10 @@ basic_archive_file_internal(const char *file, const char *path)
 
 	/*
 	 * Sync the temporary file to disk and move it to its final destination.
-	 * This will fail if destination already exists.
+	 * Note that this will overwrite any existing file, but this is only
+	 * possible if someone else created the file since the stat() above.
 	 */
-	(void) durable_rename_excl(temp, destination, ERROR);
+	(void) durable_rename(temp, destination, ERROR);
 
 	ereport(DEBUG1,
 			(errmsg("archived \"%s\" via basic_archive", file)));
diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index be21968293..128f754e87 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -441,12 +441,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 	 * Now move the completed history file into place with its final name.
 	 */
 	TLHistoryFilePath(path, newTLI);
-
-	/*
-	 * Perform the rename using link if available, paranoidly trying to avoid
-	 * overwriting an existing file (there shouldn't be one).
-	 */
-	durable_rename_excl(tmppath, path, ERROR);
+	durable_rename(tmppath, path, ERROR);
 
 	/* The history file can be archived immediately. */
 	if (XLogArchivingActive())
@@ -519,12 +514,7 @@ writeTimeLineHistoryFile(TimeLineID tli, char *content, int size)
 	 * Now move the completed history file into place with its final name.
 	 */
 	TLHistoryFilePath(path, tli);
-
-	/*
-	 * Perform the rename using link if available, paranoidly trying to avoid
-	 * overwriting an existing file (there shouldn't be one).
-	 */
-	durable_rename_excl(tmppath, path, ERROR);
+	durable_rename(tmppath, path, ERROR);
 }
 
 /*
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61cda56c6f..f49194a8b5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3323,14 +3323,10 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 		}
 	}
 
-	/*
-	 * Perform the rename using link if available, paranoidly trying to avoid
-	 * overwriting an existing file (there shouldn't be one).
-	 */
-	if (durable_rename_excl(tmppath, path, LOG) != 0)
+	if (durable_rename(tmppath, path, LOG) != 0)
 	{
 		LWLockRelease(ControlFileLock);
-		/* durable_rename_excl already emitted log message */
+		/* durable_rename already emitted log message */
 		return false;
 	}
 
-- 
2.25.1

>From 398350968f35f0974f1668e06be1adad4a7f7e3c Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandboss...@gmail.com>
Date: Tue, 26 Apr 2022 12:38:23 -0700
Subject: [PATCH v3 2/2] Remove durable_rename_excl().

A previous commit replaced all calls to this function with
durable_rename(), but the function itself was not removed in back-
branches since extensions may use it.  This change removes the
function from v16devel.

Do not back-patch.

Author: Nathan Bossart
Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/20220418182336.GA2298576%40nathanxps13
---
 src/backend/storage/file/fd.c  | 63 ----------------------------------
 src/include/pg_config_manual.h |  7 ----
 src/include/storage/fd.h       |  1 -
 3 files changed, 71 deletions(-)

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 24704b6a02..f904f60c08 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -807,69 +807,6 @@ durable_unlink(const char *fname, int elevel)
 	return 0;
 }
 
-/*
- * durable_rename_excl -- rename a file in a durable manner.
- *
- * Similar to durable_rename(), except that this routine tries (but does not
- * guarantee) not to overwrite the target file.
- *
- * Note that a crash in an unfortunate moment can leave you with two links to
- * the target file.
- *
- * Log errors with the caller specified severity.
- *
- * On Windows, using a hard link followed by unlink() causes concurrency
- * issues, while a simple rename() does not cause that, so be careful when
- * changing the logic of this routine.
- *
- * Returns 0 if the operation succeeded, -1 otherwise. Note that errno is not
- * valid upon return.
- */
-int
-durable_rename_excl(const char *oldfile, const char *newfile, int elevel)
-{
-	/*
-	 * Ensure that, if we crash directly after the rename/link, a file with
-	 * valid contents is moved into place.
-	 */
-	if (fsync_fname_ext(oldfile, false, false, elevel) != 0)
-		return -1;
-
-#ifdef HAVE_WORKING_LINK
-	if (link(oldfile, newfile) < 0)
-	{
-		ereport(elevel,
-				(errcode_for_file_access(),
-				 errmsg("could not link file \"%s\" to \"%s\": %m",
-						oldfile, newfile)));
-		return -1;
-	}
-	unlink(oldfile);
-#else
-	if (rename(oldfile, newfile) < 0)
-	{
-		ereport(elevel,
-				(errcode_for_file_access(),
-				 errmsg("could not rename file \"%s\" to \"%s\": %m",
-						oldfile, newfile)));
-		return -1;
-	}
-#endif
-
-	/*
-	 * Make change persistent in case of an OS crash, both the new entry and
-	 * its parent directory need to be flushed.
-	 */
-	if (fsync_fname_ext(newfile, false, false, elevel) != 0)
-		return -1;
-
-	/* Same for parent directory */
-	if (fsync_parent_path(newfile, elevel) != 0)
-		return -1;
-
-	return 0;
-}
-
 /*
  * InitFileAccess --- initialize this module during backend startup
  *
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 84ce5a4a5d..830804fdfb 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -163,13 +163,6 @@
 #define USE_BARRIER_SMGRRELEASE
 #endif
 
-/*
- * Define this if your operating system supports link()
- */
-#if !defined(WIN32) && !defined(__CYGWIN__)
-#define HAVE_WORKING_LINK 1
-#endif
-
 /*
  * USE_POSIX_FADVISE controls whether Postgres will attempt to use the
  * posix_fadvise() kernel call.  Usually the automatic configure tests are
diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h
index 69549b000f..2b4a8e0ffe 100644
--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -187,7 +187,6 @@ extern void fsync_fname(const char *fname, bool isdir);
 extern int	fsync_fname_ext(const char *fname, bool isdir, bool ignore_perm, int elevel);
 extern int	durable_rename(const char *oldfile, const char *newfile, int loglevel);
 extern int	durable_unlink(const char *fname, int loglevel);
-extern int	durable_rename_excl(const char *oldfile, const char *newfile, int loglevel);
 extern void SyncDataDirectory(void);
 extern int	data_sync_elevel(int elevel);
 
-- 
2.25.1

Reply via email to