On Mon, May 02, 2022 at 10:39:07AM -0700, Nathan Bossart wrote:
> On Mon, May 02, 2022 at 07:48:18PM +0900, Michael Paquier wrote:
>> The WAL receiver upgrades the ERROR to a FATAL, and restarts
>> streaming shortly after.  Using durable_rename() would not be an issue
>> here.
> 
> Thanks for investigating this one.  I think I agree that we should simply
> switch to durable_rename() (without a file existence check beforehand).

Here is a new patch set.  For now, I've only removed the file existence
check in writeTimeLineHistoryFile().  I don't know if I'm totally convinced
that there isn't a problem here (e.g., due to concurrent .ready file
creation), but since some platforms have been using rename() for some time,
I don't know how worried we should be.  I thought about adding some kind of
locking between the WAL receiver and startup processes, but that seems
excessive.  Alternatively, we could just fix xlog.c as proposed earlier
[0].  AFAICT that is the only caller that can experience problems due to
the multiple-hard-link issue.  All other callers are simply renaming a
temporary file into place, and the temporary file can be discarded if left
behind after a crash.

[0] https://postgr.es/m/20220407182954.GA1231544%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 312860b4dc3794428c1e31528c05cf4b1ad06f00 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandboss...@gmail.com>
Date: Tue, 26 Apr 2022 11:56:50 -0700
Subject: [PATCH v5 1/2] Replace calls to durable_rename_excl() with
 durable_rename().

durable_rename_excl() attempts to avoid overwriting any existing
files by using link() and unlink(), but it falls back to rename()
on some platforms (e.g., Windows), which offers no such overwrite
protection.  Most callers use durable_rename_excl() just in case
there is an existing file, but in practice there shouldn't be one.
basic_archive used it to avoid overwriting an archive concurrently
created by another server, but as mentioned above, it will still
overwrite files on some platforms.

Furthermore, failures during durable_rename_excl() can result in
multiple hard links to the same file.  My testing demonstrated that
it was possible to end up with two links to the same file in pg_wal
after a crash just before unlink() during WAL recycling.
Specifically, the test produced links to the same file for the
current WAL file and the next one because the half-recycled WAL
file was re-recycled upon restarting.  This seems likely to lead to
WAL corruption.

This change replaces all calls to durable_rename_excl() with
durable_rename().  This removes the protection against
accidentally overwriting an existing file, but some platforms are
already living without it, and ordinarily there shouldn't be one.
The function itself is left around in case any extensions are using
it.  It will be removed in v16 via a follow-up commit.

Back-patch to all supported versions.  Before v13,
durable_rename_excl() was named durable_link_or_rename().

Author: Nathan Bossart
Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/20220418182336.GA2298576%40nathanxps13
---
 contrib/basic_archive/basic_archive.c |  5 +++--
 src/backend/access/transam/timeline.c | 18 +++++-------------
 src/backend/access/transam/xlog.c     |  9 +++------
 3 files changed, 11 insertions(+), 21 deletions(-)

diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
index e7efbfb9c3..ed33854c57 100644
--- a/contrib/basic_archive/basic_archive.c
+++ b/contrib/basic_archive/basic_archive.c
@@ -281,9 +281,10 @@ basic_archive_file_internal(const char *file, const char *path)
 
 	/*
 	 * Sync the temporary file to disk and move it to its final destination.
-	 * This will fail if destination already exists.
+	 * Note that this will overwrite any existing file, but this is only
+	 * possible if someone else created the file since the stat() above.
 	 */
-	(void) durable_rename_excl(temp, destination, ERROR);
+	(void) durable_rename(temp, destination, ERROR);
 
 	ereport(DEBUG1,
 			(errmsg("archived \"%s\" via basic_archive", file)));
diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index be21968293..e0a2a8ea68 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -441,12 +441,8 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 	 * Now move the completed history file into place with its final name.
 	 */
 	TLHistoryFilePath(path, newTLI);
-
-	/*
-	 * Perform the rename using link if available, paranoidly trying to avoid
-	 * overwriting an existing file (there shouldn't be one).
-	 */
-	durable_rename_excl(tmppath, path, ERROR);
+	Assert(access(path, F_OK) != 0 && errno == ENOENT);
+	durable_rename(tmppath, path, ERROR);
 
 	/* The history file can be archived immediately. */
 	if (XLogArchivingActive())
@@ -516,15 +512,11 @@ writeTimeLineHistoryFile(TimeLineID tli, char *content, int size)
 				 errmsg("could not close file \"%s\": %m", tmppath)));
 
 	/*
-	 * Now move the completed history file into place with its final name.
+	 * Now move the completed history file into place with its final name,
+	 * replacing any existing file with the same name.
 	 */
 	TLHistoryFilePath(path, tli);
-
-	/*
-	 * Perform the rename using link if available, paranoidly trying to avoid
-	 * overwriting an existing file (there shouldn't be one).
-	 */
-	durable_rename_excl(tmppath, path, ERROR);
+	durable_rename(tmppath, path, ERROR);
 }
 
 /*
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61cda56c6f..76ec80c950 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3323,14 +3323,11 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 		}
 	}
 
-	/*
-	 * Perform the rename using link if available, paranoidly trying to avoid
-	 * overwriting an existing file (there shouldn't be one).
-	 */
-	if (durable_rename_excl(tmppath, path, LOG) != 0)
+	Assert(access(path, F_OK) != 0 && errno == ENOENT);
+	if (durable_rename(tmppath, path, LOG) != 0)
 	{
 		LWLockRelease(ControlFileLock);
-		/* durable_rename_excl already emitted log message */
+		/* durable_rename already emitted log message */
 		return false;
 	}
 
-- 
2.25.1

>From b239f1be33ea0e54b8e661312df52af49c090882 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandboss...@gmail.com>
Date: Tue, 26 Apr 2022 12:38:23 -0700
Subject: [PATCH v5 2/2] Remove durable_rename_excl().

A previous commit replaced all calls to this function with
durable_rename(), but the function itself was not removed in back-
branches since extensions may use it.  This change removes the
function from v16devel.

Do not back-patch.

Author: Nathan Bossart
Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/20220418182336.GA2298576%40nathanxps13
---
 src/backend/storage/file/fd.c  | 63 ----------------------------------
 src/include/pg_config_manual.h |  7 ----
 src/include/storage/fd.h       |  1 -
 3 files changed, 71 deletions(-)

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 24704b6a02..f904f60c08 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -807,69 +807,6 @@ durable_unlink(const char *fname, int elevel)
 	return 0;
 }
 
-/*
- * durable_rename_excl -- rename a file in a durable manner.
- *
- * Similar to durable_rename(), except that this routine tries (but does not
- * guarantee) not to overwrite the target file.
- *
- * Note that a crash in an unfortunate moment can leave you with two links to
- * the target file.
- *
- * Log errors with the caller specified severity.
- *
- * On Windows, using a hard link followed by unlink() causes concurrency
- * issues, while a simple rename() does not cause that, so be careful when
- * changing the logic of this routine.
- *
- * Returns 0 if the operation succeeded, -1 otherwise. Note that errno is not
- * valid upon return.
- */
-int
-durable_rename_excl(const char *oldfile, const char *newfile, int elevel)
-{
-	/*
-	 * Ensure that, if we crash directly after the rename/link, a file with
-	 * valid contents is moved into place.
-	 */
-	if (fsync_fname_ext(oldfile, false, false, elevel) != 0)
-		return -1;
-
-#ifdef HAVE_WORKING_LINK
-	if (link(oldfile, newfile) < 0)
-	{
-		ereport(elevel,
-				(errcode_for_file_access(),
-				 errmsg("could not link file \"%s\" to \"%s\": %m",
-						oldfile, newfile)));
-		return -1;
-	}
-	unlink(oldfile);
-#else
-	if (rename(oldfile, newfile) < 0)
-	{
-		ereport(elevel,
-				(errcode_for_file_access(),
-				 errmsg("could not rename file \"%s\" to \"%s\": %m",
-						oldfile, newfile)));
-		return -1;
-	}
-#endif
-
-	/*
-	 * Make change persistent in case of an OS crash, both the new entry and
-	 * its parent directory need to be flushed.
-	 */
-	if (fsync_fname_ext(newfile, false, false, elevel) != 0)
-		return -1;
-
-	/* Same for parent directory */
-	if (fsync_parent_path(newfile, elevel) != 0)
-		return -1;
-
-	return 0;
-}
-
 /*
  * InitFileAccess --- initialize this module during backend startup
  *
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 84ce5a4a5d..830804fdfb 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -163,13 +163,6 @@
 #define USE_BARRIER_SMGRRELEASE
 #endif
 
-/*
- * Define this if your operating system supports link()
- */
-#if !defined(WIN32) && !defined(__CYGWIN__)
-#define HAVE_WORKING_LINK 1
-#endif
-
 /*
  * USE_POSIX_FADVISE controls whether Postgres will attempt to use the
  * posix_fadvise() kernel call.  Usually the automatic configure tests are
diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h
index 69549b000f..2b4a8e0ffe 100644
--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -187,7 +187,6 @@ extern void fsync_fname(const char *fname, bool isdir);
 extern int	fsync_fname_ext(const char *fname, bool isdir, bool ignore_perm, int elevel);
 extern int	durable_rename(const char *oldfile, const char *newfile, int loglevel);
 extern int	durable_unlink(const char *fname, int loglevel);
-extern int	durable_rename_excl(const char *oldfile, const char *newfile, int loglevel);
 extern void SyncDataDirectory(void);
 extern int	data_sync_elevel(int elevel);
 
-- 
2.25.1

Reply via email to