While chatting to Robert and Andres about all this, a new idea came
up.  Or, rather, one of the first ideas that was initially rejected,
now resurrected to try out a suggestion of Andres’s on how to
de-pessimise it.  Unfortunately, it also suffers from Windows-specific
problems that I originally mentioned at the top of this thread but
had since repressed.  Arrrghgh.

First, the good news:

We could write out a whole new control file, and durable_rename() it
into place.  We don’t want to do that in general, because we don’t
want to slow down UpdateMinRecoveryPoint().  The new concept is to do
that only if a backup is in progress.  That requires a bit of
interlocking with backup start/stop (ie when runningBackups is
changing in shmem, we don’t want to overlap with UpdateControlFile()'s
decision on how to do it).  Here is a patch to try that out.  No more
weasel wording needed for the docs; basebackup and low-level file
system backup should always see an atomic control file (and
occasionally also copy a harmless pg_control.tmp file).  Then we only
need the gross retry-until-stable hack for front-end programs.

And the bad news:

In my catalogue-of-Windows-weirdness project[1], I learned in v3-0003 that:

+ fd = open(path, O_CREAT | O_EXCL | O_RDWR | PG_BINARY, 0777);
+ PG_EXPECT_SYS(fd >= 0, "touch name1.txt");
+ PG_REQUIRE_SYS(fd < 0 || close(fd) == 0);
+
+ fd = open(path2, O_RDWR | PG_BINARY, 0777);
+ PG_EXPECT_SYS(fd >= 0, "open name2.txt");
+ make_path(path2, "name2.txt");
+#ifdef WIN32
+
+ /*
+ * Windows can't rename over an open non-unlinked file, even with
+ * have_posix_unlink_semantics.
+ */
+ pgwin32_dirmod_loops = 2; /* minimize looping to fail fast in testing */
+ PG_EXPECT_SYS(rename(path, path2) == -1,
+   "Windows: can't rename name1.txt -> name2.txt while name2.txt is open");
+ PG_EXPECT_EQ(errno, EACCES);
+ PG_EXPECT_SYS(unlink(path) == 0, "unlink name1.txt");
+#else
+ PG_EXPECT_SYS(rename(path, path2) == 0,
+   "POSIX: can rename name1.txt -> name2.txt while name2.txt is open");
+#endif
+ PG_EXPECT_SYS(close(fd) == 0);

Luckily the code in dirmod.c:pgrename() should retry lots of times if
a concurrent transient opener/reader comes along, so I think that
should be OK in practice (but if backups_r_us.exe holds the file open
for 10 seconds while we're trying to rename it, I assume we'll PANIC);
call that problem #1.  What is slightly more disturbing is the clue in
the "Cygwin cleanup" thread[2] that rename() can fail to be 100%
atomic, so that a concurrent call to open() can fail with ENOENT (cf.
the POSIX requirement "... a link named new shall remain visible to
other processes throughout the renaming operation and refer either to
the file referred to by new or old ...").  Call that problem #2, a
problem that already causes us rare breakage (for example: could not
open file "pg_logical/snapshots/0-14FE6B0.snap").

I know that problem #1 can be fixed by applying v3-0004 from [1] but
that leads to impossible decisions like revoking support for non-NTFS
filesystems as discussed in that thread, and we certainly couldn't
back-patch that anyway.  I assume problem #2 can too.

That makes me want to *also* acquire ControlFileLock, for base
backup's read of pg_control.  Even though it seems redundant with the
rename() trick (the rename() trick should be enough for low-level
*and* basebackup on ext4), it would at least avoid the above
Windowsian pathologies during base backups.

I'm sorry for the patch/idea-churn in this thread.  It's like
Whac-a-Mole.  Blasted non-POSIX-compliant moles.  New patches
attached.  Are they getting better?

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2Be13wK0PBX5Z63CCwWm7MfRQuwBRabM_3aKWSko2AUww%40mail.gmail.com
From caf6f0d49d5b24b8c3d1f41b97d7824dadad8f8b Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 26 Jul 2023 08:46:19 +1200
Subject: [PATCH v6 1/4] Update control file atomically during backups.

If there is a backup in progress, switch to a rename-based approach when
writing out the control file to avoid problems with torn reads that can
occur on some systems (at least ext4, ntfs).  We don't want to make
UpdateControlFile() slower in general, because it's called frequently
during recovery, so we only do it while runningBackups > 0.  In order to
activate the new behavior without races, we now also acquire
ControlFileLock when backups begin and end.

Back-patch to all supported releases.

Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
---
 src/backend/access/transam/xlog.c      | 33 +++++++++++++++++++--
 src/bin/pg_checksums/pg_checksums.c    |  2 +-
 src/bin/pg_resetwal/pg_resetwal.c      |  2 +-
 src/bin/pg_rewind/pg_rewind.c          |  2 +-
 src/common/controldata_utils.c         | 41 ++++++++++++++++++++++++--
 src/include/common/controldata_utils.h |  4 ++-
 6 files changed, 74 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f7d4750fc0..c4626638e2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -442,7 +442,9 @@ typedef struct XLogCtlInsert
 	/*
 	 * runningBackups is a counter indicating the number of backups currently
 	 * in progress. lastBackupStart is the latest checkpoint redo location
-	 * used as a starting point for an online backup.
+	 * used as a starting point for an online backup.  runningBackups is
+	 * protected by both ControlFileLock and WAL insertion lock (in that
+	 * order), because it affects the behavior of UpdateControlFile().
 	 */
 	int			runningBackups;
 	XLogRecPtr	lastBackupStart;
@@ -4170,7 +4172,13 @@ ReadControlFile(void)
 static void
 UpdateControlFile(void)
 {
-	update_controlfile(DataDir, ControlFile, true);
+	XLogCtlInsert *Insert = &XLogCtl->Insert;
+	bool		atomic;
+
+	Assert(LWLockHeldByMeInMode(ControlFileLock, LW_EXCLUSIVE));
+	atomic = Insert->runningBackups > 0;
+
+	update_controlfile(DataDir, ControlFile, atomic, true);
 }
 
 /*
@@ -5306,9 +5314,12 @@ StartupXLOG(void)
 		 * pg_control with any minimum recovery stop point obtained from a
 		 * backup history file.
 		 *
-		 * No need to hold ControlFileLock yet, we aren't up far enough.
+		 * ControlFileLock not really needed yet, we aren't up far enough, but
+		 * makes assertions simpler.
 		 */
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
 
 		/*
 		 * If there was a backup label file, it's done its job and the info
@@ -8293,6 +8304,12 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
 
 	memcpy(state->name, backupidstr, strlen(backupidstr));
 
+	/*
+	 * UpdateControlFile() behaves differently while backups are running, and
+	 * we need to avoid races when the backups start.
+	 */
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
 	/*
 	 * Mark backup active in shared memory.  We must do full-page WAL writes
 	 * during an on-line backup even if not doing so at other times, because
@@ -8318,6 +8335,8 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
 	XLogCtl->Insert.runningBackups++;
 	WALInsertLockRelease();
 
+	LWLockRelease(ControlFileLock);
+
 	/*
 	 * Ensure we decrement runningBackups if we fail below. NB -- for this to
 	 * work correctly, it is critical that sessionBackupState is only updated
@@ -8608,6 +8627,12 @@ do_pg_backup_stop(BackupState *state, bool waitforarchive)
 				 errmsg("WAL level not sufficient for making an online backup"),
 				 errhint("wal_level must be set to \"replica\" or \"logical\" at server start.")));
 
+	/*
+	 * Interlocking with UpdateControlFile(), so that it can start using atomic
+	 * mode while backups are running.
+	 */
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
 	/*
 	 * OK to update backup counter and session-level lock.
 	 *
@@ -8637,6 +8662,8 @@ do_pg_backup_stop(BackupState *state, bool waitforarchive)
 
 	WALInsertLockRelease();
 
+	LWLockRelease(ControlFileLock);
+
 	/*
 	 * If we are taking an online backup from the standby, we confirm that the
 	 * standby has not been promoted during the backup.
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 19eb67e485..80411e2d9f 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -637,7 +637,7 @@ main(int argc, char *argv[])
 		}
 
 		pg_log_info("updating control file");
-		update_controlfile(DataDir, ControlFile, do_sync);
+		update_controlfile(DataDir, ControlFile, false, do_sync);
 
 		if (verbose)
 			printf(_("Data checksum version: %u\n"), ControlFile->data_checksum_version);
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index e7ef2b8bd0..1dcc84abea 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -884,7 +884,7 @@ RewriteControlFile(void)
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
-	update_controlfile(".", &ControlFile, true);
+	update_controlfile(".", &ControlFile, false, true);
 }
 
 
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index f7f3b8227f..3a34b4d0dd 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -714,7 +714,7 @@ perform_rewind(filemap_t *filemap, rewind_source *source,
 	ControlFile_new.minRecoveryPointTLI = endtli;
 	ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY;
 	if (!dry_run)
-		update_controlfile(datadir_target, &ControlFile_new, do_sync);
+		update_controlfile(datadir_target, &ControlFile_new, false, do_sync);
 }
 
 static void
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index 9723587466..475c2e524c 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -140,14 +140,27 @@ get_controlfile(const char *DataDir, bool *crc_ok_p)
  * optionally used to flush the updated control file.  Note that it is up
  * to the caller to properly lock ControlFileLock when calling this
  * routine in the backend.
+ *
+ * If atomic is true, a slower procedure is used in backend code that renames a
+ * temporary file into place.  This is intended for use while backups are in
+ * progress, to avoid the possibility of a torn read of the control file.  Note
+ * that this refers to atomicity of concurrent reads and writes.  (Atomicity
+ * under power failure is a separate topic; we always assume that writes of
+ * size PG_CONTROL_MAX_SAFE_SIZE are power-failure-atomic, even when 'atomic'
+ * is false here.)
  */
 void
 update_controlfile(const char *DataDir,
-				   ControlFileData *ControlFile, bool do_sync)
+				   ControlFileData *ControlFile,
+				   bool atomic,
+				   bool do_sync)
 {
 	int			fd;
 	char		buffer[PG_CONTROL_FILE_SIZE];
 	char		ControlFilePath[MAXPGPATH];
+#ifndef FRONTEND
+	int			flags = 0;
+#endif
 
 	/* Update timestamp  */
 	ControlFile->time = (pg_time_t) time(NULL);
@@ -167,7 +180,18 @@ update_controlfile(const char *DataDir,
 	memset(buffer, 0, PG_CONTROL_FILE_SIZE);
 	memcpy(buffer, ControlFile, sizeof(ControlFileData));
 
-	snprintf(ControlFilePath, sizeof(ControlFilePath), "%s/%s", DataDir, XLOG_CONTROL_FILE);
+	/* In atomic mode, we'll write into a new temporary file first. */
+	snprintf(ControlFilePath,
+			 sizeof(ControlFilePath),
+			 "%s/%s%s",
+			 DataDir,
+			 XLOG_CONTROL_FILE,
+			 atomic ? ".tmp" : "");
+#ifndef FRONTEND
+	flags = 0;
+	if (atomic)
+		flags = O_CREAT;
+#endif
 
 #ifndef FRONTEND
 
@@ -175,7 +199,7 @@ update_controlfile(const char *DataDir,
 	 * All errors issue a PANIC, so no need to use OpenTransientFile() and to
 	 * worry about file descriptor leaks.
 	 */
-	if ((fd = BasicOpenFile(ControlFilePath, O_RDWR | PG_BINARY)) < 0)
+	if ((fd = BasicOpenFile(ControlFilePath, O_RDWR | PG_BINARY | flags)) < 0)
 		ereport(PANIC,
 				(errcode_for_file_access(),
 				 errmsg("could not open file \"%s\": %m",
@@ -236,4 +260,15 @@ update_controlfile(const char *DataDir,
 		pg_fatal("could not close file \"%s\": %m", ControlFilePath);
 #endif
 	}
+
+#ifndef FRONTEND
+	if (atomic)
+	{
+		char		path[MAXPGPATH];
+
+		snprintf(path, sizeof(path), "%s/%s", DataDir, XLOG_CONTROL_FILE);
+
+		durable_rename(ControlFilePath, path, PANIC);
+	}
+#endif
 }
diff --git a/src/include/common/controldata_utils.h b/src/include/common/controldata_utils.h
index 49e7c52d31..0d9b41a2ba 100644
--- a/src/include/common/controldata_utils.h
+++ b/src/include/common/controldata_utils.h
@@ -14,6 +14,8 @@
 
 extern ControlFileData *get_controlfile(const char *DataDir, bool *crc_ok_p);
 extern void update_controlfile(const char *DataDir,
-							   ControlFileData *ControlFile, bool do_sync);
+							   ControlFileData *ControlFile,
+							   bool atomic,
+							   bool do_sync);
 
 #endif							/* COMMON_CONTROLDATA_UTILS_H */
-- 
2.39.2 (Apple Git-143)

From 69fd24b670b24424bc3da3e3c654750cf20826c4 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 24 Nov 2022 13:28:22 +1300
Subject: [PATCH v6 2/4] Try to tolerate torn reads of control file in
 frontend.

Some of our src/bin tools read the control file without any kind of
interlocking against concurrent writes.

Tolerate the torn read that can occur on some systems (ext4, ntfs) by
retrying if checksum fails, until we get two reads in a row with the
same checksum.  This is only a last ditch effort and not guaranteed to
reach the right conclusion with extremely unlucky scheduling, but it
seems at least very likely to.  Thanks to Tom Lane for this suggestion.

Back-patch to all supported releases.

Reviewed-by: Anton A. Melnikov <aamelnikov@inbox.ru> (earlier version)
Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
---
 src/common/controldata_utils.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index 475c2e524c..9745a4c926 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -56,12 +56,22 @@ get_controlfile(const char *DataDir, bool *crc_ok_p)
 	char		ControlFilePath[MAXPGPATH];
 	pg_crc32c	crc;
 	int			r;
+#ifdef FRONTEND
+	pg_crc32c	last_crc;
+	int			retries = 0;
+#endif
 
 	Assert(crc_ok_p);
 
 	ControlFile = palloc_object(ControlFileData);
 	snprintf(ControlFilePath, MAXPGPATH, "%s/global/pg_control", DataDir);
 
+#ifdef FRONTEND
+	INIT_CRC32C(last_crc);
+
+retry:
+#endif
+
 #ifndef FRONTEND
 	if ((fd = OpenTransientFile(ControlFilePath, O_RDONLY | PG_BINARY)) == -1)
 		ereport(ERROR,
@@ -117,6 +127,26 @@ get_controlfile(const char *DataDir, bool *crc_ok_p)
 
 	*crc_ok_p = EQ_CRC32C(crc, ControlFile->crc);
 
+#ifdef FRONTEND
+
+	/*
+	 * With unlucky timing on filesystems that don't implement atomicity of
+	 * concurrent reads and writes, we might have seen garbage if the server
+	 * was writing to the file at the same time.  Keep retrying until we see
+	 * the same CRC twice, with a tiny sleep to give a concurrent writer a
+	 * good chance of making progress.
+	 */
+	if (!*crc_ok_p &&
+		(retries == 0 || !EQ_CRC32C(crc, last_crc)) &&
+		retries < 10)
+	{
+		retries++;
+		last_crc = crc;
+		pg_usleep(10000);
+		goto retry;
+	}
+#endif
+
 	/* Make sure the control file is valid byte order. */
 	if (ControlFile->pg_control_version % 65536 == 0 &&
 		ControlFile->pg_control_version / 65536 != 0)
-- 
2.39.2 (Apple Git-143)

From 22c2acf613f62467efe3029a533d565d735864b2 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 31 Jan 2023 20:01:49 +1300
Subject: [PATCH v6 3/4] Acquire ControlFileLock in base backups.

Even though we now rename() the control file to gain atomicity during
backups, we can avoid some unpleasant retry behaviors or worse on
Windows by also acquiring ControlFileLock during base backups.  (This is
similar to commit a2e97cb2.)

Back-patch to all supported releases.

Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
---
 src/backend/backup/basebackup.c | 36 ++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 45be21131c..f03496665f 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -24,6 +24,7 @@
 #include "backup/basebackup_target.h"
 #include "commands/defrem.h"
 #include "common/compression.h"
+#include "common/controldata_utils.h"
 #include "common/file_perm.h"
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
@@ -39,6 +40,7 @@
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
+#include "storage/lwlock.h"
 #include "storage/reinit.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
@@ -84,7 +86,7 @@ static bool sendFile(bbsink *sink, const char *readfilename, const char *tarfile
 					 struct stat *statbuf, bool missing_ok, Oid dboid,
 					 backup_manifest_info *manifest, const char *spcoid);
 static void sendFileWithContent(bbsink *sink, const char *filename,
-								const char *content,
+								const char *content, int len,
 								backup_manifest_info *manifest);
 static int64 _tarWriteHeader(bbsink *sink, const char *filename,
 							 const char *linktarget, struct stat *statbuf,
@@ -315,7 +317,8 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 
 			if (ti->path == NULL)
 			{
-				struct stat statbuf;
+				ControlFileData *control_file;
+				bool		crc_ok;
 				bool		sendtblspclinks = true;
 				char	   *backup_label;
 
@@ -324,14 +327,14 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 				/* In the main tar, include the backup_label first... */
 				backup_label = build_backup_content(backup_state, false);
 				sendFileWithContent(sink, BACKUP_LABEL_FILE,
-									backup_label, &manifest);
+									backup_label, -1, &manifest);
 				pfree(backup_label);
 
 				/* Then the tablespace_map file, if required... */
 				if (opt->sendtblspcmapfile)
 				{
 					sendFileWithContent(sink, TABLESPACE_MAP,
-										tablespace_map->data, &manifest);
+										tablespace_map->data, -1, &manifest);
 					sendtblspclinks = false;
 				}
 
@@ -340,13 +343,13 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 						sendtblspclinks, &manifest, NULL);
 
 				/* ... and pg_control after everything else. */
-				if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
-					ereport(ERROR,
-							(errcode_for_file_access(),
-							 errmsg("could not stat file \"%s\": %m",
-									XLOG_CONTROL_FILE)));
-				sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
-						 false, InvalidOid, &manifest, NULL);
+				LWLockAcquire(ControlFileLock, LW_SHARED);
+				control_file = get_controlfile(DataDir, &crc_ok);
+				LWLockRelease(ControlFileLock);
+				sendFileWithContent(sink, XLOG_CONTROL_FILE,
+									(char *) control_file, sizeof(*control_file),
+									&manifest);
+				pfree(control_file);
 			}
 			else
 			{
@@ -591,7 +594,7 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 			 * complete segment.
 			 */
 			StatusFilePath(pathbuf, walFileName, ".done");
-			sendFileWithContent(sink, pathbuf, "", &manifest);
+			sendFileWithContent(sink, pathbuf, "", -1, &manifest);
 		}
 
 		/*
@@ -619,7 +622,7 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 
 			/* unconditionally mark file as archived */
 			StatusFilePath(pathbuf, fname, ".done");
-			sendFileWithContent(sink, pathbuf, "", &manifest);
+			sendFileWithContent(sink, pathbuf, "", -1, &manifest);
 		}
 
 		/* Properly terminate the tar file. */
@@ -1030,18 +1033,19 @@ SendBaseBackup(BaseBackupCmd *cmd)
  */
 static void
 sendFileWithContent(bbsink *sink, const char *filename, const char *content,
+					int len,
 					backup_manifest_info *manifest)
 {
 	struct stat statbuf;
-	int			bytes_done = 0,
-				len;
+	int			bytes_done = 0;
 	pg_checksum_context checksum_ctx;
 
 	if (pg_checksum_init(&checksum_ctx, manifest->checksum_type) < 0)
 		elog(ERROR, "could not initialize checksum of file \"%s\"",
 			 filename);
 
-	len = strlen(content);
+	if (len < 0)
+		len = strlen(content);
 
 	/*
 	 * Construct a stat struct for the backup_label file we're injecting in
-- 
2.39.2 (Apple Git-143)

From b7b5e85f0e4929441ab27402de5afd32392d9193 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 22 Jul 2023 11:50:27 +1200
Subject: [PATCH v6 4/4] Acquire ControlFileLock in SQL functions.

Commit dc7d70ea added functions that read the control file, but didn't
acquire ControlFileLock.  With unlucky timing and on certain file
systems, they could read corrupted data that is in the process of being
overwritten, and the checksum would fail.

Back-patch to all supported releases.

Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
---
 src/backend/utils/misc/pg_controldata.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/backend/utils/misc/pg_controldata.c b/src/backend/utils/misc/pg_controldata.c
index f2c1084797..a1003a464d 100644
--- a/src/backend/utils/misc/pg_controldata.c
+++ b/src/backend/utils/misc/pg_controldata.c
@@ -24,6 +24,7 @@
 #include "common/controldata_utils.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "storage/lwlock.h"
 #include "utils/builtins.h"
 #include "utils/pg_lsn.h"
 #include "utils/timestamp.h"
@@ -42,7 +43,9 @@ pg_control_system(PG_FUNCTION_ARGS)
 		elog(ERROR, "return type must be a row type");
 
 	/* read the control file */
+	LWLockAcquire(ControlFileLock, LW_SHARED);
 	ControlFile = get_controlfile(DataDir, &crc_ok);
+	LWLockRelease(ControlFileLock);
 	if (!crc_ok)
 		ereport(ERROR,
 				(errmsg("calculated CRC checksum does not match value stored in file")));
@@ -80,7 +83,9 @@ pg_control_checkpoint(PG_FUNCTION_ARGS)
 		elog(ERROR, "return type must be a row type");
 
 	/* Read the control file. */
+	LWLockAcquire(ControlFileLock, LW_SHARED);
 	ControlFile = get_controlfile(DataDir, &crc_ok);
+	LWLockRelease(ControlFileLock);
 	if (!crc_ok)
 		ereport(ERROR,
 				(errmsg("calculated CRC checksum does not match value stored in file")));
@@ -169,7 +174,9 @@ pg_control_recovery(PG_FUNCTION_ARGS)
 		elog(ERROR, "return type must be a row type");
 
 	/* read the control file */
+	LWLockAcquire(ControlFileLock, LW_SHARED);
 	ControlFile = get_controlfile(DataDir, &crc_ok);
+	LWLockRelease(ControlFileLock);
 	if (!crc_ok)
 		ereport(ERROR,
 				(errmsg("calculated CRC checksum does not match value stored in file")));
@@ -208,7 +215,9 @@ pg_control_init(PG_FUNCTION_ARGS)
 		elog(ERROR, "return type must be a row type");
 
 	/* read the control file */
+	LWLockAcquire(ControlFileLock, LW_SHARED);
 	ControlFile = get_controlfile(DataDir, &crc_ok);
+	LWLockRelease(ControlFileLock);
 	if (!crc_ok)
 		ereport(ERROR,
 				(errmsg("calculated CRC checksum does not match value stored in file")));
-- 
2.39.2 (Apple Git-143)

Reply via email to