On Fri, Feb 24, 2023 at 11:12 PM Anton A. Melnikov <aamelni...@inbox.ru> wrote:
> On 17.02.2023 06:21, Thomas Munro wrote:
> > BTW there are at least two other places where PostgreSQL already knows
> > that concurrent reads and writes are possibly non-atomic (and we also
> > don't even try to get the alignment right, making the question moot):
> > pg_basebackup, which enables full_page_writes implicitly if you didn't
> > have the GUC on already, and pg_rewind, which refuses to run if you
> > haven't enabled full_page_writes explicitly (as recently discussed on
> > another thread recently; that's an annoying difference, and also an
> > annoying behaviour if you know your storage doesn't really need it!)
>
> It seems a good topic for a separate thread patch. Would you provide a
> link to the thread you mentioned please?

https://www.postgresql.org/message-id/flat/367d01a7-90bb-9b70-4cda-248e81cc475c%40cosium.com

> > Therefore, we need a solution for Windows too.  I tried to write the
> > equivalent code, in the attached.  I learned a few things: Windows
> > locks are mandatory (that is, if you lock a file, reads/writes can
> > fail, unlike Unix).  Combined with async release, that means we
> > probably also need to lock the file in xlog.c, when reading it in
> > xlog.c:ReadControlFile() (see comments).  Before I added that, on a CI
> > run, I found that the read in there would sometimes fail, and adding
> > the locking fixed that.  I am a bit confused, though, because I
> > expected that to be necessary only if someone managed to crash while
> > holding the write lock, which the CI tests shouldn't do.  Do you have
> > any ideas?
>
> Unfortunately, no ideas so far. Was it a pg_control CRC or I/O errors?
> Maybe logs of such a fail were saved somewhere? I would like to see
> them if possible.

I think it was this one:

https://cirrus-ci.com/task/5004082033721344

For example, see subscription/011_generated which failed like this:

2023-02-16 06:57:25.724 GMT [5736][not initialized] PANIC:  could not
read file "global/pg_control": Permission denied

That was fixed after I changed it to also do locking in xlog.c
ReadControlFile(), in the version you tested.  There must be something
I don't understand going on, because that cluster wasn't even running
before: it had just been created by initdb.

But, anyway, I have a new idea that makes that whole problem go away
(though I'd still like to understand what happened there):

With the "pseudo-advisory lock for Windows" idea from the last email,
we don't need to bother with file system level locking in many places.
Just in controldata_utils.c, for FE/BE interlocking (that is, we don't
need to use that for interlocking of concurrent reads and writes that
are entirely in the backend, because we already have an LWLock that we
could use more consistently).  Changes:

1. xlog.c mostly uses no locking
2. basebackup.c now acquires ControlFileLock
3. only controldata_utils.c uses the new file locking, for FE-vs-BE interlocking
4. lock past the end (pseudo-advisory locking for Windows)

Note that when we recover from a basebackup or pg_backup_start()
backup, we use the backup label to find a redo start location in the
WAL (see read_backup_label()), BUT we still read the copied pg_control
file (one that might be too "new", so we don't use its redo pointer).
So it had better not be torn, or the recovery will fail.  So, in this
version I protected that sendFile() with ControlFileLock.  But...

Isn't that a bit strange?  To go down this path we would also need to
document the need to copy the control file with the file locked to
avoid a rare failure, in the pg_backup_start() documentation.  That's
annoying (I don't even know how to do it with easy shell commands;
maybe we should provide a program that locks and cats the file...?).
Could we make better use of the safe copy that we have in the log?
Then the pg_backup_start() subproblem would disappear.  Conceptually,
that'd be just like the way we use FPI for data pages copied during a
backup.  I'm not sure about any of that, though, it's just an idea,
not tested.

> > Or maybe the problem is/was too theoretical before...
>
> As far as i understand, this problem has always been, but the probability of
> this is extremely small in practice, which is directly pointed in
> the docs [4]:
> "So while it is theoretically a weak spot, pg_control does not seem
> to be a problem in practice."

I guess that was talking about power loss atomicity again?  Knowledge
of the read/write atomicity problem seems to be less evenly
distributed (and I think it became more likely in Linux > 3.something;
and the Windows situation possibly hadn't been examined by anyone
before).

> > Here's a patch like that.
>
> In this patch, the problem is solved for the live database and
> maybe remains for some possible cases of an external backup. In a whole,
> i think it can be stated that this is a sensible step forward.
>
> Just like last time, i ran a long stress test under windows with current 
> patch.
> There were no errors for more than 3 days even with fsync=off.

Thanks!
From 62c107aa2e3c52192baf6105d6f13cd8601b7913 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Tue, 31 Jan 2023 20:01:49 +1300
Subject: [PATCH v3 1/3] Lock pg_control while reading or writing.

Front-end programs that read pg_control and user-facing functions run
in the backend read pg_control without acquiring ControlFileLock.  If
you're unlucky enough to read() while the backend is in write(), on at
least ext4 or NTFS, you might see partial data.  Use a POSIX advisory
file lock or a Windows mandatory file lock, to avoid this problem.
Since Windows mandatory locks might cause existing external backup tools
to fail in ReadFile(), lock one byte past the end so that only tools
that opt into this scheme are affected by it.

Reviewed-by: Anton A. Melnikov <aamelni...@inbox.ru>
Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
---
 src/backend/backup/basebackup.c |   5 ++
 src/common/controldata_utils.c  | 137 ++++++++++++++++++++++++++++++++
 2 files changed, 142 insertions(+)

diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 3fb9451643..7e567300bc 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -39,6 +39,7 @@
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
+#include "storage/lwlock.h"
 #include "storage/reinit.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
@@ -345,8 +346,12 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 							(errcode_for_file_access(),
 							 errmsg("could not stat file \"%s\": %m",
 									XLOG_CONTROL_FILE)));
+
+				/* Lock to avoid torn reads, if there is a concurrent write. */
+				LWLockAcquire(ControlFileLock, LW_SHARED);
 				sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
 						 false, InvalidOid, &manifest, NULL);
+				LWLockRelease(ControlFileLock);
 			}
 			else
 			{
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index 9723587466..e1b7b936ba 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -39,6 +39,102 @@
 #include "storage/fd.h"
 #endif
 
+/*
+ * Lock the control file until closed.  This is used for the benefit of
+ * frontend/external programs that want to take an atomic copy of the control
+ * file, when though it might be written to concurrently by the server.
+ * On some systems including Windows NTFS and Linux ext4, that might otherwise
+ * cause a torn read.
+ *
+ * (The server also reads and writes the control file directly sometimes, not
+ * through these routines; that's OK, because it uses ControlFileLock, or no
+ * locking when it expects no concurrent writes.)
+ */
+static int
+lock_controlfile(int fd, bool exclusive)
+{
+#ifdef WIN32
+	OVERLAPPED	overlapped = {.Offset = PG_CONTROL_FILE_SIZE};
+	HANDLE		handle;
+
+	handle = (HANDLE) _get_osfhandle(fd);
+	if (handle == INVALID_HANDLE_VALUE)
+	{
+		errno = EBADF;
+		return -1;
+	}
+
+	/*
+	 * On Windows, we lock the first byte past the end of the control file (see
+	 * overlapped.Offset).  This means that we shouldn't cause errors in
+	 * external tools that just want to read the file, but we can block other
+	 * processes using this routine or that know about this protocol.  This
+	 * provides an approximation of Unix's "advisory" locking.
+	 */
+	if (!LockFileEx(handle,
+					exclusive ? LOCKFILE_EXCLUSIVE_LOCK : 0,
+					0,
+					1,
+					0,
+					&overlapped))
+	{
+		_dosmaperr(GetLastError());
+		return -1;
+	}
+
+	return 0;
+#else
+	struct flock lock;
+	int			rc;
+
+	memset(&lock, 0, sizeof(lock));
+	lock.l_type = exclusive ? F_WRLCK : F_RDLCK;
+	lock.l_start = 0;
+	lock.l_whence = SEEK_SET;
+	lock.l_len = 0;
+	lock.l_pid = -1;
+
+	do
+	{
+		rc = fcntl(fd, F_SETLKW, &lock);
+	}
+	while (rc < 0 && errno == EINTR);
+
+	return rc;
+#endif
+}
+
+#ifdef WIN32
+/*
+ * Release lock acquire with lock_controlfile().  On POSIX systems, we don't
+ * bother making an extra system call to release the lock, since it'll be
+ * released on close anyway.  On Windows, explicit release is recommended by
+ * the documentation to make sure it is done synchronously.
+ */
+static int
+unlock_controlfile(int fd)
+{
+	OVERLAPPED	overlapped = {.Offset = PG_CONTROL_FILE_SIZE};
+	HANDLE		handle;
+
+	handle = (HANDLE) _get_osfhandle(fd);
+	if (handle == INVALID_HANDLE_VALUE)
+	{
+		errno = EBADF;
+		return -1;
+	}
+
+	/* Unlock first byte. */
+	if (!UnlockFileEx(handle, 0, 1, 0, &overlapped))
+	{
+		_dosmaperr(GetLastError());
+		return -1;
+	}
+
+	return 0;
+}
+#endif
+
 /*
  * get_controlfile()
  *
@@ -74,6 +170,20 @@ get_controlfile(const char *DataDir, bool *crc_ok_p)
 				 ControlFilePath);
 #endif
 
+	/* Make sure we can read the file atomically. */
+	if (lock_controlfile(fd, false) < 0)
+	{
+#ifndef FRONTEND
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not lock file \"%s\" for reading: %m",
+						ControlFilePath)));
+#else
+		pg_fatal("could not lock file \"%s\" for reading: %m",
+				 ControlFilePath);
+#endif
+	}
+
 	r = read(fd, ControlFile, sizeof(ControlFileData));
 	if (r != sizeof(ControlFileData))
 	{
@@ -97,6 +207,10 @@ get_controlfile(const char *DataDir, bool *crc_ok_p)
 #endif
 	}
 
+#ifdef WIN32
+	unlock_controlfile(fd);
+#endif
+
 #ifndef FRONTEND
 	if (CloseTransientFile(fd) != 0)
 		ereport(ERROR,
@@ -186,6 +300,25 @@ update_controlfile(const char *DataDir,
 		pg_fatal("could not open file \"%s\": %m", ControlFilePath);
 #endif
 
+	/*
+	 * Make sure that any concurrent reader (including frontend programs) can
+	 * read the file atomically.  Note that this refers to atomicity of
+	 * concurrent reads and writes.  For our assumption of atomicity under
+	 * power failure, see PG_CONTROL_MAX_SAFE_SIZE.
+	 */
+	if (lock_controlfile(fd, true) < 0)
+	{
+#ifndef FRONTEND
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not lock file \"%s\" for writing: %m",
+						ControlFilePath)));
+#else
+		pg_fatal("could not lock file \"%s\" for writing: %m",
+				 ControlFilePath);
+#endif
+	}
+
 	errno = 0;
 #ifndef FRONTEND
 	pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE);
@@ -225,6 +358,10 @@ update_controlfile(const char *DataDir,
 #endif
 	}
 
+#ifdef WIN32
+	unlock_controlfile(fd);
+#endif
+
 	if (close(fd) != 0)
 	{
 #ifndef FRONTEND
-- 
2.39.2

From 17a942e760f2abec6490f416609f986b7180763c Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Thu, 16 Feb 2023 20:02:22 +1300
Subject: [PATCH v3 2/3] Make wal_sync_method=fdatasync the default on all
 OSes.

Previously, fdatasync was the default level on Linux and FreeBSD by
special rules, and on OpenBSD and DragonflyBSD because they didn't have
O_DSYNC != O_SYNC or O_DSYNC at all.  For every other system, we'd
choose open_datasync.

Use fdatasync everywhere, for consistency.  This became possible after
commit 9430fb40 added support for Windows, the last known system not to
have it.  This means that we'll now flush caches on consumer drives by
default on Windows, where previously we didn't, which seems like a
better default for crash safety.  Users who want the older behavior can
still request it with wal_sync_method=open_datasync.

Reviewed-by: Anton A. Melnikov <aamelni...@inbox.ru>
Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
---
 doc/src/sgml/config.sgml              |  5 ++---
 doc/src/sgml/wal.sgml                 |  8 ++++----
 src/bin/pg_test_fsync/pg_test_fsync.c |  4 ++--
 src/include/access/xlogdefs.h         | 14 +-------------
 src/include/port/freebsd.h            |  7 -------
 src/include/port/linux.h              |  8 --------
 6 files changed, 9 insertions(+), 37 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e5c41cc6c6..15d5f2bf2b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3107,9 +3107,8 @@ include_dir 'conf.d'
        <para>
         The <literal>open_</literal>* options also use <literal>O_DIRECT</literal> if available.
         Not all of these choices are available on all platforms.
-        The default is the first method in the above list that is supported
-        by the platform, except that <literal>fdatasync</literal> is the default on
-        Linux and FreeBSD.  The default is not necessarily ideal; it might be
+        The default is <literal>fdatasync</literal>.
+        The default is not necessarily ideal; it might be
         necessary to change this setting or other aspects of your system
         configuration in order to create a crash-safe configuration or
         achieve optimal performance.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index ed7929cbcd..ffc22a7ded 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -106,12 +106,12 @@
     <listitem>
       <para>
         On <productname>Windows</productname>, if <varname>wal_sync_method</varname> is
-        <literal>open_datasync</literal> (the default), write caching can be disabled
+        <literal>open_datasync</literal>, write caching can be disabled
         by unchecking <literal>My Computer\Open\<replaceable>disk drive</replaceable>\Properties\Hardware\Properties\Policies\Enable write caching on the disk</literal>.
         Alternatively, set <varname>wal_sync_method</varname> to
-        <literal>fdatasync</literal> (NTFS only), <literal>fsync</literal> or
-        <literal>fsync_writethrough</literal>, which prevent
-        write caching.
+        <literal>fdatasync</literal> (the default), <literal>fsync</literal> or
+        <literal>fsync_writethrough</literal>, which use system calls that
+        flush write caches.
       </para>
     </listitem>
 
diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c
index 3d5e8f30ab..45bd2daf2e 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -292,7 +292,7 @@ test_sync(int writes_per_op)
 		printf(_("\nCompare file sync methods using one %dkB write:\n"), XLOG_BLCKSZ_K);
 	else
 		printf(_("\nCompare file sync methods using two %dkB writes:\n"), XLOG_BLCKSZ_K);
-	printf(_("(in wal_sync_method preference order, except fdatasync is Linux's default)\n"));
+	printf(_("(fdatasync is the default)\n"));
 
 	/*
 	 * Test open_datasync if available
@@ -326,7 +326,7 @@ test_sync(int writes_per_op)
 #endif
 
 /*
- * Test fdatasync if available
+ * Test fdatasync
  */
 	printf(LABEL_FORMAT, "fdatasync");
 	fflush(stdout);
diff --git a/src/include/access/xlogdefs.h b/src/include/access/xlogdefs.h
index fe794c7740..a628976902 100644
--- a/src/include/access/xlogdefs.h
+++ b/src/include/access/xlogdefs.h
@@ -64,19 +64,7 @@ typedef uint32 TimeLineID;
  */
 typedef uint16 RepOriginId;
 
-/*
- * This chunk of hackery attempts to determine which file sync methods
- * are available on the current platform, and to choose an appropriate
- * default method.
- *
- * Note that we define our own O_DSYNC on Windows, but not O_SYNC.
- */
-#if defined(PLATFORM_DEFAULT_SYNC_METHOD)
-#define DEFAULT_SYNC_METHOD		PLATFORM_DEFAULT_SYNC_METHOD
-#elif defined(O_DSYNC) && (!defined(O_SYNC) || O_DSYNC != O_SYNC)
-#define DEFAULT_SYNC_METHOD		SYNC_METHOD_OPEN_DSYNC
-#else
+/* Default synchronization method for WAL. */
 #define DEFAULT_SYNC_METHOD		SYNC_METHOD_FDATASYNC
-#endif
 
 #endif							/* XLOG_DEFS_H */
diff --git a/src/include/port/freebsd.h b/src/include/port/freebsd.h
index 0e3fde55d6..2e36d3da4f 100644
--- a/src/include/port/freebsd.h
+++ b/src/include/port/freebsd.h
@@ -1,8 +1 @@
 /* src/include/port/freebsd.h */
-
-/*
- * Set the default wal_sync_method to fdatasync.  xlogdefs.h's normal rules
- * would prefer open_datasync on FreeBSD 13+, but that is not a good choice on
- * many systems.
- */
-#define PLATFORM_DEFAULT_SYNC_METHOD	SYNC_METHOD_FDATASYNC
diff --git a/src/include/port/linux.h b/src/include/port/linux.h
index 7a6e46cdbb..acd867606c 100644
--- a/src/include/port/linux.h
+++ b/src/include/port/linux.h
@@ -12,11 +12,3 @@
  * to have a kernel version test here.
  */
 #define HAVE_LINUX_EIDRM_BUG
-
-/*
- * Set the default wal_sync_method to fdatasync.  With recent Linux versions,
- * xlogdefs.h's normal rules will prefer open_datasync, which (a) doesn't
- * perform better and (b) causes outright failures on ext4 data=journal
- * filesystems, because those don't support O_DIRECT.
- */
-#define PLATFORM_DEFAULT_SYNC_METHOD	SYNC_METHOD_FDATASYNC
-- 
2.39.2

From e28b7bae80a711a8c23fc2332e653097a931e2bb Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Tue, 31 Jan 2023 21:08:12 +1300
Subject: [PATCH v3 3/3] Apply wal_sync_method to pg_control file.

When writing out the control file frequently on a standby, we can go a
little faster on some systems by using fdatasync() instead of fsync(),
with no loss of safety.  We already respected the special
"wal_sync_method=fsync_writethrough" via a circuitous route, but we
might as well support the full range of methods.

Reviewed-by: Anton A. Melnikov <aamelni...@inbox.ru>
Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
---
 src/backend/access/transam/xlog.c      | 19 ++++++++-
 src/common/controldata_utils.c         | 59 ++++++++++++++++++++------
 src/include/common/controldata_utils.h |  7 ++-
 3 files changed, 70 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 87af608d15..4ad4899256 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4167,7 +4167,24 @@ ReadControlFile(void)
 static void
 UpdateControlFile(void)
 {
-	update_controlfile(DataDir, ControlFile, true);
+	int			sync_op;
+
+	switch (sync_method)
+	{
+		case SYNC_METHOD_FDATASYNC:
+			sync_op = UPDATE_CONTROLFILE_FDATASYNC;
+			break;
+		case SYNC_METHOD_OPEN:
+			sync_op = UPDATE_CONTROLFILE_O_SYNC;
+			break;
+		case SYNC_METHOD_OPEN_DSYNC:
+			sync_op = UPDATE_CONTROLFILE_O_DSYNC;
+			break;
+		default:
+			sync_op = UPDATE_CONTROLFILE_FSYNC;
+	}
+
+	update_controlfile(DataDir, ControlFile, sync_op);
 }
 
 /*
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index e1b7b936ba..0c81f66dc4 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -250,18 +250,39 @@ get_controlfile(const char *DataDir, bool *crc_ok_p)
  * update_controlfile()
  *
  * Update controlfile values with the contents given by caller.  The
- * contents to write are included in "ControlFile". "do_sync" can be
- * optionally used to flush the updated control file.  Note that it is up
- * to the caller to properly lock ControlFileLock when calling this
- * routine in the backend.
+ * contents to write are included in "ControlFile".  "sync_op" can be set
+ * to 0 or a synchronization method UPDATE_CONTROLFILE_XXX.  In frontend code,
+ * only 0 and non-zero (fsync) are meaningful.
+ * Note that it is up to the caller to properly lock ControlFileLock when
+ * calling this routine in the backend.
  */
 void
 update_controlfile(const char *DataDir,
-				   ControlFileData *ControlFile, bool do_sync)
+				   ControlFileData *ControlFile,
+				   int sync_op)
 {
 	int			fd;
 	char		buffer[PG_CONTROL_FILE_SIZE];
 	char		ControlFilePath[MAXPGPATH];
+	int			open_flag;
+
+	switch (sync_op)
+	{
+#ifndef FRONTEND
+#ifdef O_SYNC
+		case UPDATE_CONTROLFILE_O_SYNC:
+			open_flag = O_SYNC;
+			break;
+#endif
+#ifdef O_DSYNC
+		case UPDATE_CONTROLFILE_O_DSYNC:
+			open_flag = O_DSYNC;
+			break;
+#endif
+#endif
+		default:
+			open_flag = 0;
+	}
 
 	/* Update timestamp  */
 	ControlFile->time = (pg_time_t) time(NULL);
@@ -289,13 +310,13 @@ update_controlfile(const char *DataDir,
 	 * All errors issue a PANIC, so no need to use OpenTransientFile() and to
 	 * worry about file descriptor leaks.
 	 */
-	if ((fd = BasicOpenFile(ControlFilePath, O_RDWR | PG_BINARY)) < 0)
+	if ((fd = BasicOpenFile(ControlFilePath, O_RDWR | PG_BINARY | open_flag)) < 0)
 		ereport(PANIC,
 				(errcode_for_file_access(),
 				 errmsg("could not open file \"%s\": %m",
 						ControlFilePath)));
 #else
-	if ((fd = open(ControlFilePath, O_WRONLY | PG_BINARY,
+	if ((fd = open(ControlFilePath, O_WRONLY | PG_BINARY | open_flag,
 				   pg_file_create_mode)) == -1)
 		pg_fatal("could not open file \"%s\": %m", ControlFilePath);
 #endif
@@ -342,17 +363,29 @@ update_controlfile(const char *DataDir,
 	pgstat_report_wait_end();
 #endif
 
-	if (do_sync)
+	if (sync_op != 0)
 	{
 #ifndef FRONTEND
 		pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE);
-		if (pg_fsync(fd) != 0)
-			ereport(PANIC,
-					(errcode_for_file_access(),
-					 errmsg("could not fsync file \"%s\": %m",
-							ControlFilePath)));
+		if (sync_op == UPDATE_CONTROLFILE_FDATASYNC)
+		{
+			if (fdatasync(fd) != 0)
+				ereport(PANIC,
+						(errcode_for_file_access(),
+						 errmsg("could not fdatasync file \"%s\": %m",
+								ControlFilePath)));
+		}
+		else if (sync_op == UPDATE_CONTROLFILE_FSYNC)
+		{
+			if (pg_fsync(fd) != 0)
+				ereport(PANIC,
+						(errcode_for_file_access(),
+						 errmsg("could not fdatasync file \"%s\": %m",
+								ControlFilePath)));
+		}
 		pgstat_report_wait_end();
 #else
+		/* In frontend code, non-zero sync_op gets you plain fsync(). */
 		if (fsync(fd) != 0)
 			pg_fatal("could not fsync file \"%s\": %m", ControlFilePath);
 #endif
diff --git a/src/include/common/controldata_utils.h b/src/include/common/controldata_utils.h
index 49e7c52d31..879455c8fb 100644
--- a/src/include/common/controldata_utils.h
+++ b/src/include/common/controldata_utils.h
@@ -12,8 +12,13 @@
 
 #include "catalog/pg_control.h"
 
+#define UPDATE_CONTROLFILE_FSYNC 1
+#define UPDATE_CONTROLFILE_FDATASYNC 2
+#define UPDATE_CONTROLFILE_O_SYNC 3
+#define UPDATE_CONTROLFILE_O_DSYNC 4
+
 extern ControlFileData *get_controlfile(const char *DataDir, bool *crc_ok_p);
 extern void update_controlfile(const char *DataDir,
-							   ControlFileData *ControlFile, bool do_sync);
+							   ControlFileData *ControlFile, int sync_op);
 
 #endif							/* COMMON_CONTROLDATA_UTILS_H */
-- 
2.39.2

Reply via email to