On Wed, Feb 1, 2023 at 5:04 PM Anton A. Melnikov <aamelni...@inbox.ru> wrote:
> On 31.01.2023 14:38, Thomas Munro wrote:
> > Here's an experimental patch for that alternative.  I wonder if
> > someone would want to be able to turn it off for some reason -- maybe
> > some NFS problem?  It's less back-patchable, but maybe more
> > principled?
>
> It looks very strange to me that there may be cases where the cluster data
> is stored in NFS. Can't figure out how this can be useful.

Heh.  There are many interesting failure modes, but people do it.  I
guess my more general question when introducing any new system call
into this code is how some unusual system I can't even access is going
to break.  Maybe some obscure filesystem will fail with EOPNOTSUPP, or
take 5 seconds and then fail because there is no lock server
configured or whatever, so that's why I don't think we can back-patch
it, and we probably need a way to turn it off.

> i think this variant of the patch is a normal solution
> of the problem, not workaround. Found no problems on Linux.
> +1 for this variant.

I prefer it too.

> Might add a custom error message for EDEADLK
> since it absent in errcode_for_file_access()?

Ah, good thought.  I think it shouldn't happen™, so it's OK that
errcode_for_file_access() would classify it as ERRCODE_INTERNAL_ERROR.

Other interesting errors are:

ENOLCK: system limits exceeded; PANIC seems reasonable
EOPNOTSUPP: this file doesn't support locking (seen on FreeBSD man
pages, not on POSIX)

> > I don't know if Windows suffers from this type of problem.
> > Unfortunately its equivalent functionality LockFile() looks non-ideal
> > for this purpose: if your program crashes, the documentation is very
> > vague on when exactly it is released by the OS, but it's not
> > immediately on process exit.  That seems non-ideal for a control file
> > you might want to access again very soon after a crash, to be able to
> > recover.
>
> Unfortunately i've not had time to reproduce the problem and apply this patch 
> on
> Windows yet but i'm going to do it soon on windows 10. If a crc error
> will occur there, then we might use the workaround from the first
> variant of the patch.

Thank you for investigating.  I am afraid to read your results.

One idea would be to proceed with LockFile() for Windows, with a note
suggesting you file a bug with your OS vendor if you ever need it to
get unstuck.  Googling this subject, I read that MongoDB used to
suffer from stuck lock files, until an OS bug report led to recent
versions releasing locks more promptly.  I find that sufficiently
scary that I would want to default the feature to off on Windows, even
if your testing shows that it does really need it.

> > A thought in passing: if UpdateMinRecoveryPoint() performance is an
> > issue, maybe we should figure out how to use fdatasync() instead of
> > fsync().
>
> May be choose it in accordance with GUC wal_sync_method?

Here's a patch like that.  I don't know if it's a good idea for
wal_sync_method to affect other kinds of files or not, but, then, it
already does (fsync_writethough changes this behaviour).   I would
only want to consider this if we also stop choosing "open_datasync" as
a default on at least Windows.
From f55c98daefc4eec7f05413edf92b3b1e7070e1ef Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Tue, 31 Jan 2023 21:08:12 +1300
Subject: [PATCH] Apply wal_sync_method to pg_control file.

When writing out the control file frequently on a standby, we can go a
little faster on some systems by using fdatasync() instead of fsync(),
with no loss of safety.  We already respected the special
"wal_sync_method=fsync_writethrough" via a circuitous route, but we
might as well support the full range of methods.
---
 src/backend/access/transam/xlog.c      | 19 ++++++++-
 src/common/controldata_utils.c         | 59 ++++++++++++++++++++------
 src/include/common/controldata_utils.h |  7 ++-
 3 files changed, 70 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fb4c860bde..d0a7a3d7eb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4165,7 +4165,24 @@ ReadControlFile(void)
 static void
 UpdateControlFile(void)
 {
-	update_controlfile(DataDir, ControlFile, true);
+	int			sync_op;
+
+	switch (sync_method)
+	{
+		case SYNC_METHOD_FDATASYNC:
+			sync_op = UPDATE_CONTROLFILE_FDATASYNC;
+			break;
+		case SYNC_METHOD_OPEN:
+			sync_op = UPDATE_CONTROLFILE_O_SYNC;
+			break;
+		case SYNC_METHOD_OPEN_DSYNC:
+			sync_op = UPDATE_CONTROLFILE_O_DSYNC;
+			break;
+		default:
+			sync_op = UPDATE_CONTROLFILE_FSYNC;
+	}
+
+	update_controlfile(DataDir, ControlFile, sync_op);
 }
 
 /*
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index 9723587466..1bc1c6f8d4 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -136,18 +136,39 @@ get_controlfile(const char *DataDir, bool *crc_ok_p)
  * update_controlfile()
  *
  * Update controlfile values with the contents given by caller.  The
- * contents to write are included in "ControlFile". "do_sync" can be
- * optionally used to flush the updated control file.  Note that it is up
- * to the caller to properly lock ControlFileLock when calling this
- * routine in the backend.
+ * contents to write are included in "ControlFile".  "sync_op" can be set
+ * to 0 or a synchronization method UPDATE_CONTROLFILE_XXX.  In frontend code,
+ * only 0 and non-zero (fsync) are meaningful.
+ * Note that it is up to the caller to properly lock ControlFileLock when
+ * calling this routine in the backend.
  */
 void
 update_controlfile(const char *DataDir,
-				   ControlFileData *ControlFile, bool do_sync)
+				   ControlFileData *ControlFile,
+				   int sync_op)
 {
 	int			fd;
 	char		buffer[PG_CONTROL_FILE_SIZE];
 	char		ControlFilePath[MAXPGPATH];
+	int			open_flag;
+
+	switch (sync_op)
+	{
+#ifndef FRONTEND
+#ifdef O_SYNC
+		case UPDATE_CONTROLFILE_O_SYNC:
+			open_flag = O_SYNC;
+			break;
+#endif
+#ifdef O_DSYNC
+		case UPDATE_CONTROLFILE_O_DSYNC:
+			open_flag = O_DSYNC;
+			break;
+#endif
+#endif
+		default:
+			open_flag = 0;
+	}
 
 	/* Update timestamp  */
 	ControlFile->time = (pg_time_t) time(NULL);
@@ -175,13 +196,13 @@ update_controlfile(const char *DataDir,
 	 * All errors issue a PANIC, so no need to use OpenTransientFile() and to
 	 * worry about file descriptor leaks.
 	 */
-	if ((fd = BasicOpenFile(ControlFilePath, O_RDWR | PG_BINARY)) < 0)
+	if ((fd = BasicOpenFile(ControlFilePath, O_RDWR | PG_BINARY | open_flag)) < 0)
 		ereport(PANIC,
 				(errcode_for_file_access(),
 				 errmsg("could not open file \"%s\": %m",
 						ControlFilePath)));
 #else
-	if ((fd = open(ControlFilePath, O_WRONLY | PG_BINARY,
+	if ((fd = open(ControlFilePath, O_WRONLY | PG_BINARY | open_flag,
 				   pg_file_create_mode)) == -1)
 		pg_fatal("could not open file \"%s\": %m", ControlFilePath);
 #endif
@@ -209,17 +230,29 @@ update_controlfile(const char *DataDir,
 	pgstat_report_wait_end();
 #endif
 
-	if (do_sync)
+	if (sync_op != 0)
 	{
 #ifndef FRONTEND
 		pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE);
-		if (pg_fsync(fd) != 0)
-			ereport(PANIC,
-					(errcode_for_file_access(),
-					 errmsg("could not fsync file \"%s\": %m",
-							ControlFilePath)));
+		if (sync_op == UPDATE_CONTROLFILE_FDATASYNC)
+		{
+			if (fdatasync(fd) != 0)
+				ereport(PANIC,
+						(errcode_for_file_access(),
+						 errmsg("could not fdatasync file \"%s\": %m",
+								ControlFilePath)));
+		}
+		else if (sync_op == UPDATE_CONTROLFILE_FSYNC)
+		{
+			if (pg_fsync(fd) != 0)
+				ereport(PANIC,
+						(errcode_for_file_access(),
+						 errmsg("could not fdatasync file \"%s\": %m",
+								ControlFilePath)));
+		}
 		pgstat_report_wait_end();
 #else
+		/* In frontend code, non-zero sync_op gets you plain fsync(). */
 		if (fsync(fd) != 0)
 			pg_fatal("could not fsync file \"%s\": %m", ControlFilePath);
 #endif
diff --git a/src/include/common/controldata_utils.h b/src/include/common/controldata_utils.h
index 49e7c52d31..879455c8fb 100644
--- a/src/include/common/controldata_utils.h
+++ b/src/include/common/controldata_utils.h
@@ -12,8 +12,13 @@
 
 #include "catalog/pg_control.h"
 
+#define UPDATE_CONTROLFILE_FSYNC 1
+#define UPDATE_CONTROLFILE_FDATASYNC 2
+#define UPDATE_CONTROLFILE_O_SYNC 3
+#define UPDATE_CONTROLFILE_O_DSYNC 4
+
 extern ControlFileData *get_controlfile(const char *DataDir, bool *crc_ok_p);
 extern void update_controlfile(const char *DataDir,
-							   ControlFileData *ControlFile, bool do_sync);
+							   ControlFileData *ControlFile, int sync_op);
 
 #endif							/* COMMON_CONTROLDATA_UTILS_H */
-- 
2.35.1

Reply via email to