On Wed, Feb 1, 2023 at 5:04 PM Anton A. Melnikov <[email protected]> wrote: > On 31.01.2023 14:38, Thomas Munro wrote: > > Here's an experimental patch for that alternative. I wonder if > > someone would want to be able to turn it off for some reason -- maybe > > some NFS problem? It's less back-patchable, but maybe more > > principled? > > It looks very strange to me that there may be cases where the cluster data > is stored in NFS. Can't figure out how this can be useful.
Heh. There are many interesting failure modes, but people do it. I guess my more general question when introducing any new system call into this code is how some unusual system I can't even access is going to break. Maybe some obscure filesystem will fail with EOPNOTSUPP, or take 5 seconds and then fail because there is no lock server configured or whatever, so that's why I don't think we can back-patch it, and we probably need a way to turn it off. > i think this variant of the patch is a normal solution > of the problem, not workaround. Found no problems on Linux. > +1 for this variant. I prefer it too. > Might add a custom error message for EDEADLK > since it absent in errcode_for_file_access()? Ah, good thought. I think it shouldn't happen™, so it's OK that errcode_for_file_access() would classify it as ERRCODE_INTERNAL_ERROR. Other interesting errors are: ENOLCK: system limits exceeded; PANIC seems reasonable EOPNOTSUPP: this file doesn't support locking (seen on FreeBSD man pages, not on POSIX) > > I don't know if Windows suffers from this type of problem. > > Unfortunately its equivalent functionality LockFile() looks non-ideal > > for this purpose: if your program crashes, the documentation is very > > vague on when exactly it is released by the OS, but it's not > > immediately on process exit. That seems non-ideal for a control file > > you might want to access again very soon after a crash, to be able to > > recover. > > Unfortunately i've not had time to reproduce the problem and apply this patch > on > Windows yet but i'm going to do it soon on windows 10. If a crc error > will occur there, then we might use the workaround from the first > variant of the patch. Thank you for investigating. I am afraid to read your results. One idea would be to proceed with LockFile() for Windows, with a note suggesting you file a bug with your OS vendor if you ever need it to get unstuck. Googling this subject, I read that MongoDB used to suffer from stuck lock files, until an OS bug report led to recent versions releasing locks more promptly. I find that sufficiently scary that I would want to default the feature to off on Windows, even if your testing shows that it does really need it. > > A thought in passing: if UpdateMinRecoveryPoint() performance is an > > issue, maybe we should figure out how to use fdatasync() instead of > > fsync(). > > May be choose it in accordance with GUC wal_sync_method? Here's a patch like that. I don't know if it's a good idea for wal_sync_method to affect other kinds of files or not, but, then, it already does (fsync_writethough changes this behaviour). I would only want to consider this if we also stop choosing "open_datasync" as a default on at least Windows.
From f55c98daefc4eec7f05413edf92b3b1e7070e1ef Mon Sep 17 00:00:00 2001 From: Thomas Munro <[email protected]> Date: Tue, 31 Jan 2023 21:08:12 +1300 Subject: [PATCH] Apply wal_sync_method to pg_control file. When writing out the control file frequently on a standby, we can go a little faster on some systems by using fdatasync() instead of fsync(), with no loss of safety. We already respected the special "wal_sync_method=fsync_writethrough" via a circuitous route, but we might as well support the full range of methods. --- src/backend/access/transam/xlog.c | 19 ++++++++- src/common/controldata_utils.c | 59 ++++++++++++++++++++------ src/include/common/controldata_utils.h | 7 ++- 3 files changed, 70 insertions(+), 15 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index fb4c860bde..d0a7a3d7eb 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -4165,7 +4165,24 @@ ReadControlFile(void) static void UpdateControlFile(void) { - update_controlfile(DataDir, ControlFile, true); + int sync_op; + + switch (sync_method) + { + case SYNC_METHOD_FDATASYNC: + sync_op = UPDATE_CONTROLFILE_FDATASYNC; + break; + case SYNC_METHOD_OPEN: + sync_op = UPDATE_CONTROLFILE_O_SYNC; + break; + case SYNC_METHOD_OPEN_DSYNC: + sync_op = UPDATE_CONTROLFILE_O_DSYNC; + break; + default: + sync_op = UPDATE_CONTROLFILE_FSYNC; + } + + update_controlfile(DataDir, ControlFile, sync_op); } /* diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c index 9723587466..1bc1c6f8d4 100644 --- a/src/common/controldata_utils.c +++ b/src/common/controldata_utils.c @@ -136,18 +136,39 @@ get_controlfile(const char *DataDir, bool *crc_ok_p) * update_controlfile() * * Update controlfile values with the contents given by caller. The - * contents to write are included in "ControlFile". "do_sync" can be - * optionally used to flush the updated control file. Note that it is up - * to the caller to properly lock ControlFileLock when calling this - * routine in the backend. + * contents to write are included in "ControlFile". "sync_op" can be set + * to 0 or a synchronization method UPDATE_CONTROLFILE_XXX. In frontend code, + * only 0 and non-zero (fsync) are meaningful. + * Note that it is up to the caller to properly lock ControlFileLock when + * calling this routine in the backend. */ void update_controlfile(const char *DataDir, - ControlFileData *ControlFile, bool do_sync) + ControlFileData *ControlFile, + int sync_op) { int fd; char buffer[PG_CONTROL_FILE_SIZE]; char ControlFilePath[MAXPGPATH]; + int open_flag; + + switch (sync_op) + { +#ifndef FRONTEND +#ifdef O_SYNC + case UPDATE_CONTROLFILE_O_SYNC: + open_flag = O_SYNC; + break; +#endif +#ifdef O_DSYNC + case UPDATE_CONTROLFILE_O_DSYNC: + open_flag = O_DSYNC; + break; +#endif +#endif + default: + open_flag = 0; + } /* Update timestamp */ ControlFile->time = (pg_time_t) time(NULL); @@ -175,13 +196,13 @@ update_controlfile(const char *DataDir, * All errors issue a PANIC, so no need to use OpenTransientFile() and to * worry about file descriptor leaks. */ - if ((fd = BasicOpenFile(ControlFilePath, O_RDWR | PG_BINARY)) < 0) + if ((fd = BasicOpenFile(ControlFilePath, O_RDWR | PG_BINARY | open_flag)) < 0) ereport(PANIC, (errcode_for_file_access(), errmsg("could not open file \"%s\": %m", ControlFilePath))); #else - if ((fd = open(ControlFilePath, O_WRONLY | PG_BINARY, + if ((fd = open(ControlFilePath, O_WRONLY | PG_BINARY | open_flag, pg_file_create_mode)) == -1) pg_fatal("could not open file \"%s\": %m", ControlFilePath); #endif @@ -209,17 +230,29 @@ update_controlfile(const char *DataDir, pgstat_report_wait_end(); #endif - if (do_sync) + if (sync_op != 0) { #ifndef FRONTEND pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE); - if (pg_fsync(fd) != 0) - ereport(PANIC, - (errcode_for_file_access(), - errmsg("could not fsync file \"%s\": %m", - ControlFilePath))); + if (sync_op == UPDATE_CONTROLFILE_FDATASYNC) + { + if (fdatasync(fd) != 0) + ereport(PANIC, + (errcode_for_file_access(), + errmsg("could not fdatasync file \"%s\": %m", + ControlFilePath))); + } + else if (sync_op == UPDATE_CONTROLFILE_FSYNC) + { + if (pg_fsync(fd) != 0) + ereport(PANIC, + (errcode_for_file_access(), + errmsg("could not fdatasync file \"%s\": %m", + ControlFilePath))); + } pgstat_report_wait_end(); #else + /* In frontend code, non-zero sync_op gets you plain fsync(). */ if (fsync(fd) != 0) pg_fatal("could not fsync file \"%s\": %m", ControlFilePath); #endif diff --git a/src/include/common/controldata_utils.h b/src/include/common/controldata_utils.h index 49e7c52d31..879455c8fb 100644 --- a/src/include/common/controldata_utils.h +++ b/src/include/common/controldata_utils.h @@ -12,8 +12,13 @@ #include "catalog/pg_control.h" +#define UPDATE_CONTROLFILE_FSYNC 1 +#define UPDATE_CONTROLFILE_FDATASYNC 2 +#define UPDATE_CONTROLFILE_O_SYNC 3 +#define UPDATE_CONTROLFILE_O_DSYNC 4 + extern ControlFileData *get_controlfile(const char *DataDir, bool *crc_ok_p); extern void update_controlfile(const char *DataDir, - ControlFileData *ControlFile, bool do_sync); + ControlFileData *ControlFile, int sync_op); #endif /* COMMON_CONTROLDATA_UTILS_H */ -- 2.35.1
