Re: should frontend tools use syncfs() ?
On Mon, Oct 09, 2023 at 02:34:27PM -0500, Nathan Bossart wrote: > On Mon, Oct 09, 2023 at 11:14:39AM -0500, Nathan Bossart wrote: >> Thanks. I've made a couple of small changes, but otherwise I think this >> one is just about ready. > > I forgot to rename one thing. Here's a v13 with that fixed. Committed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: should frontend tools use syncfs() ?
On Wed, Oct 04, 2023 at 10:29:07AM -0500, Nathan Bossart wrote: > On Wed, Sep 27, 2023 at 01:56:08PM +0100, Peter Eisentraut wrote: >> We have a collection of platform-specific notes in chapter 19, including >> file-system-related notes in section 19.2. Maybe it could be put there? > > I will give this a try. I started on this, but I couldn't shake the feeling that this wasn't the right place for these notes. This chapter is about setting up a server, and the syncfs() notes do apply to the recovery_init_sync_method configuration parameter, but it also applies to a number of server/client applications. I've been looking around, and I haven't found a great place to move this section to. IMO some of the other appendices have similar amounts of information (e.g., Date/Time Support, The Source Code Repository, Color Support), so maybe a dedicated appendix isn't too extreme. Another option could be to introduce a new section for platform-specific notes, but that would just make this section even larger for now. Thoughts? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: should frontend tools use syncfs() ?
On Mon, Oct 09, 2023 at 11:14:39AM -0500, Nathan Bossart wrote: > Thanks. I've made a couple of small changes, but otherwise I think this > one is just about ready. I forgot to rename one thing. Here's a v13 with that fixed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From b1829fea8f81cceceabd37585ab74a290505d0d5 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 9 Oct 2023 11:07:29 -0500 Subject: [PATCH v13 1/1] Align names of variables, etc. in wal_sync_method-related code. Author: Maxim Orlov Discussion: https://postgr.es/m/CACG%3DezbL1gwE7_K7sr9uqaCGkWhmvRTcTEnm3%2BX1xsRNwbXULQ%40mail.gmail.com --- src/backend/access/transam/xlog.c | 58 ++--- src/backend/storage/file/fd.c | 4 +- src/backend/utils/misc/guc_tables.c | 8 ++-- src/include/access/xlog.h | 15 +--- src/include/access/xlogdefs.h | 8 ++-- src/include/port/freebsd.h | 2 +- src/include/port/linux.h| 2 +- src/include/utils/guc_hooks.h | 2 +- src/tools/pgindent/typedefs.list| 1 + 9 files changed, 52 insertions(+), 48 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index fcbde10529..a0acd67772 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -130,7 +130,7 @@ bool *wal_consistency_checking = NULL; bool wal_init_zero = true; bool wal_recycle = true; bool log_checkpoints = true; -int sync_method = DEFAULT_SYNC_METHOD; +int wal_sync_method = DEFAULT_WAL_SYNC_METHOD; int wal_level = WAL_LEVEL_REPLICA; int CommitDelay = 0; /* precommit delay in microseconds */ int CommitSiblings = 5; /* # concurrent xacts needed to sleep */ @@ -171,17 +171,17 @@ static bool check_wal_consistency_checking_deferred = false; /* * GUC support */ -const struct config_enum_entry sync_method_options[] = { - {"fsync", SYNC_METHOD_FSYNC, false}, +const struct config_enum_entry wal_sync_method_options[] = { + {"fsync", WAL_SYNC_METHOD_FSYNC, false}, #ifdef HAVE_FSYNC_WRITETHROUGH - {"fsync_writethrough", SYNC_METHOD_FSYNC_WRITETHROUGH, false}, + {"fsync_writethrough", WAL_SYNC_METHOD_FSYNC_WRITETHROUGH, false}, #endif - {"fdatasync", SYNC_METHOD_FDATASYNC, false}, + {"fdatasync", WAL_SYNC_METHOD_FDATASYNC, false}, #ifdef O_SYNC - {"open_sync", SYNC_METHOD_OPEN, false}, + {"open_sync", WAL_SYNC_METHOD_OPEN, false}, #endif #ifdef O_DSYNC - {"open_datasync", SYNC_METHOD_OPEN_DSYNC, false}, + {"open_datasync", WAL_SYNC_METHOD_OPEN_DSYNC, false}, #endif {NULL, 0, false} }; @@ -2328,8 +2328,8 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible) * have no open file or the wrong one. However, we do not need to * fsync more than one file. */ - if (sync_method != SYNC_METHOD_OPEN && - sync_method != SYNC_METHOD_OPEN_DSYNC) + if (wal_sync_method != WAL_SYNC_METHOD_OPEN && + wal_sync_method != WAL_SYNC_METHOD_OPEN_DSYNC) { if (openLogFile >= 0 && !XLByteInPrevSeg(LogwrtResult.Write, openLogSegNo, @@ -2959,7 +2959,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli, */ *added = false; fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC | - get_sync_bit(sync_method)); + get_sync_bit(wal_sync_method)); if (fd < 0) { if (errno != ENOENT) @@ -3124,7 +3124,7 @@ XLogFileInit(XLogSegNo logsegno, TimeLineID logtli) /* Now open original target segment (might not be file I just made) */ fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC | - get_sync_bit(sync_method)); + get_sync_bit(wal_sync_method)); if (fd < 0) ereport(ERROR, (errcode_for_file_access(), @@ -3356,7 +3356,7 @@ XLogFileOpen(XLogSegNo segno, TimeLineID tli) XLogFilePath(path, tli, segno, wal_segment_size); fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC | - get_sync_bit(sync_method)); + get_sync_bit(wal_sync_method)); if (fd < 0) ereport(PANIC, (errcode_for_file_access(), @@ -8118,16 +8118,16 @@ get_sync_bit(int method) * not included in the enum option array, and therefore will never * be seen here. */ - case SYNC_METHOD_FSYNC: - case SYNC_METHOD_FSYNC_WRITETHROUGH: - case SYNC_METHOD_FDATASYNC: + case WAL_SYNC_METHOD_FSYNC: + case WAL_SYNC_METHOD_FSYNC_WRITETHROUGH: + case WAL_SYNC_METHOD_FDATASYNC: return o_direct_flag; #ifdef O_SYNC - case SYNC_METHOD_OPEN: + case WAL_SYNC_METHOD_OPEN: return O_SYNC | o_direct_flag; #endif #ifdef O_DSYNC - case SYNC_METHOD_OPEN_DSYNC: + case WAL_SYNC_METHOD_OPEN_DSYNC: return O_DSYNC | o_direct_flag; #endif default: @@ -8141,9 +8141,9 @@ get_sync_bit(int method) * GUC support */ void -assign_xlog_sync_method(int new_sync_method, void *extra) +assign_wal_sync_method(int new_wal_sync_method, void *extra) { - if (sync_method != new_sync_method) + if (wal_sync_method != new_wal_sync_method) { /* * To ensure that no blocks escape unsync
Re: should frontend tools use syncfs() ?
On Mon, Oct 09, 2023 at 01:12:16PM +0300, Maxim Orlov wrote: > On Fri, 6 Oct 2023 at 22:35, Nathan Bossart > wrote: >> From a quick skim, this one looks pretty good to me. Would you mind adding >> it to the commitfest so that it doesn't get lost? I will aim to take a >> closer look at it next week. > > Sounds good, thanks a lot! > > https://commitfest.postgresql.org/45/4609/ Thanks. I've made a couple of small changes, but otherwise I think this one is just about ready. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 481efed2b47ab1cf37308a789698069d1a84561b Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 9 Oct 2023 11:07:29 -0500 Subject: [PATCH v12 1/1] Align names of variables, etc. in wal_sync_method-related code. Author: Maxim Orlov Discussion: https://postgr.es/m/CACG%3DezbL1gwE7_K7sr9uqaCGkWhmvRTcTEnm3%2BX1xsRNwbXULQ%40mail.gmail.com --- src/backend/access/transam/xlog.c | 58 ++--- src/backend/storage/file/fd.c | 4 +- src/backend/utils/misc/guc_tables.c | 8 ++-- src/include/access/xlog.h | 15 +--- src/include/access/xlogdefs.h | 8 ++-- src/include/port/freebsd.h | 2 +- src/include/port/linux.h| 2 +- src/include/utils/guc_hooks.h | 2 +- src/tools/pgindent/typedefs.list| 1 + 9 files changed, 52 insertions(+), 48 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index fcbde10529..a0acd67772 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -130,7 +130,7 @@ bool *wal_consistency_checking = NULL; bool wal_init_zero = true; bool wal_recycle = true; bool log_checkpoints = true; -int sync_method = DEFAULT_SYNC_METHOD; +int wal_sync_method = DEFAULT_WAL_SYNC_METHOD; int wal_level = WAL_LEVEL_REPLICA; int CommitDelay = 0; /* precommit delay in microseconds */ int CommitSiblings = 5; /* # concurrent xacts needed to sleep */ @@ -171,17 +171,17 @@ static bool check_wal_consistency_checking_deferred = false; /* * GUC support */ -const struct config_enum_entry sync_method_options[] = { - {"fsync", SYNC_METHOD_FSYNC, false}, +const struct config_enum_entry wal_sync_method_options[] = { + {"fsync", WAL_SYNC_METHOD_FSYNC, false}, #ifdef HAVE_FSYNC_WRITETHROUGH - {"fsync_writethrough", SYNC_METHOD_FSYNC_WRITETHROUGH, false}, + {"fsync_writethrough", WAL_SYNC_METHOD_FSYNC_WRITETHROUGH, false}, #endif - {"fdatasync", SYNC_METHOD_FDATASYNC, false}, + {"fdatasync", WAL_SYNC_METHOD_FDATASYNC, false}, #ifdef O_SYNC - {"open_sync", SYNC_METHOD_OPEN, false}, + {"open_sync", WAL_SYNC_METHOD_OPEN, false}, #endif #ifdef O_DSYNC - {"open_datasync", SYNC_METHOD_OPEN_DSYNC, false}, + {"open_datasync", WAL_SYNC_METHOD_OPEN_DSYNC, false}, #endif {NULL, 0, false} }; @@ -2328,8 +2328,8 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible) * have no open file or the wrong one. However, we do not need to * fsync more than one file. */ - if (sync_method != SYNC_METHOD_OPEN && - sync_method != SYNC_METHOD_OPEN_DSYNC) + if (wal_sync_method != WAL_SYNC_METHOD_OPEN && + wal_sync_method != WAL_SYNC_METHOD_OPEN_DSYNC) { if (openLogFile >= 0 && !XLByteInPrevSeg(LogwrtResult.Write, openLogSegNo, @@ -2959,7 +2959,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli, */ *added = false; fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC | - get_sync_bit(sync_method)); + get_sync_bit(wal_sync_method)); if (fd < 0) { if (errno != ENOENT) @@ -3124,7 +3124,7 @@ XLogFileInit(XLogSegNo logsegno, TimeLineID logtli) /* Now open original target segment (might not be file I just made) */ fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC | - get_sync_bit(sync_method)); + get_sync_bit(wal_sync_method)); if (fd < 0) ereport(ERROR, (errcode_for_file_access(), @@ -3356,7 +3356,7 @@ XLogFileOpen(XLogSegNo segno, TimeLineID tli) XLogFilePath(path, tli, segno, wal_segment_size); fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC | - get_sync_bit(sync_method)); + get_sync_bit(wal_sync_method)); if (fd < 0) ereport(PANIC, (errcode_for_file_access(), @@ -8118,16 +8118,16 @@ get_sync_bit(int method) * not included in the enum option array, and therefore will never * be seen here. */ - case SYNC_METHOD_FSYNC: - case SYNC_METHOD_FSYNC_WRITETHROUGH: - case SYNC_METHOD_FDATASYNC: + case WAL_SYNC_METHOD_FSYNC: + case WAL_SYNC_METHOD_FSYNC_WRITETHROUGH: + case WAL_SYNC_METHOD_FDATASYNC: return o_direct_flag; #ifdef O_SYNC - case SYNC_METHOD_OPEN: + case WAL_SYNC_METHOD_OPEN: return O_SYNC | o_direct_flag; #endif #ifdef O_DSYNC - case SYNC_METHOD_OPEN_DSYNC: + case WAL_SYNC_METHOD_OPEN_DSYNC: return O_DSYNC | o_direct_flag; #endif default: @@ -8141,9 +8141,9 @@ get_sync_bit(int method) * GUC support */ void -assign_
Re: should frontend tools use syncfs() ?
On Fri, 6 Oct 2023 at 22:35, Nathan Bossart wrote: > From a quick skim, this one looks pretty good to me. Would you mind adding > it to the commitfest so that it doesn't get lost? I will aim to take a > closer look at it next week. > Sounds good, thanks a lot! https://commitfest.postgresql.org/45/4609/ -- Best regards, Maxim Orlov.
Re: should frontend tools use syncfs() ?
On Fri, Oct 06, 2023 at 10:50:11AM +0300, Maxim Orlov wrote: > Back to the patch v11. I don’t understand a bit, what we should do next? > Make a separate thread or put this one on commitfest? >From a quick skim, this one looks pretty good to me. Would you mind adding it to the commitfest so that it doesn't get lost? I will aim to take a closer look at it next week. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: should frontend tools use syncfs() ?
Back to the patch v11. I don’t understand a bit, what we should do next? Make a separate thread or put this one on commitfest? -- Best regards, Maxim Orlov.
Re: should frontend tools use syncfs() ?
On Wed, Sep 27, 2023 at 01:56:08PM +0100, Peter Eisentraut wrote: > I think it's a bit much to add a whole appendix for that little content. I'm inclined to agree. > We have a collection of platform-specific notes in chapter 19, including > file-system-related notes in section 19.2. Maybe it could be put there? I will give this a try. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: should frontend tools use syncfs() ?
On 17.08.23 04:50, Michael Paquier wrote: Yeah, this crossed my mind. Do you know of any existing examples of options with links to a common section? One problem with this approach is that there are small differences in the wording for some of the frontend utilities, so it might be difficult to cleanly unite these sections. The closest thing I can think of is Color Support in section Appendixes, that describes something shared across a lot of binaries (that would be 6 tools with this patch). I think it's a bit much to add a whole appendix for that little content. We have a collection of platform-specific notes in chapter 19, including file-system-related notes in section 19.2. Maybe it could be put there?
Re: should frontend tools use syncfs() ?
On Wed, 20 Sept 2023 at 22:08, Nathan Bossart wrote: > I think we should also consider renaming things like SYNC_METHOD_FSYNC to > WAL_SYNC_METHOD_FSYNC, and sync_method_options to wal_sync_method_options. > I've already rename sync_method_options in previous patch. 34 @@ -171,7 +171,7 @@ static bool check_wal_consistency_checking_deferred = false; 35 /* 36 * GUC support 37 */ 38 -const struct config_enum_entry sync_method_options[] = { 39 +const struct config_enum_entry wal_sync_method_options[] = { As for SYNC_METHOD_FSYNC rename, PFA patch. Also make enum for WAL sync methods instead of defines. -- Best regards, Maxim Orlov. v11-0001-Fix-conflicting-types-for-sync_method.patch Description: Binary data
Re: should frontend tools use syncfs() ?
On Wed, Sep 20, 2023 at 03:12:56PM +0300, Maxim Orlov wrote: > As a solution, I suggest renaming sync_method in xlog module to > wal_sync_method. In fact, > appropriate GUC for this variable, called "wal_sync_method" and I see no > reason not to use > the exact same name for a variable in xlog module. +1 I think we should also consider renaming things like SYNC_METHOD_FSYNC to WAL_SYNC_METHOD_FSYNC, and sync_method_options to wal_sync_method_options. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: should frontend tools use syncfs() ?
On Thu, 7 Sept 2023 at 03:34, Nathan Bossart wrote: > Committed. > Hi! Great job! But here is one problem I've encountered during working on some unrelated stuff. How we have two different things call the same name – sync_method. One in xlog: intsync_method = DEFAULT_SYNC_METHOD; ...and another one in "bins": static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC; In current include order, this is not a problem, but imagine you add a couple of new includes, for example: --- a/src/include/storage/bufpage.h +++ b/src/include/storage/bufpage.h @@ -18,6 +18,8 @@ #include "storage/block.h" #include "storage/item.h" #include "storage/off.h" +#include "postgres.h" +#include "utils/rel.h" And build will be broken, because we how have two different things called "sync_method" with different types: In file included from .../src/bin/pg_rewind/pg_rewind.c:33: In file included from .../src/include/storage/bufpage.h:22: In file included from .../src/include/utils/rel.h:18: .../src/include/access/xlog.h:27:24: error: redeclaration of 'sync_method' with a different type: 'int' vs 'DataDirSyncMethod' (aka 'enum DataDirSyncMethod') extern PGDLLIMPORT int sync_method; ... As a solution, I suggest renaming sync_method in xlog module to wal_sync_method. In fact, appropriate GUC for this variable, called "wal_sync_method" and I see no reason not to use the exact same name for a variable in xlog module. -- Best regards, Maxim Orlov. diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h index 424ecba028f..3aa1fc60cb4 100644 --- a/src/include/storage/bufpage.h +++ b/src/include/storage/bufpage.h @@ -18,6 +18,8 @@ #include "storage/block.h" #include "storage/item.h" #include "storage/off.h" +#include "postgres.h" +#include "utils/rel.h" /* * A postgres disk page is an abstraction layered on top of a postgres v10-0001-Fix-conflicting-types-for-sync_method.patch Description: Binary data
Re: should frontend tools use syncfs() ?
On Tue, Sep 05, 2023 at 05:08:53PM -0700, Nathan Bossart wrote: > I've committed 0001 for now. I plan to commit the rest in the next couple > of days. Committed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: should frontend tools use syncfs() ?
I've committed 0001 for now. I plan to commit the rest in the next couple of days. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: should frontend tools use syncfs() ?
On Fri, Sep 01, 2023 at 01:19:13PM -0500, Justin Pryzby wrote: > What about (per git grep no-sync doc) pg_receivewal? I don't think it's applicable there, either. IIUC that option specifies whether to sync the data as it is streamed over. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: should frontend tools use syncfs() ?
On Fri, Sep 01, 2023 at 11:08:51AM -0700, Nathan Bossart wrote: > > This should probably give a distinct error when syncfs is not supported > > than when it's truely recognized. > > Later versions of the patch should have this. Oops, right. > > The patch should handle pg_dumpall, too. > > It looks like pg_dumpall only ever fsyncs a single file, so I don't think > it is really needed there. What about (per git grep no-sync doc) pg_receivewal? -- Justin
Re: should frontend tools use syncfs() ?
Thanks for taking a look. On Fri, Sep 01, 2023 at 12:58:10PM -0500, Justin Pryzby wrote: >> +if (!user_opts.sync_method) >> +user_opts.sync_method = pg_strdup("fsync"); > > why pstrdup? I believe I was just following the precedent set by some of the other options. >> +parse_sync_method(const char *optarg, SyncMethod *sync_method) >> +{ >> +if (strcmp(optarg, "fsync") == 0) >> +*sync_method = SYNC_METHOD_FSYNC; >> +#ifdef HAVE_SYNCFS >> +else if (strcmp(optarg, "syncfs") == 0) >> +*sync_method = SYNC_METHOD_SYNCFS; >> +#endif >> +else >> +{ >> +pg_log_error("unrecognized sync method: %s", optarg); >> +return false; >> +} > > This should probably give a distinct error when syncfs is not supported > than when it's truely recognized. Later versions of the patch should have this. > The patch should handle pg_dumpall, too. It looks like pg_dumpall only ever fsyncs a single file, so I don't think it is really needed there. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: should frontend tools use syncfs() ?
> + if (!user_opts.sync_method) > + user_opts.sync_method = pg_strdup("fsync"); why pstrdup? > +parse_sync_method(const char *optarg, SyncMethod *sync_method) > +{ > + if (strcmp(optarg, "fsync") == 0) > + *sync_method = SYNC_METHOD_FSYNC; > +#ifdef HAVE_SYNCFS > + else if (strcmp(optarg, "syncfs") == 0) > + *sync_method = SYNC_METHOD_SYNCFS; > +#endif > + else > + { > + pg_log_error("unrecognized sync method: %s", optarg); > + return false; > + } This should probably give a distinct error when syncfs is not supported than when it's truely recognized. The patch should handle pg_dumpall, too. Note that /bin/sync doesn't try to de-duplicate, it does just what you tell it. $ strace -e openat,syncfs,fsync sync / / / -f ... openat(AT_FDCWD, "/", O_RDONLY|O_NONBLOCK) = 3 syncfs(3) = 0 openat(AT_FDCWD, "/", O_RDONLY|O_NONBLOCK) = 3 syncfs(3) = 0 openat(AT_FDCWD, "/", O_RDONLY|O_NONBLOCK) = 3 syncfs(3) = 0 +++ exited with 0 +++
Re: should frontend tools use syncfs() ?
On Fri, Sep 01, 2023 at 10:40:12AM +0900, Michael Paquier wrote: > That should be OK this way. The extra running time is not really > visible, right? AFAICT it is negligible. Presumably it could take a little longer if there is a lot to sync on the file system, but I don't know if that's worth worrying about. > +command_ok([ 'initdb', '-S', $datadir, '--sync-method', 'fsync' ], > + 'sync method fsync'); > > Removing this one may be fine, actually, because we test the sync > paths on other places like pg_dump. Done. > This split is OK by me, so WFM. Cool. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From bc2210a3cee0a852b65f74b544a0c0d23c59b78f Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 21 Aug 2023 19:05:14 -0700 Subject: [PATCH v9 1/4] move PG_TEMP_FILE* macros to file_utils.h --- src/backend/backup/basebackup.c | 2 +- src/backend/postmaster/postmaster.c | 1 + src/backend/storage/file/fileset.c | 1 + src/bin/pg_checksums/pg_checksums.c | 10 -- src/bin/pg_rewind/filemap.c | 2 +- src/include/common/file_utils.h | 4 src/include/storage/fd.h| 4 7 files changed, 8 insertions(+), 16 deletions(-) diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c index 45be21131c..5d66014499 100644 --- a/src/backend/backup/basebackup.c +++ b/src/backend/backup/basebackup.c @@ -25,6 +25,7 @@ #include "commands/defrem.h" #include "common/compression.h" #include "common/file_perm.h" +#include "common/file_utils.h" #include "lib/stringinfo.h" #include "miscadmin.h" #include "nodes/pg_list.h" @@ -37,7 +38,6 @@ #include "storage/bufpage.h" #include "storage/checksum.h" #include "storage/dsm_impl.h" -#include "storage/fd.h" #include "storage/ipc.h" #include "storage/reinit.h" #include "utils/builtins.h" diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index d7bfb28ff3..54e9bfb8c4 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -94,6 +94,7 @@ #include "access/xlogrecovery.h" #include "catalog/pg_control.h" #include "common/file_perm.h" +#include "common/file_utils.h" #include "common/ip.h" #include "common/pg_prng.h" #include "common/string.h" diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c index e9951b0269..84cae32548 100644 --- a/src/backend/storage/file/fileset.c +++ b/src/backend/storage/file/fileset.c @@ -25,6 +25,7 @@ #include "catalog/pg_tablespace.h" #include "commands/tablespace.h" +#include "common/file_utils.h" #include "common/hashfn.h" #include "miscadmin.h" #include "storage/ipc.h" diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c index 19eb67e485..9011a19b4e 100644 --- a/src/bin/pg_checksums/pg_checksums.c +++ b/src/bin/pg_checksums/pg_checksums.c @@ -52,16 +52,6 @@ typedef enum PG_MODE_ENABLE } PgChecksumMode; -/* - * Filename components. - * - * XXX: fd.h is not declared here as frontend side code is not able to - * interact with the backend-side definitions for the various fsync - * wrappers. - */ -#define PG_TEMP_FILES_DIR "pgsql_tmp" -#define PG_TEMP_FILE_PREFIX "pgsql_tmp" - static PgChecksumMode mode = PG_MODE_CHECK; static const char *progname; diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c index bd5c598e20..58280d9abc 100644 --- a/src/bin/pg_rewind/filemap.c +++ b/src/bin/pg_rewind/filemap.c @@ -27,12 +27,12 @@ #include #include "catalog/pg_tablespace_d.h" +#include "common/file_utils.h" #include "common/hashfn.h" #include "common/string.h" #include "datapagemap.h" #include "filemap.h" #include "pg_rewind.h" -#include "storage/fd.h" /* * Define a hash table which we can use to store information about the files diff --git a/src/include/common/file_utils.h b/src/include/common/file_utils.h index b7efa1226d..dd1532bcb0 100644 --- a/src/include/common/file_utils.h +++ b/src/include/common/file_utils.h @@ -46,4 +46,8 @@ extern ssize_t pg_pwritev_with_retry(int fd, extern ssize_t pg_pwrite_zeros(int fd, size_t size, off_t offset); +/* Filename components */ +#define PG_TEMP_FILES_DIR "pgsql_tmp" +#define PG_TEMP_FILE_PREFIX "pgsql_tmp" + #endif /* FILE_UTILS_H */ diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h index 6791a406fc..aea30c0622 100644 --- a/src/include/storage/fd.h +++ b/src/include/storage/fd.h @@ -195,8 +195,4 @@ extern int durable_unlink(const char *fname, int elevel); extern void SyncDataDirectory(void); extern int data_sync_elevel(int elevel); -/* Filename components */ -#define PG_TEMP_FILES_DIR "pgsql_tmp" -#define PG_TEMP_FILE_PREFIX "pgsql_tmp" - #endif /* FD_H */ -- 2.25.1 >From ae51ea0182e88398a5c8ebd644fea9e98229e1d0 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Thu, 31 Aug 2023 07:38:42 -0700 Subject: [PATCH v9 2/4] make common enum for sync methods --- src/backend/storage/file/fd.c
Re: should frontend tools use syncfs() ?
On Thu, Aug 31, 2023 at 08:48:58AM -0700, Nathan Bossart wrote: > On Thu, Aug 31, 2023 at 02:30:33PM +0900, Michael Paquier wrote: > > - Should we have some regression tests? We should only need one test > > in one of the binaries to be able to stress the new code paths of > > file_utils.c with syncfs. The cheapest one may be pg_dump with a > > dump in directory format? Note that we have tests there that depend > > on lz4 or gzip existing, which are conditional. > > I added one for initdb in v8. +my $supports_syncfs = check_pg_config("#define HAVE_SYNCFS 1"); That should be OK this way. The extra running time is not really visible, right? +command_ok([ 'initdb', '-S', $datadir, '--sync-method', 'fsync' ], + 'sync method fsync'); Removing this one may be fine, actually, because we test the sync paths on other places like pg_dump. > Ha, I was just thinking about this, too. I actually split it into 3 > patches. The first adds DataDirSyncMethod and uses it for > recovery_init_sync_method. The second adds syncfs() support in > file_utils.c. And the third adds the ability to specify syncfs in the > frontend utilities. WDYT? This split is OK by me, so WFM. -- Michael signature.asc Description: PGP signature
Re: should frontend tools use syncfs() ?
On Thu, Aug 31, 2023 at 02:30:33PM +0900, Michael Paquier wrote: > - Should we have some regression tests? We should only need one test > in one of the binaries to be able to stress the new code paths of > file_utils.c with syncfs. The cheapest one may be pg_dump with a > dump in directory format? Note that we have tests there that depend > on lz4 or gzip existing, which are conditional. I added one for initdb in v8. > - Perhaps 0002 should be split into two parts? The first patch could > introduce DataDirSyncMethod in file_utils.h with the new routines in > file_utils.h (including syncfs support), and the second patch would > plug the new option to all the binaries. In the first patch, I would > hardcode DATA_DIR_SYNC_METHOD_FSYNC. Ha, I was just thinking about this, too. I actually split it into 3 patches. The first adds DataDirSyncMethod and uses it for recovery_init_sync_method. The second adds syncfs() support in file_utils.c. And the third adds the ability to specify syncfs in the frontend utilities. WDYT? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From e627b8ea4dcc01ac9e2fe3a21e3b3cc109b815e4 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 21 Aug 2023 19:05:14 -0700 Subject: [PATCH v8 1/4] move PG_TEMP_FILE* macros to file_utils.h --- src/backend/backup/basebackup.c | 2 +- src/backend/postmaster/postmaster.c | 1 + src/backend/storage/file/fileset.c | 1 + src/bin/pg_checksums/pg_checksums.c | 10 -- src/bin/pg_rewind/filemap.c | 2 +- src/include/common/file_utils.h | 4 src/include/storage/fd.h| 4 7 files changed, 8 insertions(+), 16 deletions(-) diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c index 45be21131c..5d66014499 100644 --- a/src/backend/backup/basebackup.c +++ b/src/backend/backup/basebackup.c @@ -25,6 +25,7 @@ #include "commands/defrem.h" #include "common/compression.h" #include "common/file_perm.h" +#include "common/file_utils.h" #include "lib/stringinfo.h" #include "miscadmin.h" #include "nodes/pg_list.h" @@ -37,7 +38,6 @@ #include "storage/bufpage.h" #include "storage/checksum.h" #include "storage/dsm_impl.h" -#include "storage/fd.h" #include "storage/ipc.h" #include "storage/reinit.h" #include "utils/builtins.h" diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index d7bfb28ff3..54e9bfb8c4 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -94,6 +94,7 @@ #include "access/xlogrecovery.h" #include "catalog/pg_control.h" #include "common/file_perm.h" +#include "common/file_utils.h" #include "common/ip.h" #include "common/pg_prng.h" #include "common/string.h" diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c index e9951b0269..84cae32548 100644 --- a/src/backend/storage/file/fileset.c +++ b/src/backend/storage/file/fileset.c @@ -25,6 +25,7 @@ #include "catalog/pg_tablespace.h" #include "commands/tablespace.h" +#include "common/file_utils.h" #include "common/hashfn.h" #include "miscadmin.h" #include "storage/ipc.h" diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c index 19eb67e485..9011a19b4e 100644 --- a/src/bin/pg_checksums/pg_checksums.c +++ b/src/bin/pg_checksums/pg_checksums.c @@ -52,16 +52,6 @@ typedef enum PG_MODE_ENABLE } PgChecksumMode; -/* - * Filename components. - * - * XXX: fd.h is not declared here as frontend side code is not able to - * interact with the backend-side definitions for the various fsync - * wrappers. - */ -#define PG_TEMP_FILES_DIR "pgsql_tmp" -#define PG_TEMP_FILE_PREFIX "pgsql_tmp" - static PgChecksumMode mode = PG_MODE_CHECK; static const char *progname; diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c index bd5c598e20..58280d9abc 100644 --- a/src/bin/pg_rewind/filemap.c +++ b/src/bin/pg_rewind/filemap.c @@ -27,12 +27,12 @@ #include #include "catalog/pg_tablespace_d.h" +#include "common/file_utils.h" #include "common/hashfn.h" #include "common/string.h" #include "datapagemap.h" #include "filemap.h" #include "pg_rewind.h" -#include "storage/fd.h" /* * Define a hash table which we can use to store information about the files diff --git a/src/include/common/file_utils.h b/src/include/common/file_utils.h index b7efa1226d..dd1532bcb0 100644 --- a/src/include/common/file_utils.h +++ b/src/include/common/file_utils.h @@ -46,4 +46,8 @@ extern ssize_t pg_pwritev_with_retry(int fd, extern ssize_t pg_pwrite_zeros(int fd, size_t size, off_t offset); +/* Filename components */ +#define PG_TEMP_FILES_DIR "pgsql_tmp" +#define PG_TEMP_FILE_PREFIX "pgsql_tmp" + #endif /* FILE_UTILS_H */ diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h index 6791a406fc..aea30c0622 100644 --- a/src/include/storage/fd.h +++ b/src/include/storage/fd.h @@ -195,8 +195,4 @@ extern int durable_unlink(const char *fname,
Re: should frontend tools use syncfs() ?
On Tue, Aug 29, 2023 at 06:14:08PM -0700, Nathan Bossart wrote: > That seems fair enough. I did this in v7. I restructured fsync_pgdata() > and fsync_dir_recurse() so that any new sync methods should cause compiler > warnings until they are implemented. That's pretty cool and easier to maintain in the long term. After sleeping on it, there are two things that popped up in my mind that may be worth considering: - Should we have some regression tests? We should only need one test in one of the binaries to be able to stress the new code paths of file_utils.c with syncfs. The cheapest one may be pg_dump with a dump in directory format? Note that we have tests there that depend on lz4 or gzip existing, which are conditional. - Perhaps 0002 should be split into two parts? The first patch could introduce DataDirSyncMethod in file_utils.h with the new routines in file_utils.h (including syncfs support), and the second patch would plug the new option to all the binaries. In the first patch, I would hardcode DATA_DIR_SYNC_METHOD_FSYNC. -- Michael signature.asc Description: PGP signature
Re: should frontend tools use syncfs() ?
On Wed, Aug 30, 2023 at 09:10:47AM +0900, Michael Paquier wrote: > I understand that I'm perhaps sounding pedantic about fsync_pgdata().. > But, after thinking more about it, I would still make this code fail > hard with an exit(EXIT_FAILURE) to let any C code calling directly > this routine with sync_method = DATA_DIR_SYNC_METHOD_SYNCFS know that > the build does not allow the use of this option when we don't have > HAVE_SYNCFS. parse_sync_method() offers some protection, but adding > this restriction also in the execution path is more friendly than > falling back silently to the default of flushing each file if > fsync_pgdata() is called with syncfs but the build does not support > it. At least that's more predictible. That seems fair enough. I did this in v7. I restructured fsync_pgdata() and fsync_dir_recurse() so that any new sync methods should cause compiler warnings until they are implemented. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From f5ecb3c8b5d397c94a9f8ab9a053b3a0cfd7f854 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 21 Aug 2023 19:05:14 -0700 Subject: [PATCH v7 1/2] move PG_TEMP_FILE* macros to file_utils.h --- src/backend/backup/basebackup.c | 2 +- src/backend/postmaster/postmaster.c | 1 + src/backend/storage/file/fileset.c | 1 + src/bin/pg_checksums/pg_checksums.c | 10 -- src/bin/pg_rewind/filemap.c | 2 +- src/include/common/file_utils.h | 4 src/include/storage/fd.h| 4 7 files changed, 8 insertions(+), 16 deletions(-) diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c index 45be21131c..5d66014499 100644 --- a/src/backend/backup/basebackup.c +++ b/src/backend/backup/basebackup.c @@ -25,6 +25,7 @@ #include "commands/defrem.h" #include "common/compression.h" #include "common/file_perm.h" +#include "common/file_utils.h" #include "lib/stringinfo.h" #include "miscadmin.h" #include "nodes/pg_list.h" @@ -37,7 +38,6 @@ #include "storage/bufpage.h" #include "storage/checksum.h" #include "storage/dsm_impl.h" -#include "storage/fd.h" #include "storage/ipc.h" #include "storage/reinit.h" #include "utils/builtins.h" diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index d7bfb28ff3..54e9bfb8c4 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -94,6 +94,7 @@ #include "access/xlogrecovery.h" #include "catalog/pg_control.h" #include "common/file_perm.h" +#include "common/file_utils.h" #include "common/ip.h" #include "common/pg_prng.h" #include "common/string.h" diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c index e9951b0269..84cae32548 100644 --- a/src/backend/storage/file/fileset.c +++ b/src/backend/storage/file/fileset.c @@ -25,6 +25,7 @@ #include "catalog/pg_tablespace.h" #include "commands/tablespace.h" +#include "common/file_utils.h" #include "common/hashfn.h" #include "miscadmin.h" #include "storage/ipc.h" diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c index 19eb67e485..9011a19b4e 100644 --- a/src/bin/pg_checksums/pg_checksums.c +++ b/src/bin/pg_checksums/pg_checksums.c @@ -52,16 +52,6 @@ typedef enum PG_MODE_ENABLE } PgChecksumMode; -/* - * Filename components. - * - * XXX: fd.h is not declared here as frontend side code is not able to - * interact with the backend-side definitions for the various fsync - * wrappers. - */ -#define PG_TEMP_FILES_DIR "pgsql_tmp" -#define PG_TEMP_FILE_PREFIX "pgsql_tmp" - static PgChecksumMode mode = PG_MODE_CHECK; static const char *progname; diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c index bd5c598e20..58280d9abc 100644 --- a/src/bin/pg_rewind/filemap.c +++ b/src/bin/pg_rewind/filemap.c @@ -27,12 +27,12 @@ #include #include "catalog/pg_tablespace_d.h" +#include "common/file_utils.h" #include "common/hashfn.h" #include "common/string.h" #include "datapagemap.h" #include "filemap.h" #include "pg_rewind.h" -#include "storage/fd.h" /* * Define a hash table which we can use to store information about the files diff --git a/src/include/common/file_utils.h b/src/include/common/file_utils.h index b7efa1226d..dd1532bcb0 100644 --- a/src/include/common/file_utils.h +++ b/src/include/common/file_utils.h @@ -46,4 +46,8 @@ extern ssize_t pg_pwritev_with_retry(int fd, extern ssize_t pg_pwrite_zeros(int fd, size_t size, off_t offset); +/* Filename components */ +#define PG_TEMP_FILES_DIR "pgsql_tmp" +#define PG_TEMP_FILE_PREFIX "pgsql_tmp" + #endif /* FILE_UTILS_H */ diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h index 6791a406fc..aea30c0622 100644 --- a/src/include/storage/fd.h +++ b/src/include/storage/fd.h @@ -195,8 +195,4 @@ extern int durable_unlink(const char *fname, int elevel); extern void SyncDataDirectory(void); extern int data_sync_elevel(int elevel); -/* Filename components */
Re: should frontend tools use syncfs() ?
On Tue, Aug 29, 2023 at 08:45:59AM -0700, Nathan Bossart wrote: > rebased 0001 looks OK, worth its own, independent, commit. I understand that I'm perhaps sounding pedantic about fsync_pgdata().. But, after thinking more about it, I would still make this code fail hard with an exit(EXIT_FAILURE) to let any C code calling directly this routine with sync_method = DATA_DIR_SYNC_METHOD_SYNCFS know that the build does not allow the use of this option when we don't have HAVE_SYNCFS. parse_sync_method() offers some protection, but adding this restriction also in the execution path is more friendly than falling back silently to the default of flushing each file if fsync_pgdata() is called with syncfs but the build does not support it. At least that's more predictible. I'm fine with the doc changes. -- Michael signature.asc Description: PGP signature
Re: should frontend tools use syncfs() ?
rebased -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 1bb28cfe5d40670386ae663e14c8854dc1b5486d Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 21 Aug 2023 19:05:14 -0700 Subject: [PATCH v6 1/2] move PG_TEMP_FILE* macros to file_utils.h --- src/backend/backup/basebackup.c | 2 +- src/backend/postmaster/postmaster.c | 1 + src/backend/storage/file/fileset.c | 1 + src/bin/pg_checksums/pg_checksums.c | 10 -- src/bin/pg_rewind/filemap.c | 2 +- src/include/common/file_utils.h | 4 src/include/storage/fd.h| 4 7 files changed, 8 insertions(+), 16 deletions(-) diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c index 45be21131c..5d66014499 100644 --- a/src/backend/backup/basebackup.c +++ b/src/backend/backup/basebackup.c @@ -25,6 +25,7 @@ #include "commands/defrem.h" #include "common/compression.h" #include "common/file_perm.h" +#include "common/file_utils.h" #include "lib/stringinfo.h" #include "miscadmin.h" #include "nodes/pg_list.h" @@ -37,7 +38,6 @@ #include "storage/bufpage.h" #include "storage/checksum.h" #include "storage/dsm_impl.h" -#include "storage/fd.h" #include "storage/ipc.h" #include "storage/reinit.h" #include "utils/builtins.h" diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index d7bfb28ff3..54e9bfb8c4 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -94,6 +94,7 @@ #include "access/xlogrecovery.h" #include "catalog/pg_control.h" #include "common/file_perm.h" +#include "common/file_utils.h" #include "common/ip.h" #include "common/pg_prng.h" #include "common/string.h" diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c index e9951b0269..84cae32548 100644 --- a/src/backend/storage/file/fileset.c +++ b/src/backend/storage/file/fileset.c @@ -25,6 +25,7 @@ #include "catalog/pg_tablespace.h" #include "commands/tablespace.h" +#include "common/file_utils.h" #include "common/hashfn.h" #include "miscadmin.h" #include "storage/ipc.h" diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c index 19eb67e485..9011a19b4e 100644 --- a/src/bin/pg_checksums/pg_checksums.c +++ b/src/bin/pg_checksums/pg_checksums.c @@ -52,16 +52,6 @@ typedef enum PG_MODE_ENABLE } PgChecksumMode; -/* - * Filename components. - * - * XXX: fd.h is not declared here as frontend side code is not able to - * interact with the backend-side definitions for the various fsync - * wrappers. - */ -#define PG_TEMP_FILES_DIR "pgsql_tmp" -#define PG_TEMP_FILE_PREFIX "pgsql_tmp" - static PgChecksumMode mode = PG_MODE_CHECK; static const char *progname; diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c index bd5c598e20..58280d9abc 100644 --- a/src/bin/pg_rewind/filemap.c +++ b/src/bin/pg_rewind/filemap.c @@ -27,12 +27,12 @@ #include #include "catalog/pg_tablespace_d.h" +#include "common/file_utils.h" #include "common/hashfn.h" #include "common/string.h" #include "datapagemap.h" #include "filemap.h" #include "pg_rewind.h" -#include "storage/fd.h" /* * Define a hash table which we can use to store information about the files diff --git a/src/include/common/file_utils.h b/src/include/common/file_utils.h index b7efa1226d..dd1532bcb0 100644 --- a/src/include/common/file_utils.h +++ b/src/include/common/file_utils.h @@ -46,4 +46,8 @@ extern ssize_t pg_pwritev_with_retry(int fd, extern ssize_t pg_pwrite_zeros(int fd, size_t size, off_t offset); +/* Filename components */ +#define PG_TEMP_FILES_DIR "pgsql_tmp" +#define PG_TEMP_FILE_PREFIX "pgsql_tmp" + #endif /* FILE_UTILS_H */ diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h index 6791a406fc..aea30c0622 100644 --- a/src/include/storage/fd.h +++ b/src/include/storage/fd.h @@ -195,8 +195,4 @@ extern int durable_unlink(const char *fname, int elevel); extern void SyncDataDirectory(void); extern int data_sync_elevel(int elevel); -/* Filename components */ -#define PG_TEMP_FILES_DIR "pgsql_tmp" -#define PG_TEMP_FILE_PREFIX "pgsql_tmp" - #endif /* FD_H */ -- 2.25.1 >From 81e87db711bb90f4641b41b4f863f1f5064eb05d Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 28 Jul 2023 15:56:24 -0700 Subject: [PATCH v6 2/2] allow syncfs in frontend utilities --- doc/src/sgml/config.sgml | 12 +--- doc/src/sgml/filelist.sgml| 1 + doc/src/sgml/postgres.sgml| 1 + doc/src/sgml/ref/initdb.sgml | 22 +++ doc/src/sgml/ref/pg_basebackup.sgml | 25 doc/src/sgml/ref/pg_checksums.sgml| 22 +++ doc/src/sgml/ref/pg_dump.sgml | 21 +++ doc/src/sgml/ref/pg_rewind.sgml | 22 +++ doc/src/sgml/ref/pgupgrade.sgml | 23 +++ doc/src/sgml/syncfs.sgml | 36 +++ src/backend/storage/file/fd.c | 4 +- src/backend/utils/misc/guc_tables.c
Re: should frontend tools use syncfs() ?
On Tue, Aug 22, 2023 at 12:53:53PM +0900, Michael Paquier wrote: > On Mon, Aug 21, 2023 at 07:06:32PM -0700, Nathan Bossart wrote: >> This would look something like the attached patch. I think this is nicer. >> With this patch, we don't have to choose between including fd.h or >> redefining the macros in the frontend code. > > Yes, this one is moving the needle in the good direction. +1. Great. Here is a new patch set that includes this change as well as the suggested documentation updates. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 25357aefb8956e12b4555881346c5b4a73c1f4e5 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 21 Aug 2023 19:05:14 -0700 Subject: [PATCH v5 1/2] move PG_TEMP_FILE* macros to file_utils.h --- src/backend/backup/basebackup.c | 2 +- src/backend/postmaster/postmaster.c | 1 + src/backend/storage/file/fileset.c | 1 + src/bin/pg_checksums/pg_checksums.c | 10 -- src/bin/pg_rewind/filemap.c | 2 +- src/include/common/file_utils.h | 4 src/include/storage/fd.h| 4 7 files changed, 8 insertions(+), 16 deletions(-) diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c index 45be21131c..5d66014499 100644 --- a/src/backend/backup/basebackup.c +++ b/src/backend/backup/basebackup.c @@ -25,6 +25,7 @@ #include "commands/defrem.h" #include "common/compression.h" #include "common/file_perm.h" +#include "common/file_utils.h" #include "lib/stringinfo.h" #include "miscadmin.h" #include "nodes/pg_list.h" @@ -37,7 +38,6 @@ #include "storage/bufpage.h" #include "storage/checksum.h" #include "storage/dsm_impl.h" -#include "storage/fd.h" #include "storage/ipc.h" #include "storage/reinit.h" #include "utils/builtins.h" diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 9c8ec779f9..d375dcb795 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -94,6 +94,7 @@ #include "access/xlogrecovery.h" #include "catalog/pg_control.h" #include "common/file_perm.h" +#include "common/file_utils.h" #include "common/ip.h" #include "common/pg_prng.h" #include "common/string.h" diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c index e9951b0269..84cae32548 100644 --- a/src/backend/storage/file/fileset.c +++ b/src/backend/storage/file/fileset.c @@ -25,6 +25,7 @@ #include "catalog/pg_tablespace.h" #include "commands/tablespace.h" +#include "common/file_utils.h" #include "common/hashfn.h" #include "miscadmin.h" #include "storage/ipc.h" diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c index 19eb67e485..9011a19b4e 100644 --- a/src/bin/pg_checksums/pg_checksums.c +++ b/src/bin/pg_checksums/pg_checksums.c @@ -52,16 +52,6 @@ typedef enum PG_MODE_ENABLE } PgChecksumMode; -/* - * Filename components. - * - * XXX: fd.h is not declared here as frontend side code is not able to - * interact with the backend-side definitions for the various fsync - * wrappers. - */ -#define PG_TEMP_FILES_DIR "pgsql_tmp" -#define PG_TEMP_FILE_PREFIX "pgsql_tmp" - static PgChecksumMode mode = PG_MODE_CHECK; static const char *progname; diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c index bd5c598e20..58280d9abc 100644 --- a/src/bin/pg_rewind/filemap.c +++ b/src/bin/pg_rewind/filemap.c @@ -27,12 +27,12 @@ #include #include "catalog/pg_tablespace_d.h" +#include "common/file_utils.h" #include "common/hashfn.h" #include "common/string.h" #include "datapagemap.h" #include "filemap.h" #include "pg_rewind.h" -#include "storage/fd.h" /* * Define a hash table which we can use to store information about the files diff --git a/src/include/common/file_utils.h b/src/include/common/file_utils.h index b7efa1226d..dd1532bcb0 100644 --- a/src/include/common/file_utils.h +++ b/src/include/common/file_utils.h @@ -46,4 +46,8 @@ extern ssize_t pg_pwritev_with_retry(int fd, extern ssize_t pg_pwrite_zeros(int fd, size_t size, off_t offset); +/* Filename components */ +#define PG_TEMP_FILES_DIR "pgsql_tmp" +#define PG_TEMP_FILE_PREFIX "pgsql_tmp" + #endif /* FILE_UTILS_H */ diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h index 6791a406fc..aea30c0622 100644 --- a/src/include/storage/fd.h +++ b/src/include/storage/fd.h @@ -195,8 +195,4 @@ extern int durable_unlink(const char *fname, int elevel); extern void SyncDataDirectory(void); extern int data_sync_elevel(int elevel); -/* Filename components */ -#define PG_TEMP_FILES_DIR "pgsql_tmp" -#define PG_TEMP_FILE_PREFIX "pgsql_tmp" - #endif /* FD_H */ -- 2.25.1 >From 4cc8395720a3af1916f3baada04826177783ec80 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 28 Jul 2023 15:56:24 -0700 Subject: [PATCH v5 2/2] allow syncfs in frontend utilities --- doc/src/sgml/config.sgml | 12 +--- doc/src/sgml/filelist.sgml| 1 + doc/src
Re: should frontend tools use syncfs() ?
On Mon, Aug 21, 2023 at 07:06:32PM -0700, Nathan Bossart wrote: > This would look something like the attached patch. I think this is nicer. > With this patch, we don't have to choose between including fd.h or > redefining the macros in the frontend code. Yes, this one is moving the needle in the good direction. +1. -- Michael signature.asc Description: PGP signature
Re: should frontend tools use syncfs() ?
On Tue, Aug 22, 2023 at 10:50:01AM +0900, Michael Paquier wrote: > On Mon, Aug 21, 2023 at 06:44:07PM -0700, Nathan Bossart wrote: >> I'm hoping there's a simpler path forward here. pg_rewind only needs the >> following lines from fd.h: >> >> /* Filename components */ >> #define PG_TEMP_FILES_DIR "pgsql_tmp" >> #define PG_TEMP_FILE_PREFIX "pgsql_tmp" >> >> Maybe we could move these to file_utils.h instead. WDYT? > > I guess so.. At the same time, something can be said about > pg_checksums that redeclares PG_TEMP_FILE_PREFIX and PG_TEMP_FILES_DIR > because it does not want to include fd.h and its sync routines. This would look something like the attached patch. I think this is nicer. With this patch, we don't have to choose between including fd.h or redefining the macros in the frontend code. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From af40f64933b5d8829b7e1641ce57e4ef8fd5a2c2 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 21 Aug 2023 19:05:14 -0700 Subject: [PATCH 1/1] move PG_TEMP_FILE* macros to file_utils.h --- src/backend/backup/basebackup.c | 2 +- src/backend/storage/file/fileset.c | 1 + src/bin/pg_checksums/pg_checksums.c | 10 -- src/bin/pg_rewind/filemap.c | 2 +- src/include/common/file_utils.h | 4 src/include/storage/fd.h| 4 6 files changed, 7 insertions(+), 16 deletions(-) diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c index 45be21131c..5d66014499 100644 --- a/src/backend/backup/basebackup.c +++ b/src/backend/backup/basebackup.c @@ -25,6 +25,7 @@ #include "commands/defrem.h" #include "common/compression.h" #include "common/file_perm.h" +#include "common/file_utils.h" #include "lib/stringinfo.h" #include "miscadmin.h" #include "nodes/pg_list.h" @@ -37,7 +38,6 @@ #include "storage/bufpage.h" #include "storage/checksum.h" #include "storage/dsm_impl.h" -#include "storage/fd.h" #include "storage/ipc.h" #include "storage/reinit.h" #include "utils/builtins.h" diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c index e9951b0269..84cae32548 100644 --- a/src/backend/storage/file/fileset.c +++ b/src/backend/storage/file/fileset.c @@ -25,6 +25,7 @@ #include "catalog/pg_tablespace.h" #include "commands/tablespace.h" +#include "common/file_utils.h" #include "common/hashfn.h" #include "miscadmin.h" #include "storage/ipc.h" diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c index 19eb67e485..9011a19b4e 100644 --- a/src/bin/pg_checksums/pg_checksums.c +++ b/src/bin/pg_checksums/pg_checksums.c @@ -52,16 +52,6 @@ typedef enum PG_MODE_ENABLE } PgChecksumMode; -/* - * Filename components. - * - * XXX: fd.h is not declared here as frontend side code is not able to - * interact with the backend-side definitions for the various fsync - * wrappers. - */ -#define PG_TEMP_FILES_DIR "pgsql_tmp" -#define PG_TEMP_FILE_PREFIX "pgsql_tmp" - static PgChecksumMode mode = PG_MODE_CHECK; static const char *progname; diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c index bd5c598e20..58280d9abc 100644 --- a/src/bin/pg_rewind/filemap.c +++ b/src/bin/pg_rewind/filemap.c @@ -27,12 +27,12 @@ #include #include "catalog/pg_tablespace_d.h" +#include "common/file_utils.h" #include "common/hashfn.h" #include "common/string.h" #include "datapagemap.h" #include "filemap.h" #include "pg_rewind.h" -#include "storage/fd.h" /* * Define a hash table which we can use to store information about the files diff --git a/src/include/common/file_utils.h b/src/include/common/file_utils.h index b7efa1226d..dd1532bcb0 100644 --- a/src/include/common/file_utils.h +++ b/src/include/common/file_utils.h @@ -46,4 +46,8 @@ extern ssize_t pg_pwritev_with_retry(int fd, extern ssize_t pg_pwrite_zeros(int fd, size_t size, off_t offset); +/* Filename components */ +#define PG_TEMP_FILES_DIR "pgsql_tmp" +#define PG_TEMP_FILE_PREFIX "pgsql_tmp" + #endif /* FILE_UTILS_H */ diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h index 6791a406fc..aea30c0622 100644 --- a/src/include/storage/fd.h +++ b/src/include/storage/fd.h @@ -195,8 +195,4 @@ extern int durable_unlink(const char *fname, int elevel); extern void SyncDataDirectory(void); extern int data_sync_elevel(int elevel); -/* Filename components */ -#define PG_TEMP_FILES_DIR "pgsql_tmp" -#define PG_TEMP_FILE_PREFIX "pgsql_tmp" - #endif /* FD_H */ -- 2.25.1
Re: should frontend tools use syncfs() ?
On Mon, Aug 21, 2023 at 06:44:07PM -0700, Nathan Bossart wrote: > I'm hoping there's a simpler path forward here. pg_rewind only needs the > following lines from fd.h: > > /* Filename components */ > #define PG_TEMP_FILES_DIR "pgsql_tmp" > #define PG_TEMP_FILE_PREFIX "pgsql_tmp" > > Maybe we could move these to file_utils.h instead. WDYT? I guess so.. At the same time, something can be said about pg_checksums that redeclares PG_TEMP_FILE_PREFIX and PG_TEMP_FILES_DIR because it does not want to include fd.h and its sync routines. -- Michael signature.asc Description: PGP signature
Re: should frontend tools use syncfs() ?
On Tue, Aug 22, 2023 at 08:56:26AM +0900, Michael Paquier wrote: > --- a/src/include/storage/fd.h > +++ b/src/include/storage/fd.h > @@ -43,15 +43,11 @@ > #ifndef FD_H > #define FD_H > > +#ifndef FRONTEND > + > #include > #include > > Ugh. So you need this part because pg_rewind's filemap.c includes > fd.h, and pg_rewind also needs file_utils.h. This is not the fault of > your patch, but this does not make the situation better, either.. It > looks like we need to think harder about this layer. An improvement > would be to split file_utils.c so as its frontend-only code is moved > to OBJS_FRONTEND in a new file with a new header? It should be OK to > keep DataDirSyncMethod in file_utils.h as long as the split is clean. I'm hoping there's a simpler path forward here. pg_rewind only needs the following lines from fd.h: /* Filename components */ #define PG_TEMP_FILES_DIR "pgsql_tmp" #define PG_TEMP_FILE_PREFIX "pgsql_tmp" Maybe we could move these to file_utils.h instead. WDYT? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: should frontend tools use syncfs() ?
On Fri, Aug 18, 2023 at 09:01:11AM -0700, Nathan Bossart wrote: > On Thu, Aug 17, 2023 at 12:50:31PM +0900, Michael Paquier wrote: >> SyncMethod may be a bit too generic as name for the option structure. >> How about a PGSyncMethod or pg_sync_method? > > In v4, I renamed this to DataDirSyncMethod and merged it with > RecoveryInitSyncMethod. I'm not wedded to the name, but that seemed > generic enough for both use-cases. As an aside, we need to be careful to > distinguish these options from those for wal_sync_method. Okay. >>> Yeah, this crossed my mind. Do you know of any existing examples of >>> options with links to a common section? One problem with this approach is >>> that there are small differences in the wording for some of the frontend >>> utilities, so it might be difficult to cleanly unite these sections. >> >> The closest thing I can think of is Color Support in section >> Appendixes, that describes something shared across a lot of binaries >> (that would be 6 tools with this patch). > > If I added a "syncfs() Caveats" appendix for the common parts of the docs, > it would only say something like the following: > > Using syncfs may be a lot faster than using fsync, because it doesn't > need to open each file one by one. On the other hand, it may be slower > if a file system is shared by other applications that modify a lot of > files, since those files will also be written to disk. Furthermore, on > versions of Linux before 5.8, I/O errors encountered while writing data > to disk may not be reported to the calling program, and relevant error > messages may appear only in kernel logs. > > Does that seem reasonable? It would reduce the duplication a little bit, > but I'm not sure it's really much of an improvement in this case. This would cut 60% (?) of the documentation added by the patch for these six tools, so that looks like an improvement to me. Perhaps other may disagree, so more opinions are welcome. --- a/src/include/storage/fd.h +++ b/src/include/storage/fd.h @@ -43,15 +43,11 @@ #ifndef FD_H #define FD_H +#ifndef FRONTEND + #include #include Ugh. So you need this part because pg_rewind's filemap.c includes fd.h, and pg_rewind also needs file_utils.h. This is not the fault of your patch, but this does not make the situation better, either.. It looks like we need to think harder about this layer. An improvement would be to split file_utils.c so as its frontend-only code is moved to OBJS_FRONTEND in a new file with a new header? It should be OK to keep DataDirSyncMethod in file_utils.h as long as the split is clean. -- Michael signature.asc Description: PGP signature
Re: should frontend tools use syncfs() ?
On Mon, Aug 21, 2023 at 04:08:46PM -0400, Robert Haas wrote: > Doesn't seem worth it to me. I think --no-sync is more intuitive than > --sync-method=none, it's certainly shorter, and it's a pretty > important setting because we use it when running the regression tests. No arguments against that ;) -- Michael signature.asc Description: PGP signature
Re: should frontend tools use syncfs() ?
On Wed, Aug 16, 2023 at 11:50 PM Michael Paquier wrote: > >> Do we actually need --no-sync at all if --sync-method is around? We > >> could have an extra --sync-method=none at option level with --no-sync > >> still around mainly for compatibility? Or perhaps that's just > >> over-designing things? > > > > I don't have a strong opinion. We could take up deprecating --no-sync in a > > follow-up thread, though. Like you said, we'll probably need to keep it > > around for backward compatibility, so it might not be worth the trouble. > > Okay, maybe that's not worth it. Doesn't seem worth it to me. I think --no-sync is more intuitive than --sync-method=none, it's certainly shorter, and it's a pretty important setting because we use it when running the regression tests. -- Robert Haas EDB: http://www.enterprisedb.com
Re: should frontend tools use syncfs() ?
On Thu, Aug 17, 2023 at 12:50:31PM +0900, Michael Paquier wrote: > On Wed, Aug 16, 2023 at 08:17:05AM -0700, Nathan Bossart wrote: >> The patch does have the following note: >> >> +On Linux, syncfs may be used instead to ask the >> +operating system to synchronize the whole file systems that contain >> the >> +data directory, the WAL files, and each tablespace. >> >> Do you think that is sufficient, or do you think we should really clearly >> explain that you could end up calling syncfs() on the same file system a >> few times if your tablespaces are on the same disk? I personally feel >> like that'd be a bit too verbose for the already lengthy descriptions of >> this setting. > > It does not hurt to mention that the code syncfs()-es each tablespace > path (not in-place tablespaces), ignoring locations that share the > same mounting point, IMO. For that, we'd better rely on > get_dirent_type() like the normal sync path. But it doesn't ignore tablespace locations that share the same mount point. It simply calls syncfs() for each tablespace path, just like recovery_init_sync_method. >> If syncfs() is not available, SYNC_METHOD_SYNCFS won't even be defined, and >> parse_sync_method() should fail if "syncfs" is specified. Furthermore, the >> relevant part of fsync_pgdata() won't be compiled in whenever HAVE_SYNCFS >> is not defined. > > That feels structurally inconsistent with what we do with other > option sets that have library dependencies. For example, look at > compression.h and what happens for pg_compress_algorithm. So, it > seems to me that it would be more friendly to list SYNC_METHOD_SYNCFS > all the time in SyncMethod even if HAVE_SYNCFS is not around, and at > least generate a warning rather than having a platform-dependent set > of options? Done. > SyncMethod may be a bit too generic as name for the option structure. > How about a PGSyncMethod or pg_sync_method? In v4, I renamed this to DataDirSyncMethod and merged it with RecoveryInitSyncMethod. I'm not wedded to the name, but that seemed generic enough for both use-cases. As an aside, we need to be careful to distinguish these options from those for wal_sync_method. >> Yeah, this crossed my mind. Do you know of any existing examples of >> options with links to a common section? One problem with this approach is >> that there are small differences in the wording for some of the frontend >> utilities, so it might be difficult to cleanly unite these sections. > > The closest thing I can think of is Color Support in section > Appendixes, that describes something shared across a lot of binaries > (that would be 6 tools with this patch). If I added a "syncfs() Caveats" appendix for the common parts of the docs, it would only say something like the following: Using syncfs may be a lot faster than using fsync, because it doesn't need to open each file one by one. On the other hand, it may be slower if a file system is shared by other applications that modify a lot of files, since those files will also be written to disk. Furthermore, on versions of Linux before 5.8, I/O errors encountered while writing data to disk may not be reported to the calling program, and relevant error messages may appear only in kernel logs. Does that seem reasonable? It would reduce the duplication a little bit, but I'm not sure it's really much of an improvement in this case. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 5edd4b4570617a897043086cf203af57a5596e11 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 28 Jul 2023 15:56:24 -0700 Subject: [PATCH v4 1/1] allow syncfs in frontend utilities --- doc/src/sgml/ref/initdb.sgml | 27 doc/src/sgml/ref/pg_basebackup.sgml | 30 + doc/src/sgml/ref/pg_checksums.sgml| 27 doc/src/sgml/ref/pg_dump.sgml | 27 doc/src/sgml/ref/pg_rewind.sgml | 27 doc/src/sgml/ref/pgupgrade.sgml | 29 + src/backend/storage/file/fd.c | 4 +- src/backend/utils/misc/guc_tables.c | 7 ++- src/bin/initdb/initdb.c | 12 +++- src/bin/pg_basebackup/pg_basebackup.c | 12 +++- src/bin/pg_checksums/pg_checksums.c | 9 ++- src/bin/pg_dump/pg_backup.h | 4 +- src/bin/pg_dump/pg_backup_archiver.c | 14 +++-- src/bin/pg_dump/pg_backup_archiver.h | 1 + src/bin/pg_dump/pg_backup_directory.c | 2 +- src/bin/pg_dump/pg_dump.c | 10 ++- src/bin/pg_rewind/file_ops.c | 2 +- src/bin/pg_rewind/pg_rewind.c | 9 +++ src/bin/pg_rewind/pg_rewind.h | 2 + src/bin/pg_upgrade/option.c | 13 src/bin/pg_upgrade/pg_upgrade.c | 6 +- src/bin/pg_upgrade/pg_upgrade.h | 1 + src/common/file_utils.c | 91 ++- src/fe_utils/option_utils.c | 24 +++ src/include/common/file_utils.h | 15 - src/in
Re: should frontend tools use syncfs() ?
On Wed, Aug 16, 2023 at 08:17:05AM -0700, Nathan Bossart wrote: > On Wed, Aug 16, 2023 at 08:10:10AM +0900, Michael Paquier wrote: >> On Tue, Aug 08, 2023 at 01:06:06PM -0700, Nathan Bossart wrote: >> +else >> +{ >> +while (errno = 0, (de = readdir(dir)) != NULL) >> +{ >> +charsubpath[MAXPGPATH * 2]; >> + >> +if (strcmp(de->d_name, ".") == 0 || >> +strcmp(de->d_name, "..") == 0) >> +continue; >> >> It seems to me that there is no need to do that for in-place >> tablespaces. There are relative paths in pg_tblspc/, so they would be >> taken care of by the syncfs() done on the main data folder. >> >> This does not really check if the mount points of each tablespace is >> different, as well. For example, imagine that you have two >> tablespaces within the same disk, syncfs() twice. Perhaps, the >> current logic is OK anyway as long as the behavior is optional, but it >> should be explained in the docs, at least. > > True. But I don't know if checking the mount point of each tablespace is > worth the complexity. Perhaps worth a note, this would depend on statvfs(), which is not that portable the last time I looked at it (NetBSD, some BSD-ish? And of course WIN32). > In the worst case, we'll call syncfs() on the same > file system a few times, which is probably still much faster in most cases. > FWIW this is what recovery_init_sync_method does today, and I'm not aware > of any complaints about this behavior. Hmm. Okay. > The patch does have the following note: > > +On Linux, syncfs may be used instead to ask the > +operating system to synchronize the whole file systems that contain > the > +data directory, the WAL files, and each tablespace. > > Do you think that is sufficient, or do you think we should really clearly > explain that you could end up calling syncfs() on the same file system a > few times if your tablespaces are on the same disk? I personally feel > like that'd be a bit too verbose for the already lengthy descriptions of > this setting. It does not hurt to mention that the code syncfs()-es each tablespace path (not in-place tablespaces), ignoring locations that share the same mounting point, IMO. For that, we'd better rely on get_dirent_type() like the normal sync path. >> I'm finding a bit confusing that fsync_pgdata() is coded in such a way >> that it does a silent fallback to the cascading syncs through >> walkdir() when syncfs is specified but not available in the build. >> Perhaps an error is more helpful because one would then know that they >> are trying something that's not around? > > If syncfs() is not available, SYNC_METHOD_SYNCFS won't even be defined, and > parse_sync_method() should fail if "syncfs" is specified. Furthermore, the > relevant part of fsync_pgdata() won't be compiled in whenever HAVE_SYNCFS > is not defined. That feels structurally inconsistent with what we do with other option sets that have library dependencies. For example, look at compression.h and what happens for pg_compress_algorithm. So, it seems to me that it would be more friendly to list SYNC_METHOD_SYNCFS all the time in SyncMethod even if HAVE_SYNCFS is not around, and at least generate a warning rather than having a platform-dependent set of options? SyncMethod may be a bit too generic as name for the option structure. How about a PGSyncMethod or pg_sync_method? >> I am a bit concerned about the amount of duplication this patch >> introduces in the docs. Perhaps this had better be moved into a new >> section of the docs to explain the tradeoffs, with each tool linking >> to it? > > Yeah, this crossed my mind. Do you know of any existing examples of > options with links to a common section? One problem with this approach is > that there are small differences in the wording for some of the frontend > utilities, so it might be difficult to cleanly unite these sections. The closest thing I can think of is Color Support in section Appendixes, that describes something shared across a lot of binaries (that would be 6 tools with this patch). >> Do we actually need --no-sync at all if --sync-method is around? We >> could have an extra --sync-method=none at option level with --no-sync >> still around mainly for compatibility? Or perhaps that's just >> over-designing things? > > I don't have a strong opinion. We could take up deprecating --no-sync in a > follow-up thread, though. Like you said, we'll probably need to keep it > around for backward compatibility, so it might not be worth the trouble. Okay, maybe that's not worth it. -- Michael signature.asc Description: PGP signature
Re: should frontend tools use syncfs() ?
On Wed, Aug 16, 2023 at 08:23:25AM -0700, Nathan Bossart wrote: > Ah, it looks like this code used to treat fsync() errors as non-fatal, but > it was changed in commit 1420617. I still find it a bit strange that some > errors that prevent a file from being sync'd are non-fatal while others > _are_ fatal, but that is probably a topic for another thread. Right. That rings a bell. -- Michael signature.asc Description: PGP signature
Re: should frontend tools use syncfs() ?
On Wed, Aug 16, 2023 at 08:17:05AM -0700, Nathan Bossart wrote: > On Wed, Aug 16, 2023 at 08:10:10AM +0900, Michael Paquier wrote: >> + pg_log_error("could not synchronize file system for file \"%s\": >> %m", path); >> + (void) close(fd); >> + exit(EXIT_FAILURE); >> >> walkdir() reports errors and does not consider these fatal. Why the >> early exit()? > > I know it claims to, but fsync_fname() does exit when fsync() fails: > > returncode = fsync(fd); > > /* >* Some OSes don't allow us to fsync directories at all, so we can > ignore >* those errors. Anything else needs to be reported. >*/ > if (returncode != 0 && !(isdir && (errno == EBADF || errno == EINVAL))) > { > pg_log_error("could not fsync file \"%s\": %m", fname); > (void) close(fd); > exit(EXIT_FAILURE); > } > > I suspect that the current code does not treat failures for things like > open() as fatal because it's likely due to a lack of permissions on the > file, but it does treat failures to fsync() as fatal because it is more > likely to indicate that ѕomething is very wrong. I don't know whether this > reasoning is sound, but I tried to match the current convention in the > syncfs() code. Ah, it looks like this code used to treat fsync() errors as non-fatal, but it was changed in commit 1420617. I still find it a bit strange that some errors that prevent a file from being sync'd are non-fatal while others _are_ fatal, but that is probably a topic for another thread. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: should frontend tools use syncfs() ?
Thanks for taking a look. On Wed, Aug 16, 2023 at 08:10:10AM +0900, Michael Paquier wrote: > On Tue, Aug 08, 2023 at 01:06:06PM -0700, Nathan Bossart wrote: >> I ran a couple of tests for pg_upgrade with 100k tables (created using the >> script here [0]) in order to demonstrate the potential benefits of this >> patch. > > That shows some nice numbers with many files, indeed. How does the > size of each file influence the difference in time? IME the number of files tends to influence the duration much more than the size. I assume this is because most files are already sync'd in these code paths that loop through every file. > +else > +{ > +while (errno = 0, (de = readdir(dir)) != NULL) > +{ > +charsubpath[MAXPGPATH * 2]; > + > +if (strcmp(de->d_name, ".") == 0 || > +strcmp(de->d_name, "..") == 0) > +continue; > > It seems to me that there is no need to do that for in-place > tablespaces. There are relative paths in pg_tblspc/, so they would be > taken care of by the syncfs() done on the main data folder. > > This does not really check if the mount points of each tablespace is > different, as well. For example, imagine that you have two > tablespaces within the same disk, syncfs() twice. Perhaps, the > current logic is OK anyway as long as the behavior is optional, but it > should be explained in the docs, at least. True. But I don't know if checking the mount point of each tablespace is worth the complexity. In the worst case, we'll call syncfs() on the same file system a few times, which is probably still much faster in most cases. FWIW this is what recovery_init_sync_method does today, and I'm not aware of any complaints about this behavior. The patch does have the following note: +On Linux, syncfs may be used instead to ask the +operating system to synchronize the whole file systems that contain the +data directory, the WAL files, and each tablespace. Do you think that is sufficient, or do you think we should really clearly explain that you could end up calling syncfs() on the same file system a few times if your tablespaces are on the same disk? I personally feel like that'd be a bit too verbose for the already lengthy descriptions of this setting. > I'm finding a bit confusing that fsync_pgdata() is coded in such a way > that it does a silent fallback to the cascading syncs through > walkdir() when syncfs is specified but not available in the build. > Perhaps an error is more helpful because one would then know that they > are trying something that's not around? If syncfs() is not available, SYNC_METHOD_SYNCFS won't even be defined, and parse_sync_method() should fail if "syncfs" is specified. Furthermore, the relevant part of fsync_pgdata() won't be compiled in whenever HAVE_SYNCFS is not defined. > + pg_log_error("could not synchronize file system for file \"%s\": %m", > path); > + (void) close(fd); > + exit(EXIT_FAILURE); > > walkdir() reports errors and does not consider these fatal. Why the > early exit()? I know it claims to, but fsync_fname() does exit when fsync() fails: returncode = fsync(fd); /* * Some OSes don't allow us to fsync directories at all, so we can ignore * those errors. Anything else needs to be reported. */ if (returncode != 0 && !(isdir && (errno == EBADF || errno == EINVAL))) { pg_log_error("could not fsync file \"%s\": %m", fname); (void) close(fd); exit(EXIT_FAILURE); } I suspect that the current code does not treat failures for things like open() as fatal because it's likely due to a lack of permissions on the file, but it does treat failures to fsync() as fatal because it is more likely to indicate that ѕomething is very wrong. I don't know whether this reasoning is sound, but I tried to match the current convention in the syncfs() code. > I am a bit concerned about the amount of duplication this patch > introduces in the docs. Perhaps this had better be moved into a new > section of the docs to explain the tradeoffs, with each tool linking > to it? Yeah, this crossed my mind. Do you know of any existing examples of options with links to a common section? One problem with this approach is that there are small differences in the wording for some of the frontend utilities, so it might be difficult to cleanly unite these sections. > Do we actually need --no-sync at all if --sync-method is around? We > could have an extra --sync-method=none at option level with --no-sync > still around mainly for compatibility? Or perhaps that's just > over-designing things? I don't have a strong opinion. We could take up deprecating --no-sync in a follow-up thread, though. Like you said, we'll probably need to keep it around for backward compatibility, so it might not be worth t
Re: should frontend tools use syncfs() ?
On Tue, Aug 08, 2023 at 01:06:06PM -0700, Nathan Bossart wrote: > I ran a couple of tests for pg_upgrade with 100k tables (created using the > script here [0]) in order to demonstrate the potential benefits of this > patch. That shows some nice numbers with many files, indeed. How does the size of each file influence the difference in time? +else +{ +while (errno = 0, (de = readdir(dir)) != NULL) +{ +charsubpath[MAXPGPATH * 2]; + +if (strcmp(de->d_name, ".") == 0 || +strcmp(de->d_name, "..") == 0) +continue; It seems to me that there is no need to do that for in-place tablespaces. There are relative paths in pg_tblspc/, so they would be taken care of by the syncfs() done on the main data folder. This does not really check if the mount points of each tablespace is different, as well. For example, imagine that you have two tablespaces within the same disk, syncfs() twice. Perhaps, the current logic is OK anyway as long as the behavior is optional, but it should be explained in the docs, at least. I'm finding a bit confusing that fsync_pgdata() is coded in such a way that it does a silent fallback to the cascading syncs through walkdir() when syncfs is specified but not available in the build. Perhaps an error is more helpful because one would then know that they are trying something that's not around? + pg_log_error("could not synchronize file system for file \"%s\": %m", path); + (void) close(fd); + exit(EXIT_FAILURE); walkdir() reports errors and does not consider these fatal. Why the early exit()? I am a bit concerned about the amount of duplication this patch introduces in the docs. Perhaps this had better be moved into a new section of the docs to explain the tradeoffs, with each tool linking to it? Do we actually need --no-sync at all if --sync-method is around? We could have an extra --sync-method=none at option level with --no-sync still around mainly for compatibility? Or perhaps that's just over-designing things? -- Michael signature.asc Description: PGP signature
Re: should frontend tools use syncfs() ?
I ran a couple of tests for pg_upgrade with 100k tables (created using the script here [0]) in order to demonstrate the potential benefits of this patch. pg_upgrade --sync-method fsync real5m50.072s user0m10.606s sys 0m40.298s pg_upgrade --sync-method syncfs real3m44.096s user0m8.906s sys 0m26.398s pg_upgrade --no-sync real3m27.697s user0m9.056s sys 0m26.605s [0] https://postgr.es/m/3612876.1689443232%40sss.pgh.pa.us -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: should frontend tools use syncfs() ?
On Mon, Jul 31, 2023 at 11:39:46AM -0700, Nathan Bossart wrote: > I just realized I forgot to update the --help output for these utilities. > I'll do that in the next version of the patch. Done in v3. Sorry for the noise. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From be52e4625754fa36703423e8b67d2f6cd2a16a09 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 28 Jul 2023 15:56:24 -0700 Subject: [PATCH v3 1/1] allow syncfs in frontend utilities --- doc/src/sgml/ref/initdb.sgml | 27 doc/src/sgml/ref/pg_basebackup.sgml | 30 + doc/src/sgml/ref/pg_checksums.sgml| 27 doc/src/sgml/ref/pg_dump.sgml | 27 doc/src/sgml/ref/pg_rewind.sgml | 27 doc/src/sgml/ref/pgupgrade.sgml | 29 + src/bin/initdb/initdb.c | 12 +++- src/bin/pg_basebackup/pg_basebackup.c | 12 +++- src/bin/pg_checksums/pg_checksums.c | 9 ++- src/bin/pg_dump/pg_backup.h | 4 +- src/bin/pg_dump/pg_backup_archiver.c | 13 ++-- src/bin/pg_dump/pg_backup_archiver.h | 1 + src/bin/pg_dump/pg_backup_directory.c | 2 +- src/bin/pg_dump/pg_dump.c | 10 ++- src/bin/pg_rewind/file_ops.c | 2 +- src/bin/pg_rewind/pg_rewind.c | 9 +++ src/bin/pg_rewind/pg_rewind.h | 2 + src/bin/pg_upgrade/option.c | 13 src/bin/pg_upgrade/pg_upgrade.c | 6 +- src/bin/pg_upgrade/pg_upgrade.h | 1 + src/common/file_utils.c | 91 ++- src/fe_utils/option_utils.c | 18 ++ src/include/common/file_utils.h | 17 - src/include/fe_utils/option_utils.h | 3 + src/include/storage/fd.h | 4 ++ src/tools/pgindent/typedefs.list | 1 + 26 files changed, 376 insertions(+), 21 deletions(-) diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml index 22f1011781..872fef5705 100644 --- a/doc/src/sgml/ref/initdb.sgml +++ b/doc/src/sgml/ref/initdb.sgml @@ -365,6 +365,33 @@ PostgreSQL documentation + + --sync-method + + +When set to fsync, which is the default, +initdb will recursively open and synchronize all +files in the data directory. The search for files will follow symbolic +links for the WAL directory and each configured tablespace. + + +On Linux, syncfs may be used instead to ask the +operating system to synchronize the whole file systems that contain the +data directory, the WAL files, and each tablespace. This may be a lot +faster than the fsync setting, because it doesn't +need to open each file one by one. On the other hand, it may be slower +if a file system is shared by other applications that modify a lot of +files, since those files will also be written to disk. Furthermore, on +versions of Linux before 5.8, I/O errors encountered while writing data +to disk may not be reported to initdb, and relevant +error messages may appear only in kernel logs. + + +This option has no effect when --no-sync is used. + + + + -S --sync-only diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml index 79d3e657c3..af8eb43251 100644 --- a/doc/src/sgml/ref/pg_basebackup.sgml +++ b/doc/src/sgml/ref/pg_basebackup.sgml @@ -594,6 +594,36 @@ PostgreSQL documentation + + --sync-method + + +When set to fsync, which is the default, +pg_basebackup will recursively open and synchronize +all files in the backup directory. When the plain format is used, the +search for files will follow symbolic links for the WAL directory and +each configured tablespace. + + +On Linux, syncfs may be used instead to ask the +operating system to synchronize the whole file system that contains the +backup directory. When the plain format is used, +pg_basebackup will also synchronize the file systems +that contain the WAL files and each tablespace. This may be a lot +faster than the fsync setting, because it doesn't +need to open each file one by one. On the other hand, it may be slower +if a file system is shared by other applications that modify a lot of +files, since those files will also be written to disk. Furthermore, on +versions of Linux before 5.8, I/O errors encountered while writing data +to disk may not be reported to pg_basebackup, and +relevant error messages may appear only in kernel logs. + + +This option has no effect when --no-sync is used. + + + + -v --verbose diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml index a3d0b0f0a3..70fb65a474 100644 --- a/
Re: should frontend tools use syncfs() ?
On Mon, Jul 31, 2023 at 10:51:38AM -0700, Nathan Bossart wrote: > Here is a new version of the patch with documentation updates and a couple > other small improvements. I just realized I forgot to update the --help output for these utilities. I'll do that in the next version of the patch. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: should frontend tools use syncfs() ?
On Sat, Jul 29, 2023 at 02:40:10PM -0700, Nathan Bossart wrote: > I was about to start a new thread, but I found this one with some good > preliminary discussion. I came to the same conclusion about introducing a > new option instead of using syncfs() by default wherever it is available. > The attached patch is still a work-in-progress, but it seems to behave as > expected. I began investigating this because I noticed that the > sync-data-directory step on pg_upgrade takes quite a while when there are > many files, and I am looking for ways to reduce the amount of downtime > required for pg_upgrade. > > The attached patch adds a new --sync-method option to the relevant frontend > utilities, but I am not wedded to that name/approach. Here is a new version of the patch with documentation updates and a couple other small improvements. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 6b65605c90703ee3aebca0b90c771c9f14b59545 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 28 Jul 2023 15:56:24 -0700 Subject: [PATCH v2 1/1] allow syncfs in frontend utilities --- doc/src/sgml/ref/initdb.sgml | 27 doc/src/sgml/ref/pg_basebackup.sgml | 30 + doc/src/sgml/ref/pg_checksums.sgml| 27 doc/src/sgml/ref/pg_dump.sgml | 27 doc/src/sgml/ref/pg_rewind.sgml | 27 doc/src/sgml/ref/pgupgrade.sgml | 29 + src/bin/initdb/initdb.c | 11 +++- src/bin/pg_basebackup/pg_basebackup.c | 10 ++- src/bin/pg_checksums/pg_checksums.c | 8 ++- src/bin/pg_dump/pg_backup.h | 4 +- src/bin/pg_dump/pg_backup_archiver.c | 13 ++-- src/bin/pg_dump/pg_backup_archiver.h | 1 + src/bin/pg_dump/pg_backup_directory.c | 2 +- src/bin/pg_dump/pg_dump.c | 9 ++- src/bin/pg_rewind/file_ops.c | 2 +- src/bin/pg_rewind/pg_rewind.c | 8 +++ src/bin/pg_rewind/pg_rewind.h | 2 + src/bin/pg_upgrade/option.c | 12 src/bin/pg_upgrade/pg_upgrade.c | 6 +- src/bin/pg_upgrade/pg_upgrade.h | 1 + src/common/file_utils.c | 91 ++- src/fe_utils/option_utils.c | 18 ++ src/include/common/file_utils.h | 17 - src/include/fe_utils/option_utils.h | 3 + src/include/storage/fd.h | 4 ++ src/tools/pgindent/typedefs.list | 1 + 26 files changed, 369 insertions(+), 21 deletions(-) diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml index 22f1011781..872fef5705 100644 --- a/doc/src/sgml/ref/initdb.sgml +++ b/doc/src/sgml/ref/initdb.sgml @@ -365,6 +365,33 @@ PostgreSQL documentation + + --sync-method + + +When set to fsync, which is the default, +initdb will recursively open and synchronize all +files in the data directory. The search for files will follow symbolic +links for the WAL directory and each configured tablespace. + + +On Linux, syncfs may be used instead to ask the +operating system to synchronize the whole file systems that contain the +data directory, the WAL files, and each tablespace. This may be a lot +faster than the fsync setting, because it doesn't +need to open each file one by one. On the other hand, it may be slower +if a file system is shared by other applications that modify a lot of +files, since those files will also be written to disk. Furthermore, on +versions of Linux before 5.8, I/O errors encountered while writing data +to disk may not be reported to initdb, and relevant +error messages may appear only in kernel logs. + + +This option has no effect when --no-sync is used. + + + + -S --sync-only diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml index 79d3e657c3..af8eb43251 100644 --- a/doc/src/sgml/ref/pg_basebackup.sgml +++ b/doc/src/sgml/ref/pg_basebackup.sgml @@ -594,6 +594,36 @@ PostgreSQL documentation + + --sync-method + + +When set to fsync, which is the default, +pg_basebackup will recursively open and synchronize +all files in the backup directory. When the plain format is used, the +search for files will follow symbolic links for the WAL directory and +each configured tablespace. + + +On Linux, syncfs may be used instead to ask the +operating system to synchronize the whole file system that contains the +backup directory. When the plain format is used, +pg_basebackup will also synchronize the file systems +that contain the WAL files and each tablespace. This may be a lot +faster than the fsync setting, because it doesn't +need to open each file one by one. On the other hand, it may be slower +
Re: should frontend tools use syncfs() ?
On Wed, Apr 13, 2022 at 06:54:12AM -0500, Justin Pryzby wrote: > I didn't pursue this patch, as it's easier for me to use /bin/sync -f. > Someone > should adopt it if interested. I was about to start a new thread, but I found this one with some good preliminary discussion. I came to the same conclusion about introducing a new option instead of using syncfs() by default wherever it is available. The attached patch is still a work-in-progress, but it seems to behave as expected. I began investigating this because I noticed that the sync-data-directory step on pg_upgrade takes quite a while when there are many files, and I am looking for ways to reduce the amount of downtime required for pg_upgrade. The attached patch adds a new --sync-method option to the relevant frontend utilities, but I am not wedded to that name/approach. Thoughts? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 4f484cd0268e63da8226d78dd21a8d7e29aa2b78 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 28 Jul 2023 15:56:24 -0700 Subject: [PATCH v1 1/1] allow syncfs in frontend utilities --- src/bin/initdb/initdb.c | 11 +++- src/bin/pg_basebackup/pg_basebackup.c | 8 ++- src/bin/pg_checksums/pg_checksums.c | 8 ++- src/bin/pg_rewind/file_ops.c | 2 +- src/bin/pg_rewind/pg_rewind.c | 8 +++ src/bin/pg_rewind/pg_rewind.h | 2 + src/bin/pg_upgrade/option.c | 12 src/bin/pg_upgrade/pg_upgrade.c | 6 +- src/bin/pg_upgrade/pg_upgrade.h | 1 + src/common/file_utils.c | 79 ++- src/fe_utils/option_utils.c | 18 ++ src/include/common/file_utils.h | 15 - src/include/fe_utils/option_utils.h | 3 + src/include/storage/fd.h | 4 ++ src/tools/pgindent/typedefs.list | 1 + 15 files changed, 168 insertions(+), 10 deletions(-) diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 3f4167682a..908263ee62 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -76,6 +76,7 @@ #include "common/restricted_token.h" #include "common/string.h" #include "common/username.h" +#include "fe_utils/option_utils.h" #include "fe_utils/string_utils.h" #include "getopt_long.h" #include "mb/pg_wchar.h" @@ -165,6 +166,7 @@ static bool data_checksums = false; static char *xlog_dir = NULL; static char *str_wal_segment_size_mb = NULL; static int wal_segment_size_mb; +static SyncMethod sync_method = SYNC_METHOD_FSYNC; /* internal vars */ @@ -3106,6 +3108,7 @@ main(int argc, char *argv[]) {"locale-provider", required_argument, NULL, 15}, {"icu-locale", required_argument, NULL, 16}, {"icu-rules", required_argument, NULL, 17}, + {"sync-method", required_argument, NULL, 18}, {NULL, 0, NULL, 0} }; @@ -3285,6 +3288,10 @@ main(int argc, char *argv[]) case 17: icu_rules = pg_strdup(optarg); break; + case 18: +if (!parse_sync_method(optarg, &sync_method)) + exit(1); +break; default: /* getopt_long already emitted a complaint */ pg_log_error_hint("Try \"%s --help\" for more information.", progname); @@ -3332,7 +3339,7 @@ main(int argc, char *argv[]) fputs(_("syncing data to disk ... "), stdout); fflush(stdout); - fsync_pgdata(pg_data, PG_VERSION_NUM); + fsync_pgdata(pg_data, PG_VERSION_NUM, sync_method); check_ok(); return 0; } @@ -3409,7 +3416,7 @@ main(int argc, char *argv[]) { fputs(_("syncing data to disk ... "), stdout); fflush(stdout); - fsync_pgdata(pg_data, PG_VERSION_NUM); + fsync_pgdata(pg_data, PG_VERSION_NUM, sync_method); check_ok(); } else diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 1dc8efe0cb..548e764b2f 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -148,6 +148,7 @@ static bool verify_checksums = true; static bool manifest = true; static bool manifest_force_encode = false; static char *manifest_checksums = NULL; +static SyncMethod sync_method = SYNC_METHOD_FSYNC; static bool success = false; static bool made_new_pgdata = false; @@ -2203,7 +2204,7 @@ BaseBackup(char *compression_algorithm, char *compression_detail, } else { - (void) fsync_pgdata(basedir, serverVersion); + (void) fsync_pgdata(basedir, serverVersion, sync_method); } } @@ -2281,6 +2282,7 @@ main(int argc, char **argv) {"no-manifest", no_argument, NULL, 5}, {"manifest-force-encode", no_argument, NULL, 6}, {"manifest-checksums", required_argument, NULL, 7}, + {"sync-method", required_argument, NULL, 8}, {NULL, 0, NULL, 0} }; int c; @@ -2452,6 +2454,10 @@ main(int argc, char **argv) case 7: manifest_checksums = pg_strdup(optarg); break; + case 8: +if (!parse_sync_method(optarg, &sync_method)) + exit(1); +break; default: /* getopt_long already emitted a complaint */ pg_log_error_hint("Try \"%s --
Re: should frontend tools use syncfs() ?
On Thu, Sep 30, 2021 at 12:49:36PM +0900, Michael Paquier wrote: > On Wed, Sep 29, 2021 at 07:43:41PM -0500, Justin Pryzby wrote: > > Forking this thread in which Thomas implemented syncfs for the startup > > process > > (61752afb2). > > https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BSG9jSW3ekwib0cSdC0yD-jReJ21X4bZAmqxoWTLTc2A%40mail.gmail.com > > > > Is there any reason that initdb/pg_basebackup/pg_checksums/pg_rewind > > shouldn't > > use syncfs() ? > > That makes sense. > > > do_syncfs() is in src/backend/ so would need to be duplicated^Wimplemented > > in > > common. > > The fd handling in the backend makes things tricky if trying to plug > in a common interface, so I'd rather do that as this is frontend-only > code. > > > They can't use the GUC, so need to add an cmdline option or look at an > > environment variable. > > fsync_pgdata() is going to manipulate many inodes anyway, because > that's a code path designed to do so. If we know that syncfs() is > just going to be better, I'd rather just call it by default if > available and not add new switches to all the frontend tools in need > of flushing the data folder, switches that are not documented in your > patch. It is a draft/POC, after all. The argument against using syncfs by default is that it could be worse than recursive fsync if a tiny 200MB postgres instance lives on a shared filesystem along with other, larger applications (maybe a larger postgres instance). There's also an argument that syncfs might be unreliable in the case of a write error. (But I agreed with Thomas' earlier assessment: that claim caries little weight since fsync() itself wasn't reliable for 20some years). I didn't pursue this patch, as it's easier for me to use /bin/sync -f. Someone should adopt it if interested. -- Justin
Re: should frontend tools use syncfs() ?
On Thu, Sep 30, 2021 at 05:08:24PM +1300, Thomas Munro wrote: > On Thu, Sep 30, 2021 at 4:49 PM Michael Paquier wrote: > > fsync_pgdata() is going to manipulate many inodes anyway, because > > that's a code path designed to do so. If we know that syncfs() is > > just going to be better, I'd rather just call it by default if > > available and not add new switches to all the frontend tools in need > > of flushing the data folder, switches that are not documented in your > > patch. > > If we want this it should be an option, because it flushes out data > other than the pgdata dir, and it doesn't report errors on old > kernels. I ran into bad performance of initdb --sync-only shortly after adding it to my db migration script, so added initdb --syncfs. I found that with sufficiently recent coreutils, I can do what's wanted by calling /bin/sync -f /datadir Since it's not integrated into initdb, it's necessary to include each tablespace and wal. -- Justin
Re: should frontend tools use syncfs() ?
On Thu, Sep 30, 2021 at 05:08:24PM +1300, Thomas Munro wrote: > If we want this it should be an option, because it flushes out data > other than the pgdata dir, and it doesn't report errors on old > kernels. Oh, OK, thanks. That's the part about 5.8. The only option controlling if sync is used now in those binaries is --no-sync. Should we use a different design for the option rather than a --syncfs? Something like --sync={on,off,syncfs,fsync} could be a possibility, for example. -- Michael signature.asc Description: PGP signature
Re: should frontend tools use syncfs() ?
On Thu, Sep 30, 2021 at 4:49 PM Michael Paquier wrote: > fsync_pgdata() is going to manipulate many inodes anyway, because > that's a code path designed to do so. If we know that syncfs() is > just going to be better, I'd rather just call it by default if > available and not add new switches to all the frontend tools in need > of flushing the data folder, switches that are not documented in your > patch. If we want this it should be an option, because it flushes out data other than the pgdata dir, and it doesn't report errors on old kernels.
Re: should frontend tools use syncfs() ?
On Wed, Sep 29, 2021 at 07:43:41PM -0500, Justin Pryzby wrote: > Forking this thread in which Thomas implemented syncfs for the startup process > (61752afb2). > https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BSG9jSW3ekwib0cSdC0yD-jReJ21X4bZAmqxoWTLTc2A%40mail.gmail.com > > Is there any reason that initdb/pg_basebackup/pg_checksums/pg_rewind shouldn't > use syncfs() ? That makes sense. > do_syncfs() is in src/backend/ so would need to be duplicated^Wimplemented in > common. The fd handling in the backend makes things tricky if trying to plug in a common interface, so I'd rather do that as this is frontend-only code. > They can't use the GUC, so need to add an cmdline option or look at an > environment variable. fsync_pgdata() is going to manipulate many inodes anyway, because that's a code path designed to do so. If we know that syncfs() is just going to be better, I'd rather just call it by default if available and not add new switches to all the frontend tools in need of flushing the data folder, switches that are not documented in your patch. -- Michael signature.asc Description: PGP signature