Re: Add system identifier to backup manifest
On Thu, Mar 14, 2024 at 11:05 AM Amul Sul wrote: > Thank you, Robert. Thanks for the patch! -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add system identifier to backup manifest
On Thu, Mar 14, 2024 at 12:48 AM Robert Haas wrote: > On Fri, Mar 8, 2024 at 12:14 AM Amul Sul wrote: > > Thank you for the improvement. > > > > The caller of verify_control_file() has the full path of the control > file that > > can pass it and avoid recomputing. With this change, it doesn't really > need > > verifier_context argument -- only the manifest's system identifier is > enough > > along with the control file path. Did the same in the attached delta > patch > > for v11-0002 patch, please have a look, thanks. > > Those seem like sensible changes. I incorporated them and committed. I > also: > > * ran pgindent, which changed a bit of your formatting > * changed some BAIL_OUT calls to die; I think we should hardly ever be > using BAIL_OUT, as that terminates the entire TAP test run, not just > the current file > Thank you, Robert. Regards, Amul
Re: Add system identifier to backup manifest
On Fri, Mar 8, 2024 at 12:14 AM Amul Sul wrote: > Thank you for the improvement. > > The caller of verify_control_file() has the full path of the control file that > can pass it and avoid recomputing. With this change, it doesn't really need > verifier_context argument -- only the manifest's system identifier is enough > along with the control file path. Did the same in the attached delta patch > for v11-0002 patch, please have a look, thanks. Those seem like sensible changes. I incorporated them and committed. I also: * ran pgindent, which changed a bit of your formatting * changed some BAIL_OUT calls to die; I think we should hardly ever be using BAIL_OUT, as that terminates the entire TAP test run, not just the current file Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add system identifier to backup manifest
On Fri, Mar 8, 2024 at 1:22 AM Robert Haas wrote: > On Thu, Mar 7, 2024 at 9:16 AM Robert Haas wrote: > > It could. I just thought this was clearer. I agree that it's a bit > > long, but I don't think this is worth bikeshedding very much. If at a > > later time somebody feels strongly that it needs to be changed, so be > > it. Right now, getting on with the business at hand is more important, > > IMHO. > > Here's a new version of the patch set, rebased over my version of 0001 > and with various other corrections: > > * Tidy up grammar in documentation. > * In manifest_process_version, the test checked whether the manifest > version == 1, but the comment talked about versions >= 2. Make the > comment match the code. > * In load_backup_manifest, avoid changing the existing placement of a > variable declaration. > * Rename verify_system_identifier to verify_control_file because if we > were verifying multiple things about the control file we'd still want > to only read it one. > * Tweak the coding of verify_backup_file and verify_control_file to > avoid repeated path construction. > * Remove saw_system_identifier_field. This looks like it's trying to > enforce a rule that the system identifier must immediately follow the > version, but we don't insist on anything like that for files or wal > ranges, so there seems to be no reason to do it here. > * Remove bogus "unrecognized top-level field" test from > 005_bad_manifest.pl. The JSON included here doesn't include any > unrecognized top-level field, so the fact that we were getting that > error message was wrong. After removing saw_system_identifier_field, > we no longer get the wrong error message any more, so the test started > failing. > * Remove "expected system identifier" test from 005_bad_manifest.pl. > This was basically a test that saw_system_identifier_field was > working. > * Header comment adjustment for > json_manifest_finalize_system_identifier. The last sentence was > cut-and-pasted from somewhere that it made sense to here, where it > doesn't. There's only ever one system identifier. > > Thank you for the improvement. The caller of verify_control_file() has the full path of the control file that can pass it and avoid recomputing. With this change, it doesn't really need verifier_context argument -- only the manifest's system identifier is enough along with the control file path. Did the same in the attached delta patch for v11-0002 patch, please have a look, thanks. Regards, Amul v11-0002-delta.patch Description: Binary data
Re: Add system identifier to backup manifest
On Thu, Mar 7, 2024 at 9:16 AM Robert Haas wrote: > It could. I just thought this was clearer. I agree that it's a bit > long, but I don't think this is worth bikeshedding very much. If at a > later time somebody feels strongly that it needs to be changed, so be > it. Right now, getting on with the business at hand is more important, > IMHO. Here's a new version of the patch set, rebased over my version of 0001 and with various other corrections: * Tidy up grammar in documentation. * In manifest_process_version, the test checked whether the manifest version == 1, but the comment talked about versions >= 2. Make the comment match the code. * In load_backup_manifest, avoid changing the existing placement of a variable declaration. * Rename verify_system_identifier to verify_control_file because if we were verifying multiple things about the control file we'd still want to only read it one. * Tweak the coding of verify_backup_file and verify_control_file to avoid repeated path construction. * Remove saw_system_identifier_field. This looks like it's trying to enforce a rule that the system identifier must immediately follow the version, but we don't insist on anything like that for files or wal ranges, so there seems to be no reason to do it here. * Remove bogus "unrecognized top-level field" test from 005_bad_manifest.pl. The JSON included here doesn't include any unrecognized top-level field, so the fact that we were getting that error message was wrong. After removing saw_system_identifier_field, we no longer get the wrong error message any more, so the test started failing. * Remove "expected system identifier" test from 005_bad_manifest.pl. This was basically a test that saw_system_identifier_field was working. * Header comment adjustment for json_manifest_finalize_system_identifier. The last sentence was cut-and-pasted from somewhere that it made sense to here, where it doesn't. There's only ever one system identifier. I plan to commit this, barring objections. -- Robert Haas EDB: http://www.enterprisedb.com v11-0001-Expose-new-function-get_controlfile_by_exact_pat.patch Description: Binary data v11-0002-Add-the-system-identifier-to-backup-manifests.patch Description: Binary data
Re: Add system identifier to backup manifest
On Wed, Mar 6, 2024 at 11:22 PM Amul Sul wrote: >> You are not changing silently the internals of get_controlfile(), so >> no objections here. The name of the new routine could be shorter, but >> being short of ideas what you are proposing looks fine by me. > > Could be get_controlfile_by_path() ? It could. I just thought this was clearer. I agree that it's a bit long, but I don't think this is worth bikeshedding very much. If at a later time somebody feels strongly that it needs to be changed, so be it. Right now, getting on with the business at hand is more important, IMHO. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add system identifier to backup manifest
On Thu, Mar 7, 2024 at 9:37 AM Michael Paquier wrote: > On Wed, Mar 06, 2024 at 11:05:36AM -0500, Robert Haas wrote: > > So with that in mind, here's my proposal. This is an adjustment of > > Amit's previous refactoring patch. He renamed the existing > > get_controlfile() to get_dir_controlfile() and made a new > > get_controlfile() that accepted the path to the control file itself. I > > chose to instead leave the existing get_controlfile() alone and add a > > new get_controlfile_by_exact_path(). I think this is better, because > > most of the existing callers find it more convenient to pass the path > > to the data directory rather than the path to the controlfile, so the > > patch is smaller this way, and less prone to cause headaches for > > people back-patching or maintaining out-of-core code. But it still > > gives us a way to avoid repeatedly constructing the same pathname. > > Yes, that was my primary concern with the previous versions of the > patch. > > > If nobody objects, I plan to commit this version. > > You are not changing silently the internals of get_controlfile(), so > no objections here. The name of the new routine could be shorter, but > being short of ideas what you are proposing looks fine by me. > Could be get_controlfile_by_path() ? Regards, Amul
Re: Add system identifier to backup manifest
On Wed, Mar 06, 2024 at 11:05:36AM -0500, Robert Haas wrote: > So with that in mind, here's my proposal. This is an adjustment of > Amit's previous refactoring patch. He renamed the existing > get_controlfile() to get_dir_controlfile() and made a new > get_controlfile() that accepted the path to the control file itself. I > chose to instead leave the existing get_controlfile() alone and add a > new get_controlfile_by_exact_path(). I think this is better, because > most of the existing callers find it more convenient to pass the path > to the data directory rather than the path to the controlfile, so the > patch is smaller this way, and less prone to cause headaches for > people back-patching or maintaining out-of-core code. But it still > gives us a way to avoid repeatedly constructing the same pathname. Yes, that was my primary concern with the previous versions of the patch. > If nobody objects, I plan to commit this version. You are not changing silently the internals of get_controlfile(), so no objections here. The name of the new routine could be shorter, but being short of ideas what you are proposing looks fine by me. -- Michael signature.asc Description: PGP signature
Re: Add system identifier to backup manifest
On Mon, Mar 4, 2024 at 2:47 PM Robert Haas wrote: > I don't have an enormously strong opinion on what the right thing to > do is here either, but I am not convinced that the change proposed by > Michael is an improvement. After all, that leaves us with the > situation where we know the path to the control file in three > different places. First, verify_backup_file() does a strcmp() against > the string "global/pg_control" to decide whether to call > verify_backup_file(). Then, verify_system_identifier() uses that > string to construct a pathname to the file that it will be read. Then, > get_controlfile() reconstructs the same pathname using it's own logic. > That's all pretty disagreeable. Hard-coded constants are hard to avoid > completely, but here it looks an awful lot like we're trying to > hardcode the same constant into as many different places as we can. > The now-dropped patch seems like an effort to avoid this, and while > it's possible that it wasn't the best way to avoid this, I still think > avoiding it somehow is probably the right idea. So with that in mind, here's my proposal. This is an adjustment of Amit's previous refactoring patch. He renamed the existing get_controlfile() to get_dir_controlfile() and made a new get_controlfile() that accepted the path to the control file itself. I chose to instead leave the existing get_controlfile() alone and add a new get_controlfile_by_exact_path(). I think this is better, because most of the existing callers find it more convenient to pass the path to the data directory rather than the path to the controlfile, so the patch is smaller this way, and less prone to cause headaches for people back-patching or maintaining out-of-core code. But it still gives us a way to avoid repeatedly constructing the same pathname. If nobody objects, I plan to commit this version. -- Robert Haas EDB: http://www.enterprisedb.com v10-0001-Expose-new-function-get_controlfile_by_exact_pat.patch Description: Binary data
Re: Add system identifier to backup manifest
On Mon, Mar 4, 2024 at 12:35 AM Amul Sul wrote: > Yes, you are correct. Both the current caller of get_controlfile() has > access to the root directory. > > I have dropped the 0002 patch -- I don't have a very strong opinion to > refactor > get_controlfile() apart from saying that it might be good to have both > versions :) . I don't have an enormously strong opinion on what the right thing to do is here either, but I am not convinced that the change proposed by Michael is an improvement. After all, that leaves us with the situation where we know the path to the control file in three different places. First, verify_backup_file() does a strcmp() against the string "global/pg_control" to decide whether to call verify_backup_file(). Then, verify_system_identifier() uses that string to construct a pathname to the file that it will be read. Then, get_controlfile() reconstructs the same pathname using it's own logic. That's all pretty disagreeable. Hard-coded constants are hard to avoid completely, but here it looks an awful lot like we're trying to hardcode the same constant into as many different places as we can. The now-dropped patch seems like an effort to avoid this, and while it's possible that it wasn't the best way to avoid this, I still think avoiding it somehow is probably the right idea. I get a compiler warning with 0002, too: ../pgsql/src/backend/backup/basebackup_incremental.c:960:22: warning: call to undeclared function 'GetSystemIdentifier'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] system_identifier = GetSystemIdentifier(); ^ 1 warning generated. But I've committed 0001. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add system identifier to backup manifest
anifest_file(char *manifest_path) { int fd; struct stat statbuf; @@ -396,8 +389,8 @@ parse_manifest_file(char *manifest_path, manifest_files_hash **ht_p, manifest_files_hash *ht; char *buffer; int rc; - parser_context private_context; JsonManifestParseContext context; + manifest_data *result; /* Open the manifest file. */ if ((fd = open(manifest_path, O_RDONLY | PG_BINARY, 0)) < 0) @@ -436,10 +429,9 @@ parse_manifest_file(char *manifest_path, manifest_files_hash **ht_p, close(fd); /* Parse the manifest. */ - private_context.ht = ht; - private_context.first_wal_range = NULL; - private_context.last_wal_range = NULL; - context.private_data = &private_context; + result = pg_malloc0(sizeof(manifest_data)); + result->files = ht; + context.private_data = result; context.per_file_cb = verifybackup_per_file_cb; context.per_wal_range_cb = verifybackup_per_wal_range_cb; context.error_cb = report_manifest_error; @@ -448,9 +440,7 @@ parse_manifest_file(char *manifest_path, manifest_files_hash **ht_p, /* Done with the buffer. */ pfree(buffer); - /* Return the file hash table and WAL range list we constructed. */ - *ht_p = ht; - *first_wal_range_p = private_context.first_wal_range; + return result; } /* @@ -480,8 +470,8 @@ verifybackup_per_file_cb(JsonManifestParseContext *context, pg_checksum_type checksum_type, int checksum_length, uint8 *checksum_payload) { - parser_context *pcxt = context->private_data; - manifest_files_hash *ht = pcxt->ht; + manifest_data *manifest = context->private_data; + manifest_files_hash *ht = manifest->files; manifest_file *m; bool found; @@ -508,7 +498,7 @@ verifybackup_per_wal_range_cb(JsonManifestParseContext *context, TimeLineID tli, XLogRecPtr start_lsn, XLogRecPtr end_lsn) { - parser_context *pcxt = context->private_data; + manifest_data *manifest = context->private_data; manifest_wal_range *range; /* Allocate and initialize a struct describing this WAL range. */ @@ -516,15 +506,15 @@ verifybackup_per_wal_range_cb(JsonManifestParseContext *context, range->tli = tli; range->start_lsn = start_lsn; range->end_lsn = end_lsn; - range->prev = pcxt->last_wal_range; + range->prev = manifest->last_wal_range; range->next = NULL; /* Add it to the end of the list. */ - if (pcxt->first_wal_range == NULL) - pcxt->first_wal_range = range; + if (manifest->first_wal_range == NULL) + manifest->first_wal_range = range; else - pcxt->last_wal_range->next = range; - pcxt->last_wal_range = range; + manifest->last_wal_range->next = range; + manifest->last_wal_range = range; } /* @@ -639,7 +629,7 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath) } /* Check whether there's an entry in the manifest hash. */ - m = manifest_files_lookup(context->ht, relpath); + m = manifest_files_lookup(context->manifest->files, relpath); if (m == NULL) { report_backup_error(context, @@ -679,11 +669,12 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath) static void report_extra_backup_files(verifier_context *context) { + manifest_data *manifest = context->manifest; manifest_files_iterator it; manifest_file *m; - manifest_files_start_iterate(context->ht, &it); - while ((m = manifest_files_iterate(context->ht, &it)) != NULL) + manifest_files_start_iterate(manifest->files, &it); + while ((m = manifest_files_iterate(manifest->files, &it)) != NULL) if (!m->matched && !should_ignore_relpath(context, m->pathname)) report_backup_error(context, "\"%s\" is present in the manifest but not on disk", @@ -698,13 +689,14 @@ report_extra_backup_files(verifier_context *context) static void verify_backup_checksums(verifier_context *context) { + manifest_data *manifest = context->manifest; manifest_files_iterator it; manifest_file *m; progress_report(false); - manifest_files_start_iterate(context->ht, &it); - while ((m = manifest_files_iterate(context->ht, &it)) != NULL) + manifest_files_start_iterate(manifest->files, &it); + while ((m = manifest_files_iterate(manifest->files, &it)) != NULL) { if (should_verify_checksum(m) && !should_ignore_relpath(context, m->pathname)) @@ -833,9 +825,10 @@ verify_file_checksum(verifier_context *context, manifest_file *m, */ static void parse_required_wal(verifier_context *context, char *pg_waldump_path, - char *wal_directory, manifest_wal_range *first_wal_range) + char *wal_directory) { - manifest_wal_range *this_wal_range = first_wal_range; + manifest_data *manifest = context->manifest; + manifest_wal_range *this_wal_range = manifest->first_wal_range; while (this_wal_range != NULL) { -- 2.18.0 From 5166d2b1707f16933a8b51066d97aa1145ee8fbb Mon Sep 17 00:00:00 2001
Re: Add system identifier to backup manifest
On Wed, Feb 21, 2024 at 10:01 AM Michael Paquier wrote: > On Mon, Feb 19, 2024 at 12:06:19PM +0530, Amul Sul wrote: > > On Mon, Feb 19, 2024 at 4:22 AM Michael Paquier > wrote: > >> And the new option should be documented at the top of the init() > >> routine for perldoc. > > > > Added in the attached version. > > I've done some wordsmithing on 0001 and it is OK, so I've applied it > to move the needle. Hope that helps. > Thank you very much. Regards, Amul
Re: Add system identifier to backup manifest
On Mon, Feb 19, 2024 at 12:06:19PM +0530, Amul Sul wrote: > Agreed, now they will have an error as _could not read file "": Is > a directory_. But, IIUC, that what usually happens with the dev version, and > the extension needs to be updated for compatibility with the newer version for > the same reason. I was reading through the remaining pieces of the patch set, and are you sure that there is a need for 0002 at all? The only reason why get_dir_controlfile() is introduced is to be able to get the contents of a control file with a full path to it, and not a data folder. Now, if I look closely, with 0002~0004 applied, the only two callers of get_controlfile() are pg_combinebackup.c and pg_verifybackup.c. Both of them have an access to the backup directories, which point to the root of the data folder. pg_combinebackup can continue to use backup_dirs[i]. pg_verifybackup has an access to the backup directory in the context data, if I'm reading this right, so you could just use that in verify_system_identifier(). -- Michael signature.asc Description: PGP signature
Re: Add system identifier to backup manifest
On Mon, Feb 19, 2024 at 12:06:19PM +0530, Amul Sul wrote: > On Mon, Feb 19, 2024 at 4:22 AM Michael Paquier wrote: >> And the new option should be documented at the top of the init() >> routine for perldoc. > > Added in the attached version. I've done some wordsmithing on 0001 and it is OK, so I've applied it to move the needle. Hope that helps. -- Michael signature.asc Description: PGP signature
Re: Add system identifier to backup manifest
files changed, 26 insertions(+), 12 deletions(-) diff --git a/src/backend/utils/misc/pg_controldata.c b/src/backend/utils/misc/pg_controldata.c index 55435dbcf3a..ea8eb3624d9 100644 --- a/src/backend/utils/misc/pg_controldata.c +++ b/src/backend/utils/misc/pg_controldata.c @@ -44,7 +44,7 @@ pg_control_system(PG_FUNCTION_ARGS) /* read the control file */ LWLockAcquire(ControlFileLock, LW_SHARED); - ControlFile = get_controlfile(DataDir, &crc_ok); + ControlFile = get_dir_controlfile(DataDir, &crc_ok); LWLockRelease(ControlFileLock); if (!crc_ok) ereport(ERROR, @@ -84,7 +84,7 @@ pg_control_checkpoint(PG_FUNCTION_ARGS) /* Read the control file. */ LWLockAcquire(ControlFileLock, LW_SHARED); - ControlFile = get_controlfile(DataDir, &crc_ok); + ControlFile = get_dir_controlfile(DataDir, &crc_ok); LWLockRelease(ControlFileLock); if (!crc_ok) ereport(ERROR, @@ -175,7 +175,7 @@ pg_control_recovery(PG_FUNCTION_ARGS) /* read the control file */ LWLockAcquire(ControlFileLock, LW_SHARED); - ControlFile = get_controlfile(DataDir, &crc_ok); + ControlFile = get_dir_controlfile(DataDir, &crc_ok); LWLockRelease(ControlFileLock); if (!crc_ok) ereport(ERROR, @@ -216,7 +216,7 @@ pg_control_init(PG_FUNCTION_ARGS) /* read the control file */ LWLockAcquire(ControlFileLock, LW_SHARED); - ControlFile = get_controlfile(DataDir, &crc_ok); + ControlFile = get_dir_controlfile(DataDir, &crc_ok); LWLockRelease(ControlFileLock); if (!crc_ok) ereport(ERROR, diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c index 9e6fd435f60..4bdd85f42e0 100644 --- a/src/bin/pg_checksums/pg_checksums.c +++ b/src/bin/pg_checksums/pg_checksums.c @@ -545,7 +545,7 @@ main(int argc, char *argv[]) } /* Read the control file and check compatibility */ - ControlFile = get_controlfile(DataDir, &crc_ok); + ControlFile = get_dir_controlfile(DataDir, &crc_ok); if (!crc_ok) pg_fatal("pg_control CRC value is incorrect"); diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c index 31ead7f4058..00c018351ac 100644 --- a/src/bin/pg_combinebackup/pg_combinebackup.c +++ b/src/bin/pg_combinebackup/pg_combinebackup.c @@ -534,7 +534,7 @@ check_control_files(int n_backups, char **backup_dirs) controlpath = psprintf("%s/%s", backup_dirs[i], "global/pg_control"); pg_log_debug("reading \"%s\"", controlpath); - control_file = get_controlfile(backup_dirs[i], &crc_ok); + control_file = get_controlfile(controlpath, &crc_ok); /* Control file contents not meaningful if CRC is bad. */ if (!crc_ok) diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c index 93e0837947c..1e615131612 100644 --- a/src/bin/pg_controldata/pg_controldata.c +++ b/src/bin/pg_controldata/pg_controldata.c @@ -165,7 +165,7 @@ main(int argc, char *argv[]) } /* get a copy of the control file */ - ControlFile = get_controlfile(DataDir, &crc_ok); + ControlFile = get_dir_controlfile(DataDir, &crc_ok); if (!crc_ok) { pg_log_warning("calculated CRC checksum does not match value stored in control file"); diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index 6900b27675e..1d887c1b9df 100644 --- a/src/bin/pg_ctl/pg_ctl.c +++ b/src/bin/pg_ctl/pg_ctl.c @@ -2174,7 +2174,7 @@ get_control_dbstate(void) { DBState ret; bool crc_ok; - ControlFileData *control_file_data = get_controlfile(pg_data, &crc_ok); + ControlFileData *control_file_data = get_dir_controlfile(pg_data, &crc_ok); if (!crc_ok) { diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c index 92e8fed6b2e..43566920fed 100644 --- a/src/common/controldata_utils.c +++ b/src/common/controldata_utils.c @@ -49,11 +49,10 @@ * file data is correct. */ ControlFileData * -get_controlfile(const char *DataDir, bool *crc_ok_p) +get_controlfile(const char *ControlFilePath, bool *crc_ok_p) { ControlFileData *ControlFile; int fd; - char ControlFilePath[MAXPGPATH]; pg_crc32c crc; int r; #ifdef FRONTEND @@ -64,7 +63,6 @@ get_controlfile(const char *DataDir, bool *crc_ok_p) Assert(crc_ok_p); ControlFile = palloc_object(ControlFileData); - snprintf(ControlFilePath, MAXPGPATH, "%s/global/pg_control", DataDir); #ifdef FRONTEND INIT_CRC32C(last_crc); @@ -162,6 +160,21 @@ retry: return ControlFile; } +/* + * get_dir_controlfile() + * + * Get controlfile values of the given data directory. + */ +ControlFileData * +get_dir_controlfile(const char *DataDir, bool *crc_ok_p) +{ + char ControlFilePath[MAXPGPATH]; + + snprintf(ControlFilePath, MAXPGPATH, "%s/global/pg_control", DataDir); + + return get_controlfile(ControlFilePath, crc_ok_p); +} + /* * update_controlfile() * diff --git a/src/include/common/controldata_utils.h b/src/include/common/controldata_
Re: Add system identifier to backup manifest
On Thu, Feb 15, 2024 at 05:41:46PM +0530, Robert Haas wrote: > On Thu, Feb 15, 2024 at 3:05 PM Amul Sul wrote: > > Kindly have a look at the attached version. > > IMHO, 0001 looks fine, except probably the comment could be phrased a > bit more nicely. And the new option should be documented at the top of the init() routine for perldoc. > That can be left for whoever commits this to > wordsmith. Michael, what are your plans? Not much, so feel free to not wait for me. I've just read through the patch because I like the idea/feature :) > 0002 seems like a reasonable approach, but there's a hunk in the wrong > patch: 0004 modifies pg_combinebackup's check_control_files to use > get_dir_controlfile() rather than git_controlfile(), but it looks to > me like that should be part of 0002. I'm slightly concerned about 0002 that silently changes the meaning of get_controlfile(). That would impact extension code without people knowing about it when compiling, just when they run their stuff under 17~. -- Michael signature.asc Description: PGP signature
Re: Add system identifier to backup manifest
On Thu, Feb 15, 2024 at 3:05 PM Amul Sul wrote: > Kindly have a look at the attached version. IMHO, 0001 looks fine, except probably the comment could be phrased a bit more nicely. That can be left for whoever commits this to wordsmith. Michael, what are your plans? 0002 seems like a reasonable approach, but there's a hunk in the wrong patch: 0004 modifies pg_combinebackup's check_control_files to use get_dir_controlfile() rather than git_controlfile(), but it looks to me like that should be part of 0002. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add system identifier to backup manifest
On Thu, Feb 15, 2024 at 7:18 AM Michael Paquier wrote: > On Wed, Feb 14, 2024 at 12:29:07PM +0530, Amul Sul wrote: > > Ok, I did that way in the attached version, I have passed the control > file's > > full path as a second argument to verify_system_identifier() what we > gets in > > verify_backup_file(), but that is not doing any useful things with it, > > since we > > were using get_controlfile() to open the control file, which takes the > > directory as an input and computes the full path on its own. > > I've read through the patch, and that's pretty cool. > Thank you for looking into this. > -static void > -parse_manifest_file(char *manifest_path, manifest_files_hash **ht_p, > - manifest_wal_range > **first_wal_range_p) > +static manifest_data * > +parse_manifest_file(char *manifest_path) > > In 0001, should the comment describing this routine be updated as > well? > Ok, updated in the attached version. > > + identifier with pg_control of the backup directory or fails > verification > > This is missing a markup here. > Done, in the attached version. > > + PostgreSQL 17, it is 2; in older > versions, > + it is 1. > > Perhaps a couple of s here. > Done. > + if (strcmp(parse->manifest_version, "1") != 0 && > + strcmp(parse->manifest_version, "2") != 0) > + json_manifest_parse_failure(parse->context, > + > "unexpected manifest version"); > + > + /* Parse version. */ > + version = strtoi64(parse->manifest_version, &ep, 10); > + if (*ep) > + json_manifest_parse_failure(parse->context, > + > "manifest version not an integer"); > + > + /* Invoke the callback for version */ > + context->version_cb(context, version); > > Shouldn't these two checks be reversed? And is there actually a need > for the first check at all knowing that the version callback should be > in charge of performing the validation vased on the version received? > Make sense, reversed the order. I think, particular allowed versions should be placed at the central place, and the callback can check and react on the versions suitable to them, IMHO. > +my $node2; > +{ > + local $ENV{'INITDB_TEMPLATE'} = undef; > > Not sure that it is a good idea to duplicate this pattern twice. > Shouldn't this be something we'd want to control with an option in the > init() method instead? > Yes, I did that in a separate patch, see 0001 patch. > +static void > +verify_system_identifier(verifier_context *context, char *controlpath) > > Relying both on controlpath, being a full path to the control file > including the data directory, and context->backup_directory to read > the contents of the control file looks a bit weird. Wouldn't it be > cleaner to just use one of them? > Well, yes, I had to have the same feeling, how about having another function that can accept a full path of pg_control? I tried in the 0002 patch, where the original function is renamed to get_dir_controlfile(), which accepts the data directory path as before, and get_controlfile() now accepts the full path of the pg_control file. Kindly have a look at the attached version. Regards, Amul v7-0004-Add-system-identifier-to-backup-manifest.patch Description: Binary data v7-0003-pg_verifybackup-code-refactor.patch Description: Binary data v7-0001-Add-option-to-force-initdb-in-PostgreSQL-Test-Clu.patch Description: Binary data v7-0002-Code-refactor-get_controlfile-to-accept-full-path.patch Description: Binary data
Re: Add system identifier to backup manifest
On Wed, Feb 14, 2024 at 12:29:07PM +0530, Amul Sul wrote: > Ok, I did that way in the attached version, I have passed the control file's > full path as a second argument to verify_system_identifier() what we gets in > verify_backup_file(), but that is not doing any useful things with it, > since we > were using get_controlfile() to open the control file, which takes the > directory as an input and computes the full path on its own. I've read through the patch, and that's pretty cool. -static void -parse_manifest_file(char *manifest_path, manifest_files_hash **ht_p, - manifest_wal_range **first_wal_range_p) +static manifest_data * +parse_manifest_file(char *manifest_path) In 0001, should the comment describing this routine be updated as well? + identifier with pg_control of the backup directory or fails verification This is missing a markup here. + PostgreSQL 17, it is 2; in older versions, + it is 1. Perhaps a couple of s here. + if (strcmp(parse->manifest_version, "1") != 0 && + strcmp(parse->manifest_version, "2") != 0) + json_manifest_parse_failure(parse->context, + "unexpected manifest version"); + + /* Parse version. */ + version = strtoi64(parse->manifest_version, &ep, 10); + if (*ep) + json_manifest_parse_failure(parse->context, + "manifest version not an integer"); + + /* Invoke the callback for version */ + context->version_cb(context, version); Shouldn't these two checks be reversed? And is there actually a need for the first check at all knowing that the version callback should be in charge of performing the validation vased on the version received? +my $node2; +{ + local $ENV{'INITDB_TEMPLATE'} = undef; Not sure that it is a good idea to duplicate this pattern twice. Shouldn't this be something we'd want to control with an option in the init() method instead? +static void +verify_system_identifier(verifier_context *context, char *controlpath) Relying both on controlpath, being a full path to the control file including the data directory, and context->backup_directory to read the contents of the control file looks a bit weird. Wouldn't it be cleaner to just use one of them? -- Michael signature.asc Description: PGP signature
Re: Add system identifier to backup manifest
On Fri, Feb 2, 2024 at 12:03 AM Robert Haas wrote: > On Thu, Feb 1, 2024 at 2:18 AM Amul Sul wrote: > > I intended to minimize the out param of parse_manifest_file(), which > currently > > returns manifest_files_hash and manifest_wal_range, and I need two more > -- > > manifest versions and the system identifier. > > Sure, but you could do context.ht = manifest_data->files instead of > context.manifest = manifest_data. The question isn't whether you > should return the whole manifest_data from parse_manifest_file -- I > agree with that decision -- but rather whether you should feed the > whole thing through into the context, or just the file hash. > > > Yeah, we can do that, but I think it is a bit inefficient to have > strcmp() > > check for the pg_control file on each verify_backup_file() call, > despite, we > > know that path. Also, I think, we need additional handling to ensure > that the > > system identifier has been verified in verify_backup_file(), what if the > > pg_control file itself missing from the backup -- might be a rare case, > but > > possible. > > > > For now, we can do the system identifier validation after > > verify_backup_directory(). > > Yes, that's another option, but I don't think it's as good. > > Suppose you do it that way. Then what will happen when the file is > altogether missing or inaccessible? I think verify_backup_file() will > complain, and then you'll have to do something ugly to avoid having > verify_system_identifier() emit the same complaint all over again. > Remember, unless you find some complicated way of passing data around, > it won't know whether verify_backup_file() emitted a warning or not -- > it can redo the stat() and see what happens, but it's not absolutely > guaranteed to be the same as what happened before. Making sure that > you always emit any given complaint once rather than twice or zero > times is going to be tricky. > > It seems more natural to me to just accept the (small) cost of a > strcmp() in verify_backup_file(). If the initial stat() fails, it > emits whatever complaint is appropriate and returns and the logic to > check the system identifier is never reached. If it succeeds, you can > proceed to try to open the file and do what you need to do. > Ok, I did that way in the attached version, I have passed the control file's full path as a second argument to verify_system_identifier() what we gets in verify_backup_file(), but that is not doing any useful things with it, since we were using get_controlfile() to open the control file, which takes the directory as an input and computes the full path on its own. Regards, Amul v6-0002-Add-system-identifier-to-backup-manifest.patch Description: Binary data v6-0001-pg_verifybackup-code-refactor.patch Description: Binary data
Re: Add system identifier to backup manifest
On Thu, Feb 1, 2024 at 2:18 AM Amul Sul wrote: > I intended to minimize the out param of parse_manifest_file(), which currently > returns manifest_files_hash and manifest_wal_range, and I need two more -- > manifest versions and the system identifier. Sure, but you could do context.ht = manifest_data->files instead of context.manifest = manifest_data. The question isn't whether you should return the whole manifest_data from parse_manifest_file -- I agree with that decision -- but rather whether you should feed the whole thing through into the context, or just the file hash. > Yeah, we can do that, but I think it is a bit inefficient to have strcmp() > check for the pg_control file on each verify_backup_file() call, despite, we > know that path. Also, I think, we need additional handling to ensure that the > system identifier has been verified in verify_backup_file(), what if the > pg_control file itself missing from the backup -- might be a rare case, but > possible. > > For now, we can do the system identifier validation after > verify_backup_directory(). Yes, that's another option, but I don't think it's as good. Suppose you do it that way. Then what will happen when the file is altogether missing or inaccessible? I think verify_backup_file() will complain, and then you'll have to do something ugly to avoid having verify_system_identifier() emit the same complaint all over again. Remember, unless you find some complicated way of passing data around, it won't know whether verify_backup_file() emitted a warning or not -- it can redo the stat() and see what happens, but it's not absolutely guaranteed to be the same as what happened before. Making sure that you always emit any given complaint once rather than twice or zero times is going to be tricky. It seems more natural to me to just accept the (small) cost of a strcmp() in verify_backup_file(). If the initial stat() fails, it emits whatever complaint is appropriate and returns and the logic to check the system identifier is never reached. If it succeeds, you can proceed to try to open the file and do what you need to do. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add system identifier to backup manifest
On Thu, Feb 1, 2024 at 3:06 AM Robert Haas wrote: > On Thu, Jan 25, 2024 at 2:52 AM Amul Sul wrote: > > Thank you for the review-comments, updated version attached. > > I generally agree with 0001. I spent a long time thinking about your > decision to make verifier_context contain a pointer to manifest_data > instead of, as it does currently, a pointer to manifest_files_hash. I > don't think that's a horrible idea, but it also doesn't seem to be > used anywhere currently. One advantage of the current approach is that > we know that none of the code downstream of verify_backup_directory() > or verify_backup_checksums() actually cares about anything other than > the manifest_files_hash. That's kind of nice. If we didn't change this > as you have done here, then we would need to continue passing the WAL > ranges to parse_required_walI() and the system identifier would have > to be passed explicitly to the code that checks the system identifier, > but that's not such a bad thing, either. It makes it clear which > functions are using which information. > I intended to minimize the out param of parse_manifest_file(), which currently returns manifest_files_hash and manifest_wal_range, and I need two more -- manifest versions and the system identifier. But before you go change anything there, exactly when should 0002 be > checking the system identifier in the control file? What happens now > is that we first walk over the directory tree and make sure we have > the files (verify_backup_directory) and then go through and verify > checksums in a second pass (verify_backup_checksums). We do this > because it lets us report problems that can be detected cheaply -- > like missing files -- relatively quickly, and problems that are more > expensive to detect -- like mismatching checksums -- only after we've > reported all the cheap-to-detect problems. At what stage should we > verify the control file? I don't really like verifying it first, as > you've done, because I think the error message change in > 004_options.pl is a clear regression. When the whole directory is > missing, it's much more pleasant to complain about the directory being > missing than some file inside the directory being missing. > > What I'd be inclined to suggest is that you have verify_backup_file() > notice when the file it's being asked to verify is the control file, > and have it check the system identifier at that stage. I think if you > do that, then the error message change in 004_options.pl goes away. > Now, to do that, you'd need to have the whole manifest_data available > from the context, not just the manifest_files_hash, so that you can > see the expected system identifier. And, interestingly, if you take > this approach, then it appears to me that 0001 is correct as-is and > doesn't need any changes. > Yeah, we can do that, but I think it is a bit inefficient to have strcmp() check for the pg_control file on each verify_backup_file() call, despite, we know that path. Also, I think, we need additional handling to ensure that the system identifier has been verified in verify_backup_file(), what if the pg_control file itself missing from the backup -- might be a rare case, but possible. For now, we can do the system identifier validation after verify_backup_directory(). Regards, Amul
Re: Add system identifier to backup manifest
On Thu, Jan 25, 2024 at 2:52 AM Amul Sul wrote: > Thank you for the review-comments, updated version attached. I generally agree with 0001. I spent a long time thinking about your decision to make verifier_context contain a pointer to manifest_data instead of, as it does currently, a pointer to manifest_files_hash. I don't think that's a horrible idea, but it also doesn't seem to be used anywhere currently. One advantage of the current approach is that we know that none of the code downstream of verify_backup_directory() or verify_backup_checksums() actually cares about anything other than the manifest_files_hash. That's kind of nice. If we didn't change this as you have done here, then we would need to continue passing the WAL ranges to parse_required_walI() and the system identifier would have to be passed explicitly to the code that checks the system identifier, but that's not such a bad thing, either. It makes it clear which functions are using which information. But before you go change anything there, exactly when should 0002 be checking the system identifier in the control file? What happens now is that we first walk over the directory tree and make sure we have the files (verify_backup_directory) and then go through and verify checksums in a second pass (verify_backup_checksums). We do this because it lets us report problems that can be detected cheaply -- like missing files -- relatively quickly, and problems that are more expensive to detect -- like mismatching checksums -- only after we've reported all the cheap-to-detect problems. At what stage should we verify the control file? I don't really like verifying it first, as you've done, because I think the error message change in 004_options.pl is a clear regression. When the whole directory is missing, it's much more pleasant to complain about the directory being missing than some file inside the directory being missing. What I'd be inclined to suggest is that you have verify_backup_file() notice when the file it's being asked to verify is the control file, and have it check the system identifier at that stage. I think if you do that, then the error message change in 004_options.pl goes away. Now, to do that, you'd need to have the whole manifest_data available from the context, not just the manifest_files_hash, so that you can see the expected system identifier. And, interestingly, if you take this approach, then it appears to me that 0001 is correct as-is and doesn't need any changes. Thoughts? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add system identifier to backup manifest
p_per_file_cb; context.per_wal_range_cb = verifybackup_per_wal_range_cb; context.error_cb = report_manifest_error; @@ -448,9 +440,7 @@ parse_manifest_file(char *manifest_path, manifest_files_hash **ht_p, /* Done with the buffer. */ pfree(buffer); - /* Return the file hash table and WAL range list we constructed. */ - *ht_p = ht; - *first_wal_range_p = private_context.first_wal_range; + return result; } /* @@ -480,8 +470,8 @@ verifybackup_per_file_cb(JsonManifestParseContext *context, pg_checksum_type checksum_type, int checksum_length, uint8 *checksum_payload) { - parser_context *pcxt = context->private_data; - manifest_files_hash *ht = pcxt->ht; + manifest_data *manifest = context->private_data; + manifest_files_hash *ht = manifest->files; manifest_file *m; bool found; @@ -508,7 +498,7 @@ verifybackup_per_wal_range_cb(JsonManifestParseContext *context, TimeLineID tli, XLogRecPtr start_lsn, XLogRecPtr end_lsn) { - parser_context *pcxt = context->private_data; + manifest_data *manifest = context->private_data; manifest_wal_range *range; /* Allocate and initialize a struct describing this WAL range. */ @@ -516,15 +506,15 @@ verifybackup_per_wal_range_cb(JsonManifestParseContext *context, range->tli = tli; range->start_lsn = start_lsn; range->end_lsn = end_lsn; - range->prev = pcxt->last_wal_range; + range->prev = manifest->last_wal_range; range->next = NULL; /* Add it to the end of the list. */ - if (pcxt->first_wal_range == NULL) - pcxt->first_wal_range = range; + if (manifest->first_wal_range == NULL) + manifest->first_wal_range = range; else - pcxt->last_wal_range->next = range; - pcxt->last_wal_range = range; + manifest->last_wal_range->next = range; + manifest->last_wal_range = range; } /* @@ -639,7 +629,7 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath) } /* Check whether there's an entry in the manifest hash. */ - m = manifest_files_lookup(context->ht, relpath); + m = manifest_files_lookup(context->manifest->files, relpath); if (m == NULL) { report_backup_error(context, @@ -679,11 +669,12 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath) static void report_extra_backup_files(verifier_context *context) { + manifest_data *manifest = context->manifest; manifest_files_iterator it; manifest_file *m; - manifest_files_start_iterate(context->ht, &it); - while ((m = manifest_files_iterate(context->ht, &it)) != NULL) + manifest_files_start_iterate(manifest->files, &it); + while ((m = manifest_files_iterate(manifest->files, &it)) != NULL) if (!m->matched && !should_ignore_relpath(context, m->pathname)) report_backup_error(context, "\"%s\" is present in the manifest but not on disk", @@ -698,13 +689,14 @@ report_extra_backup_files(verifier_context *context) static void verify_backup_checksums(verifier_context *context) { + manifest_data *manifest = context->manifest; manifest_files_iterator it; manifest_file *m; progress_report(false); - manifest_files_start_iterate(context->ht, &it); - while ((m = manifest_files_iterate(context->ht, &it)) != NULL) + manifest_files_start_iterate(manifest->files, &it); + while ((m = manifest_files_iterate(manifest->files, &it)) != NULL) { if (should_verify_checksum(m) && !should_ignore_relpath(context, m->pathname)) @@ -833,9 +825,10 @@ verify_file_checksum(verifier_context *context, manifest_file *m, */ static void parse_required_wal(verifier_context *context, char *pg_waldump_path, - char *wal_directory, manifest_wal_range *first_wal_range) + char *wal_directory) { - manifest_wal_range *this_wal_range = first_wal_range; + manifest_data *manifest = context->manifest; + manifest_wal_range *this_wal_range = manifest->first_wal_range; while (this_wal_range != NULL) { -- 2.18.0 From 8f7e9348f3c454d47abc8afa5b75d0b499d4d8dd Mon Sep 17 00:00:00 2001 From: Amul Sul Date: Mon, 22 Jan 2024 11:19:30 +0530 Subject: [PATCH v5 2/2] Add system identifier to backup manifest The backup manifest will be having a new item as "System-Identifier" and its value is from "ControlFileData.system_identifier" when manifest generated. This help identify the correct database server and/or backup while taking subsequent backup. --- doc/src/sgml/backup-manifest.sgml | 15 +++- doc/src/sgml/ref/pg_verifybackup.sgml | 3 +- src/backend/backup/backup_manifest.c | 7 +- src/backend/backup/basebackup_incremental.c | 39 + src/bin/pg_basebackup/t/010_pg_basebackup.pl | 20 + src/bin/pg_combinebackup/load_manifest.c | 33 +++- src/bin/pg_combinebackup/load_manifest.h | 1 + src/bin/pg_combinebackup/pg_combinebackup.c
Re: Add system identifier to backup manifest
On Mon, Jan 22, 2024 at 2:22 AM Amul Sul wrote: > Thinking a bit more on this, I realized parse_manifest_file() has many out > parameters. Instead parse_manifest_file() should simply return manifest data > like load_backup_manifest(). Attached 0001 patch doing the same, and removed > parser_context structure, and added manifest_data, and did the required > adjustments to pg_verifybackup code. InitializeBackupManifest(&manifest, opt->manifest, - opt->manifest_checksum_type); + opt->manifest_checksum_type, +GetSystemIdentifier()); InitializeBackupManifest() can just call GetSystemIdentifier() itself, instead of passing another parameter, I think. + if (manifest_version == 1) + context->error_cb(context, + "%s: backup manifest doesn't support incremental changes", + private_context->backup_directory); I think this is weird. First, there doesn't seem to be any reason to bounce through error_cb() here. You could just call pg_fatal(), as we do elsewhere in this file. Second, there doesn't seem to be any need to include the backup directory in this error message. We include the file name (not the directory name) in errors that pertain to the file itself, like if we can't open or read it. But we don't do that for semantic errors about the manifest contents (cf. combinebackup_per_file_cb). This file would need a lot fewer charges if you didn't feed the backup directory name through here. Third, the error message is not well-chosen, because a user who looks at it won't understand WHY the manifest doesn't support incremental changes. I suggest "backup manifest version 1 does not support incremental backup". + /* Incremental backups supported on manifest version 2 or later */ + if (manifest_version == 1) + context->error_cb(context, + "incremental backups cannot be taken for this backup"); Let's use the same error message as in the previous case here also. + for (i = 0; i < n_backups; i++) + { + if (manifests[i]->system_identifier != system_identifier) + { + char *controlpath; + + controlpath = psprintf("%s/%s", prior_backup_dirs[i], "global/pg_control"); + + pg_fatal("manifest is from different database system: manifest database system identifier is %llu, %s system identifier is %llu", +(unsigned long long) manifests[i]->system_identifier, +controlpath, +(unsigned long long) system_identifier); + } + } check_control_files() already verifies that all of the control files contain the same system identifier as each other, so what we're really checking here is that the backup manifest in each directory has the same system identifier as the control file in that same directory. One problem is that backup manifests are optional here, as per the comment in load_backup_manifests(), so you need to skip over NULL entries cleanly to avoid seg faulting if one is missing. I also think the error message should be changed. How about "%s: manifest system identifier is %llu, but control file has %llu"? + context->error_cb(context, + "manifest is from different database system: manifest database system identifier is %llu, pg_control database system identifier is %llu", + (unsigned long long) manifest_system_identifier, + (unsigned long long) system_identifier); And here, while I'm kibitzing, how about "manifest system identifier is %llu, but this system's identifier is %llu"? - qr/could not open directory/, + qr/could not open file/, I don't think that the expected error message here should be changing. Does it really, with the latest patch version? Why? Can we fix that? + else if (!parse->saw_system_identifier_field && + strcmp(parse->manifest_version, "1") != 0) I don't think this has any business testing the manifest version. That's something to sort out at some later stage. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add system identifier to backup manifest
On Mon, Jan 22, 2024 at 10:08 AM Amul Sul wrote: > > > On Fri, Jan 19, 2024 at 10:36 PM Amul Sul wrote: > >> On Wed, Jan 17, 2024 at 8:40 PM Robert Haas >> wrote: >> >>> >>> >> Updated version is attached. >> > > Another updated version attached -- fix missing manifest version check in > pg_verifybackup before system identifier validation. > Thinking a bit more on this, I realized parse_manifest_file() has many out parameters. Instead parse_manifest_file() should simply return manifest data like load_backup_manifest(). Attached 0001 patch doing the same, and removed parser_context structure, and added manifest_data, and did the required adjustments to pg_verifybackup code. Regards, Amul v4-0001-pg_verifybackup-code-refactor.patch Description: Binary data v4-0002-Add-system-identifier-to-backup-manifest.patch Description: Binary data
Re: Add system identifier to backup manifest
On Fri, Jan 19, 2024 at 10:36 PM Amul Sul wrote: > On Wed, Jan 17, 2024 at 8:40 PM Robert Haas wrote: > >> >> > Updated version is attached. > Another updated version attached -- fix missing manifest version check in pg_verifybackup before system identifier validation. Regards, Amul From 7ff9e85acbf0789d16d29dc316ce2bd9382ac879 Mon Sep 17 00:00:00 2001 From: Amul Sul Date: Mon, 22 Jan 2024 10:05:20 +0530 Subject: [PATCH v3] Add system identifier to backup manifest The backup manifest will be having a new item as "System-Identifier" and its value is from "ControlFileData.system_identifier" when manifest generated. This help identify the correct database server and/or backup while taking subsequent backup. --- doc/src/sgml/backup-manifest.sgml | 15 ++- doc/src/sgml/ref/pg_verifybackup.sgml | 3 +- src/backend/backup/backup_manifest.c | 9 +- src/backend/backup/basebackup.c | 3 +- src/backend/backup/basebackup_incremental.c | 39 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 20 src/bin/pg_combinebackup/load_manifest.c | 62 ++-- src/bin/pg_combinebackup/load_manifest.h | 1 + src/bin/pg_combinebackup/pg_combinebackup.c | 33 ++- src/bin/pg_combinebackup/t/005_integrity.pl | 14 +++ src/bin/pg_combinebackup/write_manifest.c | 8 +- src/bin/pg_combinebackup/write_manifest.h | 3 +- src/bin/pg_verifybackup/pg_verifybackup.c | 99 ++- src/bin/pg_verifybackup/t/003_corruption.pl | 31 +- src/bin/pg_verifybackup/t/004_options.pl | 2 +- src/bin/pg_verifybackup/t/005_bad_manifest.pl | 11 +++ src/common/parse_manifest.c | 83 +++- src/include/backup/backup_manifest.h | 3 +- src/include/common/parse_manifest.h | 6 ++ 19 files changed, 413 insertions(+), 32 deletions(-) diff --git a/doc/src/sgml/backup-manifest.sgml b/doc/src/sgml/backup-manifest.sgml index 771be1310a..a67462e3eb 100644 --- a/doc/src/sgml/backup-manifest.sgml +++ b/doc/src/sgml/backup-manifest.sgml @@ -37,7 +37,20 @@ PostgreSQL-Backup-Manifest-Version - The associated value is always the integer 1. + The associated value is an integer. Beginning in + PostgreSQL 17, it is 2; in older versions, + it is 1. + + + + + +System-Identifier + + + The associated integer value is an unique system identifier to ensure + file match up with the installation that produced them. Available only + when PostgreSQL-Backup-Manifest-Version is 2 and later. diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml index 36335e0a18..3051517d92 100644 --- a/doc/src/sgml/ref/pg_verifybackup.sgml +++ b/doc/src/sgml/ref/pg_verifybackup.sgml @@ -53,7 +53,8 @@ PostgreSQL documentation Backup verification proceeds in four stages. First, pg_verifybackup reads the backup_manifest file. If that file - does not exist, cannot be read, is malformed, or fails verification + does not exist, cannot be read, is malformed, fails to match the system + identifier with pg_control of the backup directory or fails verification against its own internal checksum, pg_verifybackup will terminate with a fatal error. diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c index 2c34e59752..612ff3add2 100644 --- a/src/backend/backup/backup_manifest.c +++ b/src/backend/backup/backup_manifest.c @@ -56,7 +56,8 @@ IsManifestEnabled(backup_manifest_info *manifest) void InitializeBackupManifest(backup_manifest_info *manifest, backup_manifest_option want_manifest, - pg_checksum_type manifest_checksum_type) + pg_checksum_type manifest_checksum_type, + uint64 system_identifier) { memset(manifest, 0, sizeof(backup_manifest_info)); manifest->checksum_type = manifest_checksum_type; @@ -79,8 +80,10 @@ InitializeBackupManifest(backup_manifest_info *manifest, if (want_manifest != MANIFEST_OPTION_NO) AppendToManifest(manifest, - "{ \"PostgreSQL-Backup-Manifest-Version\": 1,\n" - "\"Files\": ["); + "{ \"PostgreSQL-Backup-Manifest-Version\": 2,\n" + "\"System-Identifier\": " UINT64_FORMAT ",\n" + "\"Files\": [", + system_identifier); } /* diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c index d5b8ca21b7..315efc7536 100644 --- a/src/backend/backup/basebackup.c +++ b/src/backend/backup/basebackup.c @@ -256,7 +256,8 @@ perform_base_backup(basebackup_options *opt, bbsink *sink, backup_started_in_recovery = RecoveryInProgress(); InitializeBackupManifest(&manifest, opt->manifest, - opt->manifest_checksum_type); + opt->manifest_checksum_type,
Re: Add system identifier to backup manifest
On Thu, Jan 18, 2024 at 6:39 AM Sravan Kumar wrote: > I have also done a review of the patch and some testing. The patch looks > good, and I agree with Robert's comments. > Thank you for your review, testing and the offline discussion. Regards, Amul
Re: Add system identifier to backup manifest
d happen later (e.g. "0/0" can't be a the correct backup end LSN > but we don't figure that out while parsing, but rather later). I think > you should actually move validation of the system identifier to the > point where the directory walk encounters the control file (and update > the docs and tests to match that decision). Imagine if you wanted to > validate a tar-format backup; then you wouldn't have random access to > the directory. You'd see the manifest file first, and then all the > files in a random order, with one chance to look at each one. > > Agree. I have moved the system identifier validation after manifest parsing. But, not in the directory walkthrough since in pg_combinebackup, we don't really needed to open the pg_control file to get the system identifier, which we have from the check_control_files(). > (This is, in fact, a feature I think we should implement.) > > - if (strcmp(token, "1") != 0) > + parse->manifest_version = atoi(token); > + if (parse->manifest_version != 1 && parse->manifest_version != 2) > json_manifest_parse_failure(parse->context, > "unexpected manifest version"); > > Please either (a) don't do a string-to-integer conversion and just > strcmp() twice or (b) use strtol so that you can check that it > succeeded. I don't want to accept manifest version 1a as 1. > Understood, corrected in the attached version. > +/* > + * Validate manifest system identifier against the database server system > + * identifier. > + */ > > This comment assumes you know what the callback is going to do, but > you don't. This should be more like the comment for > json_manifest_finalize_file or json_manifest_finalize_wal_range. > Ok. Updated version is attached. Regards, Amul From 567b498dab623b60279c4f5df909bcb564b4f81a Mon Sep 17 00:00:00 2001 From: Amul Sul Date: Fri, 19 Jan 2024 22:21:12 +0530 Subject: [PATCH v2] Add system identifier to backup manifest The backup manifest will be having a new item as "System-Identifier" and its value is from "ControlFileData.system_identifier" when manifest generated. This help identify the correct database server and/or backup while taking subsequent backup. --- doc/src/sgml/backup-manifest.sgml | 15 +++- doc/src/sgml/ref/pg_verifybackup.sgml | 3 +- src/backend/backup/backup_manifest.c | 9 +- src/backend/backup/basebackup.c | 3 +- src/backend/backup/basebackup_incremental.c | 39 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 20 + src/bin/pg_combinebackup/load_manifest.c | 62 +++-- src/bin/pg_combinebackup/load_manifest.h | 1 + src/bin/pg_combinebackup/pg_combinebackup.c | 33 +-- src/bin/pg_combinebackup/t/005_integrity.pl | 14 +++ src/bin/pg_combinebackup/write_manifest.c | 8 +- src/bin/pg_combinebackup/write_manifest.h | 3 +- src/bin/pg_verifybackup/pg_verifybackup.c | 90 ++- src/bin/pg_verifybackup/t/003_corruption.pl | 31 ++- src/bin/pg_verifybackup/t/004_options.pl | 2 +- src/bin/pg_verifybackup/t/005_bad_manifest.pl | 11 +++ src/common/parse_manifest.c | 83 - src/include/backup/backup_manifest.h | 3 +- src/include/common/parse_manifest.h | 6 ++ 19 files changed, 404 insertions(+), 32 deletions(-) diff --git a/doc/src/sgml/backup-manifest.sgml b/doc/src/sgml/backup-manifest.sgml index 771be1310a..a67462e3eb 100644 --- a/doc/src/sgml/backup-manifest.sgml +++ b/doc/src/sgml/backup-manifest.sgml @@ -37,7 +37,20 @@ PostgreSQL-Backup-Manifest-Version - The associated value is always the integer 1. + The associated value is an integer. Beginning in + PostgreSQL 17, it is 2; in older versions, + it is 1. + + + + + +System-Identifier + + + The associated integer value is an unique system identifier to ensure + file match up with the installation that produced them. Available only + when PostgreSQL-Backup-Manifest-Version is 2 and later. diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml index 36335e0a18..3051517d92 100644 --- a/doc/src/sgml/ref/pg_verifybackup.sgml +++ b/doc/src/sgml/ref/pg_verifybackup.sgml @@ -53,7 +53,8 @@ PostgreSQL documentation Backup verification proceeds in four stages. First, pg_verifybackup reads the backup_manifest file. If that file - does not exist, cannot be read, is malformed, or fails verification + does not exist, cannot be read, is malformed, fails to match the system + identifier with pg_control of the backup directory or fails verification against its own internal checksum, pg_verifybackup will terminate with a fatal error. diff --git a/src/ba
Re: Add system identifier to backup manifest
On Wed, Jan 17, 2024 at 08:46:09AM -0500, Robert Haas wrote: > On Wed, Jan 17, 2024 at 6:45 AM Alvaro Herrera > wrote: >> Hmm, okay, but what if I take a full backup from a primary server and >> later I want an incremental from a standby, or the other way around? >> Will this prevent me from using such a combination? > > The system identifier had BETTER match in such cases. If it doesn't, > somebody's run pg_resetwal on your standby since it was created... and > in that case, no incremental backup for you! There is an even stronger check than that at replay as we also store the system identifier in XLogLongPageHeaderData and cross-check it with the contents of the control file. Having a field in the backup manifest makes for a much faster detection, even if that's not the same as replaying things, it can avoid a lot of problems when combining backup pieces. I'm +1 for Amul's patch concept. -- Michael signature.asc Description: PGP signature
Re: Add system identifier to backup manifest
I have also done a review of the patch and some testing. The patch looks good, and I agree with Robert's comments. On Wed, Jan 17, 2024 at 8:40 PM Robert Haas wrote: > > On Wed, Jan 17, 2024 at 6:31 AM Amul Sul wrote: > > With the attached patch, the backup manifest will have a new key item as > > "System-Identifier" 64-bit integer whose value is derived from pg_control while > > generating it, and the manifest version bumps to 2. > > > > This helps to identify the correct database server and/or backup for the > > subsequent backup operations. pg_verifybackup validates the manifest system > > identifier against the backup control file and fails if they don’t match. > > Similarly, pg_basebackup increment backup will fail if the manifest system > > identifier does not match with the server system identifier. The > > pg_combinebackup is already a bit smarter -- checks the system identifier from > > the pg_control of all the backups, with this patch the manifest system > > identifier also validated. > > Thanks for working on this. Without this, I think what happens is that > you can potentially take an incremental backup from the "wrong" > server, if the states of the systems are such that all of the other > sanity checks pass. When you run pg_combinebackup, it'll discover the > problem and tell you, but you ideally want to discover such errors at > backup time rather than at restore time. This addresses that. And, > overall, I think it's a pretty good patch. But I nonetheless have a > bunch of comments. > > - The associated value is always the integer 1. > + The associated value is the integer, either 1 or 2. > > is an integer. Beginning in PostgreSQL 17, > it is 2; in older versions, it is 1. > > + context.identity_cb = manifest_process_identity; > > I'm not really on board with calling the system identifier "identity" > throughout the patch. I think it should just say system_identifier. If > we were going to abbreviate, I'd prefer something like "sysident" that > looks like it's derived from "system identifier" rather than > "identity" which is a different word altogether. But I don't think we > should abbreviate unless doing so creates *ridiculously* long > identifier names. > > +static void > +manifest_process_identity(JsonManifestParseContext *context, > + int manifest_version, > + uint64 manifest_system_identifier) > +{ > + uint64 system_identifier; > + > + /* Manifest system identifier available in version 2 or later */ > + if (manifest_version == 1) > + return; > > I think you've got the wrong idea here. I think this function would > only get called if System-Identifier is present in the manifest, so if > it's a v1 manifest, this would never get called, so this if-statement > would not ever do anything useful. I think what you should do is (1) > if the client supplies a v1 manifest, reject it, because surely that's > from an older server version that doesn't support incremental backup; > but do that when the version is parsed rather than here; and (2) also > detect and reject the case when it's supposedly a v2 manifest but this > is absent. > > (1) should really be done when the version number is parsed, so I > suspect you may need to add manifest_version_cb. > > +static void > +combinebackup_identity_cb(JsonManifestParseContext *context, > + int manifest_version, > + uint64 manifest_system_identifier) > +{ > + parser_context *private_context = context->private_data; > + uint64 system_identifier = private_context->system_identifier; > + > + /* Manifest system identifier available in version 2 or later */ > + if (manifest_version == 1) > + return; > > Very similar to the above case. Just reject a version 1 manifest as > soon as we see the version number. In this function, set a flag > indicating we saw the system identifier; if at the end of parsing that > flag is not set, kaboom. > > - parse_manifest_file(manifest_path, &context.ht, &first_wal_range); > + parse_manifest_file(manifest_path, &context.ht, &first_wal_range, > + context.backup_directory); > > Don't do this! parse_manifest_file() should just record everything > found in the manifest in the context object. Syntax validation should > happen while parsing the manifest (e.g. "CAT/DOG" is not a valid LSN > and we should reject that at this stage) but semantic validation > should happen later (e.g. "0/0" can't be a the correct backup end LSN > but we don't figure that out while parsing, but rather later). I think > you should actually move validation of the system identifier to the > point where the directory walk encounters the control file (and update > the docs and tests to match that decision). Imagine if you wanted to > validate a tar-format backup; then you wouldn't have random access to > the directory. You'd see the manifest file first, and then all the > files in a random order, with one chance to look at each one. > > (This is, in fact, a feature I think we should implement.) > > - if (strcmp(token, "1") != 0) > +
Re: Add system identifier to backup manifest
On Wed, Jan 17, 2024 at 6:31 AM Amul Sul wrote: > With the attached patch, the backup manifest will have a new key item as > "System-Identifier" 64-bit integer whose value is derived from pg_control > while > generating it, and the manifest version bumps to 2. > > This helps to identify the correct database server and/or backup for the > subsequent backup operations. pg_verifybackup validates the manifest system > identifier against the backup control file and fails if they don’t match. > Similarly, pg_basebackup increment backup will fail if the manifest system > identifier does not match with the server system identifier. The > pg_combinebackup is already a bit smarter -- checks the system identifier from > the pg_control of all the backups, with this patch the manifest system > identifier also validated. Thanks for working on this. Without this, I think what happens is that you can potentially take an incremental backup from the "wrong" server, if the states of the systems are such that all of the other sanity checks pass. When you run pg_combinebackup, it'll discover the problem and tell you, but you ideally want to discover such errors at backup time rather than at restore time. This addresses that. And, overall, I think it's a pretty good patch. But I nonetheless have a bunch of comments. - The associated value is always the integer 1. + The associated value is the integer, either 1 or 2. is an integer. Beginning in PostgreSQL 17, it is 2; in older versions, it is 1. + context.identity_cb = manifest_process_identity; I'm not really on board with calling the system identifier "identity" throughout the patch. I think it should just say system_identifier. If we were going to abbreviate, I'd prefer something like "sysident" that looks like it's derived from "system identifier" rather than "identity" which is a different word altogether. But I don't think we should abbreviate unless doing so creates *ridiculously* long identifier names. +static void +manifest_process_identity(JsonManifestParseContext *context, + int manifest_version, + uint64 manifest_system_identifier) +{ + uint64 system_identifier; + + /* Manifest system identifier available in version 2 or later */ + if (manifest_version == 1) + return; I think you've got the wrong idea here. I think this function would only get called if System-Identifier is present in the manifest, so if it's a v1 manifest, this would never get called, so this if-statement would not ever do anything useful. I think what you should do is (1) if the client supplies a v1 manifest, reject it, because surely that's from an older server version that doesn't support incremental backup; but do that when the version is parsed rather than here; and (2) also detect and reject the case when it's supposedly a v2 manifest but this is absent. (1) should really be done when the version number is parsed, so I suspect you may need to add manifest_version_cb. +static void +combinebackup_identity_cb(JsonManifestParseContext *context, + int manifest_version, + uint64 manifest_system_identifier) +{ + parser_context *private_context = context->private_data; + uint64 system_identifier = private_context->system_identifier; + + /* Manifest system identifier available in version 2 or later */ + if (manifest_version == 1) + return; Very similar to the above case. Just reject a version 1 manifest as soon as we see the version number. In this function, set a flag indicating we saw the system identifier; if at the end of parsing that flag is not set, kaboom. - parse_manifest_file(manifest_path, &context.ht, &first_wal_range); + parse_manifest_file(manifest_path, &context.ht, &first_wal_range, + context.backup_directory); Don't do this! parse_manifest_file() should just record everything found in the manifest in the context object. Syntax validation should happen while parsing the manifest (e.g. "CAT/DOG" is not a valid LSN and we should reject that at this stage) but semantic validation should happen later (e.g. "0/0" can't be a the correct backup end LSN but we don't figure that out while parsing, but rather later). I think you should actually move validation of the system identifier to the point where the directory walk encounters the control file (and update the docs and tests to match that decision). Imagine if you wanted to validate a tar-format backup; then you wouldn't have random access to the directory. You'd see the manifest file first, and then all the files in a random order, with one chance to look at each one. (This is, in fact, a feature I think we should implement.) - if (strcmp(token, "1") != 0) + parse->manifest_version = atoi(token); + if (parse->manifest_version != 1 && parse->manifest_version != 2) json_manifest_parse_failure(parse->context, "unexpected manifest version"); Please either (a) don't do a string-to-integer conversion and just strcmp() twice or (b) use strtol so that you can check that it succeeded. I don't want to accept ma
Re: Add system identifier to backup manifest
On Wed, Jan 17, 2024 at 6:45 AM Alvaro Herrera wrote: > Hmm, okay, but what if I take a full backup from a primary server and > later I want an incremental from a standby, or the other way around? > Will this prevent me from using such a combination? The system identifier had BETTER match in such cases. If it doesn't, somebody's run pg_resetwal on your standby since it was created... and in that case, no incremental backup for you! -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add system identifier to backup manifest
On Wed, Jan 17, 2024 at 5:15 PM Alvaro Herrera wrote: > On 2024-Jan-17, Amul Sul wrote: > > > This helps to identify the correct database server and/or backup for the > > subsequent backup operations. pg_verifybackup validates the manifest > system > > identifier against the backup control file and fails if they don’t match. > > Similarly, pg_basebackup increment backup will fail if the manifest > system > > identifier does not match with the server system identifier. The > > pg_combinebackup is already a bit smarter -- checks the system identifier > > from > > the pg_control of all the backups, with this patch the manifest system > > identifier also validated. > > Hmm, okay, but what if I take a full backup from a primary server and > later I want an incremental from a standby, or the other way around? > Will this prevent me from using such a combination? > Yes, that worked for me where the system identifier was the same on master as well standby. Regards, Amul
Re: Add system identifier to backup manifest
On 2024-Jan-17, Amul Sul wrote: > This helps to identify the correct database server and/or backup for the > subsequent backup operations. pg_verifybackup validates the manifest system > identifier against the backup control file and fails if they don’t match. > Similarly, pg_basebackup increment backup will fail if the manifest system > identifier does not match with the server system identifier. The > pg_combinebackup is already a bit smarter -- checks the system identifier > from > the pg_control of all the backups, with this patch the manifest system > identifier also validated. Hmm, okay, but what if I take a full backup from a primary server and later I want an incremental from a standby, or the other way around? Will this prevent me from using such a combination? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "You're _really_ hosed if the person doing the hiring doesn't understand relational systems: you end up with a whole raft of programmers, none of whom has had a Date with the clue stick." (Andrew Sullivan) https://postgr.es/m/20050809113420.gd2...@phlogiston.dyndns.org
Add system identifier to backup manifest
Hi All, With the attached patch, the backup manifest will have a new key item as "System-Identifier" 64-bit integer whose value is derived from pg_control while generating it, and the manifest version bumps to 2. This helps to identify the correct database server and/or backup for the subsequent backup operations. pg_verifybackup validates the manifest system identifier against the backup control file and fails if they don’t match. Similarly, pg_basebackup increment backup will fail if the manifest system identifier does not match with the server system identifier. The pg_combinebackup is already a bit smarter -- checks the system identifier from the pg_control of all the backups, with this patch the manifest system identifier also validated. For backward compatibility, the manifest system identifier validation will be skipped for version 1. -- Regards, Amul Sul EDB: http://www.enterprisedb.com From 03d21efd7b03d17a55fc1dd1159e0838777f548a Mon Sep 17 00:00:00 2001 From: Amul Sul Date: Wed, 17 Jan 2024 16:12:19 +0530 Subject: [PATCH v1] Add system identifier to backup manifest The backup manifest will be having a new item as "System-Identifier" and its value is from "ControlFileData.system_identifier" when manifest generated. This help identify the correct database server and/or backup while taking subsequent backup. --- doc/src/sgml/backup-manifest.sgml | 13 +++- doc/src/sgml/ref/pg_verifybackup.sgml | 3 +- src/backend/backup/backup_manifest.c | 9 ++- src/backend/backup/basebackup.c | 3 +- src/backend/backup/basebackup_incremental.c | 29 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 20 + src/bin/pg_combinebackup/load_manifest.c | 73 --- src/bin/pg_combinebackup/load_manifest.h | 6 +- src/bin/pg_combinebackup/pg_combinebackup.c | 16 ++-- src/bin/pg_combinebackup/t/005_integrity.pl | 14 src/bin/pg_combinebackup/write_manifest.c | 8 +- src/bin/pg_combinebackup/write_manifest.h | 3 +- src/bin/pg_verifybackup/pg_verifybackup.c | 59 ++- src/bin/pg_verifybackup/t/003_corruption.pl | 16 +++- src/bin/pg_verifybackup/t/004_options.pl | 2 +- src/bin/pg_verifybackup/t/005_bad_manifest.pl | 7 +- src/common/parse_manifest.c | 36 - src/include/backup/backup_manifest.h | 3 +- src/include/common/parse_manifest.h | 4 + 19 files changed, 286 insertions(+), 38 deletions(-) diff --git a/doc/src/sgml/backup-manifest.sgml b/doc/src/sgml/backup-manifest.sgml index 771be1310a..d0fc20ed0f 100644 --- a/doc/src/sgml/backup-manifest.sgml +++ b/doc/src/sgml/backup-manifest.sgml @@ -37,7 +37,18 @@ PostgreSQL-Backup-Manifest-Version - The associated value is always the integer 1. + The associated value is the integer, either 1 or 2. + + + + + +System-Identifier + + + The associated integer value is an unique system identifier to ensure + file match up with the installation that produced them. Available only + when PostgreSQL-Backup-Manifest-Version is 2 and later. diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml index 36335e0a18..3051517d92 100644 --- a/doc/src/sgml/ref/pg_verifybackup.sgml +++ b/doc/src/sgml/ref/pg_verifybackup.sgml @@ -53,7 +53,8 @@ PostgreSQL documentation Backup verification proceeds in four stages. First, pg_verifybackup reads the backup_manifest file. If that file - does not exist, cannot be read, is malformed, or fails verification + does not exist, cannot be read, is malformed, fails to match the system + identifier with pg_control of the backup directory or fails verification against its own internal checksum, pg_verifybackup will terminate with a fatal error. diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c index 2c34e59752..612ff3add2 100644 --- a/src/backend/backup/backup_manifest.c +++ b/src/backend/backup/backup_manifest.c @@ -56,7 +56,8 @@ IsManifestEnabled(backup_manifest_info *manifest) void InitializeBackupManifest(backup_manifest_info *manifest, backup_manifest_option want_manifest, - pg_checksum_type manifest_checksum_type) + pg_checksum_type manifest_checksum_type, + uint64 system_identifier) { memset(manifest, 0, sizeof(backup_manifest_info)); manifest->checksum_type = manifest_checksum_type; @@ -79,8 +80,10 @@ InitializeBackupManifest(backup_manifest_info *manifest, if (want_manifest != MANIFEST_OPTION_NO) AppendToManifest(manifest, - "{ \"PostgreSQL-Backup-Manifest-Version\": 1,\n" - "\"Files\": ["); + "{ \"PostgreSQL-Backup-Manifest-Version\": 2,\n" + "\"System-Identifier\": " UINT64_FORMAT ",\n" +