On 26.03.2019 11:19, Michael Paquier wrote:
+ * This is a simplified and adapted to frontend version of + * RestoreArchivedFile function from transam/xlogarchive.c + */ +static int +RestoreArchivedWAL(const char *path, const char *xlogfname, I don't think that we should have duplicates for that, so I would recommend refactoring the code so as a unique code path is taken by both, especially since the user can fetch the command from postgresql.conf.
This comment is here since the beginning of my work on this patch and now it is rather misleading.
Even if we does not take into account obvious differences like error reporting, different log levels based on many conditions, cleanup options, check for standby mode; restore_command execution at backend recovery and during pg_rewind has a very important difference. If it fails at backend, then as stated in the comment 'Remember, we rollforward UNTIL the restore fails so failure here is just part of the process' -- it is OK. In opposite, in pg_rewind if we failed to recover some required WAL segment, then it definitely means the end of the entire process, since we will fail at finding last common checkpoint or extracting page map.
The only part we can share is constructing restore_command with aliases replacement. However, even in this place the logic is slightly different, since we do not need %r alias for pg_rewind. The only use case of %r in restore_command I know is pg_standby, which seems to be as not a case for pg_rewind. I have tried to move this part to the common, but it becomes full of conditions and less concise.
Please, correct me if I am wrong, but it seems that there are enough differences to keep this function separated, isn't it?
Why two options? Wouldn't actually be enough use-postgresql-conf to do the job? Note that "postgres" should always be installed if pg_rewind is present because it is a backend-side utility, so while I don't like adding a dependency to other binaries in one binary, having an option to pass out a command directly via the command line of pg_rewind stresses me more.
I am not familiar enough with DBA scenarios, where -R option may be useful, but I was asked a few times for that. I can only speculate that for example someone may want to run freshly rewinded cluster as master, not replica, so its config may differ from replica's one, where restore_command is surely intended to be. Thus, it is easier to leave master's config at the place and just specify restore_command as command line argument.
Don't we need to worry about signals interrupting the restore command? It seems to me that some refactoring from the stuff in xlogarchive.c would be in order.
Thank you for pointing me to this place again. Previously, I thought that we should not care about it, since if restore_command was not successful due to any reason, then rewind failed, so we will stop and exit at upper levels. However, if it was due to a signal, then some of next messages may be misleading, if e.g. user manually interrupted it for some reason. So that, I added a similar check here as well.
Updated version of patch is attached. -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
>From 9e00f7a7696a88f350e1e328a9758ab85631c813 Mon Sep 17 00:00:00 2001 From: Alexey Kondratov <kondratov.alek...@gmail.com> Date: Tue, 19 Feb 2019 19:14:53 +0300 Subject: [PATCH v6] pg_rewind: options to use restore_command from command line or cluster config Previously, when pg_rewind could not find required WAL files in the target data directory the rewind process would fail. One had to manually figure out which of required WAL files have already moved to the archival storage and copy them back. This patch adds possibility to specify restore_command via command line option or use one specified inside postgresql.conf. Specified restore_command will be used for automatic retrieval of missing WAL files from archival storage. --- doc/src/sgml/ref/pg_rewind.sgml | 30 ++++- src/bin/pg_rewind/parsexlog.c | 167 +++++++++++++++++++++++++- src/bin/pg_rewind/pg_rewind.c | 96 ++++++++++++++- src/bin/pg_rewind/pg_rewind.h | 7 +- src/bin/pg_rewind/t/001_basic.pl | 4 +- src/bin/pg_rewind/t/002_databases.pl | 4 +- src/bin/pg_rewind/t/003_extrafiles.pl | 4 +- src/bin/pg_rewind/t/RewindTest.pm | 84 ++++++++++++- 8 files changed, 376 insertions(+), 20 deletions(-) diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml index 53a64ee29e..90e3f22f97 100644 --- a/doc/src/sgml/ref/pg_rewind.sgml +++ b/doc/src/sgml/ref/pg_rewind.sgml @@ -67,8 +67,10 @@ PostgreSQL documentation ancestor. In the typical failover scenario where the target cluster was shut down soon after the divergence, this is not a problem, but if the target cluster ran for a long time after the divergence, the old WAL - files might no longer be present. In that case, they can be manually - copied from the WAL archive to the <filename>pg_wal</filename> directory, or + files might no longer be present. In that case, they can be automatically + copied by <application>pg_rewind</application> from the WAL archive to the + <filename>pg_wal</filename> directory if either <literal>-r</literal> or + <literal>-R</literal> option is specified, or fetched on startup by configuring <xref linkend="guc-primary-conninfo"/> or <xref linkend="guc-restore-command"/>. The use of <application>pg_rewind</application> is not limited to failover, e.g. a standby @@ -200,6 +202,30 @@ PostgreSQL documentation </listitem> </varlistentry> + <varlistentry> + <term><option>-r</option></term> + <term><option>--use-postgresql-conf</option></term> + <listitem> + <para> + Use restore_command in the <filename>postgresql.conf</filename> to + retrieve missing in the target <filename>pg_wal</filename> directory + WAL files from the WAL archive. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-R <replaceable class="parameter">restore_command</replaceable></option></term> + <term><option>--restore-command=<replaceable class="parameter">restore_command</replaceable></option></term> + <listitem> + <para> + Specifies the restore_command to use for retrieval of the missing + in the target <filename>pg_wal</filename> directory WAL files from + the WAL archive. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><option>--debug</option></term> <listitem> diff --git a/src/bin/pg_rewind/parsexlog.c b/src/bin/pg_rewind/parsexlog.c index e19c265cbb..3dc110be4e 100644 --- a/src/bin/pg_rewind/parsexlog.c +++ b/src/bin/pg_rewind/parsexlog.c @@ -12,6 +12,7 @@ #include "postgres_fe.h" #include <unistd.h> +#include <sys/stat.h> #include "pg_rewind.h" #include "filemap.h" @@ -45,6 +46,7 @@ static char xlogfpath[MAXPGPATH]; typedef struct XLogPageReadPrivate { const char *datadir; + const char *restoreCommand; int tliIndex; } XLogPageReadPrivate; @@ -53,6 +55,9 @@ static int SimpleXLogPageRead(XLogReaderState *xlogreader, int reqLen, XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *pageTLI); +static int RestoreArchivedWAL(const char *path, const char *xlogfname, + off_t expectedSize, const char *restoreCommand); + /* * Read WAL from the datadir/pg_wal, starting from 'startpoint' on timeline * index 'tliIndex' in target timeline history, until 'endpoint'. Make note of @@ -60,7 +65,7 @@ static int SimpleXLogPageRead(XLogReaderState *xlogreader, */ void extractPageMap(const char *datadir, XLogRecPtr startpoint, int tliIndex, - XLogRecPtr endpoint) + XLogRecPtr endpoint, const char *restore_command) { XLogRecord *record; XLogReaderState *xlogreader; @@ -69,6 +74,7 @@ extractPageMap(const char *datadir, XLogRecPtr startpoint, int tliIndex, private.datadir = datadir; private.tliIndex = tliIndex; + private.restoreCommand = restore_command; xlogreader = XLogReaderAllocate(WalSegSz, &SimpleXLogPageRead, &private); if (xlogreader == NULL) @@ -156,7 +162,7 @@ readOneRecord(const char *datadir, XLogRecPtr ptr, int tliIndex) void findLastCheckpoint(const char *datadir, XLogRecPtr forkptr, int tliIndex, XLogRecPtr *lastchkptrec, TimeLineID *lastchkpttli, - XLogRecPtr *lastchkptredo) + XLogRecPtr *lastchkptredo, const char *restoreCommand) { /* Walk backwards, starting from the given record */ XLogRecord *record; @@ -181,6 +187,7 @@ findLastCheckpoint(const char *datadir, XLogRecPtr forkptr, int tliIndex, private.datadir = datadir; private.tliIndex = tliIndex; + private.restoreCommand = restoreCommand; xlogreader = XLogReaderAllocate(WalSegSz, &SimpleXLogPageRead, &private); if (xlogreader == NULL) @@ -291,9 +298,30 @@ SimpleXLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, if (xlogreadfd < 0) { - printf(_("could not open file \"%s\": %s\n"), xlogfpath, - strerror(errno)); - return -1; + /* + * If we have no restore_command to execute, then exit. + */ + if (private->restoreCommand == NULL) + { + printf(_("could not open file \"%s\": %s\n"), xlogfpath, + strerror(errno)); + return -1; + } + + /* + * Since we have restore_command to execute, then try to retrieve + * missing WAL file from the archive. + */ + xlogreadfd = RestoreArchivedWAL(private->datadir, + xlogfname, + WalSegSz, + private->restoreCommand); + + if (xlogreadfd < 0) + return -1; + else + pg_log(PG_DEBUG, "using restored from archive version of file \"%s\"\n", + xlogfpath); } } @@ -409,3 +437,132 @@ extractPageInfo(XLogReaderState *record) process_block_change(forknum, rnode, blkno); } } + +/* + * Attempt to retrieve the specified file from off-line archival storage. + * If successful return a file descriptor of restored WAL file, else + * return -1. + * + * For fixed-size files, the caller may pass the expected size as an + * additional crosscheck on successful recovery. If the file size is not + * known, set expectedSize = 0. + */ +static int +RestoreArchivedWAL(const char *path, const char *xlogfname, + off_t expectedSize, const char *restoreCommand) +{ + char xlogpath[MAXPGPATH], + xlogRestoreCmd[MAXPGPATH], + *dp, + *endp; + const char *sp; + int rc, + xlogfd; + struct stat stat_buf; + + snprintf(xlogpath, MAXPGPATH, "%s/" XLOGDIR "/%s", path, xlogfname); + + /* + * Construct the command to be executed. + */ + dp = xlogRestoreCmd; + endp = xlogRestoreCmd + MAXPGPATH - 1; + *endp = '\0'; + + for (sp = restoreCommand; *sp; sp++) + { + if (*sp == '%') + { + switch (sp[1]) + { + case 'p': + /* %p: relative path of target file */ + sp++; + StrNCpy(dp, xlogpath, endp - dp); + make_native_path(dp); + dp += strlen(dp); + break; + case 'f': + /* %f: filename of desired file */ + sp++; + StrNCpy(dp, xlogfname, endp - dp); + dp += strlen(dp); + break; + case 'r': + /* %r: filename of last restartpoint */ + pg_fatal("restore_command with %%r cannot be used with pg_rewind.\n"); + break; + case '%': + /* convert %% to a single % */ + sp++; + if (dp < endp) + *dp++ = *sp; + break; + default: + /* otherwise treat the % as not special */ + if (dp < endp) + *dp++ = *sp; + break; + } + } + else + { + if (dp < endp) + *dp++ = *sp; + } + } + *dp = '\0'; + + /* + * Execute restore_command, which should copy + * the missing WAL file from archival storage. + */ + rc = system(xlogRestoreCmd); + + if (rc == 0) + { + /* + * Command apparently succeeded, but let's make sure the file is + * really there now and has the correct size. + */ + if (stat(xlogpath, &stat_buf) == 0) + { + if (expectedSize > 0 && stat_buf.st_size != expectedSize) + { + printf(_("archive file \"%s\" has wrong size: %lu instead of %lu, %s"), + xlogfname, (unsigned long) stat_buf.st_size, + (unsigned long) expectedSize, strerror(errno)); + } + else + { + xlogfd = open(xlogpath, O_RDONLY | PG_BINARY, 0); + + if (xlogfd < 0) + printf(_("could not open restored from archive file \"%s\": %s\n"), + xlogpath, strerror(errno)); + else + return xlogfd; + } + } + else + { + /* Stat failed */ + printf(_("could not stat file \"%s\": %s"), + xlogpath, strerror(errno)); + } + } + + /* + * If the failure was due to any sort of signal, then it will be + * misleading to return message 'could not restore file...' and + * propagate result to the upper levels. We should exit right now. + */ + if (wait_result_is_any_signal(rc, false)) + pg_fatal("restore_command failed due to the signal: %s\n", + wait_result_to_str(rc)); + + printf(_("could not restore file \"%s\" from archive\n"), + xlogfname); + + return -1; +} diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c index 3dcadb9b40..344b67f99b 100644 --- a/src/bin/pg_rewind/pg_rewind.c +++ b/src/bin/pg_rewind/pg_rewind.c @@ -52,11 +52,13 @@ int WalSegSz; char *datadir_target = NULL; char *datadir_source = NULL; char *connstr_source = NULL; +char *restore_command = NULL; bool debug = false; bool showprogress = false; bool dry_run = false; bool do_sync = true; +bool restore_wals = false; /* Target history */ TimeLineHistoryEntry *targetHistory; @@ -75,6 +77,9 @@ usage(const char *progname) printf(_(" -N, --no-sync do not wait for changes to be written\n")); printf(_(" safely to disk\n")); printf(_(" -P, --progress write progress messages\n")); + printf(_(" -r, --use-postgresql-conf use restore_command in the postgresql.conf to\n")); + printf(_(" retrieve WALs from archive\n")); + printf(_(" -R, --restore-command=COMMAND restore_command\n")); printf(_(" --debug write a lot of debug messages\n")); printf(_(" -V, --version output version information, then exit\n")); printf(_(" -?, --help show this help, then exit\n")); @@ -94,6 +99,8 @@ main(int argc, char **argv) {"dry-run", no_argument, NULL, 'n'}, {"no-sync", no_argument, NULL, 'N'}, {"progress", no_argument, NULL, 'P'}, + {"use-postgresql-conf", no_argument, NULL, 'r'}, + {"restore-command", required_argument, NULL, 'R'}, {"debug", no_argument, NULL, 3}, {NULL, 0, NULL, 0} }; @@ -129,7 +136,7 @@ main(int argc, char **argv) } } - while ((c = getopt_long(argc, argv, "D:nNP", long_options, &option_index)) != -1) + while ((c = getopt_long(argc, argv, "D:nNPR:r", long_options, &option_index)) != -1) { switch (c) { @@ -141,6 +148,10 @@ main(int argc, char **argv) showprogress = true; break; + case 'r': + restore_wals = true; + break; + case 'n': dry_run = true; break; @@ -157,6 +168,10 @@ main(int argc, char **argv) datadir_target = pg_strdup(optarg); break; + case 'R': + restore_command = pg_strdup(optarg); + break; + case 1: /* --source-pgdata */ datadir_source = pg_strdup(optarg); break; @@ -223,6 +238,78 @@ main(int argc, char **argv) umask(pg_mode_mask); + if (restore_command != NULL) + { + if (restore_wals) + { + fprintf(stderr, _("%s: conflicting options: both -r and -R are specified\n"), + progname); + fprintf(stderr, _("You must run %s with either -r/--use-postgresql-conf " + "or -R/--restore-command.\n"), progname); + exit(1); + } + + pg_log(PG_DEBUG, "using command line restore_command=\'%s\'.\n", restore_command); + } + else if (restore_wals) + { + int rc; + char postgres_exec_path[MAXPGPATH], + postgres_cmd[MAXPGPATH], + cmd_output[MAXPGPATH]; + FILE *output_fp; + + /* Find postgres executable. */ + rc = find_other_exec(argv[0], "postgres", + PG_BACKEND_VERSIONSTR, + postgres_exec_path); + + if (rc < 0) + { + char full_path[MAXPGPATH]; + + if (find_my_exec(argv[0], full_path) < 0) + strlcpy(full_path, progname, sizeof(full_path)); + + if (rc == -1) + fprintf(stderr, + _("the program \"postgres\" is needed by %s " + "but was not found in the\n" + "same directory as \"%s\".\n" + "Check your installation.\n"), + progname, full_path); + else + fprintf(stderr, + _("the program \"postgres\" was found by \"%s\"\n" + "but was not the same version as %s.\n" + "Check your installation.\n"), + full_path, progname); + exit(1); + } + + /* Build a command to execute for restore_command GUC retrieval if set. */ + snprintf(postgres_cmd, sizeof(postgres_cmd), "%s -D %s -C restore_command", + postgres_exec_path, datadir_target); + + if ((output_fp = popen(postgres_cmd, "r")) == NULL || + fgets(cmd_output, sizeof(cmd_output), output_fp) == NULL) + pg_fatal("could not get restore_command using %s: %s\n", + postgres_cmd, strerror(errno)); + + pclose(output_fp); + + /* Remove trailing newline */ + if (strchr(cmd_output, '\n') != NULL) + *strchr(cmd_output, '\n') = '\0'; + + if (!strcmp(cmd_output, "")) + pg_fatal("restore_command is not set on the target cluster\n"); + + restore_command = pg_strdup(cmd_output); + + pg_log(PG_DEBUG, "using config variable restore_command=\'%s\'.\n", restore_command); + } + /* Connect to remote server */ if (connstr_source) libpqConnect(connstr_source); @@ -294,9 +381,8 @@ main(int argc, char **argv) exit(0); } - findLastCheckpoint(datadir_target, divergerec, - lastcommontliIndex, - &chkptrec, &chkpttli, &chkptredo); + findLastCheckpoint(datadir_target, divergerec, lastcommontliIndex, + &chkptrec, &chkpttli, &chkptredo, restore_command); printf(_("rewinding from last common checkpoint at %X/%X on timeline %u\n"), (uint32) (chkptrec >> 32), (uint32) chkptrec, chkpttli); @@ -319,7 +405,7 @@ main(int argc, char **argv) */ pg_log(PG_PROGRESS, "reading WAL in target\n"); extractPageMap(datadir_target, chkptrec, lastcommontliIndex, - ControlFile_target.checkPoint); + ControlFile_target.checkPoint, restore_command); filemap_finalize(); if (showprogress) diff --git a/src/bin/pg_rewind/pg_rewind.h b/src/bin/pg_rewind/pg_rewind.h index 83b2898b8b..08a753475c 100644 --- a/src/bin/pg_rewind/pg_rewind.h +++ b/src/bin/pg_rewind/pg_rewind.h @@ -32,11 +32,10 @@ extern int targetNentries; /* in parsexlog.c */ extern void extractPageMap(const char *datadir, XLogRecPtr startpoint, - int tliIndex, XLogRecPtr endpoint); + int tliIndex, XLogRecPtr endpoint, const char *restoreCommand); extern void findLastCheckpoint(const char *datadir, XLogRecPtr searchptr, - int tliIndex, - XLogRecPtr *lastchkptrec, TimeLineID *lastchkpttli, - XLogRecPtr *lastchkptredo); + int tliIndex, XLogRecPtr *lastchkptrec, TimeLineID *lastchkpttli, + XLogRecPtr *lastchkptredo, const char *restoreCommand); extern XLogRecPtr readOneRecord(const char *datadir, XLogRecPtr ptr, int tliIndex); diff --git a/src/bin/pg_rewind/t/001_basic.pl b/src/bin/pg_rewind/t/001_basic.pl index 115192170e..8a6fa33016 100644 --- a/src/bin/pg_rewind/t/001_basic.pl +++ b/src/bin/pg_rewind/t/001_basic.pl @@ -1,7 +1,7 @@ use strict; use warnings; use TestLib; -use Test::More tests => 10; +use Test::More tests => 20; use FindBin; use lib $FindBin::RealBin; @@ -106,5 +106,7 @@ in master, before promotion # Run the test in both modes run_test('local'); run_test('remote'); +run_test('archive'); +run_test('archive_conf'); exit(0); diff --git a/src/bin/pg_rewind/t/002_databases.pl b/src/bin/pg_rewind/t/002_databases.pl index 0562c21549..f42fb5a068 100644 --- a/src/bin/pg_rewind/t/002_databases.pl +++ b/src/bin/pg_rewind/t/002_databases.pl @@ -1,7 +1,7 @@ use strict; use warnings; use TestLib; -use Test::More tests => 6; +use Test::More tests => 12; use FindBin; use lib $FindBin::RealBin; @@ -70,5 +70,7 @@ template1 # Run the test in both modes. run_test('local'); run_test('remote'); +run_test('archive'); +run_test('archive_conf'); exit(0); diff --git a/src/bin/pg_rewind/t/003_extrafiles.pl b/src/bin/pg_rewind/t/003_extrafiles.pl index c4040bd562..24cec256de 100644 --- a/src/bin/pg_rewind/t/003_extrafiles.pl +++ b/src/bin/pg_rewind/t/003_extrafiles.pl @@ -3,7 +3,7 @@ use strict; use warnings; use TestLib; -use Test::More tests => 4; +use Test::More tests => 8; use File::Find; @@ -90,5 +90,7 @@ sub run_test # Run the test in both modes. run_test('local'); run_test('remote'); +run_test('archive'); +run_test('archive_conf'); exit(0); diff --git a/src/bin/pg_rewind/t/RewindTest.pm b/src/bin/pg_rewind/t/RewindTest.pm index 900d452d8b..3a58376347 100644 --- a/src/bin/pg_rewind/t/RewindTest.pm +++ b/src/bin/pg_rewind/t/RewindTest.pm @@ -39,7 +39,9 @@ use Carp; use Config; use Exporter 'import'; use File::Copy; -use File::Path qw(rmtree); +use File::Glob ':bsd_glob'; +use File::Path qw(remove_tree make_path); +use File::Spec::Functions qw(catdir catfile); use IPC::Run qw(run); use PostgresNode; use TestLib; @@ -199,6 +201,38 @@ sub promote_standby return; } +# Moves WAL files to the temporary location and returns restore_command +# to get them back. +sub move_wals +{ + my $tmp_dir = shift; + my $master_pgdata = shift; + my $wals_archive_dir = catdir($tmp_dir, "master_wals_archive"); + my @wal_files = bsd_glob catfile($master_pgdata, "pg_wal", "0000000*"); + my $restore_command; + + remove_tree($wals_archive_dir); + make_path($wals_archive_dir) or die; + + # Move all old master WAL files to the archive. + # Old master should be stopped at this point. + foreach my $wal_file (@wal_files) + { + move($wal_file, "$wals_archive_dir") or die; + } + + if ($windows_os) + { + $restore_command = "copy $wals_archive_dir\\\%f \%p"; + } + else + { + $restore_command = "cp $wals_archive_dir/\%f \%p"; + } + + return $restore_command; +} + sub run_pg_rewind { my $test_mode = shift; @@ -251,6 +285,54 @@ sub run_pg_rewind ], 'pg_rewind remote'); } + elsif ($test_mode eq "archive") + { + + # Do rewind using a local pgdata as source and + # specified directory with target WALs archive. + my $restore_command = move_wals($tmp_folder, $master_pgdata); + + # Stop the new master and be ready to perform the rewind. + $node_standby->stop; + + command_ok( + [ + 'pg_rewind', + "--debug", + "--source-pgdata=$standby_pgdata", + "--target-pgdata=$master_pgdata", + "--no-sync", + "--restore-command=$restore_command" + ], + 'pg_rewind archive'); + } + elsif ($test_mode eq "archive_conf") + { + + # Do rewind using a local pgdata as source and + # specified directory with target WALs archive. + my $master_conf_path = catfile($master_pgdata, 'postgresql.conf'); + my $restore_command = move_wals($tmp_folder, $master_pgdata); + + # Stop the new master and be ready to perform the rewind. + $node_standby->stop; + + # Add restore_command to postgresql.conf of target cluster. + open(my $conf_fd, ">>", $master_conf_path) or die; + print $conf_fd "\nrestore_command='$restore_command'"; + close $conf_fd; + + command_ok( + [ + 'pg_rewind', + "--debug", + "--source-pgdata=$standby_pgdata", + "--target-pgdata=$master_pgdata", + "--no-sync", + "-r" + ], + 'pg_rewind archive_conf'); + } else { base-commit: c8c885b7a5c8c1175288de1d8aaec3b4ae9050e1 -- 2.17.1