Re: [HACKERS] WAL Restore process during recovery
On Tue, Jan 24, 2012 at 6:43 PM, Fujii Masao masao.fu...@gmail.com wrote: Cleaned up the points noted, new patch attached in case you wish to review further. Still has bug, so still with me to fix. Thanks! Will review further. v3 patch contains lots of unrelated code changes like the following. - structfieldpid/structfield column of the + structfieldprocpid/structfield column of the You seem to have failed to extract the patch from your repository correctly. So I used v2 patch for the review. Sorry if I comment the things which you've already fixed in v3 patch. Here are the comments. They are almost not serious problems. +/* + * GUC parameters + */ +intWalRestoreDelay = 1; You forget to change guc.c to define wal_restore_delay as a GUC parameter? Or just that source code comment is incorrect? + elog(FATAL, recovery_restore_command is too long); Typo: s/recovery_restore_command/restore_command + InitLatch(walrstr-WALRestoreLatch); /* initialize latch used in main loop */ That latch is shared one. OwnLatch() should be called instead of InitLatch()? If yes, it's better to call DisownLatch() when walrestore exits. Though skipping DisownLatch() would be harmless because the latch is never owned by new process after walrestore exits. + { + /* use volatile pointer to prevent code rearrangement */ + volatile WalRestoreData *walrstr = WalRstr; + + nextFileTli = walrstr-nextFileTli; The declaration of walrstr is not required here because it's already done at the head of WalRestoreNextFile(). + if (restoredFromArchive) + { + /* use volatile pointer to prevent code rearrangement */ + volatile WalRestoreData *walrstr = WalRstr; Same as above. +#define TMPRECOVERYXLOGRECOVERYXLOG ISTM that it's better to move this definition to an include file and we should use it in all the places where the fixed value RECOVERYXLOG is still used. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL Restore process during recovery
On Tue, Jan 24, 2012 at 12:23 AM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, Jan 23, 2012 at 12:23 PM, Fujii Masao masao.fu...@gmail.com wrote: Why does walrestore need to be invoked even when restore_command is not specified? It seems to be useless. We invoke walreceiver only when primary_conninfo is specified now. Similarly we should invoke walrestore only when restore_command is specified? walreceiver is shutdown and restarted in case of failed connection. That never happens with walrestore because the command is run each time - when we issue system(3) a new process is forked to run the command. So there is no specific cleanup to perform and so no reason for a managed cleanup process. So I can't see a specific reason to change that. Do you think it makes a difference? Yes. When restore_command is not specified in recovery.conf, walrestore process doesn't do any useful activity and just wastes CPU cycle. Which might be harmless for a functionality of recovery, but ISTM it's better not to start up walrestore in that case to avoid the waste of cycle. Cleaned up the points noted, new patch attached in case you wish to review further. Still has bug, so still with me to fix. Thanks! Will review further. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL Restore process during recovery
On Tue, Jan 24, 2012 at 9:43 AM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jan 24, 2012 at 12:23 AM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, Jan 23, 2012 at 12:23 PM, Fujii Masao masao.fu...@gmail.com wrote: Why does walrestore need to be invoked even when restore_command is not specified? It seems to be useless. We invoke walreceiver only when primary_conninfo is specified now. Similarly we should invoke walrestore only when restore_command is specified? walreceiver is shutdown and restarted in case of failed connection. That never happens with walrestore because the command is run each time - when we issue system(3) a new process is forked to run the command. So there is no specific cleanup to perform and so no reason for a managed cleanup process. So I can't see a specific reason to change that. Do you think it makes a difference? Yes. When restore_command is not specified in recovery.conf, walrestore process doesn't do any useful activity and just wastes CPU cycle. Which might be harmless for a functionality of recovery, but ISTM it's better not to start up walrestore in that case to avoid the waste of cycle. It just sleeps on a latch when it has nothing to do, so no wasted cycles. Asking the postmaster seemed the easier option, I guess I could have chosen the other way also. I'll look at this when this is the last thing left to resolve to see if that improves things. Cleaned up the points noted, new patch attached in case you wish to review further. Still has bug, so still with me to fix. Thanks! Will review further. Much appreciated. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL Restore process during recovery
On Tue, Jan 24, 2012 at 6:49 PM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, Jan 24, 2012 at 9:43 AM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jan 24, 2012 at 12:23 AM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, Jan 23, 2012 at 12:23 PM, Fujii Masao masao.fu...@gmail.com wrote: Why does walrestore need to be invoked even when restore_command is not specified? It seems to be useless. We invoke walreceiver only when primary_conninfo is specified now. Similarly we should invoke walrestore only when restore_command is specified? walreceiver is shutdown and restarted in case of failed connection. That never happens with walrestore because the command is run each time - when we issue system(3) a new process is forked to run the command. So there is no specific cleanup to perform and so no reason for a managed cleanup process. So I can't see a specific reason to change that. Do you think it makes a difference? Yes. When restore_command is not specified in recovery.conf, walrestore process doesn't do any useful activity and just wastes CPU cycle. Which might be harmless for a functionality of recovery, but ISTM it's better not to start up walrestore in that case to avoid the waste of cycle. It just sleeps on a latch when it has nothing to do, so no wasted cycles. Right, since walrestore process wakes up just every 10 seconds, a waste of cycle is low. But what I feel uncomfortable is that walrestore process has nothing to do *from start to end*, when restore_command is not specified, but it's started up. I guess that many people would get surprised at that. Of course, if restore_command can be changed without restarting the server, I agree with you because walrestore process might do an useful activity later. But currently not. Asking the postmaster seemed the easier option, I guess I could have chosen the other way also. I'll look at this when this is the last thing left to resolve to see if that improves things. Okay. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL Restore process during recovery
On Fri, Jan 20, 2012 at 7:50 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Jan 20, 2012 at 7:38 PM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, Jan 20, 2012 at 3:43 AM, Fujii Masao masao.fu...@gmail.com wrote: Requested update Thanks! Will review. In StartChildProcess(), the code which emits an error when fork of walrestore fails is required. In reaper(), the following comment needs to be updated because an unexpected exit (including FATAL) is treated as a crash in the patch. /* * Was it the wal restore? If exit status is zero (normal) or one * (FATAL exit), we assume everything is all right just like normal * backends. */ if (pid == WalRestorePID) Why does walrestore need to be invoked even when restore_command is not specified? It seems to be useless. We invoke walreceiver only when primary_conninfo is specified now. Similarly we should invoke walrestore only when restore_command is specified? When I set up the file-based log-shipping environment using pg_standby, ran pgbench -i -s2, waited for walrestore to restore at least one WAL file, and created the trigger file, then I encounterd the following error in the standby. sby LOG: startup process requests 00010003 from archive trigger file found: smart failover sby LOG: startup process sees last file was 00010003 sby FATAL: could not rename file pg_xlog/RECOVERYXLOG to pg_xlog/00010003: No such file or directory sby LOG: startup process (PID 11079) exited with exit code 1 sby LOG: terminating any other active server processes When I set up streaming replication with setting restore_command, I got the following messages repeatedly. The WAL file name was always . sby1 LOG: walrestore checking for next file to restore sby1 LOG: restore of is already complete, so sleep In PostmasterStateMachine(), the following comment needs to mention WALRestore. * PM_WAIT_READONLY state ends when we have no regular backends that * have been started during recovery. We kill the startup and * walreceiver processes and transition to PM_WAIT_BACKENDS. Ideally, In walrestore.c, the following comments seem to be incorrect. At least an unexpected exit of WALRestore doesn't start a recovery cycle in the patch. * If the WAL restore exits unexpectedly, the postmaster treats that the same * as a backend crash: shared memory may be corrupted, so remaining backends * should be killed by SIGQUIT and then a recovery cycle started. In walrestore.c + * Main entry point for walrestore process + * + * This is invoked from BootstrapMain, which has already created the basic + * execution environment, but not enabled signals yet. BootstrapMain() doesn't exist, and it should be changed to AuxiliaryProcessMain(). This is not a fault of the patch. There are the same typos in bgwriter.c, walwriter.c and checkpointer.c In walrestore.c +* SIGUSR1 is presently unused; keep it spare in case someday we want this +* process to participate in ProcSignal signalling. The above comment is incorrect because SIGUSR1 is presently used. + /* +* From here on, elog(ERROR) should end with exit(1), not send +* control back to the sigsetjmp block above +*/ + ExitOnAnyError = true; The above is not required because sigsetjmp is not used in walrestore.c + /* Normal exit from the walwriter is here */ + proc_exit(0); /* done */ Typo: s/walwriter/walrestore I've not reviewed the patch enough yet. Will review the patch tomorrow again. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL Restore process during recovery
On Mon, Jan 23, 2012 at 12:23 PM, Fujii Masao masao.fu...@gmail.com wrote: I've not reviewed the patch enough yet. Will review the patch tomorrow again. Thanks very much. I'm sure that's enough to keep me busy a few days. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL Restore process during recovery
On Fri, Jan 20, 2012 at 3:43 AM, Fujii Masao masao.fu...@gmail.com wrote: Requested update -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index ce659ec..469e6d6 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -40,6 +40,7 @@ #include pgstat.h #include postmaster/bgwriter.h #include postmaster/startup.h +#include postmaster/walrestore.h #include replication/walreceiver.h #include replication/walsender.h #include storage/bufmgr.h @@ -187,7 +188,6 @@ static bool InArchiveRecovery = false; static bool restoredFromArchive = false; /* options taken from recovery.conf for archive recovery */ -static char *recoveryRestoreCommand = NULL; static char *recoveryEndCommand = NULL; static char *archiveCleanupCommand = NULL; static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET; @@ -575,8 +575,8 @@ bool reachedConsistency = false; static bool InRedo = false; -/* Have we launched bgwriter during recovery? */ -static bool bgwriterLaunched = false; +/* Have we launched background procs during archive recovery yet? */ +static bool ArchRecoveryBgProcsActive = false; /* * Information logged when we detect a change in one of the parameters @@ -632,8 +632,6 @@ static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt, bool randAccess); static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr); static void XLogFileClose(void); -static bool RestoreArchivedFile(char *path, const char *xlogfname, - const char *recovername, off_t expectedSize); static void ExecuteRecoveryCommand(char *command, char *commandName, bool failOnerror); static void PreallocXlogFiles(XLogRecPtr endptr); @@ -2706,19 +2704,47 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli, XLogFileName(xlogfname, tli, log, seg); +#define TMPRECOVERYXLOG RECOVERYXLOG + switch (source) { case XLOG_FROM_ARCHIVE: + /* + * Check to see if the WALRestore process has already put the + * next file in place while we were working. If so, use that. + * If not, get it ourselves. This makes it easier to handle + * initial state before the WALRestore is active, and also + * handles the stop/start logic correctly when we have both + * streaming and file based replication active. + * + * We queue up the next task for WALRestore after we've begun to + * use this file later in XLogFileRead(). + * + * If the WALRestore process is still active, the lock wait makes + * us wait, which is just like we were executing the command + * ourselves and so doesn't alter the logic elsewhere. + */ + if (XLogFileIsNowFullyRestored(tli, log, seg)) + { +snprintf(path, MAXPGPATH, XLOGDIR /%s, TMPRECOVERYXLOG); +restoredFromArchive = true; +break; + } + /* Report recovery progress in PS display */ snprintf(activitymsg, sizeof(activitymsg), waiting for %s, xlogfname); set_ps_display(activitymsg, false); restoredFromArchive = RestoreArchivedFile(path, xlogfname, - RECOVERYXLOG, + TMPRECOVERYXLOG, XLogSegSize); + if (!restoredFromArchive) + { +LWLockRelease(WALRestoreCommandLock); return -1; + } break; case XLOG_FROM_PG_XLOG: @@ -2748,18 +2774,42 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli, if (stat(xlogfpath, statbuf) == 0) { if (unlink(xlogfpath) != 0) + { +LWLockRelease(WALRestoreCommandLock); ereport(FATAL, (errcode_for_file_access(), errmsg(could not remove file \%s\: %m, xlogfpath))); + } reload = true; } if (rename(path, xlogfpath) 0) + { + LWLockRelease(WALRestoreCommandLock); ereport(ERROR, (errcode_for_file_access(), errmsg(could not rename file \%s\ to \%s\: %m, path, xlogfpath))); + } + + /* + * Make sure we recover from the new filename, so we can reuse the + * temporary filename for asynchronous restore actions. + */ + strcpy(path, xlogfpath); + + /* + * Tell the WALRestore process to get the next file now. + * Hopefully it will be ready for use in time for the next call the + * Startup process makes to XLogFileRead(). + * + * It might seem like we should do that earlier but then there is a + * race condition that might lead to replacing RECOVERYXLOG with + * another file before we've copied it. + */ + SetNextWALRestoreLogSeg(tli, log, seg); + LWLockRelease(WALRestoreCommandLock); /* * If the existing segment was replaced, since walsenders might have @@ -2911,8 +2961,11 @@ XLogFileClose(void) * For fixed-size files, the caller may pass the expected size as an * additional crosscheck on successful recovery. If the file size is not * known, set expectedSize = 0. + * + * Must be called with
Re: [HACKERS] WAL Restore process during recovery
On Fri, Jan 20, 2012 at 7:38 PM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, Jan 20, 2012 at 3:43 AM, Fujii Masao masao.fu...@gmail.com wrote: Requested update Thanks! Will review. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL Restore process during recovery
On Tue, Jan 17, 2012 at 6:52 AM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Jan 16, 2012 at 2:06 AM, Simon Riggs si...@2ndquadrant.com wrote: WALRestore process asynchronously executes restore_command while recovery continues working. Overlaps downloading of next WAL file to reduce time delays in file based archive recovery. Handles cases of file-only and streaming/file correctly. Though I've not reviewed the patch deeply yet, I observed the following two problems when I tested the patch. When I set up streaming replication + archive (i.e., restore_command is set) and started the standby, I got the following error: FATAL: all AuxiliaryProcs are in use LOG: walrestore process (PID 18839) exited with exit code 1 Fixed and better documented. When I started an archive recovery without setting restore_command, it successfully finished. Not sure exactly what you mean, but I fixed a bug that might be something you're seeing. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index ce659ec..469e6d6 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -40,6 +40,7 @@ #include pgstat.h #include postmaster/bgwriter.h #include postmaster/startup.h +#include postmaster/walrestore.h #include replication/walreceiver.h #include replication/walsender.h #include storage/bufmgr.h @@ -187,7 +188,6 @@ static bool InArchiveRecovery = false; static bool restoredFromArchive = false; /* options taken from recovery.conf for archive recovery */ -static char *recoveryRestoreCommand = NULL; static char *recoveryEndCommand = NULL; static char *archiveCleanupCommand = NULL; static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET; @@ -575,8 +575,8 @@ bool reachedConsistency = false; static bool InRedo = false; -/* Have we launched bgwriter during recovery? */ -static bool bgwriterLaunched = false; +/* Have we launched background procs during archive recovery yet? */ +static bool ArchRecoveryBgProcsActive = false; /* * Information logged when we detect a change in one of the parameters @@ -632,8 +632,6 @@ static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt, bool randAccess); static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr); static void XLogFileClose(void); -static bool RestoreArchivedFile(char *path, const char *xlogfname, - const char *recovername, off_t expectedSize); static void ExecuteRecoveryCommand(char *command, char *commandName, bool failOnerror); static void PreallocXlogFiles(XLogRecPtr endptr); @@ -2706,19 +2704,47 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli, XLogFileName(xlogfname, tli, log, seg); +#define TMPRECOVERYXLOG RECOVERYXLOG + switch (source) { case XLOG_FROM_ARCHIVE: + /* + * Check to see if the WALRestore process has already put the + * next file in place while we were working. If so, use that. + * If not, get it ourselves. This makes it easier to handle + * initial state before the WALRestore is active, and also + * handles the stop/start logic correctly when we have both + * streaming and file based replication active. + * + * We queue up the next task for WALRestore after we've begun to + * use this file later in XLogFileRead(). + * + * If the WALRestore process is still active, the lock wait makes + * us wait, which is just like we were executing the command + * ourselves and so doesn't alter the logic elsewhere. + */ + if (XLogFileIsNowFullyRestored(tli, log, seg)) + { +snprintf(path, MAXPGPATH, XLOGDIR /%s, TMPRECOVERYXLOG); +restoredFromArchive = true; +break; + } + /* Report recovery progress in PS display */ snprintf(activitymsg, sizeof(activitymsg), waiting for %s, xlogfname); set_ps_display(activitymsg, false); restoredFromArchive = RestoreArchivedFile(path, xlogfname, - RECOVERYXLOG, + TMPRECOVERYXLOG, XLogSegSize); + if (!restoredFromArchive) + { +LWLockRelease(WALRestoreCommandLock); return -1; + } break; case XLOG_FROM_PG_XLOG: @@ -2748,18 +2774,42 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli, if (stat(xlogfpath, statbuf) == 0) { if (unlink(xlogfpath) != 0) + { +LWLockRelease(WALRestoreCommandLock); ereport(FATAL, (errcode_for_file_access(), errmsg(could not remove file \%s\: %m, xlogfpath))); + } reload = true; } if (rename(path, xlogfpath) 0) + { + LWLockRelease(WALRestoreCommandLock); ereport(ERROR, (errcode_for_file_access(), errmsg(could not rename file \%s\ to \%s\: %m, path, xlogfpath))); + } + + /* + * Make sure we recover from the new filename, so we can reuse the + * temporary
Re: [HACKERS] WAL Restore process during recovery
On Fri, Jan 20, 2012 at 4:17 AM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, Jan 17, 2012 at 6:52 AM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Jan 16, 2012 at 2:06 AM, Simon Riggs si...@2ndquadrant.com wrote: WALRestore process asynchronously executes restore_command while recovery continues working. Overlaps downloading of next WAL file to reduce time delays in file based archive recovery. Handles cases of file-only and streaming/file correctly. Though I've not reviewed the patch deeply yet, I observed the following two problems when I tested the patch. When I set up streaming replication + archive (i.e., restore_command is set) and started the standby, I got the following error: FATAL: all AuxiliaryProcs are in use LOG: walrestore process (PID 18839) exited with exit code 1 Fixed and better documented. When I started an archive recovery without setting restore_command, it successfully finished. Not sure exactly what you mean, but I fixed a bug that might be something you're seeing. Thanks! But you forgot to include walrestore.c and .h in the patch. Can you submit the updated version of the patch? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL Restore process during recovery
On Tue, Jan 17, 2012 at 6:52 AM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Jan 16, 2012 at 2:06 AM, Simon Riggs si...@2ndquadrant.com wrote: WALRestore process asynchronously executes restore_command while recovery continues working. Overlaps downloading of next WAL file to reduce time delays in file based archive recovery. Handles cases of file-only and streaming/file correctly. Though I've not reviewed the patch deeply yet, I observed the following two problems when I tested the patch. When I set up streaming replication + archive (i.e., restore_command is set) and started the standby, I got the following error: FATAL: all AuxiliaryProcs are in use LOG: walrestore process (PID 18839) exited with exit code 1 When I started an archive recovery without setting restore_command, it successfully finished. Oh, I did have NUM_AUXILIARY_PROCS increased at one point, but I realised it wasn't needed and removed it. Will change that. Thanks. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL Restore process during recovery
On Mon, Jan 16, 2012 at 2:06 AM, Simon Riggs si...@2ndquadrant.com wrote: WALRestore process asynchronously executes restore_command while recovery continues working. Overlaps downloading of next WAL file to reduce time delays in file based archive recovery. Handles cases of file-only and streaming/file correctly. Though I've not reviewed the patch deeply yet, I observed the following two problems when I tested the patch. When I set up streaming replication + archive (i.e., restore_command is set) and started the standby, I got the following error: FATAL: all AuxiliaryProcs are in use LOG: walrestore process (PID 18839) exited with exit code 1 When I started an archive recovery without setting restore_command, it successfully finished. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WAL Restore process during recovery
WALRestore process asynchronously executes restore_command while recovery continues working. Overlaps downloading of next WAL file to reduce time delays in file based archive recovery. Handles cases of file-only and streaming/file correctly. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index ce659ec..e8b0b69 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -40,6 +40,7 @@ #include pgstat.h #include postmaster/bgwriter.h #include postmaster/startup.h +#include postmaster/walrestore.h #include replication/walreceiver.h #include replication/walsender.h #include storage/bufmgr.h @@ -187,7 +188,6 @@ static bool InArchiveRecovery = false; static bool restoredFromArchive = false; /* options taken from recovery.conf for archive recovery */ -static char *recoveryRestoreCommand = NULL; static char *recoveryEndCommand = NULL; static char *archiveCleanupCommand = NULL; static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET; @@ -575,8 +575,8 @@ bool reachedConsistency = false; static bool InRedo = false; -/* Have we launched bgwriter during recovery? */ -static bool bgwriterLaunched = false; +/* Have we launched background procs during archive recovery yet? */ +static bool ArchRecoveryBgProcsActive = false; /* * Information logged when we detect a change in one of the parameters @@ -632,8 +632,6 @@ static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt, bool randAccess); static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr); static void XLogFileClose(void); -static bool RestoreArchivedFile(char *path, const char *xlogfname, - const char *recovername, off_t expectedSize); static void ExecuteRecoveryCommand(char *command, char *commandName, bool failOnerror); static void PreallocXlogFiles(XLogRecPtr endptr); @@ -2706,19 +2704,47 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli, XLogFileName(xlogfname, tli, log, seg); +#define TMPRECOVERYXLOG RECOVERYXLOG + switch (source) { case XLOG_FROM_ARCHIVE: + /* + * Check to see if the WALRestore process has already put the + * next file in place while we were working. If so, use that. + * If not, get it ourselves. This makes it easier to handle + * initial state before the WALRestore is active, and also + * handles the stop/start logic correctly when we have both + * streaming and file based replication active. + * + * We queue up the next task for WALRestore after we've begun to + * use this file later in XLogFileRead(). + * + * If the WALRestore process is still active, the lock wait makes + * us wait, which is just like we were executing the command + * ourselves and so doesn't alter the logic elsewhere. + */ + if (XLogFileIsNowFullyRestored(tli, log, seg)) + { +snprintf(path, MAXPGPATH, XLOGDIR /%s, TMPRECOVERYXLOG); +restoredFromArchive = true; +break; + } + /* Report recovery progress in PS display */ snprintf(activitymsg, sizeof(activitymsg), waiting for %s, xlogfname); set_ps_display(activitymsg, false); restoredFromArchive = RestoreArchivedFile(path, xlogfname, - RECOVERYXLOG, + TMPRECOVERYXLOG, XLogSegSize); + if (!restoredFromArchive) + { +LWLockRelease(WALRestoreCommandLock); return -1; + } break; case XLOG_FROM_PG_XLOG: @@ -2748,18 +2774,42 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli, if (stat(xlogfpath, statbuf) == 0) { if (unlink(xlogfpath) != 0) + { +LWLockRelease(WALRestoreCommandLock); ereport(FATAL, (errcode_for_file_access(), errmsg(could not remove file \%s\: %m, xlogfpath))); + } reload = true; } if (rename(path, xlogfpath) 0) + { + LWLockRelease(WALRestoreCommandLock); ereport(ERROR, (errcode_for_file_access(), errmsg(could not rename file \%s\ to \%s\: %m, path, xlogfpath))); + } + + /* + * Make sure we recover from the new filename, so we can reuse the + * temporary filename for asynchronous restore actions. + */ + strcpy(path, xlogfpath); + + /* + * Tell the WALRestore process to get the next file now. + * Hopefully it will be ready for use in time for the next call the + * Startup process makes to XLogFileRead(). + * + * It might seem like we should do that earlier but then there is a + * race condition that might lead to replacing RECOVERYXLOG with + * another file before we've copied it. + */ + SetNextWALRestoreLogSeg(tli, log, seg); + LWLockRelease(WALRestoreCommandLock); /* * If the existing segment was replaced, since walsenders might have @@ -2911,8 +2961,11 @@ XLogFileClose(void) * For fixed-size files, the caller may pass the expected