Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

Michael Paquier Mon, 05 Jan 2015 00:18:08 -0800

On Wed, Dec 31, 2014 at 12:22 AM, Alexey Vasiliev <[email protected]> wrote:
> Tue, 30 Dec 2014 21:39:33 +0900 от Michael Paquier 
> <[email protected]>:
> As I understand now = (pg_time_t) time(NULL); return time in seconds, what is 
> why I multiply value to 1000 to compare with restore_command_retry_interval 
> in milliseconds.
This way of doing is incorrect, you would get a wait time of 1s even
for retries lower than 1s. In order to get the feature working
correctly and to get a comparison granularity sufficient, you need to
use TimetampTz for the tracking of the current and last failure time
and to use the APIs from TimestampTz for difference calculations.


> I am not sure about small retry interval of time, in my cases I need interval 
> bigger 5 seconds (20-40 seconds). Right now I limiting this value be bigger 
> 100 milliseconds.
Other people may disagree here, but I don't see any reason to put
restrictions to this interval of time.

Attached is a patch fixing the feature to use TimestampTz, updating as
well the docs and recovery.conf.sample which was incorrect, on top of
other things I noticed here and there.

Alexey, does this new patch look fine for you?
-- 
Michael

From 9bb71e39b218885466a53c005a87361d4ae889fa Mon Sep 17 00:00:00 2001
From: Michael Paquier <[email protected]>
Date: Mon, 5 Jan 2015 17:10:13 +0900
Subject: [PATCH] Add restore_command_retry_interval to control retries of
 restore_command

restore_command_retry_interval can be specified in recovery.conf to control
the interval of time between retries of restore_command when failures occur.
---
 doc/src/sgml/recovery-config.sgml               | 21 +++++++++++++
 src/backend/access/transam/recovery.conf.sample |  9 ++++++
 src/backend/access/transam/xlog.c               | 42 ++++++++++++++++++++-----
 3 files changed, 64 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 0c64ff2..0488ae3 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -145,6 +145,27 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
       </listitem>
      </varlistentry>
 
+     <varlistentry id="restore-command-retry-interval" xreflabel="restore_command_retry_interval">
+      <term><varname>restore_command_retry_interval</varname> (<type>integer</type>)
+      <indexterm>
+        <primary><varname>restore_command_retry_interval</> recovery parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        If <varname>restore_command</> returns nonzero exit status code, retry
+        command after the interval of time specified by this parameter,
+        measured in milliseconds if no unit is specified. Default value is
+        <literal>5s</>.
+       </para>
+       <para>
+        This command is particularly useful for warm standby configurations
+        to leverage the amount of retries done to restore a WAL segment file
+        depending on the activity of a master node.
+       </para>
+      </listitem>
+     </varlistentry>
+
     </variablelist>
 
   </sect1>
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index b777400..8699a33 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -58,6 +58,15 @@
 #
 #recovery_end_command = ''
 #
+#
+# restore_command_retry_interval
+#
+# specifies an optional retry interval for restore_command command, if
+# previous command returned nonzero exit status code. This can be useful
+# to increase or decrease the number of calls of restore_command.
+#
+#restore_command_retry_interval = '5s'
+#
 #---------------------------------------------------------------------------
 # RECOVERY TARGET PARAMETERS
 #---------------------------------------------------------------------------
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5cc7e47..1944e6f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -235,6 +235,7 @@ static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
 static int	recovery_min_apply_delay = 0;
 static TimestampTz recoveryDelayUntilTime;
+static int	restore_command_retry_interval = 5000;
 
 /* options taken from recovery.conf for XLOG streaming */
 static bool StandbyModeRequested = false;
@@ -4881,6 +4882,26 @@ readRecoveryCommandFile(void)
 					(errmsg_internal("trigger_file = '%s'",
 									 TriggerFile)));
 		}
+		else if (strcmp(item->name, "restore_command_retry_interval") == 0)
+		{
+			const char *hintmsg;
+
+			if (!parse_int(item->value, &restore_command_retry_interval, GUC_UNIT_MS,
+						   &hintmsg))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("parameter \"%s\" requires a temporal value",
+								"restore_command_retry_interval"),
+						 hintmsg ? errhint("%s", _(hintmsg)) : 0));
+			ereport(DEBUG2,
+					(errmsg_internal("restore_command_retry_interval = '%s'", item->value)));
+
+			if (restore_command_retry_interval <= 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("\"%s\" must have a value strictly positive",
+								"restore_command_retry_interval")));
+		}
 		else if (strcmp(item->name, "recovery_min_apply_delay") == 0)
 		{
 			const char *hintmsg;
@@ -10345,8 +10366,8 @@ static bool
 WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 							bool fetching_ckpt, XLogRecPtr tliRecPtr)
 {
-	static pg_time_t last_fail_time = 0;
-	pg_time_t	now;
+	TimestampTz now = GetCurrentTimestamp();
+	TimestampTz	last_fail_time = now;
 
 	/*-------
 	 * Standby mode is implemented by a state machine:
@@ -10495,14 +10516,19 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					 * machine, so we've exhausted all the options for
 					 * obtaining the requested WAL. We're going to loop back
 					 * and retry from the archive, but if it hasn't been long
-					 * since last attempt, sleep 5 seconds to avoid
-					 * busy-waiting.
+					 * since last attempt, sleep the amount of time specified
+					 * by restore_command_retry_interval to avoid busy-waiting.
 					 */
-					now = (pg_time_t) time(NULL);
-					if ((now - last_fail_time) < 5)
+					now = GetCurrentTimestamp();
+					if (!TimestampDifferenceExceeds(last_fail_time, now,
+													restore_command_retry_interval))
 					{
-						pg_usleep(1000000L * (5 - (now - last_fail_time)));
-						now = (pg_time_t) time(NULL);
+						long		secs;
+						int			microsecs;
+
+						TimestampDifference(last_fail_time, now, &secs, &microsecs);
+						pg_usleep(restore_command_retry_interval * 1000L - (1000000L * secs + 1L * microsecs));
+						now = GetCurrentTimestamp();
 					}
 					last_fail_time = now;
 					currentSource = XLOG_FROM_ARCHIVE;
-- 
2.2.1

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

Reply via email to