Re: Switching XLog source from archive to streaming when primary available

Bharath Rupireddy Tue, 05 Mar 2024 10:09:06 -0800

On Tue, Mar 5, 2024 at 7:34 AM Nathan Bossart <[email protected]> wrote:
>
> cfbot claims that this one needs another rebase.


Yeah, the conflict was with the new TAP test file name in
src/test/recovery/meson.build.

> I've spent some time thinking about this one.  I'll admit I'm a bit worried
> about adding more complexity to this state machine, but I also haven't
> thought of any other viable approaches,

Right. I understand that the WaitForWALToBecomeAvailable()'s state
machine is a complex piece.

> and this still seems like a useful
> feature.  So, for now, I think we should continue with the current
> approach.

Yes, the feature is useful like mentioned in the docs as below:

+        Reading WAL from archive may not always be as efficient and fast as
+        reading from primary. This can be due to the differences in disk types,
+        IO costs, network latencies etc. All of these can impact the recovery
+        performance on standby, and can increase the replication lag on
+        primary. In addition, the primary keeps accumulating WAL needed for the
+        standby while the standby reads WAL from archive, because the standby
+        replication slot stays inactive. To avoid these problems, one can use
+        this parameter to make standby switch to stream mode sooner.

> +        fails to switch to stream mode, it falls back to archive mode. If 
> this
> +        parameter value is specified without units, it is taken as
> +        milliseconds. Default is <literal>5min</literal>. With a lower value
>
> Does this really need to be milliseconds?  I would think that any
> reasonable setting would at least on the order of seconds.

Agreed. Done that way.

> +        attempts. To avoid this, it is recommended to set a reasonable value.
>
> I think we might want to suggest what a "reasonable value" is.

It really depends on the WAL generation rate on the primary. If the
WAL files grow faster, the disk runs out of space sooner, so setting a
 value to make frequent WAL source switch attempts can help. It's hard
to suggest a one-size-fits-all value. Therefore, I've tweaked the docs
a bit to reflect the fact  that it depends on the WAL generation rate.

> +       static bool canSwitchSource = false;
> +       bool            switchSource = false;
>
> IIUC "canSwitchSource" indicates that we are trying to force a switch to
> streaming, but we are currently exhausting anything that's present in the
> pg_wal directory,

Right.

> while "switchSource" indicates that we should force a
> switch to streaming right now.

It's not indicating force switch, it says "previously I was asked to
switch source via canSwitchSource, now that I've exhausted all the WAL
from the pg_wal directory, I'll make a source switch attempt now".

> Furthermore, "canSwitchSource" is static
> while "switchSource" is not.

This is because the WaitForWALToBecomeAvailable() has to remember the
decision (that streaming_replication_retry_interval has occurred)
across the calls. And, switchSource is decided within
WaitForWALToBecomeAvailable() for every function call.

> Is there any way to simplify this?  For
> example, would it be possible to make an enum that tracks the
> streaming_replication_retry_interval state?

I guess the way it is right now looks simple IMHO. If the suggestion
is to have an enum like below; it looks overkill for just two states.

typedef enum
{
    CAN_SWITCH_SOURCE,
    SWITCH_SOURCE
} XLogSourceSwitchState;

>                         /*
>                          * Don't allow any retry loops to occur during 
> nonblocking
> -                        * readahead.  Let the caller process everything that 
> has been
> -                        * decoded already first.
> +                        * readahead if we failed to read from the current 
> source. Let the
> +                        * caller process everything that has been decoded 
> already first.
>                          */
> -                       if (nonblocking)
> +                       if (nonblocking && lastSourceFailed)
>                                 return XLREAD_WOULDBLOCK;
>
> Why do we skip this when "switchSource" is set?

It was leftover from the initial version of the patch - I was then
encountering some issue and had that piece there. Removed it now.

> +                       /* Reset the WAL source switch state */
> +                       if (switchSource)
> +                       {
> +                               Assert(canSwitchSource);
> +                               Assert(currentSource == XLOG_FROM_STREAM);
> +                               Assert(oldSource == XLOG_FROM_ARCHIVE);
> +                               switchSource = false;
> +                               canSwitchSource = false;
> +                       }
>
> How do we know that oldSource is guaranteed to be XLOG_FROM_ARCHIVE?  Is
> there no way it could be XLOG_FROM_PG_WAL?

No. switchSource is set to true only when canSwitchSource is set to
true, which happens only when currentSource is XLOG_FROM_ARCHIVE (see
SwitchWALSourceToPrimary()).

> +#streaming_replication_retry_interval = 5min   # time after which standby
> +                                       # attempts to switch WAL source from 
> archive to
> +                                       # streaming replication
> +                                       # in milliseconds; 0 disables
>
> I think we might want to turn this feature off by default, at least for the
> first release.

Agreed. Done that way.

Please see the attached v21 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

From 82eb49593a563295a7370aa1d87db94c8aa313db Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 5 Mar 2024 17:37:50 +0000
Subject: [PATCH v21] Allow standby to switch WAL source from archive to
 streaming

A standby typically switches to streaming replication (get WAL
from primary), only when receive from WAL archive finishes (no
more WAL left there) or fails for any reason. Reading WAL from
archive may not always be as efficient and fast as reading from
primary. This can be due to the differences in disk types, IO
costs, network latencies etc.. All of these can impact the
recovery performance on standby and increase the replication lag
on primary. In addition, the primary keeps accumulating WAL needed
for the standby while the standby reads WAL from archive because
the standby replication slot stays inactive. To avoid these
problems, one can use this parameter to make standby switch to
stream mode sooner.

This feature adds a new GUC that specifies amount of time after
which standby attempts to switch WAL source from WAL archive to
streaming replication (getting WAL from primary). However, standby
exhausts all the WAL present in pg_wal before switching. If
standby fails to switch to stream mode, it falls back to archive
mode.

Author: Bharath Rupireddy
Reviewed-by: Cary Huang, Nathan Bossart
Reviewed-by: Kyotaro Horiguchi, SATYANARAYANA NARLAPURAM
Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com
---
 doc/src/sgml/config.sgml                      |  49 ++++++++
 doc/src/sgml/high-availability.sgml           |  15 ++-
 src/backend/access/transam/xlogrecovery.c     | 107 +++++++++++++++--
 src/backend/utils/misc/guc_tables.c           |  12 ++
 src/backend/utils/misc/postgresql.conf.sample |   3 +
 src/include/access/xlogrecovery.h             |   1 +
 src/test/recovery/meson.build                 |   1 +
 src/test/recovery/t/042_wal_source_switch.pl  | 113 ++++++++++++++++++
 8 files changed, 286 insertions(+), 15 deletions(-)
 create mode 100644 src/test/recovery/t/042_wal_source_switch.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b38cbd714a..02e79f32fb 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5011,6 +5011,55 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-streaming-replication-retry-interval" xreflabel="streaming_replication_retry_interval">
+      <term><varname>streaming_replication_retry_interval</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>streaming_replication_retry_interval</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies amount of time after which standby attempts to switch WAL
+        source from archive to streaming replication (i.e., getting WAL from
+        primary). However, the standby exhausts all the WAL present in
+        <filename>pg_wal</filename> directory before switching. If the standby
+        fails to switch to stream mode, it falls back to archive mode. If this
+        parameter value is specified without units, it is taken as seconds.
+        With a lower value for this parameter, the standby makes frequent WAL
+        source switch attempts. To avoid this, it is recommended to set a
+        value depending on the rate of WAL generation on the primary. If the
+        WAL files grow faster, the disk runs out of space sooner, so setting a
+        value to make frequent WAL source switch attempts can help. The default
+        is zero, disabling this feature. When disabled, the standby typically
+        switches to stream mode only after receiving WAL from archive finishes
+        (i.e., no more WAL left there) or fails for any reason. This parameter
+        can only be set in the <filename>postgresql.conf</filename> file or on
+        the server command line.
+       </para>
+       <note>
+        <para>
+         Standby may not always attempt to switch source from WAL archive to
+         streaming replication at exact
+         <varname>streaming_replication_retry_interval</varname> intervals. For
+         example, if the parameter is set to <literal>1min</literal> and
+         fetching WAL file from archive takes about <literal>2min</literal>,
+         then the source switch attempt happens for the next WAL file after
+         the current WAL file fetched from archive is fully applied.
+        </para>
+       </note>
+       <para>
+        Reading WAL from archive may not always be as efficient and fast as
+        reading from primary. This can be due to the differences in disk types,
+        IO costs, network latencies etc. All of these can impact the recovery
+        performance on standby, and can increase the replication lag on
+        primary. In addition, the primary keeps accumulating WAL needed for the
+        standby while the standby reads WAL from archive, because the standby
+        replication slot stays inactive. To avoid these problems, one can use
+        this parameter to make standby switch to stream mode sooner.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-recovery-min-apply-delay" xreflabel="recovery_min_apply_delay">
       <term><varname>recovery_min_apply_delay</varname> (<type>integer</type>)
       <indexterm>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 236c0af65f..ab2e4293bf 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -628,12 +628,15 @@ protocol to make nodes agree on a serializable transactional order.
     In standby mode, the server continuously applies WAL received from the
     primary server. The standby server can read WAL from a WAL archive
     (see <xref linkend="guc-restore-command"/>) or directly from the primary
-    over a TCP connection (streaming replication). The standby server will
-    also attempt to restore any WAL found in the standby cluster's
-    <filename>pg_wal</filename> directory. That typically happens after a server
-    restart, when the standby replays again WAL that was streamed from the
-    primary before the restart, but you can also manually copy files to
-    <filename>pg_wal</filename> at any time to have them replayed.
+    over a TCP connection (streaming replication) or attempt to switch to
+    streaming replication after reading from archive when
+    <xref linkend="guc-streaming-replication-retry-interval"/> parameter is
+    set. The standby server will also attempt to restore any WAL found in the
+    standby cluster's <filename>pg_wal</filename> directory. That typically
+    happens after a server restart, when the standby replays again WAL that was
+    streamed from the primary before the restart, but you can also manually
+    copy files to <filename>pg_wal</filename> at any time to have them
+    replayed.
    </para>
 
    <para>
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 853b540945..ca73234695 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -91,6 +91,7 @@ TimestampTz recoveryTargetTime;
 const char *recoveryTargetName;
 XLogRecPtr	recoveryTargetLSN;
 int			recovery_min_apply_delay = 0;
+int			streaming_replication_retry_interval = 0;
 
 /* options formerly taken from recovery.conf for XLOG streaming */
 char	   *PrimaryConnInfo = NULL;
@@ -297,6 +298,8 @@ bool		reachedConsistency = false;
 static char *replay_image_masked = NULL;
 static char *primary_image_masked = NULL;
 
+/* Holds the timestamp at which standby switched WAL source to archive */
+static TimestampTz switched_to_archive_at = 0;
 
 /*
  * Shared-memory state for WAL recovery.
@@ -440,6 +443,8 @@ static bool HotStandbyActiveInReplay(void);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void SetLatestXTime(TimestampTz xtime);
 
+static bool SwitchWALSourceToPrimary(void);
+
 /*
  * Initialization of shared memory for WAL recovery
  */
@@ -3541,8 +3546,11 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 							bool nonblocking)
 {
 	static TimestampTz last_fail_time = 0;
+	static bool canSwitchSource = false;
+	bool		switchSource = false;
 	TimestampTz now;
 	bool		streaming_reply_sent = false;
+	XLogSource	readFrom;
 
 	/*-------
 	 * Standby mode is implemented by a state machine:
@@ -3562,6 +3570,12 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	 * those actions are taken when reading from the previous source fails, as
 	 * part of advancing to the next state.
 	 *
+	 * Try reading WAL from primary after being in XLOG_FROM_ARCHIVE state for
+	 * at least streaming_replication_retry_interval seconds. However,
+	 * exhaust all the WAL present in pg_wal before switching. If successful,
+	 * the state machine moves to XLOG_FROM_STREAM state, otherwise it falls
+	 * back to XLOG_FROM_ARCHIVE state.
+	 *
 	 * If standby mode is turned off while reading WAL from stream, we move
 	 * to XLOG_FROM_ARCHIVE and reset lastSourceFailed, to force fetching
 	 * the files (which would be required at end of recovery, e.g., timeline
@@ -3585,12 +3599,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		bool		startWalReceiver = false;
 
 		/*
-		 * First check if we failed to read from the current source, and
+		 * First check if we failed to read from the current source or we
+		 * intentionally want to switch the source from archive to stream, and
 		 * advance the state machine if so. The failure to read might've
 		 * happened outside this function, e.g when a CRC check fails on a
 		 * record, or within this loop.
 		 */
-		if (lastSourceFailed)
+		if (lastSourceFailed || switchSource)
 		{
 			/*
 			 * Don't allow any retry loops to occur during nonblocking
@@ -3729,9 +3744,27 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		}
 
 		if (currentSource != oldSource)
-			elog(DEBUG2, "switched WAL source from %s to %s after %s",
-				 xlogSourceNames[oldSource], xlogSourceNames[currentSource],
-				 lastSourceFailed ? "failure" : "success");
+		{
+			/* Save the timestamp at which we are switching to archive */
+			if (currentSource == XLOG_FROM_ARCHIVE)
+				switched_to_archive_at = GetCurrentTimestamp();
+
+			ereport(DEBUG1,
+					errmsg_internal("switched WAL source from %s to %s after %s",
+									xlogSourceNames[oldSource],
+									xlogSourceNames[currentSource],
+									(switchSource ? "timeout" : (lastSourceFailed ? "failure" : "success"))));
+
+			/* Reset the WAL source switch state */
+			if (switchSource)
+			{
+				Assert(canSwitchSource);
+				Assert(currentSource == XLOG_FROM_STREAM);
+				Assert(oldSource == XLOG_FROM_ARCHIVE);
+				switchSource = false;
+				canSwitchSource = false;
+			}
+		}
 
 		/*
 		 * We've now handled possible failure. Try to read from the chosen
@@ -3760,13 +3793,23 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 				if (randAccess)
 					curFileTLI = 0;
 
+				/* See if we can switch WAL source to streaming */
+				if (!canSwitchSource)
+					canSwitchSource = SwitchWALSourceToPrimary();
+
 				/*
 				 * Try to restore the file from archive, or read an existing
-				 * file from pg_wal.
+				 * file from pg_wal. However, before switching WAL source to
+				 * streaming, give it a chance to read all the WAL from
+				 * pg_wal.
 				 */
-				readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,
-											  currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY :
-											  currentSource);
+				if (canSwitchSource)
+					readFrom = XLOG_FROM_PG_WAL;
+				else
+					readFrom = currentSource == XLOG_FROM_ARCHIVE ?
+						XLOG_FROM_ANY : currentSource;
+
+				readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2, readFrom);
 				if (readFile >= 0)
 					return XLREAD_SUCCESS;	/* success! */
 
@@ -3774,6 +3817,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 				 * Nope, not found in archive or pg_wal.
 				 */
 				lastSourceFailed = true;
+
+				/*
+				 * Read all the WAL in pg_wal. Now ready to switch to
+				 * streaming.
+				 */
+				if (canSwitchSource)
+					switchSource = true;
+
 				break;
 
 			case XLOG_FROM_STREAM:
@@ -4004,6 +4055,44 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	return XLREAD_FAIL;			/* not reached */
 }
 
+/*
+ * Check if standby can make an attempt to read WAL from primary after reading
+ * from archive for at least a configurable duration.
+ *
+ * Reading WAL from archive may not always be as efficient and fast as reading
+ * from primary. This can be due to the differences in disk types, IO costs,
+ * network latencies etc. All of these can impact the recovery performance on
+ * standby and increase the replication lag on primary. In addition, the
+ * primary keeps accumulating WAL needed for the standby while the standby
+ * reads WAL from archive because the standby replication slot stays inactive.
+ * To avoid these problems, the standby will try to switch to stream mode
+ * sooner.
+ */
+static bool
+SwitchWALSourceToPrimary(void)
+{
+	TimestampTz now;
+
+	if (streaming_replication_retry_interval <= 0 ||
+		!StandbyMode ||
+		currentSource != XLOG_FROM_ARCHIVE)
+		return false;
+
+	now = GetCurrentTimestamp();
+
+	/* First time through */
+	if (switched_to_archive_at == 0)
+	{
+		switched_to_archive_at = now;
+		return false;
+	}
+
+	if (TimestampDifferenceExceeds(switched_to_archive_at, now,
+								   streaming_replication_retry_interval * 1000))
+		return true;
+
+	return false;
+}
 
 /*
  * Determine what log level should be used to report a corrupt WAL record
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 45013582a7..e54d82dd1c 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3273,6 +3273,18 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"streaming_replication_retry_interval", PGC_SIGHUP, REPLICATION_STANDBY,
+			gettext_noop("Sets the time after which standby attempts to switch WAL "
+						 "source from archive to streaming replication."),
+			gettext_noop("0 turns this feature off."),
+			GUC_UNIT_S
+		},
+		&streaming_replication_retry_interval,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"wal_segment_size", PGC_INTERNAL, PRESET_OPTIONS,
 			gettext_noop("Shows the size of write ahead log segments."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index edcc0282b2..6f87209d62 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -369,6 +369,9 @@
 					# in milliseconds; 0 disables
 #wal_retrieve_retry_interval = 5s	# time to wait before retrying to
 					# retrieve WAL after a failed attempt
+#streaming_replication_retry_interval = 0	# time after which standby
+					# attempts to switch WAL source from archive to
+					# streaming replication in seconds; 0 disables
 #recovery_min_apply_delay = 0		# minimum delay for applying changes during recovery
 #sync_replication_slots = off			# enables slot synchronization on the physical standby from the primary
 
diff --git a/src/include/access/xlogrecovery.h b/src/include/access/xlogrecovery.h
index c423464e8b..73c5a86f4c 100644
--- a/src/include/access/xlogrecovery.h
+++ b/src/include/access/xlogrecovery.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT char *PrimarySlotName;
 extern PGDLLIMPORT char *recoveryRestoreCommand;
 extern PGDLLIMPORT char *recoveryEndCommand;
 extern PGDLLIMPORT char *archiveCleanupCommand;
+extern PGDLLIMPORT int streaming_replication_retry_interval;
 
 /* indirectly set via GUC system */
 extern PGDLLIMPORT TransactionId recoveryTargetXid;
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index c67249500e..3a8ecd5e54 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -50,6 +50,7 @@ tests += {
       't/039_end_of_wal.pl',
       't/040_standby_failover_slots_sync.pl',
       't/041_checkpoint_at_promote.pl',
+      't/042_wal_source_switch.pl',
     ],
   },
 }
diff --git a/src/test/recovery/t/042_wal_source_switch.pl b/src/test/recovery/t/042_wal_source_switch.pl
new file mode 100644
index 0000000000..b00ed29f73
--- /dev/null
+++ b/src/test/recovery/t/042_wal_source_switch.pl
@@ -0,0 +1,113 @@
+
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Checks for WAL source switch feature.
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize primary node
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+$node_primary->init(allows_streaming => 1, has_archiving => 1);
+
+# Ensure checkpoint doesn't come in our way
+$node_primary->append_conf(
+	'postgresql.conf', qq(
+checkpoint_timeout = 1h
+autovacuum = off
+));
+$node_primary->start;
+
+$node_primary->safe_psql('postgres',
+	"SELECT pg_create_physical_replication_slot('standby_slot')");
+
+# And some content
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE tab_int AS SELECT generate_series(1, 10) AS a");
+
+# Take backup
+my $backup_name = 'my_backup';
+$node_primary->backup($backup_name);
+
+# Create streaming standby from backup
+my $node_standby = PostgreSQL::Test::Cluster->new('standby');
+$node_standby->init_from_backup(
+	$node_primary, $backup_name,
+	has_streaming => 1,
+	has_restoring => 1);
+
+my $retry_interval = 1;
+$node_standby->append_conf(
+	'postgresql.conf', qq(
+primary_slot_name = 'standby_slot'
+streaming_replication_retry_interval = '${retry_interval}s'
+log_min_messages = 'debug2'
+));
+$node_standby->start;
+
+# Wait until standby has replayed enough data
+$node_primary->wait_for_catchup($node_standby);
+
+# Generate some data on the primary while the standby is down
+$node_standby->stop;
+for my $i (1 .. 10)
+{
+	$node_primary->safe_psql('postgres',
+		"INSERT INTO tab_int VALUES (generate_series(11, 20));");
+	$node_primary->safe_psql('postgres', "SELECT pg_switch_wal();");
+}
+
+# Now wait for replay to complete on standby. We're done waiting when the
+# standby has replayed up to the previously saved primary LSN.
+my $cur_lsn =
+  $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+
+# Generate 1 more WAL file so that we wait predictably for the archiving of
+# all WAL files.
+$node_primary->advance_wal(1);
+
+my $walfile_name =
+  $node_primary->safe_psql('postgres', "SELECT pg_walfile_name('$cur_lsn')");
+
+$node_primary->poll_query_until('postgres',
+	"SELECT count(*) = 1 FROM pg_stat_archiver WHERE last_archived_wal = '$walfile_name';"
+) or die "Timed out while waiting for archiving of WAL by primary";
+
+my $offset = -s $node_standby->logfile;
+
+# Standby initially fetches WAL from archive after the restart. Since it is
+# asked to retry fetching from primary after retry interval
+# (i.e. streaming_replication_retry_interval), it will do so. To mimic the
+# standby spending some time fetching from archive, we use apply delay
+# (i.e. recovery_min_apply_delay) greater than the retry interval, so that for
+# fetching the next WAL file the standby honours retry interval and fetches it
+# from primary.
+my $delay = $retry_interval * 5;
+$node_standby->append_conf(
+	'postgresql.conf', qq(
+recovery_min_apply_delay = '${delay}s'
+));
+$node_standby->start;
+
+# Wait until standby has replayed enough data
+$node_primary->wait_for_catchup($node_standby);
+
+$node_standby->wait_for_log(
+	qr/DEBUG: ( [A-Z0-9]+:)? switched WAL source from archive to stream after timeout/,
+	$offset);
+$node_standby->wait_for_log(
+	qr/LOG: ( [A-Z0-9]+:)? started streaming WAL from primary at .* on timeline .*/,
+	$offset);
+
+# Check that the data from primary is streamed to standby
+my $row_cnt1 =
+  $node_primary->safe_psql('postgres', "SELECT count(*) FROM tab_int;");
+
+my $row_cnt2 =
+  $node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int;");
+is($row_cnt1, $row_cnt2, 'data from primary is streamed to standby');
+
+done_testing();
-- 
2.34.1

Re: Switching XLog source from archive to streaming when primary available

Reply via email to