Re: Switching XLog source from archive to streaming when primary available

2024-03-23 Thread Bharath Rupireddy
On Mon, Mar 18, 2024 at 11:38 AM Michael Paquier  wrote:
>
> On Sun, Mar 17, 2024 at 11:37:58AM +0530, Bharath Rupireddy wrote:
> > Rebase needed after 071e3ad59d6fd2d6d1277b2bd9579397d10ded28 due to a
> > conflict in meson.build. Please see the attached v23 patch.
>
> I've been reading this patch, and this is a very tricky one.  Please
> be *very* cautious.

Thanks for looking into this.

> +#streaming_replication_retry_interval = 0# time after which standby
> +# attempts to switch WAL source from archive to
> +# streaming replication in seconds; 0 disables
>
> This stuff allows a minimal retry interval of 1s.  Could it be useful
> to have more responsiveness here and allow lower values than that?
> Why not switching the units to be milliseconds?

Nathan had a different view on this to have it on the order of seconds
- https://www.postgresql.org/message-id/20240305020452.GA3373526%40nathanxps13.
If set to a too low value, the frequency of standby trying to connect
to primary increases. IMO, the order of seconds seems fine.

> +if (streaming_replication_retry_interval <= 0 ||
> +!StandbyMode ||
> +currentSource != XLOG_FROM_ARCHIVE)
> +return SWITCH_TO_STREAMING_NONE;
>
> Hmm.  Perhaps this should mention why we don't care about the
> consistent point.

Are you asking why we don't care whether the standby reached a
consistent point when switching to streaming mode due to this new
parameter? If this is the ask, the same applies when a standby
typically switches to streaming replication (get WAL
from primary) today, that is when receive from WAL archive finishes
(no more WAL left there) or fails for any reason. The standby doesn't
care about the consistent point even today, it just trusts the WAL
source and makes the switch.

> +/* See if we can switch WAL source to streaming */
> +if (wal_source_switch_state == SWITCH_TO_STREAMING_NONE)
> +wal_source_switch_state = SwitchWALSourceToPrimary();
>
> Rather than a routine that returns as result the value to use for the
> GUC, I'd suggest to let this routine set the GUC as there is only one
> caller of SwitchWALSourceToPrimary().  This can also include a check
> on SWITCH_TO_STREAMING_NONE, based on what I'm reading that.

Firstly, wal_source_switch_state is not a GUC, it's a static variable
to be used across WaitForWALToBecomeAvailable calls. And, if you are
suggesting to turn SwitchWALSourceToPrimary so that it sets
wal_source_switch_state directly, I'd not do that because when
wal_source_switch_state is not SWITCH_TO_STREAMING_NONE, the function
gets called unnecessarily.

> -   if (lastSourceFailed)
> +   if (lastSourceFailed ||
> +   wal_source_switch_state == SWITCH_TO_STREAMING)
>
> Hmm.  This one may be tricky.  I'd recommend a separation between the
> failure in reading from a source and the switch to a new "forced"
> source.

Separation would just add duplicate code. Moreover, the code wrapped
within if (lastSourceFailed) doesn't do any error handling or such, it
just resets a few stuff from the previous source and sets the next
source.

FWIW, please check [1] (and the discussion thereon) for how the
lastSourceFailed flag is being used to consume all the streamed WAL in
pg_wal directly upon detecting promotion trigger file. Therefore, I
see no problem with the way it is right now for this new feature.

[1]
/*
 * Data not here yet. Check for trigger, then wait for
 * walreceiver to wake us up when new WAL arrives.
 */
if (CheckForStandbyTrigger())
{
/*
 * Note that we don't return XLREAD_FAIL immediately
 * here. After being triggered, we still want to
 * replay all the WAL that was already streamed. It's
 * in pg_wal now, so we just treat this as a failure,
 * and the state machine will move on to replay the
 * streamed WAL from pg_wal, and then recheck the
 * trigger and exit replay.
 */
lastSourceFailed = true;

> +if (wal_source_switch_state == SWITCH_TO_STREAMING_PENDING)
> +readFrom = XLOG_FROM_PG_WAL;
> +else
> +readFrom = currentSource == XLOG_FROM_ARCHIVE ?
> +XLOG_FROM_ANY : currentSource;
>
> WALSourceSwitchState looks confusing here, and are you sure that this
> is actualy correct?  Shouldn't we still try a READ_FROM_ANY or a read
> from the archives even with a streaming pending.

Please see the discussion starting from
https://www.postgresql.org/message-id/20221008215221.GA894639%40nathanxps13.
We wanted to keep the existing behaviour the same when we

Re: Switching XLog source from archive to streaming when primary available

2024-03-18 Thread Michael Paquier
On Sun, Mar 17, 2024 at 11:37:58AM +0530, Bharath Rupireddy wrote:
> Rebase needed after 071e3ad59d6fd2d6d1277b2bd9579397d10ded28 due to a
> conflict in meson.build. Please see the attached v23 patch.

I've been reading this patch, and this is a very tricky one.  Please
be *very* cautious.

+#streaming_replication_retry_interval = 0# time after which standby
+# attempts to switch WAL source from archive to
+# streaming replication in seconds; 0 disables

This stuff allows a minimal retry interval of 1s.  Could it be useful
to have more responsiveness here and allow lower values than that?
Why not switching the units to be milliseconds?

+if (streaming_replication_retry_interval <= 0 ||
+!StandbyMode ||
+currentSource != XLOG_FROM_ARCHIVE)
+return SWITCH_TO_STREAMING_NONE;

Hmm.  Perhaps this should mention why we don't care about the
consistent point.

+/* See if we can switch WAL source to streaming */
+if (wal_source_switch_state == SWITCH_TO_STREAMING_NONE)
+wal_source_switch_state = SwitchWALSourceToPrimary();

Rather than a routine that returns as result the value to use for the
GUC, I'd suggest to let this routine set the GUC as there is only one
caller of SwitchWALSourceToPrimary().  This can also include a check
on SWITCH_TO_STREAMING_NONE, based on what I'm reading that.

-   if (lastSourceFailed)
+   if (lastSourceFailed ||
+   wal_source_switch_state == SWITCH_TO_STREAMING) 

Hmm.  This one may be tricky.  I'd recommend a separation between the
failure in reading from a source and the switch to a new "forced"
source.

+if (wal_source_switch_state == SWITCH_TO_STREAMING_PENDING)
+readFrom = XLOG_FROM_PG_WAL;
+else
+readFrom = currentSource == XLOG_FROM_ARCHIVE ?
+XLOG_FROM_ANY : currentSource;

WALSourceSwitchState looks confusing here, and are you sure that this
is actualy correct?  Shouldn't we still try a READ_FROM_ANY or a read
from the archives even with a streaming pending.

By the way, I am not convinced that what you have is the best
interface ever.  This assumes that we'd always want to switch to
streaming more aggressively.  Could there be a point in also
controlling if we should switch to pg_wal/ or just to archiving more
aggressively as well, aka be able to do the opposite switch of WAL
source?  This design looks somewhat limited to me.  The origin of the
issue is that we don't have a way to control the order of the sources
consumed by WAL replay.  Perhaps something like a replay_source_order
that uses a list would be better, with elements settable to archive,
pg_wal and streaming?
--
Michael


signature.asc
Description: PGP signature


Re: Switching XLog source from archive to streaming when primary available

2024-03-17 Thread Bharath Rupireddy
On Wed, Mar 6, 2024 at 9:49 PM Nathan Bossart  wrote:
>
> > I played with that idea and it came out very nice. Please see the
> > attached v22 patch. Note that personally I didn't like "FORCE" being
> > there in the names, so I've simplified them a bit.
>
> Thanks.  I'd like to spend some time testing this, but from a glance, the
> code appears to be in decent shape.

Rebase needed after 071e3ad59d6fd2d6d1277b2bd9579397d10ded28 due to a
conflict in meson.build. Please see the attached v23 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From b451e18b7b1f91eb3a6857efe552e7fe97cc7f39 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Sun, 17 Mar 2024 05:48:38 +
Subject: [PATCH v23] Allow standby to switch WAL source from archive to
 streaming

A standby typically switches to streaming replication (get WAL
from primary), only when receive from WAL archive finishes (no
more WAL left there) or fails for any reason. Reading WAL from
archive may not always be as efficient and fast as reading from
primary. This can be due to the differences in disk types, IO
costs, network latencies etc.. All of these can impact the
recovery performance on standby and increase the replication lag
on primary. In addition, the primary keeps accumulating WAL needed
for the standby while the standby reads WAL from archive because
the standby replication slot stays inactive. To avoid these
problems, one can use this parameter to make standby switch to
stream mode sooner.

This feature adds a new GUC that specifies amount of time after
which standby attempts to switch WAL source from WAL archive to
streaming replication (getting WAL from primary). However, standby
exhausts all the WAL present in pg_wal before switching. If
standby fails to switch to stream mode, it falls back to archive
mode.

Author: Bharath Rupireddy
Reviewed-by: Cary Huang, Nathan Bossart
Reviewed-by: Kyotaro Horiguchi, SATYANARAYANA NARLAPURAM
Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com
---
 doc/src/sgml/config.sgml  |  49 
 doc/src/sgml/high-availability.sgml   |  15 ++-
 src/backend/access/transam/xlogrecovery.c | 117 --
 src/backend/utils/misc/guc_tables.c   |  12 ++
 src/backend/utils/misc/postgresql.conf.sample |   3 +
 src/include/access/xlogrecovery.h |   1 +
 src/test/recovery/meson.build |   1 +
 src/test/recovery/t/043_wal_source_switch.pl  | 113 +
 8 files changed, 296 insertions(+), 15 deletions(-)
 create mode 100644 src/test/recovery/t/043_wal_source_switch.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 65a6e6c408..40c2ae93d3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5050,6 +5050,55 @@ ANY num_sync ( 
+  streaming_replication_retry_interval (integer)
+  
+   streaming_replication_retry_interval configuration parameter
+  
+  
+  
+   
+Specifies amount of time after which standby attempts to switch WAL
+source from archive to streaming replication (i.e., getting WAL from
+primary). However, the standby exhausts all the WAL present in
+pg_wal directory before switching. If the standby
+fails to switch to stream mode, it falls back to archive mode. If this
+parameter value is specified without units, it is taken as seconds.
+With a lower value for this parameter, the standby makes frequent WAL
+source switch attempts. To avoid this, it is recommended to set a
+value depending on the rate of WAL generation on the primary. If the
+WAL files grow faster, the disk runs out of space sooner, so setting a
+value to make frequent WAL source switch attempts can help. The default
+is zero, disabling this feature. When disabled, the standby typically
+switches to stream mode only after receiving WAL from archive finishes
+(i.e., no more WAL left there) or fails for any reason. This parameter
+can only be set in the postgresql.conf file or on
+the server command line.
+   
+   
+
+ Standby may not always attempt to switch source from WAL archive to
+ streaming replication at exact
+ streaming_replication_retry_interval intervals. For
+ example, if the parameter is set to 1min and
+ fetching WAL file from archive takes about 2min,
+ then the source switch attempt happens for the next WAL file after
+ the current WAL file fetched from archive is fully applied.
+
+   
+   
+Reading WAL from archive may not always be as efficient and fast as
+reading from primary. This can be due to the differences in disk types,
+IO costs, network latencies etc. All of these can impact the recovery
+ 

Re: Switching XLog source from archive to streaming when primary available

2024-03-06 Thread Nathan Bossart
On Wed, Mar 06, 2024 at 10:02:43AM +0530, Bharath Rupireddy wrote:
> On Wed, Mar 6, 2024 at 1:22 AM Nathan Bossart  
> wrote:
>> I was thinking of something more like
>>
>> typedef enum
>> {
>> NO_FORCE_SWITCH_TO_STREAMING,   /* no switch 
>> necessary */
>> FORCE_SWITCH_TO_STREAMING_PENDING,  /* exhausting pg_wal 
>> */
>> FORCE_SWITCH_TO_STREAMING,  /* switch to 
>> streaming now */
>> } WALSourceSwitchState;
>>
>> At least, that illustrates my mental model of the process here.  IMHO
>> that's easier to follow than two similarly-named bool variables.
> 
> I played with that idea and it came out very nice. Please see the
> attached v22 patch. Note that personally I didn't like "FORCE" being
> there in the names, so I've simplified them a bit.

Thanks.  I'd like to spend some time testing this, but from a glance, the
code appears to be in decent shape.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2024-03-05 Thread Bharath Rupireddy
On Wed, Mar 6, 2024 at 1:22 AM Nathan Bossart  wrote:
>
> I was thinking of something more like
>
> typedef enum
> {
> NO_FORCE_SWITCH_TO_STREAMING,   /* no switch 
> necessary */
> FORCE_SWITCH_TO_STREAMING_PENDING,  /* exhausting pg_wal 
> */
> FORCE_SWITCH_TO_STREAMING,  /* switch to 
> streaming now */
> } WALSourceSwitchState;
>
> At least, that illustrates my mental model of the process here.  IMHO
> that's easier to follow than two similarly-named bool variables.

I played with that idea and it came out very nice. Please see the
attached v22 patch. Note that personally I didn't like "FORCE" being
there in the names, so I've simplified them a bit.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From f2c9d1a170ce4cc536ce9139d7a2e0fdc268be69 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 6 Mar 2024 04:12:26 +
Subject: [PATCH v22] Allow standby to switch WAL source from archive to
 streaming

A standby typically switches to streaming replication (get WAL
from primary), only when receive from WAL archive finishes (no
more WAL left there) or fails for any reason. Reading WAL from
archive may not always be as efficient and fast as reading from
primary. This can be due to the differences in disk types, IO
costs, network latencies etc.. All of these can impact the
recovery performance on standby and increase the replication lag
on primary. In addition, the primary keeps accumulating WAL needed
for the standby while the standby reads WAL from archive because
the standby replication slot stays inactive. To avoid these
problems, one can use this parameter to make standby switch to
stream mode sooner.

This feature adds a new GUC that specifies amount of time after
which standby attempts to switch WAL source from WAL archive to
streaming replication (getting WAL from primary). However, standby
exhausts all the WAL present in pg_wal before switching. If
standby fails to switch to stream mode, it falls back to archive
mode.

Author: Bharath Rupireddy
Reviewed-by: Cary Huang, Nathan Bossart
Reviewed-by: Kyotaro Horiguchi, SATYANARAYANA NARLAPURAM
Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com
---
 doc/src/sgml/config.sgml  |  49 
 doc/src/sgml/high-availability.sgml   |  15 ++-
 src/backend/access/transam/xlogrecovery.c | 117 --
 src/backend/utils/misc/guc_tables.c   |  12 ++
 src/backend/utils/misc/postgresql.conf.sample |   3 +
 src/include/access/xlogrecovery.h |   1 +
 src/test/recovery/meson.build |   1 +
 src/test/recovery/t/042_wal_source_switch.pl  | 113 +
 8 files changed, 296 insertions(+), 15 deletions(-)
 create mode 100644 src/test/recovery/t/042_wal_source_switch.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b38cbd714a..02e79f32fb 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5011,6 +5011,55 @@ ANY num_sync ( 
+  streaming_replication_retry_interval (integer)
+  
+   streaming_replication_retry_interval configuration parameter
+  
+  
+  
+   
+Specifies amount of time after which standby attempts to switch WAL
+source from archive to streaming replication (i.e., getting WAL from
+primary). However, the standby exhausts all the WAL present in
+pg_wal directory before switching. If the standby
+fails to switch to stream mode, it falls back to archive mode. If this
+parameter value is specified without units, it is taken as seconds.
+With a lower value for this parameter, the standby makes frequent WAL
+source switch attempts. To avoid this, it is recommended to set a
+value depending on the rate of WAL generation on the primary. If the
+WAL files grow faster, the disk runs out of space sooner, so setting a
+value to make frequent WAL source switch attempts can help. The default
+is zero, disabling this feature. When disabled, the standby typically
+switches to stream mode only after receiving WAL from archive finishes
+(i.e., no more WAL left there) or fails for any reason. This parameter
+can only be set in the postgresql.conf file or on
+the server command line.
+   
+   
+
+ Standby may not always attempt to switch source from WAL archive to
+ streaming replication at exact
+ streaming_replication_retry_interval intervals. For
+ example, if the parameter is set to 1min and
+ fetching WAL file from archive takes about 2min,
+ then the source switch attempt happens for the next WAL file after
+ the current WAL file fetched from archive is fully applied.
+  

Re: Switching XLog source from archive to streaming when primary available

2024-03-05 Thread Nathan Bossart
On Tue, Mar 05, 2024 at 11:38:37PM +0530, Bharath Rupireddy wrote:
> On Tue, Mar 5, 2024 at 7:34 AM Nathan Bossart  
> wrote:
>> Is there any way to simplify this?  For
>> example, would it be possible to make an enum that tracks the
>> streaming_replication_retry_interval state?
> 
> I guess the way it is right now looks simple IMHO. If the suggestion
> is to have an enum like below; it looks overkill for just two states.
> 
> typedef enum
> {
> CAN_SWITCH_SOURCE,
> SWITCH_SOURCE
> } XLogSourceSwitchState;

I was thinking of something more like

typedef enum
{
NO_FORCE_SWITCH_TO_STREAMING,   /* no switch necessary 
*/
FORCE_SWITCH_TO_STREAMING_PENDING,  /* exhausting pg_wal */
FORCE_SWITCH_TO_STREAMING,  /* switch to 
streaming now */
} WALSourceSwitchState;

At least, that illustrates my mental model of the process here.  IMHO
that's easier to follow than two similarly-named bool variables.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2024-03-05 Thread Bharath Rupireddy
On Tue, Mar 5, 2024 at 7:34 AM Nathan Bossart  wrote:
>
> cfbot claims that this one needs another rebase.

Yeah, the conflict was with the new TAP test file name in
src/test/recovery/meson.build.

> I've spent some time thinking about this one.  I'll admit I'm a bit worried
> about adding more complexity to this state machine, but I also haven't
> thought of any other viable approaches,

Right. I understand that the WaitForWALToBecomeAvailable()'s state
machine is a complex piece.

> and this still seems like a useful
> feature.  So, for now, I think we should continue with the current
> approach.

Yes, the feature is useful like mentioned in the docs as below:

+Reading WAL from archive may not always be as efficient and fast as
+reading from primary. This can be due to the differences in disk types,
+IO costs, network latencies etc. All of these can impact the recovery
+performance on standby, and can increase the replication lag on
+primary. In addition, the primary keeps accumulating WAL needed for the
+standby while the standby reads WAL from archive, because the standby
+replication slot stays inactive. To avoid these problems, one can use
+this parameter to make standby switch to stream mode sooner.

> +fails to switch to stream mode, it falls back to archive mode. If 
> this
> +parameter value is specified without units, it is taken as
> +milliseconds. Default is 5min. With a lower value
>
> Does this really need to be milliseconds?  I would think that any
> reasonable setting would at least on the order of seconds.

Agreed. Done that way.

> +attempts. To avoid this, it is recommended to set a reasonable value.
>
> I think we might want to suggest what a "reasonable value" is.

It really depends on the WAL generation rate on the primary. If the
WAL files grow faster, the disk runs out of space sooner, so setting a
 value to make frequent WAL source switch attempts can help. It's hard
to suggest a one-size-fits-all value. Therefore, I've tweaked the docs
a bit to reflect the fact  that it depends on the WAL generation rate.

> +   static bool canSwitchSource = false;
> +   boolswitchSource = false;
>
> IIUC "canSwitchSource" indicates that we are trying to force a switch to
> streaming, but we are currently exhausting anything that's present in the
> pg_wal directory,

Right.

> while "switchSource" indicates that we should force a
> switch to streaming right now.

It's not indicating force switch, it says "previously I was asked to
switch source via canSwitchSource, now that I've exhausted all the WAL
from the pg_wal directory, I'll make a source switch attempt now".

> Furthermore, "canSwitchSource" is static
> while "switchSource" is not.

This is because the WaitForWALToBecomeAvailable() has to remember the
decision (that streaming_replication_retry_interval has occurred)
across the calls. And, switchSource is decided within
WaitForWALToBecomeAvailable() for every function call.

> Is there any way to simplify this?  For
> example, would it be possible to make an enum that tracks the
> streaming_replication_retry_interval state?

I guess the way it is right now looks simple IMHO. If the suggestion
is to have an enum like below; it looks overkill for just two states.

typedef enum
{
CAN_SWITCH_SOURCE,
SWITCH_SOURCE
} XLogSourceSwitchState;

> /*
>  * Don't allow any retry loops to occur during 
> nonblocking
> -* readahead.  Let the caller process everything that 
> has been
> -* decoded already first.
> +* readahead if we failed to read from the current 
> source. Let the
> +* caller process everything that has been decoded 
> already first.
>  */
> -   if (nonblocking)
> +   if (nonblocking && lastSourceFailed)
> return XLREAD_WOULDBLOCK;
>
> Why do we skip this when "switchSource" is set?

It was leftover from the initial version of the patch - I was then
encountering some issue and had that piece there. Removed it now.

> +   /* Reset the WAL source switch state */
> +   if (switchSource)
> +   {
> +   Assert(canSwitchSource);
> +   Assert(currentSource == XLOG_FROM_STREAM);
> +   Assert(oldSource == XLOG_FROM_ARCHIVE);
> +   switchSource = false;
> +   canSwitchSource = false;
> +   }
>
> How do we know that oldSource is guaranteed to be XLOG_FROM_ARCHIVE?  Is
> there no way it could be XLOG_FROM_PG_WAL?

No. switchSource is set to true only when canSwitchSource is set to
true, which happens only when currentSource 

Re: Switching XLog source from archive to streaming when primary available

2024-03-04 Thread Nathan Bossart
cfbot claims that this one needs another rebase.

I've spent some time thinking about this one.  I'll admit I'm a bit worried
about adding more complexity to this state machine, but I also haven't
thought of any other viable approaches, and this still seems like a useful
feature.  So, for now, I think we should continue with the current
approach.

+fails to switch to stream mode, it falls back to archive mode. If this
+parameter value is specified without units, it is taken as
+milliseconds. Default is 5min. With a lower value

Does this really need to be milliseconds?  I would think that any
reasonable setting would at least on the order of seconds.

+attempts. To avoid this, it is recommended to set a reasonable value.

I think we might want to suggest what a "reasonable value" is.

+   static bool canSwitchSource = false;
+   boolswitchSource = false;

IIUC "canSwitchSource" indicates that we are trying to force a switch to
streaming, but we are currently exhausting anything that's present in the
pg_wal directory, while "switchSource" indicates that we should force a
switch to streaming right now.  Furthermore, "canSwitchSource" is static
while "switchSource" is not.  Is there any way to simplify this?  For
example, would it be possible to make an enum that tracks the
streaming_replication_retry_interval state?

/*
 * Don't allow any retry loops to occur during 
nonblocking
-* readahead.  Let the caller process everything that 
has been
-* decoded already first.
+* readahead if we failed to read from the current 
source. Let the
+* caller process everything that has been decoded 
already first.
 */
-   if (nonblocking)
+   if (nonblocking && lastSourceFailed)
return XLREAD_WOULDBLOCK;

Why do we skip this when "switchSource" is set?

+   /* Reset the WAL source switch state */
+   if (switchSource)
+   {
+   Assert(canSwitchSource);
+   Assert(currentSource == XLOG_FROM_STREAM);
+   Assert(oldSource == XLOG_FROM_ARCHIVE);
+   switchSource = false;
+   canSwitchSource = false;
+   }

How do we know that oldSource is guaranteed to be XLOG_FROM_ARCHIVE?  Is
there no way it could be XLOG_FROM_PG_WAL?

+#streaming_replication_retry_interval = 5min   # time after which standby
+   # attempts to switch WAL source from 
archive to
+   # streaming replication
+   # in milliseconds; 0 disables

I think we might want to turn this feature off by default, at least for the
first release.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2024-02-19 Thread Bharath Rupireddy
On Tue, Feb 20, 2024 at 11:54 AM Japin Li  wrote:
>
>
> On Tue, 20 Feb 2024 at 13:40, Bharath Rupireddy 
>  wrote:
> > On Mon, Feb 19, 2024 at 8:25 PM Japin Li  wrote:
> >> [2]
> >> +# Ensure checkpoint doesn't come in our way
> >> +$primary->append_conf('postgresql.conf', qq(
> >> +min_wal_size = 2MB
> >> +max_wal_size = 1GB
> >> +checkpoint_timeout = 1h
> >> +   autovacuum = off
> >> +));
> >>
> >> Keeping the same indentation might be better.
> >
> > The autovacuum line looks mis-indented in the patch file. However, I
> > now ran src/tools/pgindent/perltidyrc
> > src/test/recovery/t/041_wal_source_switch.pl on it.
> >
>
> Thanks for updating the patch.  It seems still with the wrong indent.

Thanks. perltidyrc didn't complain about anything on v19. However, I
kept the alignment same as other TAP tests for multi-line append_conf.
If that's not correct, I'll leave it to the committer to decide. PSA
v20 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From beea9f0b8bbc76bb48dd1a5d64a5b52bafd09e6f Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Tue, 20 Feb 2024 07:31:29 +
Subject: [PATCH v20] Allow standby to switch WAL source from archive to
 streaming

A standby typically switches to streaming replication (get WAL
from primary), only when receive from WAL archive finishes (no
more WAL left there) or fails for any reason. Reading WAL from
archive may not always be as efficient and fast as reading from
primary. This can be due to the differences in disk types, IO
costs, network latencies etc.. All of these can impact the
recovery performance on standby and increase the replication lag
on primary. In addition, the primary keeps accumulating WAL needed
for the standby while the standby reads WAL from archive because
the standby replication slot stays inactive. To avoid these
problems, one can use this parameter to make standby switch to
stream mode sooner.

This feature adds a new GUC that specifies amount of time after
which standby attempts to switch WAL source from WAL archive to
streaming replication (getting WAL from primary). However, standby
exhausts all the WAL present in pg_wal before switching. If
standby fails to switch to stream mode, it falls back to archive
mode.

Author: Bharath Rupireddy
Reviewed-by: Cary Huang, Nathan Bossart
Reviewed-by: Kyotaro Horiguchi, SATYANARAYANA NARLAPURAM
Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com
---
 doc/src/sgml/config.sgml  |  48 
 doc/src/sgml/high-availability.sgml   |  15 ++-
 src/backend/access/transam/xlogrecovery.c | 115 --
 src/backend/utils/misc/guc_tables.c   |  12 ++
 src/backend/utils/misc/postgresql.conf.sample |   4 +
 src/include/access/xlogrecovery.h |   1 +
 src/test/recovery/meson.build |   1 +
 src/test/recovery/t/041_wal_source_switch.pl  | 110 +
 8 files changed, 287 insertions(+), 19 deletions(-)
 create mode 100644 src/test/recovery/t/041_wal_source_switch.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ffd711b7f2..2fbf1ad6e1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4872,6 +4872,54 @@ ANY num_sync ( 
+  streaming_replication_retry_interval (integer)
+  
+   streaming_replication_retry_interval configuration parameter
+  
+  
+  
+   
+Specifies amount of time after which standby attempts to switch WAL
+source from archive to streaming replication (i.e., getting WAL from
+primary). However, the standby exhausts all the WAL present in
+pg_wal directory before switching. If the standby
+fails to switch to stream mode, it falls back to archive mode. If this
+parameter value is specified without units, it is taken as
+milliseconds. Default is 5min. With a lower value
+for this parameter, the standby makes frequent WAL source switch
+attempts. To avoid this, it is recommended to set a reasonable value.
+A setting of 0 disables the feature. When disabled,
+the standby typically switches to stream mode only after receiving WAL
+from archive finishes (i.e., no more WAL left there) or fails for any
+reason. This parameter can only be set in the
+postgresql.conf file or on the server command
+line.
+   
+   
+
+ Standby may not always attempt to switch source from WAL archive to
+ streaming replication at exact
+ streaming_replication_retry_interval intervals. For
+ example, if the parameter is set to 1min and
+ fetching WAL file from archive takes about 2min,
+ then the source switch attempt happens for the next WAL file after
+ current WAL file fetched from archive is fully applied.
+

Re: Switching XLog source from archive to streaming when primary available

2024-02-19 Thread Japin Li

On Tue, 20 Feb 2024 at 13:40, Bharath Rupireddy 
 wrote:
> On Mon, Feb 19, 2024 at 8:25 PM Japin Li  wrote:
>> [2]
>> +# Ensure checkpoint doesn't come in our way
>> +$primary->append_conf('postgresql.conf', qq(
>> +min_wal_size = 2MB
>> +max_wal_size = 1GB
>> +checkpoint_timeout = 1h
>> +   autovacuum = off
>> +));
>>
>> Keeping the same indentation might be better.
>
> The autovacuum line looks mis-indented in the patch file. However, I
> now ran src/tools/pgindent/perltidyrc
> src/test/recovery/t/041_wal_source_switch.pl on it.
>

Thanks for updating the patch.  It seems still with the wrong indent.

diff --git a/src/test/recovery/t/041_wal_source_switch.pl b/src/test/recovery/t/041_wal_source_switch.pl
index 082680bf4a..b5eddba1d5 100644
--- a/src/test/recovery/t/041_wal_source_switch.pl
+++ b/src/test/recovery/t/041_wal_source_switch.pl
@@ -18,9 +18,9 @@ $primary->init(
 # Ensure checkpoint doesn't come in our way
 $primary->append_conf(
 	'postgresql.conf', qq(
-min_wal_size = 2MB
-max_wal_size = 1GB
-checkpoint_timeout = 1h
+	min_wal_size = 2MB
+	max_wal_size = 1GB
+	checkpoint_timeout = 1h
 	autovacuum = off
 ));
 $primary->start;
@@ -85,7 +85,7 @@ my $offset = -s $standby->logfile;
 my $apply_delay = $retry_interval * 5;
 $standby->append_conf(
 	'postgresql.conf', qq(
-recovery_min_apply_delay = '${apply_delay}ms'
+	recovery_min_apply_delay = '${apply_delay}ms'
 ));
 $standby->start;
 


Re: Switching XLog source from archive to streaming when primary available

2024-02-19 Thread Bharath Rupireddy
On Mon, Feb 19, 2024 at 8:25 PM Japin Li  wrote:
>
> > Strengthened tests a bit by using recovery_min_apply_delay to mimic
> > standby spending some time fetching from archive. PSA v18 patch.
>
> Here are some minor comments:

Thanks for taking a look at it.

> [1]
> +primary). However, the standby exhausts all the WAL present in pg_wal
>
> s|pg_wal|pg_wal|g

Done.

> [2]
> +# Ensure checkpoint doesn't come in our way
> +$primary->append_conf('postgresql.conf', qq(
> +min_wal_size = 2MB
> +max_wal_size = 1GB
> +checkpoint_timeout = 1h
> +   autovacuum = off
> +));
>
> Keeping the same indentation might be better.

The autovacuum line looks mis-indented in the patch file. However, I
now ran src/tools/pgindent/perltidyrc
src/test/recovery/t/041_wal_source_switch.pl on it.

Please see the attached v19 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v19-0001-Allow-standby-to-switch-WAL-source-from-archive-.patch
Description: Binary data


Re: Switching XLog source from archive to streaming when primary available

2024-02-19 Thread Japin Li


On Mon, 19 Feb 2024 at 18:36, Bharath Rupireddy 
 wrote:
> On Wed, Jan 31, 2024 at 6:30 PM Bharath Rupireddy
>  wrote:
>>
>> Needed a rebase due to commit 776621a (conflict in
>> src/test/recovery/meson.build for new TAP test file added). Please
>> find the attached v17 patch.
>
> Strengthened tests a bit by using recovery_min_apply_delay to mimic
> standby spending some time fetching from archive. PSA v18 patch.

Here are some minor comments:

[1]
+primary). However, the standby exhausts all the WAL present in pg_wal

s|pg_wal|pg_wal|g

[2]
+# Ensure checkpoint doesn't come in our way
+$primary->append_conf('postgresql.conf', qq(
+min_wal_size = 2MB
+max_wal_size = 1GB
+checkpoint_timeout = 1h
+   autovacuum = off
+));

Keeping the same indentation might be better.




Re: Switching XLog source from archive to streaming when primary available

2024-02-19 Thread Bharath Rupireddy
On Wed, Jan 31, 2024 at 6:30 PM Bharath Rupireddy
 wrote:
>
> Needed a rebase due to commit 776621a (conflict in
> src/test/recovery/meson.build for new TAP test file added). Please
> find the attached v17 patch.

Strengthened tests a bit by using recovery_min_apply_delay to mimic
standby spending some time fetching from archive. PSA v18 patch.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v18-0001-Allow-standby-to-switch-WAL-source-from-archive-.patch
Description: Binary data


Re: Switching XLog source from archive to streaming when primary available

2024-01-31 Thread Bharath Rupireddy
On Wed, Jan 3, 2024 at 4:58 PM Bharath Rupireddy
 wrote:
>
> On Thu, Dec 28, 2023 at 5:26 PM Bharath Rupireddy
>  wrote:
> >
> > I took a closer look at v14 and came up with the following changes:
> >
> > 1. Used advance_wal introduced by commit c161ab74f7.
> > 2. Simplified the core logic and new TAP tests.
> > 3. Reworded the comments and docs.
> > 4. Simplified new DEBUG messages.
> >
> > I've attached the v15 patch for further review.
>
> Per a recent commit c538592, FATAL-ized perl warnings in the newly
> added TAP test and attached the v16 patch.

Needed a rebase due to commit 776621a (conflict in
src/test/recovery/meson.build for new TAP test file added). Please
find the attached v17 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 95a6f6c484dcff68f01ff90d9011abff2f15ad89 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 31 Jan 2024 11:59:02 +
Subject: [PATCH v17] Allow standby to switch WAL source from archive to
 streaming

A standby typically switches to streaming replication (get WAL
from primary), only when receive from WAL archive finishes (no
more WAL left there) or fails for any reason. Reading WAL from
archive may not always be as efficient and fast as reading from
primary. This can be due to the differences in disk types, IO
costs, network latencies etc.. All of these can impact the
recovery performance on standby and increase the replication lag
on primary. In addition, the primary keeps accumulating WAL needed
for the standby while the standby reads WAL from archive because
the standby replication slot stays inactive. To avoid these
problems, one can use this parameter to make standby switch to
stream mode sooner.

This feature adds a new GUC that specifies amount of time after
which standby attempts to switch WAL source from WAL archive to
streaming replication (getting WAL from primary). However, standby
exhausts all the WAL present in pg_wal before switching. If
standby fails to switch to stream mode, it falls back to archive
mode.

Author: Bharath Rupireddy
Reviewed-by: Cary Huang, Nathan Bossart
Reviewed-by: Kyotaro Horiguchi, SATYANARAYANA NARLAPURAM
Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com
---
 doc/src/sgml/config.sgml  |  47 +++
 doc/src/sgml/high-availability.sgml   |  15 ++-
 src/backend/access/transam/xlogrecovery.c | 115 --
 src/backend/utils/misc/guc_tables.c   |  12 ++
 src/backend/utils/misc/postgresql.conf.sample |   4 +
 src/include/access/xlogrecovery.h |   1 +
 src/test/recovery/meson.build |   1 +
 src/test/recovery/t/041_wal_source_switch.pl  |  93 ++
 8 files changed, 269 insertions(+), 19 deletions(-)
 create mode 100644 src/test/recovery/t/041_wal_source_switch.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 61038472c5..f0e45cf49d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4867,6 +4867,53 @@ ANY num_sync ( 
+  streaming_replication_retry_interval (integer)
+  
+   streaming_replication_retry_interval configuration parameter
+  
+  
+  
+   
+Specifies amount of time after which standby attempts to switch WAL
+source from archive to streaming replication (i.e., getting WAL from
+primary). However, the standby exhausts all the WAL present in pg_wal
+before switching. If the standby fails to switch to stream mode, it
+falls back to archive mode. If this parameter value is specified
+without units, it is taken as milliseconds. Default is
+5min. With a lower value for this parameter, the
+standby makes frequent WAL source switch attempts. To avoid this, it is
+recommended to set a reasonable value. A setting of 0
+disables the feature. When disabled, the standby typically switches to
+stream mode only after receiving WAL from archive finishes (i.e., no
+more WAL left there) or fails for any reason. This parameter can only
+be set in the postgresql.conf file or on the
+server command line.
+   
+   
+
+ Standby may not always attempt to switch source from WAL archive to
+ streaming replication at exact
+ streaming_replication_retry_interval intervals. For
+ example, if the parameter is set to 1min and
+ fetching WAL file from archive takes about 2min,
+ then the source switch attempt happens for the next WAL file after
+ current WAL file fetched from archive is fully applied.
+
+   
+   
+Reading WAL from archive may not always be as efficient and fast as
+reading from primary. This can be due to the differences in disk types,
+IO costs, network latencies etc.. All of these can impact the 

Re: Switching XLog source from archive to streaming when primary available

2024-01-03 Thread Bharath Rupireddy
On Thu, Dec 28, 2023 at 5:26 PM Bharath Rupireddy
 wrote:
>
> I took a closer look at v14 and came up with the following changes:
>
> 1. Used advance_wal introduced by commit c161ab74f7.
> 2. Simplified the core logic and new TAP tests.
> 3. Reworded the comments and docs.
> 4. Simplified new DEBUG messages.
>
> I've attached the v15 patch for further review.

Per a recent commit c538592, FATAL-ized perl warnings in the newly
added TAP test and attached the v16 patch.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v16-0001-Allow-standby-to-switch-WAL-source-from-archive-.patch
Description: Binary data


Re: Switching XLog source from archive to streaming when primary available

2023-12-28 Thread Bharath Rupireddy
On Sat, Oct 21, 2023 at 11:59 PM Bharath Rupireddy
 wrote:
>
> On Fri, Jul 21, 2023 at 12:38 PM Bharath Rupireddy
>  wrote:
> >
> > Needed a rebase. I'm attaching the v13 patch for further consideration.
>
> Needed a rebase. I'm attaching the v14 patch. It also has the following 
> changes:
>
> - Ran pgindent on the new source code.
> - Ran pgperltidy on the new TAP test.
> - Improved the newly added TAP test a bit. Used the new wait_for_log
> core TAP function in place of custom find_in_log.
>
> Thoughts?

I took a closer look at v14 and came up with the following changes:

1. Used advance_wal introduced by commit c161ab74f7.
2. Simplified the core logic and new TAP tests.
3. Reworded the comments and docs.
4. Simplified new DEBUG messages.

I've attached the v15 patch for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 8bdd3b999343c02ae01a7d15ef7fdd0be25623bd Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Thu, 28 Dec 2023 11:14:05 +
Subject: [PATCH v15] Allow standby to switch WAL source from archive to
 streaming

A standby typically switches to streaming replication (get WAL
from primary), only when receive from WAL archive finishes (no
more WAL left there) or fails for any reason. Reading WAL from
archive may not always be as efficient and fast as reading from
primary. This can be due to the differences in disk types, IO
costs, network latencies etc.. All of these can impact the
recovery performance on standby and increase the replication lag
on primary. In addition, the primary keeps accumulating WAL needed
for the standby while the standby reads WAL from archive because
the standby replication slot stays inactive. To avoid these
problems, one can use this parameter to make standby switch to
stream mode sooner.

This feature adds a new GUC that specifies amount of time after
which standby attempts to switch WAL source from WAL archive to
streaming replication (getting WAL from primary). However, standby
exhausts all the WAL present in pg_wal before switching. If
standby fails to switch to stream mode, it falls back to archive
mode.

Author: Bharath Rupireddy
Reviewed-by: Cary Huang, Nathan Bossart
Reviewed-by: Kyotaro Horiguchi, SATYANARAYANA NARLAPURAM
Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com
---
 doc/src/sgml/config.sgml  |  47 +++
 doc/src/sgml/high-availability.sgml   |  15 ++-
 src/backend/access/transam/xlogrecovery.c | 115 --
 src/backend/utils/misc/guc_tables.c   |  12 ++
 src/backend/utils/misc/postgresql.conf.sample |   4 +
 src/include/access/xlogrecovery.h |   1 +
 src/test/recovery/meson.build |   1 +
 src/test/recovery/t/040_wal_source_switch.pl  |  93 ++
 8 files changed, 269 insertions(+), 19 deletions(-)
 create mode 100644 src/test/recovery/t/040_wal_source_switch.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b5624ca884..04aa2fa8d2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4866,6 +4866,53 @@ ANY num_sync ( 
+  streaming_replication_retry_interval (integer)
+  
+   streaming_replication_retry_interval configuration parameter
+  
+  
+  
+   
+Specifies amount of time after which standby attempts to switch WAL
+source from archive to streaming replication (i.e., getting WAL from
+primary). However, the standby exhausts all the WAL present in pg_wal
+before switching. If the standby fails to switch to stream mode, it
+falls back to archive mode. If this parameter value is specified
+without units, it is taken as milliseconds. Default is
+5min. With a lower value for this parameter, the
+standby makes frequent WAL source switch attempts. To avoid this, it is
+recommended to set a reasonable value. A setting of 0
+disables the feature. When disabled, the standby typically switches to
+stream mode only after receiving WAL from archive finishes (i.e., no
+more WAL left there) or fails for any reason. This parameter can only
+be set in the postgresql.conf file or on the
+server command line.
+   
+   
+
+ Standby may not always attempt to switch source from WAL archive to
+ streaming replication at exact
+ streaming_replication_retry_interval intervals. For
+ example, if the parameter is set to 1min and
+ fetching WAL file from archive takes about 2min,
+ then the source switch attempt happens for the next WAL file after
+ current WAL file fetched from archive is fully applied.
+
+   
+   
+Reading WAL from archive may not always be as efficient and fast as
+reading from primary. This can be due to the differences 

Re: Switching XLog source from archive to streaming when primary available

2023-10-21 Thread Bharath Rupireddy
On Fri, Jul 21, 2023 at 12:38 PM Bharath Rupireddy
 wrote:
>
> Needed a rebase. I'm attaching the v13 patch for further consideration.

Needed a rebase. I'm attaching the v14 patch. It also has the following changes:

- Ran pgindent on the new source code.
- Ran pgperltidy on the new TAP test.
- Improved the newly added TAP test a bit. Used the new wait_for_log
core TAP function in place of custom find_in_log.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From e6010e3b2e4c52d32a5d2f3eb2d59954617b221b Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Sat, 21 Oct 2023 13:41:46 +
Subject: [PATCH v14] Allow standby to switch WAL source from archive to
 streaming

A standby typically switches to streaming replication (get WAL
from primary), only when receive from WAL archive finishes (no
more WAL left there) or fails for any reason. Reading WAL from
archive may not always be as efficient and fast as reading from
primary. This can be due to the differences in disk types, IO
costs, network latencies etc., all impacting recovery
performance on standby. And, while standby is reading WAL from
archive, primary accumulates WAL because the standby's replication
slot stays inactive. To avoid these problems, one can use this
parameter to make standby switch to stream mode sooner.

This feature adds a new GUC that specifies amount of time after
which standby attempts to switch WAL source from WAL archive to
streaming replication (getting WAL from primary). However, standby
exhausts all the WAL present in pg_wal before switching. If
standby fails to switch to stream mode, it falls back to archive
mode.

Reported-by: SATYANARAYANA NARLAPURAM
Author: Bharath Rupireddy
Reviewed-by: Cary Huang, Nathan Bossart
Reviewed-by: Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com
---
 doc/src/sgml/config.sgml  |  45 ++
 doc/src/sgml/high-availability.sgml   |  15 +-
 src/backend/access/transam/xlogrecovery.c | 146 --
 src/backend/utils/misc/guc_tables.c   |  12 ++
 src/backend/utils/misc/postgresql.conf.sample |   4 +
 src/include/access/xlogrecovery.h |   1 +
 src/test/recovery/meson.build |   1 +
 src/test/recovery/t/040_wal_source_switch.pl  | 130 
 8 files changed, 333 insertions(+), 21 deletions(-)
 create mode 100644 src/test/recovery/t/040_wal_source_switch.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3839c72c86..3a18ba9b26 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4791,6 +4791,51 @@ ANY num_sync ( 
+  streaming_replication_retry_interval (integer)
+  
+   streaming_replication_retry_interval configuration parameter
+  
+  
+  
+   
+Specifies amount of time after which standby attempts to switch WAL
+source from WAL archive to streaming replication (getting WAL from
+primary). However, standby exhausts all the WAL present in pg_wal
+before switching. If standby fails to switch to stream mode, it falls
+back to archive mode. If this parameter's value is specified without
+units, it is taken as milliseconds. Default is five minutes
+(5min). With a lower value for this parameter,
+standby makes frequent WAL source switch attempts. To avoid this, it is
+recommended to set a reasonable value. A setting of 0
+disables the feature. When disabled, standby typically switches to
+stream mode only when receive from WAL archive finishes (no more WAL
+left there) or fails for any reason. This parameter can only be set in
+the postgresql.conf file or on the server command
+line.
+   
+   
+
+ Standby may not always attempt to switch source from WAL archive to
+ streaming replication at exact streaming_replication_retry_interval
+ intervals. For example, if the parameter is set to 1min
+ and fetching WAL file from archive takes about 2min,
+ then the source switch attempt happens for the next WAL file after
+ current WAL file fetched from archive is fully applied.
+
+   
+   
+Reading WAL from archive may not always be as efficient and fast as
+reading from primary. This can be due to the differences in disk types,
+IO costs, network latencies etc., all impacting recovery performance on
+standby. And, while standby is reading WAL from archive, primary
+accumulates WAL because the standby's replication slot stays inactive.
+To avoid these problems, one can use this parameter to make standby
+switch to stream mode sooner.
+   
+  
+ 
+
  
   recovery_min_apply_delay (integer)
   
diff --git 

Re: Switching XLog source from archive to streaming when primary available

2023-07-21 Thread Bharath Rupireddy
On Tue, Apr 25, 2023 at 9:27 PM Bharath Rupireddy
 wrote:
>
> > Needed a rebase. I'm attaching the v11 patch for further review.
>
> Needed a rebase, so attaching the v12 patch. I word-smithed comments
> and docs a bit.

Needed a rebase. I'm attaching the v13 patch for further consideration.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v13-0001-Allow-standby-to-switch-WAL-source-from-archive-.patch
Description: Binary data


Re: Switching XLog source from archive to streaming when primary available

2023-04-25 Thread Bharath Rupireddy
On Fri, Feb 24, 2023 at 10:26 AM Bharath Rupireddy
 wrote:
>
> On Wed, Nov 16, 2022 at 11:39 AM Bharath Rupireddy
>  wrote:
> >
> > I'm attaching the v10 patch for further review.
>
> Needed a rebase. I'm attaching the v11 patch for further review.

Needed a rebase, so attaching the v12 patch. I word-smithed comments
and docs a bit.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 80a982fb1c0d0737c5144d8ef50bb5e3ff845d07 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Tue, 25 Apr 2023 08:36:48 +
Subject: [PATCH v12] Allow standby to switch WAL source from archive to
 streaming replication

A standby typically switches to streaming replication (get WAL
from primary), only when receive from WAL archive finishes (no
more WAL left there) or fails for any reason. Reading WAL from
archive may not always be as efficient and fast as reading from
primary. This can be due to the differences in disk types, IO
costs, network latencies etc., all impacting recovery
performance on standby. And, while standby is reading WAL from
archive, primary accumulates WAL because the standby's replication
slot stays inactive. To avoid these problems, one can use this
parameter to make standby switch to stream mode sooner.

This feature adds a new GUC that specifies amount of time after
which standby attempts to switch WAL source from WAL archive to
streaming replication (getting WAL from primary). However, standby
exhausts all the WAL present in pg_wal before switching. If
standby fails to switch to stream mode, it falls back to archive
mode.

Reported-by: SATYANARAYANA NARLAPURAM
Author: Bharath Rupireddy
Reviewed-by: Cary Huang, Nathan Bossart
Reviewed-by: Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com
---
 doc/src/sgml/config.sgml  |  45 ++
 doc/src/sgml/high-availability.sgml   |  15 +-
 src/backend/access/transam/xlogrecovery.c | 144 --
 src/backend/utils/misc/guc_tables.c   |  12 ++
 src/backend/utils/misc/postgresql.conf.sample |   4 +
 src/include/access/xlogrecovery.h |   1 +
 src/test/recovery/meson.build |   1 +
 src/test/recovery/t/037_wal_source_switch.pl  | 126 +++
 8 files changed, 328 insertions(+), 20 deletions(-)
 create mode 100644 src/test/recovery/t/037_wal_source_switch.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b56f073a91..8050f981e9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4923,6 +4923,51 @@ ANY num_sync ( 
+  streaming_replication_retry_interval (integer)
+  
+   streaming_replication_retry_interval configuration parameter
+  
+  
+  
+   
+Specifies amount of time after which standby attempts to switch WAL
+source from WAL archive to streaming replication (getting WAL from
+primary). However, standby exhausts all the WAL present in pg_wal
+before switching. If standby fails to switch to stream mode, it falls
+back to archive mode. If this parameter's value is specified without
+units, it is taken as milliseconds. Default is five minutes
+(5min). With a lower value for this parameter,
+standby makes frequent WAL source switch attempts. To avoid this, it is
+recommended to set a reasonable value. A setting of 0
+disables the feature. When disabled, standby typically switches to
+stream mode only when receive from WAL archive finishes (no more WAL
+left there) or fails for any reason. This parameter can only be set in
+the postgresql.conf file or on the server command
+line.
+   
+   
+
+ Standby may not always attempt to switch source from WAL archive to
+ streaming replication at exact streaming_replication_retry_interval
+ intervals. For example, if the parameter is set to 1min
+ and fetching WAL file from archive takes about 2min,
+ then the source switch attempt happens for the next WAL file after
+ current WAL file fetched from archive is fully applied.
+
+   
+   
+Reading WAL from archive may not always be as efficient and fast as
+reading from primary. This can be due to the differences in disk types,
+IO costs, network latencies etc., all impacting recovery performance on
+standby. And, while standby is reading WAL from archive, primary
+accumulates WAL because the standby's replication slot stays inactive.
+To avoid these problems, one can use this parameter to make standby
+switch to stream mode sooner.
+   
+  
+ 
+
  
   recovery_min_apply_delay (integer)
   
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 

Re: Switching XLog source from archive to streaming when primary available

2023-02-23 Thread Bharath Rupireddy
On Wed, Nov 16, 2022 at 11:39 AM Bharath Rupireddy
 wrote:
>
> I'm attaching the v10 patch for further review.

Needed a rebase. I'm attaching the v11 patch for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 6cf6c9bc0a6d6c4325459cddc22fd6fcdc970a03 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Fri, 24 Feb 2023 04:38:13 +
Subject: [PATCH v11] Allow standby to switch WAL source from archive to
 streaming replication

A standby typically switches to streaming replication (get WAL
from primary), only when receive from WAL archive finishes (no
more WAL left there) or fails for any reason. Reading WAL from
archive may not always be efficient and cheaper because network
latencies, disk IO cost might differ on the archive as compared
to primary and often the archive may sit far from standby
impacting recovery performance on the standby. Hence reading WAL
from the primary, by setting this parameter, as opposed to the
archive enables the standby to catch up with the primary sooner
thus reducing replication lag and avoiding WAL files accumulation
on the primary.

This feature adds a new GUC that specifies amount of time after
which standby attempts to switch WAL source from WAL archive to
streaming replication. However, exhaust all the WAL present in
pg_wal before switching. If the standby fails to switch to stream
mode, it falls back to archive mode.

Reported-by: SATYANARAYANA NARLAPURAM
Author: Bharath Rupireddy
Reviewed-by: Cary Huang, Nathan Bossart
Reviewed-by: Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com
---
 doc/src/sgml/config.sgml  |  46 +
 doc/src/sgml/high-availability.sgml   |   7 +
 src/backend/access/transam/xlogrecovery.c | 158 --
 src/backend/utils/misc/guc_tables.c   |  12 ++
 src/backend/utils/misc/postgresql.conf.sample |   4 +
 src/include/access/xlogrecovery.h |   1 +
 src/test/recovery/meson.build |   1 +
 src/test/recovery/t/035_wal_source_switch.pl  | 126 ++
 8 files changed, 338 insertions(+), 17 deletions(-)
 create mode 100644 src/test/recovery/t/035_wal_source_switch.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e5c41cc6c6..fd400ac662 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4884,6 +4884,52 @@ ANY num_sync ( 
+  streaming_replication_retry_interval (integer)
+  
+   streaming_replication_retry_interval configuration parameter
+  
+  
+  
+   
+Specifies amount of time after which standby attempts to switch WAL
+source from WAL archive to streaming replication (get WAL from
+primary). However, exhaust all the WAL present in pg_wal before
+switching. If the standby fails to switch to stream mode, it falls
+back to archive mode.
+If this value is specified without units, it is taken as milliseconds.
+The default is five minutes (5min).
+With a lower setting of this parameter, the standby makes frequent
+WAL source switch attempts when the primary is lost for quite longer.
+To avoid this, set a reasonable value.
+A setting of 0 disables the feature. When disabled,
+the standby typically switches to stream mode, only when receive from
+WAL archive finishes (no more WAL left there) or fails for any reason.
+This parameter can only be set in
+the postgresql.conf file or on the server
+command line.
+   
+   
+Reading WAL from archive may not always be efficient and cheaper
+because network latencies, disk IO cost might differ on the archive as
+compared to primary and often the archive may sit far from standby
+impacting recovery performance on the standby. Hence reading WAL
+from the primary, by setting this parameter, as opposed to the archive
+enables the standby to catch up with the primary sooner thus reducing
+replication lag and avoiding WAL files accumulation on the primary.
+  
+   
+Note that the standby may not always attempt to switch source from
+WAL archive to streaming replication at exact
+streaming_replication_retry_interval intervals.
+For example, if the parameter is set to 1min and
+fetching from WAL archive takes 5min, then the
+source switch attempt happens for the next WAL after current WAL is
+fetched from WAL archive and applied.
+  
+  
+ 
+
  
   recovery_min_apply_delay (integer)
   
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index f180607528..0323f4cc43 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -636,6 +636,13 @@ 

Re: Switching XLog source from archive to streaming when primary available

2023-01-20 Thread Bharath Rupireddy
On Thu, Jan 19, 2023 at 6:20 AM Nathan Bossart  wrote:
>
> On Tue, Jan 17, 2023 at 07:44:52PM +0530, Bharath Rupireddy wrote:
> > On Thu, Jan 12, 2023 at 6:21 AM Nathan Bossart  
> > wrote:
> >> With your patch, we might replay one of these "old" files in pg_wal instead
> >> of the complete version of the file from the archives,
> >
> > That's true even today, without the patch, no? We're not changing the
> > existing behaviour of the state machine. Can you explain how it
> > happens with the patch?
>
> My point is that on HEAD, we will always prefer a complete archive file.
> With your patch, we might instead choose to replay an old file in pg_wal
> because we are artificially advancing the state machine.  IOW even if
> there's a complete archive available, we might not use it.  This is a
> behavior change, but I think it is okay.

Oh, yeah, I too agree that it's okay because manually copying WAL
files directly to pg_wal (which eventually get replayed before
switching to streaming) isn't recommended anyway for production level
servers. I think, we covered it in the documentation that it exhausts
all the WAL present in pg_wal before switching. Isn't that enough?

+Specifies amount of time after which standby attempts to switch WAL
+source from WAL archive to streaming replication (get WAL from
+primary). However, exhaust all the WAL present in pg_wal before
+switching. If the standby fails to switch to stream mode, it falls
+back to archive mode.

> >> Would you mind testing this scenario?
ndby should receive f6 via archive and replay it (check the
> > replay lsn an> >
>
> I meant testing the scenario where there's an old file in pg_wal, a
> complete file in the archives, and your new GUC forces replay of the
> former.  This might be difficult to do in a TAP test.  Ultimately, I just
> want to validate the assumptions discussed above.

I think testing the scenario [1] is achievable. I could write a TAP
test for it - 
https://github.com/BRupireddy/postgres/tree/prefer_archived_wal_v1.
It's a bit flaky and needs a little more work (1 - writing a custom
script for restore_command  that sleeps only after fetching an
existing WAL file from archive, not sleeping for a history file or a
non-existent WAL file. 2- finding a command-line way to sleep on
Windows.) to stabilize it, but it seems doable. I can spend some more
time, if one thinks that the test is worth adding to the core, perhaps
discussing it separately from this thread.

[1] RestoreArchivedFile():
/*
 * When doing archive recovery, we always prefer an archived log file even
 * if a file of the same name exists in XLOGDIR.  The reason is that the
 * file in XLOGDIR could be an old, un-filled or partly-filled version
 * that was copied and restored as part of backing up $PGDATA.
 *

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2023-01-18 Thread Nathan Bossart
On Tue, Jan 17, 2023 at 07:44:52PM +0530, Bharath Rupireddy wrote:
> On Thu, Jan 12, 2023 at 6:21 AM Nathan Bossart  
> wrote:
>> With your patch, we might replay one of these "old" files in pg_wal instead
>> of the complete version of the file from the archives,
> 
> That's true even today, without the patch, no? We're not changing the
> existing behaviour of the state machine. Can you explain how it
> happens with the patch?

My point is that on HEAD, we will always prefer a complete archive file.
With your patch, we might instead choose to replay an old file in pg_wal
because we are artificially advancing the state machine.  IOW even if
there's a complete archive available, we might not use it.  This is a
behavior change, but I think it is okay.

>> Would you mind testing this scenario?
> 
> How about something like below for testing the above scenario? If it
> looks okay, I can add it as a new TAP test file.
> 
> 1. Generate WAL files f1 and f2 and archive them.
> 2. Check the replay lsn and WAL file name on the standby, when it
> replays upto f2, stop the standby.
> 3. Set recovery to fail on the standby, and stop the standby.
> 4. Generate f3, f4 (partially filled) on the primary.
> 5. Manually copy f3, f4 to the standby's pg_wal.
> 6. Start the standby, since recovery is set to fail, and there're new
> WAL files (f3, f4) under its pg_wal, it must replay those WAL files
> (check the replay lsn and WAL file name, it must be f4) before
> switching to streaming.
> 7. Generate f5 on the primary.
> 8. The standby should receive f5 and replay it (check the replay lsn
> and WAL file name, it must be f5).
> 9. Set streaming to fail on the standby and set recovery to succeed.
> 10. Generate f6 on the primary.
> 11. The standby should receive f6 via archive and replay it (check the
> replay lsn and WAL file name, it must be f6).

I meant testing the scenario where there's an old file in pg_wal, a
complete file in the archives, and your new GUC forces replay of the
former.  This might be difficult to do in a TAP test.  Ultimately, I just
want to validate the assumptions discussed above.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2023-01-17 Thread Bharath Rupireddy
On Thu, Jan 12, 2023 at 6:21 AM Nathan Bossart  wrote:
>
> On Tue, Oct 18, 2022 at 07:31:33AM +0530, Bharath Rupireddy wrote:
> > In summary, the standby state machine in WaitForWALToBecomeAvailable()
> > exhausts all the WAL in pg_wal before switching to streaming after
> > failing to fetch from archive. The v8 patch proposed upthread deviates
> > from this behaviour. Hence, attaching v9 patch that keeps the
> > behaviour as-is, that means, the standby exhausts all the WAL in
> > pg_wal before switching to streaming after fetching WAL from archive
> > for at least streaming_replication_retry_interval milliseconds.
>
> I think this is okay.  The following comment explains why archives are
> preferred over existing files in pg_wal:
>
>  * When doing archive recovery, we always prefer an archived log file 
> even
>  * if a file of the same name exists in XLOGDIR.  The reason is that 
> the
>  * file in XLOGDIR could be an old, un-filled or partly-filled version
>  * that was copied and restored as part of backing up $PGDATA.
>
> With your patch, we might replay one of these "old" files in pg_wal instead
> of the complete version of the file from the archives,

That's true even today, without the patch, no? We're not changing the
existing behaviour of the state machine. Can you explain how it
happens with the patch?

On HEAD, after failing to read from the archive, exhaust all wal from
pg_wal and then switch to streaming mode. With the patch, after
reading from the archive for at least
streaming_replication_retry_interval milliseconds, exhaust all wal
from pg_wal and then switch to streaming mode.

> but I think that is
> still correct.  We'll just replay whatever exists in pg_wal (which may be
> un-filled or partly-filled) before attempting streaming.  If that fails,
> we'll go back to trying the archives again.
>
> Would you mind testing this scenario?

How about something like below for testing the above scenario? If it
looks okay, I can add it as a new TAP test file.

1. Generate WAL files f1 and f2 and archive them.
2. Check the replay lsn and WAL file name on the standby, when it
replays upto f2, stop the standby.
3. Set recovery to fail on the standby, and stop the standby.
4. Generate f3, f4 (partially filled) on the primary.
5. Manually copy f3, f4 to the standby's pg_wal.
6. Start the standby, since recovery is set to fail, and there're new
WAL files (f3, f4) under its pg_wal, it must replay those WAL files
(check the replay lsn and WAL file name, it must be f4) before
switching to streaming.
7. Generate f5 on the primary.
8. The standby should receive f5 and replay it (check the replay lsn
and WAL file name, it must be f5).
9. Set streaming to fail on the standby and set recovery to succeed.
10. Generate f6 on the primary.
11. The standby should receive f6 via archive and replay it (check the
replay lsn and WAL file name, it must be f6).

If needed, we can look out for these messages to confirm it works as expected:
elog(DEBUG2, "switched WAL source from %s to %s after %s",
 xlogSourceNames[oldSource], xlogSourceNames[currentSource],
 lastSourceFailed ? "failure" : "success");
ereport(LOG,
(errmsg("restored log file \"%s\" from archive",
xlogfname)));

Essentially, it covers what the documentation
https://www.postgresql.org/docs/devel/warm-standby.html says:

"In standby mode, the server continuously applies WAL received from
the primary server. The standby server can read WAL from a WAL archive
(see restore_command) or directly from the primary over a TCP
connection (streaming replication). The standby server will also
attempt to restore any WAL found in the standby cluster's pg_wal
directory. That typically happens after a server restart, when the
standby replays again WAL that was streamed from the primary before
the restart, but you can also manually copy files to pg_wal at any
time to have them replayed."

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2023-01-11 Thread Nathan Bossart
On Tue, Oct 18, 2022 at 07:31:33AM +0530, Bharath Rupireddy wrote:
> In summary, the standby state machine in WaitForWALToBecomeAvailable()
> exhausts all the WAL in pg_wal before switching to streaming after
> failing to fetch from archive. The v8 patch proposed upthread deviates
> from this behaviour. Hence, attaching v9 patch that keeps the
> behaviour as-is, that means, the standby exhausts all the WAL in
> pg_wal before switching to streaming after fetching WAL from archive
> for at least streaming_replication_retry_interval milliseconds.

I think this is okay.  The following comment explains why archives are
preferred over existing files in pg_wal:

 * When doing archive recovery, we always prefer an archived log file 
even
 * if a file of the same name exists in XLOGDIR.  The reason is that the
 * file in XLOGDIR could be an old, un-filled or partly-filled version
 * that was copied and restored as part of backing up $PGDATA.

With your patch, we might replay one of these "old" files in pg_wal instead
of the complete version of the file from the archives, but I think that is
still correct.  We'll just replay whatever exists in pg_wal (which may be
un-filled or partly-filled) before attempting streaming.  If that fails,
we'll go back to trying the archives again.

Would you mind testing this scenario?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2022-11-15 Thread Bharath Rupireddy
On Wed, Nov 16, 2022 at 9:38 AM Ian Lawrence Barwick  wrote:
>
> While reviewing the patch backlog, we have determined that this patch adds
> one or more TAP tests but has not added the test to the "meson.build" file.

Thanks for pointing it out. Yeah, the test wasn't picking up on meson
builds. I added the new test file name in
src/test/recovery/meson.build.

I'm attaching the v10 patch for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 70323a5c96a889a21a5339c531193b6440d2e270 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 16 Nov 2022 05:34:41 +
Subject: [PATCH v10] Allow standby to switch WAL source from archive to
 streaming replication

A standby typically switches to streaming replication (get WAL
from primary), only when receive from WAL archive finishes (no
more WAL left there) or fails for any reason. Reading WAL from
archive may not always be efficient and cheaper because network
latencies, disk IO cost might differ on the archive as compared
to primary and often the archive may sit far from standby
impacting recovery performance on the standby. Hence reading WAL
from the primary, by setting this parameter, as opposed to the
archive enables the standby to catch up with the primary sooner
thus reducing replication lag and avoiding WAL files accumulation
on the primary.

This feature adds a new GUC that specifies amount of time after
which standby attempts to switch WAL source from WAL archive to
streaming replication. However, exhaust all the WAL present in
pg_wal before switching. If the standby fails to switch to stream
mode, it falls back to archive mode.

Reported-by: SATYANARAYANA NARLAPURAM
Author: Bharath Rupireddy
Reviewed-by: Cary Huang, Nathan Bossart
Reviewed-by: Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com
---
 doc/src/sgml/config.sgml  |  46 +
 doc/src/sgml/high-availability.sgml   |   7 +
 src/backend/access/transam/xlogrecovery.c | 158 --
 src/backend/utils/misc/guc_tables.c   |  12 ++
 src/backend/utils/misc/postgresql.conf.sample |   4 +
 src/include/access/xlogrecovery.h |   1 +
 src/test/recovery/meson.build |   1 +
 src/test/recovery/t/034_wal_source_switch.pl  | 126 ++
 8 files changed, 338 insertions(+), 17 deletions(-)
 create mode 100644 src/test/recovery/t/034_wal_source_switch.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bd50ea8e48..8a3a9ff296 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4852,6 +4852,52 @@ ANY num_sync ( 
+  streaming_replication_retry_interval (integer)
+  
+   streaming_replication_retry_interval configuration parameter
+  
+  
+  
+   
+Specifies amount of time after which standby attempts to switch WAL
+source from WAL archive to streaming replication (get WAL from
+primary). However, exhaust all the WAL present in pg_wal before
+switching. If the standby fails to switch to stream mode, it falls
+back to archive mode.
+If this value is specified without units, it is taken as milliseconds.
+The default is five minutes (5min).
+With a lower setting of this parameter, the standby makes frequent
+WAL source switch attempts when the primary is lost for quite longer.
+To avoid this, set a reasonable value.
+A setting of 0 disables the feature. When disabled,
+the standby typically switches to stream mode, only when receive from
+WAL archive finishes (no more WAL left there) or fails for any reason.
+This parameter can only be set in
+the postgresql.conf file or on the server
+command line.
+   
+   
+Reading WAL from archive may not always be efficient and cheaper
+because network latencies, disk IO cost might differ on the archive as
+compared to primary and often the archive may sit far from standby
+impacting recovery performance on the standby. Hence reading WAL
+from the primary, by setting this parameter, as opposed to the archive
+enables the standby to catch up with the primary sooner thus reducing
+replication lag and avoiding WAL files accumulation on the primary.
+  
+   
+Note that the standby may not always attempt to switch source from
+WAL archive to streaming replication at exact
+streaming_replication_retry_interval intervals.
+For example, if the parameter is set to 1min and
+fetching from WAL archive takes 5min, then the
+source switch attempt happens for the next WAL after current WAL is
+fetched from WAL archive and applied.
+  
+  
+ 
+
  
   recovery_min_apply_delay (integer)
   

Re: Switching XLog source from archive to streaming when primary available

2022-11-15 Thread Ian Lawrence Barwick
2022年10月18日(火) 11:02 Bharath Rupireddy :
>
> On Tue, Oct 11, 2022 at 8:40 AM Nathan Bossart  
> wrote:
> >
> > On Mon, Oct 10, 2022 at 11:33:57AM +0530, Bharath Rupireddy wrote:
> > > On Mon, Oct 10, 2022 at 3:17 AM Nathan Bossart  
> > > wrote:
> > >> I wonder if it would be better to simply remove this extra polling of
> > >> pg_wal as a prerequisite to your patch.  The existing commentary leads me
> > >> to think there might not be a strong reason for this behavior, so it 
> > >> could
> > >> be a nice way to simplify your patch.
> > >
> > > I don't think it's a good idea to remove that completely. As said
> > > above, it might help someone, we never know.
> >
> > It would be great to hear whether anyone is using this functionality.  If
> > no one is aware of existing usage and there is no interest in keeping it
> > around, I don't think it would be unreasonable to remove it in v16.
>
> It seems like exhausting all the WAL in pg_wal before switching to
> streaming after failing to fetch from archive is unremovable. I found
> this after experimenting with it, here are my findings:
> 1. The standby has to recover initial WAL files in the pg_wal
> directory even for the normal post-restart/first-time-start case, I
> mean, in non-crash recovery case.
> 2. The standby received WAL files from primary (walreceiver just
> writes and flushes the received WAL to WAL files under pg_wal)
> pretty-fast and/or standby recovery is slow, say both the standby
> connection to primary and archive connection are broken for whatever
> reasons, then it has WAL files to recover in pg_wal directory.
>
> I think the fundamental behaviour for the standy is that it has to
> fully recover to the end of WAL under pg_wal no matter who copies WAL
> files there. I fully understand the consequences of manually copying
> WAL files into pg_wal, for that matter, manually copying/tinkering any
> other files into/under the data directory is something we don't
> recommend and encourage.
>
> In summary, the standby state machine in WaitForWALToBecomeAvailable()
> exhausts all the WAL in pg_wal before switching to streaming after
> failing to fetch from archive. The v8 patch proposed upthread deviates
> from this behaviour. Hence, attaching v9 patch that keeps the
> behaviour as-is, that means, the standby exhausts all the WAL in
> pg_wal before switching to streaming after fetching WAL from archive
> for at least streaming_replication_retry_interval milliseconds.
>
> Please review the v9 patch further.

Thanks for the updated patch.

While reviewing the patch backlog, we have determined that this patch adds
one or more TAP tests but has not added the test to the "meson.build" file.

To do this, locate the relevant "meson.build" file for each test and add it
in the 'tests' dictionary, which will look something like this:

  'tap': {
'tests': [
  't/001_basic.pl',
],
  },

For some additional details please see this Wiki article:

  https://wiki.postgresql.org/wiki/Meson_for_patch_authors

For more information on the meson build system for PostgreSQL see:

  https://wiki.postgresql.org/wiki/Meson


Regards

Ian Barwick




Re: Switching XLog source from archive to streaming when primary available

2022-10-17 Thread Bharath Rupireddy
On Tue, Oct 11, 2022 at 8:40 AM Nathan Bossart  wrote:
>
> On Mon, Oct 10, 2022 at 11:33:57AM +0530, Bharath Rupireddy wrote:
> > On Mon, Oct 10, 2022 at 3:17 AM Nathan Bossart  
> > wrote:
> >> I wonder if it would be better to simply remove this extra polling of
> >> pg_wal as a prerequisite to your patch.  The existing commentary leads me
> >> to think there might not be a strong reason for this behavior, so it could
> >> be a nice way to simplify your patch.
> >
> > I don't think it's a good idea to remove that completely. As said
> > above, it might help someone, we never know.
>
> It would be great to hear whether anyone is using this functionality.  If
> no one is aware of existing usage and there is no interest in keeping it
> around, I don't think it would be unreasonable to remove it in v16.

It seems like exhausting all the WAL in pg_wal before switching to
streaming after failing to fetch from archive is unremovable. I found
this after experimenting with it, here are my findings:
1. The standby has to recover initial WAL files in the pg_wal
directory even for the normal post-restart/first-time-start case, I
mean, in non-crash recovery case.
2. The standby received WAL files from primary (walreceiver just
writes and flushes the received WAL to WAL files under pg_wal)
pretty-fast and/or standby recovery is slow, say both the standby
connection to primary and archive connection are broken for whatever
reasons, then it has WAL files to recover in pg_wal directory.

I think the fundamental behaviour for the standy is that it has to
fully recover to the end of WAL under pg_wal no matter who copies WAL
files there. I fully understand the consequences of manually copying
WAL files into pg_wal, for that matter, manually copying/tinkering any
other files into/under the data directory is something we don't
recommend and encourage.

In summary, the standby state machine in WaitForWALToBecomeAvailable()
exhausts all the WAL in pg_wal before switching to streaming after
failing to fetch from archive. The v8 patch proposed upthread deviates
from this behaviour. Hence, attaching v9 patch that keeps the
behaviour as-is, that means, the standby exhausts all the WAL in
pg_wal before switching to streaming after fetching WAL from archive
for at least streaming_replication_retry_interval milliseconds.

Please review the v9 patch further.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v9-0001-Allow-standby-to-switch-WAL-source-from-archive-t.patch
Description: Binary data


Re: Switching XLog source from archive to streaming when primary available

2022-10-10 Thread Nathan Bossart
On Mon, Oct 10, 2022 at 11:33:57AM +0530, Bharath Rupireddy wrote:
> On Mon, Oct 10, 2022 at 3:17 AM Nathan Bossart  
> wrote:
>> I wonder if it would be better to simply remove this extra polling of
>> pg_wal as a prerequisite to your patch.  The existing commentary leads me
>> to think there might not be a strong reason for this behavior, so it could
>> be a nice way to simplify your patch.
> 
> I don't think it's a good idea to remove that completely. As said
> above, it might help someone, we never know.

It would be great to hear whether anyone is using this functionality.  If
no one is aware of existing usage and there is no interest in keeping it
around, I don't think it would be unreasonable to remove it in v16.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2022-10-10 Thread Bharath Rupireddy
On Mon, Oct 10, 2022 at 3:17 AM Nathan Bossart  wrote:
>
> > Instead, the simplest would be to just pass XLOG_FROM_WAL to
> > XLogFileReadAnyTLI() when we're about to switch the source to stream
> > mode. This doesn't change the existing behaviour.
>
> It might be more consistent with existing behavior, but one thing I hadn't
> considered is that it might make your proposed feature ineffective when
> users are copying files straight into pg_wal.  IIUC as long as the files
> are present in pg_wal, the source-switch logic won't kick in.

It happens even now, that is, the server will not switch to streaming
mode from the archive after a failure if there's someone continuously
copying WAL files to the pg_wal directory. I have not personally seen
anyone or any service doing that. It doesn't mean that can't happen.
They might do it for some purpose such as 1) to bring back in sync
quickly a standby that's lagging behind the primary after the archive
connection and/or streaming replication connection are/is broken but
many WAL files leftover on the primary 2) before promoting a standby
that's lagging behind the primary for failover or other purposes.
However, I'm not sure if someone does these things on production
servers.

> > Unrelated to this patch, the fact that the standby polls pg_wal is not
> > documented or recommended, is not true, it is actually documented [2].
> > Whether or not we change the docs to be something like [3], is a
> > separate discussion.
>
> I wonder if it would be better to simply remove this extra polling of
> pg_wal as a prerequisite to your patch.  The existing commentary leads me
> to think there might not be a strong reason for this behavior, so it could
> be a nice way to simplify your patch.

I don't think it's a good idea to remove that completely. As said
above, it might help someone, we never know.

I think for this feature, we just need to decide on whether or not
we'd allow pg_wal polling before switching to streaming mode. If we
allow it like in the v8 patch, we can document the behavior.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2022-10-09 Thread Nathan Bossart
On Sun, Oct 09, 2022 at 02:39:47PM +0530, Bharath Rupireddy wrote:
> We can give it a chance to restore from pg_wal before switching to
> streaming to not change any behaviour of the state machine. But, not
> definitely by setting currentSource to XLOG_FROM_WAL, we basically
> never explicitly set currentSource to XLOG_FROM_WAL, other than when
> not in archive recovery i.e. InArchiveRecovery is false. Also, see the
> comment [1].
> 
> Instead, the simplest would be to just pass XLOG_FROM_WAL to
> XLogFileReadAnyTLI() when we're about to switch the source to stream
> mode. This doesn't change the existing behaviour.

It might be more consistent with existing behavior, but one thing I hadn't
considered is that it might make your proposed feature ineffective when
users are copying files straight into pg_wal.  IIUC as long as the files
are present in pg_wal, the source-switch logic won't kick in.

> Unrelated to this patch, the fact that the standby polls pg_wal is not
> documented or recommended, is not true, it is actually documented [2].
> Whether or not we change the docs to be something like [3], is a
> separate discussion.

I wonder if it would be better to simply remove this extra polling of
pg_wal as a prerequisite to your patch.  The existing commentary leads me
to think there might not be a strong reason for this behavior, so it could
be a nice way to simplify your patch.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2022-10-09 Thread Bharath Rupireddy
On Sun, Oct 9, 2022 at 3:22 AM Nathan Bossart  wrote:
>
> As I mentioned upthread [0], I'm still a little concerned that this patch
> will cause the state machine to go straight from archive recovery to
> streaming replication, skipping recovery from pg_wal.
>
> [0] https://postgr.es/m/20220906215704.GA2084086%40nathanxps13

Yes, it goes straight to streaming replication skipping recovery from
pg_wal with the patch.

> I wonder if this
> could be resolved by moving the standby to the pg_wal phase instead.
> Concretely, this line
>
> +   if (switchSource)
> +   break;
>
> would instead change currentSource from XLOG_FROM_ARCHIVE to
> XLOG_FROM_PG_WAL before the call to XLogFileReadAnyTLI().  I suspect the
> behavior would be basically the same, but it would maintain the existing
> ordering.

We can give it a chance to restore from pg_wal before switching to
streaming to not change any behaviour of the state machine. But, not
definitely by setting currentSource to XLOG_FROM_WAL, we basically
never explicitly set currentSource to XLOG_FROM_WAL, other than when
not in archive recovery i.e. InArchiveRecovery is false. Also, see the
comment [1].

Instead, the simplest would be to just pass XLOG_FROM_WAL to
XLogFileReadAnyTLI() when we're about to switch the source to stream
mode. This doesn't change the existing behaviour.

> However, I do see the following note elsewhere in xlogrecovery.c:
>
>  * The segment can be fetched via restore_command, or via walreceiver having
>  * streamed the record, or it can already be present in pg_wal. Checking
>  * pg_wal is mainly for crash recovery, but it will be polled in standby mode
>  * too, in case someone copies a new segment directly to pg_wal. That is not
>  * documented or recommended, though.
>
> Given this information, the present behavior might not be too important,
> but I don't see a point in changing it without good reason.

Yeah, with the attached patch we don't skip pg_wal before switching to
streaming mode.

I've also added a note in the 'Standby Server Operation' section about
the new feature.

Please review the v8 patch further.

Unrelated to this patch, the fact that the standby polls pg_wal is not
documented or recommended, is not true, it is actually documented [2].
Whether or not we change the docs to be something like [3], is a
separate discussion.

[1]
/*
 * We just successfully read a file in pg_wal. We prefer files in
 * the archive over ones in pg_wal, so try the next file again
 * from the archive first.
 */

[2] 
https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-OPERATION
The standby server will also attempt to restore any WAL found in the
standby cluster's pg_wal directory. That typically happens after a
server restart, when the standby replays again WAL that was streamed
from the primary before the restart, but you can also manually copy
files to pg_wal at any time to have them replayed.

[3]
The standby server will also attempt to restore any WAL found in the
standby cluster's pg_wal directory. That typically happens after a
server restart, when the standby replays again WAL that was streamed
from the primary before the restart, but you can also manually copy
files to pg_wal at any time to have them replayed. However, copying of
WAL files manually is not recommended.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 171e11088cca63e99726b49b1b9b408eed81f299 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Sun, 9 Oct 2022 08:11:10 +
Subject: [PATCH v8] Allow standby to switch WAL source from archive to
 streaming replication

A standby typically switches to streaming replication (get WAL
from primary), only when receive from WAL archive finishes (no
more WAL left there) or fails for any reason. Reading WAL from
archive may not always be efficient and cheaper because network
latencies, disk IO cost might differ on the archive as compared
to primary and often the archive may sit far from standby
impacting recovery performance on the standby. Hence reading WAL
from the primary, by setting this parameter, as opposed to the
archive enables the standby to catch up with the primary sooner
thus reducing replication lag and avoiding WAL files accumulation
on the primary.

This feature adds a new GUC that specifies amount of time after
which standby attempts to switch WAL source from WAL archive to
streaming replication. If the standby fails to switch to stream
mode, it falls back to archive mode.

Reported-by: SATYANARAYANA NARLAPURAM
Author: Bharath Rupireddy
Reviewed-by: Cary Huang, Nathan Bossart
Reviewed-by: Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com
---
 doc/src/sgml/config.sgml  |  45 ++
 

Re: Switching XLog source from archive to streaming when primary available

2022-10-08 Thread Nathan Bossart
On Mon, Sep 19, 2022 at 07:49:21PM +0530, Bharath Rupireddy wrote:
> SwitchFromArchiveToStreamEnabled() seemed better at this point. I'm
> attaching the v7 patch with that change. Please review it further.

As I mentioned upthread [0], I'm still a little concerned that this patch
will cause the state machine to go straight from archive recovery to
streaming replication, skipping recovery from pg_wal.  I wonder if this
could be resolved by moving the standby to the pg_wal phase instead.
Concretely, this line

+   if (switchSource)
+   break;

would instead change currentSource from XLOG_FROM_ARCHIVE to
XLOG_FROM_PG_WAL before the call to XLogFileReadAnyTLI().  I suspect the
behavior would be basically the same, but it would maintain the existing
ordering.

However, I do see the following note elsewhere in xlogrecovery.c:

 * The segment can be fetched via restore_command, or via walreceiver having
 * streamed the record, or it can already be present in pg_wal. Checking
 * pg_wal is mainly for crash recovery, but it will be polled in standby mode
 * too, in case someone copies a new segment directly to pg_wal. That is not
 * documented or recommended, though.

Given this information, the present behavior might not be too important,
but I don't see a point in changing it without good reason.

[0] https://postgr.es/m/20220906215704.GA2084086%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2022-09-19 Thread Bharath Rupireddy
On Fri, Sep 16, 2022 at 4:58 PM Bharath Rupireddy
 wrote:
>
> On Fri, Sep 16, 2022 at 12:06 PM Kyotaro Horiguchi
>  wrote:
> >
> > In other words, it seems to me that the macro name doesn't manifest
> > the condition correctly.
> >
> > I don't think we don't particularly want to do that unconditionally.
> > I wanted just to get rid of the macro from the usage site.  Even if
> > the same condition is used elsewhere, I see it better to write out the
> > condition directly there..
>
> I wanted to avoid a bit of duplicate code there. How about naming that
> macro IsXLOGSourceSwitchToStreamEnabled() or
> SwitchFromArchiveToStreamEnabled() or just SwitchFromArchiveToStream()
> or any other better name?

SwitchFromArchiveToStreamEnabled() seemed better at this point. I'm
attaching the v7 patch with that change. Please review it further.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From b94327d9af60a44f252251be97ee1efabc964f97 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Mon, 19 Sep 2022 14:04:49 +
Subject: [PATCH v7] Allow standby to switch WAL source from archive to
 streaming replication

A standby typically switches to streaming replication (get WAL
from primary), only when receive from WAL archive finishes (no
more WAL left there) or fails for any reason. Reading WAL from
archive may not always be efficient and cheaper because network
latencies, disk IO cost might differ on the archive as compared
to primary and often the archive may sit far from standby
impacting recovery performance on the standby. Hence reading WAL
from the primary, by setting this parameter, as opposed to the
archive enables the standby to catch up with the primary sooner
thus reducing replication lag and avoiding WAL files accumulation
on the primary.

This feature adds a new GUC that specifies amount of time after
which standby attempts to switch WAL source from WAL archive to
streaming replication. If the standby fails to switch to stream
mode, it falls back to archive mode.

Reported-by: SATYANARAYANA NARLAPURAM
Author: Bharath Rupireddy
Reviewed-by: Cary Huang, Nathan Bossart
Reviewed-by: Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com
---
 doc/src/sgml/config.sgml  |  45 +++
 src/backend/access/transam/xlogrecovery.c | 124 +++--
 src/backend/utils/misc/guc_tables.c   |  12 ++
 src/backend/utils/misc/postgresql.conf.sample |   4 +
 src/include/access/xlogrecovery.h |   1 +
 src/test/recovery/t/034_wal_source_switch.pl  | 126 ++
 6 files changed, 301 insertions(+), 11 deletions(-)
 create mode 100644 src/test/recovery/t/034_wal_source_switch.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 700914684d..892442a053 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4840,6 +4840,51 @@ ANY num_sync ( 
+  streaming_replication_retry_interval (integer)
+  
+   streaming_replication_retry_interval configuration parameter
+  
+  
+  
+   
+Specifies amount of time after which standby attempts to switch WAL
+source from WAL archive to streaming replication (get WAL from
+primary). If the standby fails to switch to stream mode, it falls back
+to archive mode.
+If this value is specified without units, it is taken as milliseconds.
+The default is five minutes (5min).
+With a lower setting of this parameter, the standby makes frequent
+WAL source switch attempts when the primary is lost for quite longer.
+To avoid this, set a reasonable value.
+A setting of 0 disables the feature. When disabled,
+the standby typically switches to stream mode, only when receive from
+WAL archive finishes (no more WAL left there) or fails for any reason.
+This parameter can only be set in
+the postgresql.conf file or on the server
+command line.
+   
+   
+Reading WAL from archive may not always be efficient and cheaper
+because network latencies, disk IO cost might differ on the archive as
+compared to primary and often the archive may sit far from standby
+impacting recovery performance on the standby. Hence reading WAL
+from the primary, by setting this parameter, as opposed to the archive
+enables the standby to catch up with the primary sooner thus reducing
+replication lag and avoiding WAL files accumulation on the primary.
+  
+   
+Note that the standby may not always attempt to switch source from
+WAL archive to streaming replication at exact
+streaming_replication_retry_interval intervals.
+For example, if the parameter is set to 1min and
+fetching from WAL archive takes 5min, then the
+source 

Re: Switching XLog source from archive to streaming when primary available

2022-09-16 Thread Bharath Rupireddy
On Fri, Sep 16, 2022 at 12:06 PM Kyotaro Horiguchi
 wrote:
>
> In other words, it seems to me that the macro name doesn't manifest
> the condition correctly.
>
> I don't think we don't particularly want to do that unconditionally.
> I wanted just to get rid of the macro from the usage site.  Even if
> the same condition is used elsewhere, I see it better to write out the
> condition directly there..

I wanted to avoid a bit of duplicate code there. How about naming that
macro IsXLOGSourceSwitchToStreamEnabled() or
SwitchFromArchiveToStreamEnabled() or just SwitchFromArchiveToStream()
or any other better name?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2022-09-16 Thread Kyotaro Horiguchi
At Fri, 16 Sep 2022 09:15:58 +0530, Bharath Rupireddy 
 wrote in 
> On Thu, Sep 15, 2022 at 1:52 PM Kyotaro Horiguchi
>  wrote:
> >
> > At Thu, 15 Sep 2022 10:28:12 +0530, Bharath Rupireddy 
> >  wrote in
> > > I'm attaching the v6 patch that's rebased on to the latest HEAD.
> > > Please consider this for review.
> >
> > Thaks for the new version!
> >
> > +#define StreamingReplRetryEnabled() \
> > +   (streaming_replication_retry_interval > 0 && \
> > +StandbyMode && \
> > +currentSource == XLOG_FROM_ARCHIVE)
> >
> > It seems to me a bit too complex..

In other words, it seems to me that the macro name doesn't manifest
the condition correctly.

> I don't think so, it just tells whether a standby is allowed to switch
> source to stream from archive.
> 
> > +   /* Save the timestamp at which we're switching to 
> > archive. */
> > +   if (StreamingReplRetryEnabled())
> > +   switched_to_archive_at = 
> > GetCurrentTimestamp();
> >
> > Anyway we are going to open a file just after this so
> > GetCurrentTimestamp() cannot cause a perceptible degradation.
> > Coulnd't we do that unconditionally, to get rid of the macro?
> 
> Do we really need to do it unconditionally? I don't think so. And, we
> can't get rid of the macro, as we need to check for the current
> source, GUC and standby mode. When this feature is disabled, it
> mustn't execute any extra code IMO.

I don't think we don't particularly want to do that unconditionally.
I wanted just to get rid of the macro from the usage site.  Even if
the same condition is used elsewhere, I see it better to write out the
condition directly there..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Switching XLog source from archive to streaming when primary available

2022-09-15 Thread Bharath Rupireddy
On Thu, Sep 15, 2022 at 1:52 PM Kyotaro Horiguchi
 wrote:
>
> At Thu, 15 Sep 2022 10:28:12 +0530, Bharath Rupireddy 
>  wrote in
> > I'm attaching the v6 patch that's rebased on to the latest HEAD.
> > Please consider this for review.
>
> Thaks for the new version!
>
> +#define StreamingReplRetryEnabled() \
> +   (streaming_replication_retry_interval > 0 && \
> +StandbyMode && \
> +currentSource == XLOG_FROM_ARCHIVE)
>
> It seems to me a bit too complex..

I don't think so, it just tells whether a standby is allowed to switch
source to stream from archive.

> +   /* Save the timestamp at which we're switching to 
> archive. */
> +   if (StreamingReplRetryEnabled())
> +   switched_to_archive_at = 
> GetCurrentTimestamp();
>
> Anyway we are going to open a file just after this so
> GetCurrentTimestamp() cannot cause a perceptible degradation.
> Coulnd't we do that unconditionally, to get rid of the macro?

Do we really need to do it unconditionally? I don't think so. And, we
can't get rid of the macro, as we need to check for the current
source, GUC and standby mode. When this feature is disabled, it
mustn't execute any extra code IMO.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2022-09-15 Thread Kyotaro Horiguchi
At Thu, 15 Sep 2022 10:28:12 +0530, Bharath Rupireddy 
 wrote in 
> I'm attaching the v6 patch that's rebased on to the latest HEAD.
> Please consider this for review.

Thaks for the new version!

+#define StreamingReplRetryEnabled() \
+   (streaming_replication_retry_interval > 0 && \
+StandbyMode && \
+currentSource == XLOG_FROM_ARCHIVE)

It seems to me a bit too complex..

+   /* Save the timestamp at which we're switching to 
archive. */
+   if (StreamingReplRetryEnabled())
+   switched_to_archive_at = GetCurrentTimestamp();

Anyway we are going to open a file just after this so
GetCurrentTimestamp() cannot cause a perceptible degradation.
Coulnd't we do that unconditionally, to get rid of the macro?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Switching XLog source from archive to streaming when primary available

2022-09-14 Thread Bharath Rupireddy
On Mon, Sep 12, 2022 at 11:56 AM Bharath Rupireddy
 wrote:
>
> Please review the attached v5 patch.

I'm attaching the v6 patch that's rebased on to the latest HEAD.
Please consider this for review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v6-0001-Allow-standby-to-switch-WAL-source-from-archive-t.patch
Description: Binary data


Re: Switching XLog source from archive to streaming when primary available

2022-09-12 Thread Bharath Rupireddy
On Mon, Sep 12, 2022 at 9:03 AM Bharath Rupireddy
 wrote:
>
> Please review the attached v4 patch addressing above review comments.

Oops, there's a compiler warning [1] with the v4 patch, fixed it.
Please review the attached v5 patch.

[1] https://cirrus-ci.com/task/5730076611313664?logs=gcc_warning#L450

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 857415757ec21bd8b0195c17694618a1cbf22a57 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Mon, 12 Sep 2022 04:44:35 +
Subject: [PATCH v5] Allow standby to switch WAL source from archive to
 streaming replication

A standby typically switches to streaming replication (get WAL
from primary), only when receive from WAL archive finishes (no
more WAL left there) or fails for any reason. Reading WAL from
archive may not always be efficient and cheaper because network
latencies, disk IO cost might differ on the archive as compared
to primary and often the archive may sit far from standby
impacting recovery performance on the standby. Hence reading WAL
from the primary, by setting this parameter, as opposed to the
archive enables the standby to catch up with the primary sooner
thus reducing replication lag and avoiding WAL files accumulation
on the primary.

This feature adds a new GUC that specifies amount of time after
which standby attempts to switch WAL source from WAL archive to
streaming replication. If the standby fails to switch to stream
mode, it falls back to archive mode.

Reported-by: SATYANARAYANA NARLAPURAM
Author: Bharath Rupireddy
Reviewed-by: Cary Huang, Nathan Bossart
Reviewed-by: Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com
---
 doc/src/sgml/config.sgml  |  45 +++
 src/backend/access/transam/xlogrecovery.c | 124 +++--
 src/backend/utils/misc/guc.c  |  12 ++
 src/backend/utils/misc/postgresql.conf.sample |   4 +
 src/include/access/xlogrecovery.h |   1 +
 src/test/recovery/t/034_wal_source_switch.pl  | 126 ++
 6 files changed, 301 insertions(+), 11 deletions(-)
 create mode 100644 src/test/recovery/t/034_wal_source_switch.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a5cd4e44c7..694da93b8c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4840,6 +4840,51 @@ ANY num_sync ( 
+  streaming_replication_retry_interval (integer)
+  
+   streaming_replication_retry_interval configuration parameter
+  
+  
+  
+   
+Specifies amount of time after which standby attempts to switch WAL
+source from WAL archive to streaming replication (get WAL from
+primary). If the standby fails to switch to stream mode, it falls back
+to archive mode.
+If this value is specified without units, it is taken as milliseconds.
+The default is five minutes (5min).
+With a lower setting of this parameter, the standby makes frequent
+WAL source switch attempts when the primary is lost for quite longer.
+To avoid this, set a reasonable value.
+A setting of 0 disables the feature. When disabled,
+the standby typically switches to stream mode, only when receive from
+WAL archive finishes (no more WAL left there) or fails for any reason.
+This parameter can only be set in
+the postgresql.conf file or on the server
+command line.
+   
+   
+Reading WAL from archive may not always be efficient and cheaper
+because network latencies, disk IO cost might differ on the archive as
+compared to primary and often the archive may sit far from standby
+impacting recovery performance on the standby. Hence reading WAL
+from the primary, by setting this parameter, as opposed to the archive
+enables the standby to catch up with the primary sooner thus reducing
+replication lag and avoiding WAL files accumulation on the primary.
+  
+   
+Note that the standby may not always attempt to switch source from
+WAL archive to streaming replication at exact
+streaming_replication_retry_interval intervals.
+For example, if the parameter is set to 1min and
+fetching from WAL archive takes 5min, then the
+source switch attempt happens for the next WAL after current WAL is
+fetched from WAL archive and applied.
+  
+  
+ 
+
  
   recovery_min_apply_delay (integer)
   
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 9a80084a68..35f7985e65 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -65,6 +65,12 @@
 #define RECOVERY_COMMAND_FILE	"recovery.conf"
 #define RECOVERY_COMMAND_DONE	"recovery.done"
 
+

Re: Switching XLog source from archive to streaming when primary available

2022-09-11 Thread Bharath Rupireddy
On Sat, Sep 10, 2022 at 3:35 AM Nathan Bossart  wrote:
>
> Well, for wal_keep_size, using bytes makes sense.  Given you know how much
> disk space you have, you can set this parameter accordingly to avoid
> retaining too much of it for standby servers.  For your proposed parameter,
> it's not so simple.  The same setting could have wildly different timing
> behavior depending on the server.  I still think that a timeout is the most
> intuitive.

Hm. In v3 patch, I've used the timeout approach, but tracking the
duration server spends in XLOG_FROM_ARCHIVE as opposed to tracking
last failed time in streaming from primary.

> So I think it
> would be difficult for the end user to decide on a value.  However, even
> the timeout approach has this sort of problem.  If your parameter is set to
> 1 minute, but the current archive takes 5 minutes to recover, you won't
> really be testing streaming replication once a minute.  That would likely
> need to be documented.

Added a note in the docs.

On Fri, Sep 9, 2022 at 10:46 AM Kyotaro Horiguchi
 wrote:
>
> Being late for the party.

Thanks for reviewing this.

> It seems to me that the function is getting too long.  I think we
> might want to move the core part of the patch into another function.

Yeah, the WaitForWALToBecomeAvailable() (without this patch) has
around 460 LOC out of which WAL fetching from the chosen source is of
240 LOC, IMO, this code will be a candidate for a new function. I
think that part can be discussed separately.

Having said that, I moved the new code to a new function.

> I think it might be better if intentionalSourceSwitch doesn't need
> lastSourceFailed set. It would look like this:
>
> >  if (lastSourceFailed || switchSource)
> >  {
> > if (nonblocking && lastSourceFailed)
> >return XLREAD_WOULDBLOCK;

I think the above looks good, done that way in the latest patch.

> I don't think the flag first_time is needed.

Addressed this in the v4 patch.

Please review the attached v4 patch addressing above review comments.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v4-0001-Allow-standby-to-switch-WAL-source-from-archive-t.patch
Description: Binary data


Re: Switching XLog source from archive to streaming when primary available

2022-09-09 Thread Nathan Bossart
On Fri, Sep 09, 2022 at 11:07:00PM +0530, Bharath Rupireddy wrote:
> On Fri, Sep 9, 2022 at 10:29 PM Nathan Bossart  
> wrote:
>> IMO the timeout approach would be more intuitive for users.  When it comes
>> to archive recovery, "WAL segment" isn't a standard unit of measure.  WAL
>> segment size can differ between clusters, and WAL files can have different
>> amounts of data or take different amounts of time to replay.
> 
> How about the amount of WAL bytes fetched from the archive after which
> a standby attempts to connect to primary or enter streaming mode? Of
> late, we've changed some GUCs to represent bytes instead of WAL
> files/segments, see [1].

Well, for wal_keep_size, using bytes makes sense.  Given you know how much
disk space you have, you can set this parameter accordingly to avoid
retaining too much of it for standby servers.  For your proposed parameter,
it's not so simple.  The same setting could have wildly different timing
behavior depending on the server.  I still think that a timeout is the most
intuitive.

>> So I think it
>> would be difficult for the end user to decide on a value.  However, even
>> the timeout approach has this sort of problem.  If your parameter is set to
>> 1 minute, but the current archive takes 5 minutes to recover, you won't
>> really be testing streaming replication once a minute.  That would likely
>> need to be documented.
> 
> If we have configurable WAL bytes instead of timeout for standby WAL
> source switch from archive to primary, we don't have the above problem
> right?

If you are going to stop replaying in the middle of a WAL archive, then
maybe.  But I don't think I'd recommend that.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2022-09-09 Thread Bharath Rupireddy
On Fri, Sep 9, 2022 at 10:29 PM Nathan Bossart  wrote:
>
> On Fri, Sep 09, 2022 at 12:14:25PM +0530, Bharath Rupireddy wrote:
> > On Fri, Sep 9, 2022 at 10:57 AM Kyotaro Horiguchi
> >  wrote:
> >> At Thu, 8 Sep 2022 10:53:56 -0700, Nathan Bossart 
> >>  wrote in
> >> > My general point is that we should probably offer some basic preventative
> >> > measure against flipping back and forth between streaming and archive
> >> > recovery while making zero progress.  As I noted, maybe that's as simple 
> >> > as
> >> > having WaitForWALToBecomeAvailable() attempt to restore a file from 
> >> > archive
> >> > at least once before the new parameter forces us to switch to streaming
> >> > replication.  There might be other ways to handle this.
> >>
> >> +1.
> >
> > Hm. In that case, I think we can get rid of timeout based switching
> > mechanism and have this behaviour - the standby can attempt to switch
> > to streaming mode from archive, say, after fetching 1, 2 or a
> > configurable number of WAL files. In fact, this is the original idea
> > proposed by Satya in this thread.
>
> IMO the timeout approach would be more intuitive for users.  When it comes
> to archive recovery, "WAL segment" isn't a standard unit of measure.  WAL
> segment size can differ between clusters, and WAL files can have different
> amounts of data or take different amounts of time to replay.

How about the amount of WAL bytes fetched from the archive after which
a standby attempts to connect to primary or enter streaming mode? Of
late, we've changed some GUCs to represent bytes instead of WAL
files/segments, see [1].

> So I think it
> would be difficult for the end user to decide on a value.  However, even
> the timeout approach has this sort of problem.  If your parameter is set to
> 1 minute, but the current archive takes 5 minutes to recover, you won't
> really be testing streaming replication once a minute.  That would likely
> need to be documented.

If we have configurable WAL bytes instead of timeout for standby WAL
source switch from archive to primary, we don't have the above problem
right?

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c3fe108c025e4a080315562d4c15ecbe3f00405e

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2022-09-09 Thread Nathan Bossart
On Fri, Sep 09, 2022 at 12:14:25PM +0530, Bharath Rupireddy wrote:
> On Fri, Sep 9, 2022 at 10:57 AM Kyotaro Horiguchi
>  wrote:
>> At Thu, 8 Sep 2022 10:53:56 -0700, Nathan Bossart  
>> wrote in
>> > My general point is that we should probably offer some basic preventative
>> > measure against flipping back and forth between streaming and archive
>> > recovery while making zero progress.  As I noted, maybe that's as simple as
>> > having WaitForWALToBecomeAvailable() attempt to restore a file from archive
>> > at least once before the new parameter forces us to switch to streaming
>> > replication.  There might be other ways to handle this.
>>
>> +1.
> 
> Hm. In that case, I think we can get rid of timeout based switching
> mechanism and have this behaviour - the standby can attempt to switch
> to streaming mode from archive, say, after fetching 1, 2 or a
> configurable number of WAL files. In fact, this is the original idea
> proposed by Satya in this thread.

IMO the timeout approach would be more intuitive for users.  When it comes
to archive recovery, "WAL segment" isn't a standard unit of measure.  WAL
segment size can differ between clusters, and WAL files can have different
amounts of data or take different amounts of time to replay.  So I think it
would be difficult for the end user to decide on a value.  However, even
the timeout approach has this sort of problem.  If your parameter is set to
1 minute, but the current archive takes 5 minutes to recover, you won't
really be testing streaming replication once a minute.  That would likely
need to be documented.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2022-09-09 Thread Bharath Rupireddy
On Fri, Sep 9, 2022 at 10:57 AM Kyotaro Horiguchi
 wrote:
>
> At Thu, 8 Sep 2022 10:53:56 -0700, Nathan Bossart  
> wrote in
> > On Thu, Sep 08, 2022 at 05:16:53PM +0530, Bharath Rupireddy wrote:
> > > I'm attaching the v3 patch with the review comments addressed, please
> > > review it further.
> >
> > My general point is that we should probably offer some basic preventative
> > measure against flipping back and forth between streaming and archive
> > recovery while making zero progress.  As I noted, maybe that's as simple as
> > having WaitForWALToBecomeAvailable() attempt to restore a file from archive
> > at least once before the new parameter forces us to switch to streaming
> > replication.  There might be other ways to handle this.
>
> +1.

Hm. In that case, I think we can get rid of timeout based switching
mechanism and have this behaviour - the standby can attempt to switch
to streaming mode from archive, say, after fetching 1, 2 or a
configurable number of WAL files. In fact, this is the original idea
proposed by Satya in this thread.

If okay, I can code on that. Thoughts?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2022-09-08 Thread Kyotaro Horiguchi
At Thu, 8 Sep 2022 10:53:56 -0700, Nathan Bossart  
wrote in 
> On Thu, Sep 08, 2022 at 05:16:53PM +0530, Bharath Rupireddy wrote:
> > I'm attaching the v3 patch with the review comments addressed, please
> > review it further.
> 
> My general point is that we should probably offer some basic preventative
> measure against flipping back and forth between streaming and archive
> recovery while making zero progress.  As I noted, maybe that's as simple as
> having WaitForWALToBecomeAvailable() attempt to restore a file from archive
> at least once before the new parameter forces us to switch to streaming
> replication.  There might be other ways to handle this.

+1.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Switching XLog source from archive to streaming when primary available

2022-09-08 Thread Kyotaro Horiguchi
Being late for the party.

It seems to me that the function is getting too long.  I think we
might want to move the core part of the patch into another function.

I think it might be better if intentionalSourceSwitch doesn't need
lastSourceFailed set. It would look like this:

>  if (lastSourceFailed || switchSource)
>  {
> if (nonblocking && lastSourceFailed)
>return XLREAD_WOULDBLOCK;


+   if (first_time)
+   last_switch_time = curr_time;
..
+   if (!first_time &&
+   
TimestampDifferenceExceeds(last_switch_time, curr_time,
..
+   /* We're not here for the first time 
any more */
+   if (first_time)
+   first_time = false;

I don't think the flag first_time is needed.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Switching XLog source from archive to streaming when primary available

2022-09-08 Thread Nathan Bossart
On Thu, Sep 08, 2022 at 05:16:53PM +0530, Bharath Rupireddy wrote:
> On Wed, Sep 7, 2022 at 3:27 AM Nathan Bossart  
> wrote:
>> It's not clear to me how this is expected to interact with the pg_wal phase
>> of standby recovery.  As the docs note [0], standby servers loop through
>> archive recovery, recovery from pg_wal, and streaming replication.  Does
>> this cause the pg_wal phase to be skipped (i.e., the standby goes straight
>> from archive recovery to streaming replication)?  I wonder if it'd be
>> better for this mechanism to simply move the standby to the pg_wal phase so
>> that the usual ordering isn't changed.
> 
> It doesn't change any behaviour as such for XLOG_FROM_PG_WAL. In
> standby mode when recovery_command is specified, the initial value of
> currentSource would be XLOG_FROM_ARCHIVE (see [1]). If the archive is
> exhausted of WAL or the standby fails to fetch from the archive, then
> it switches to XLOG_FROM_STREAM. If the standby fails to receive WAL
> from primary, it switches back to XLOG_FROM_ARCHIVE. This continues
> unless the standby gets promoted. With the patch, we enable the
> standby to try fetching from the primary, instead of waiting for WAL
> in the archive to get exhausted or for an error to occur in the
> standby while receiving from the archive.

Okay.  I see that you are checking for XLOG_FROM_ARCHIVE.

>> Shouldn't the last_switch_time be set when the state machine first enters
>> XLOG_FROM_ARCHIVE?  IIUC this logic is currently counting time spent
>> elsewhere (e.g., XLOG_FROM_STREAM) when determining whether to force a
>> source switch.  This would mean that a standby that has spent a lot of time
>> in streaming replication before failing would flip to XLOG_FROM_ARCHIVE,
>> immediately flip back to XLOG_FROM_STREAM, and then likely flip back to
>> XLOG_FROM_ARCHIVE when it failed again.  Given the standby will wait for
>> wal_retrieve_retry_interval before going back to XLOG_FROM_ARCHIVE, it
>> seems like we could end up rapidly looping between sources.  Perhaps I am
>> misunderstanding how this is meant to work.
> 
> last_switch_time indicates the time when the standby last attempted to
> switch to primary. For instance, a standby:
> 1) for the first time, sets last_switch_time = current_time when in archive 
> mode
> 2) if current_time < last_switch_time + interval, continues to be in
> archive mode
> 3) if current_time >= last_switch_time + interval, attempts to switch
> to primary and sets last_switch_time = current_time
> 3.1) if successfully switches to primary, continues in there and for
> any reason fails to fetch from primary, then enters archive mode and
> loops from step (2)
> 3.2) if fails to switch to primary, then enters archive mode and loops
> from step (2)

Let's say I have this new parameter set to 5 minutes, and I have a standby
that's been at step 3.1 for 5 days before failing and going back to step 2.
Won't the standby immediately jump back to step 3.1?  I think we should
place the limit on how long the server stays in XLOG_FROM_ARCHIVE, not how
long it's been since we last tried XLOG_FROM_STREAM.

>> I wonder if the lower bound should be higher to avoid switching
>> unnecessarily rapidly between WAL sources.  I see that
>> WaitForWALToBecomeAvailable() ensures that standbys do not switch from
>> XLOG_FROM_STREAM to XLOG_FROM_ARCHIVE more often than once per
>> wal_retrieve_retry_interval.  Perhaps wal_retrieve_retry_interval should be
>> the lower bound for this GUC, too.  Or maybe WaitForWALToBecomeAvailable()
>> should make sure that the standby makes at least once attempt to restore
>> the file from archive before switching to streaming replication.
> 
> No, we need a way to disable the feature, so I'm not changing the
> lower bound. And let's not make this GUC dependent on any other GUC, I
> would like to keep it simple for better usability. However, I've
> increased the default value to 5min and added a note in the docs about
> the lower values.
> 
> I'm attaching the v3 patch with the review comments addressed, please
> review it further.

My general point is that we should probably offer some basic preventative
measure against flipping back and forth between streaming and archive
recovery while making zero progress.  As I noted, maybe that's as simple as
having WaitForWALToBecomeAvailable() attempt to restore a file from archive
at least once before the new parameter forces us to switch to streaming
replication.  There might be other ways to handle this.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2022-09-08 Thread Bharath Rupireddy
On Wed, Sep 7, 2022 at 3:27 AM Nathan Bossart  wrote:
>
> +  
> +   wal_source_switch_interval configuration 
> parameter
> +  
>
> I don't want to bikeshed on the name too much, but I do think we need
> something more descriptive.  I'm thinking of something like
> streaming_replication_attempt_interval or
> streaming_replication_retry_interval.

I could come up with wal_source_switch_interval after a log of
bikeshedding myself :). However, streaming_replication_retry_interval
looks much better, I've used it in the latest patch. Thanks.

> +Specifies how long the standby server should wait before switching 
> WAL
> +source from WAL archive to primary (streaming replication). This can
> +happen either during the standby initial recovery or after a previous
> +failed attempt to stream WAL from the primary.
>
> I'm not sure what the second sentence means.  In general, I think the
> explanation in your commit message is much clearer:

I polished the comments, docs and commit message a bit, please check now.

> 5 seconds seems low.  I would expect the default to be 1-5 minutes.  I
> think it's important to strike a balance between interrupting archive
> recovery to attempt streaming replication and letting archive recovery make
> progress.

+1 for a default value of 5 minutes to avoid frequent interruptions
for archive mode when primary is really down for a long time. I've
also added a cautionary note in the docs about the lower values.

> +* Try reading WAL from primary after every wal_source_switch_interval
> +* milliseconds, when state machine is in XLOG_FROM_ARCHIVE state. If
> +* successful, the state machine moves to XLOG_FROM_STREAM state, 
> otherwise
> +* it falls back to XLOG_FROM_ARCHIVE state.
>
> It's not clear to me how this is expected to interact with the pg_wal phase
> of standby recovery.  As the docs note [0], standby servers loop through
> archive recovery, recovery from pg_wal, and streaming replication.  Does
> this cause the pg_wal phase to be skipped (i.e., the standby goes straight
> from archive recovery to streaming replication)?  I wonder if it'd be
> better for this mechanism to simply move the standby to the pg_wal phase so
> that the usual ordering isn't changed.
>
> [0] 
> https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-OPERATION

It doesn't change any behaviour as such for XLOG_FROM_PG_WAL. In
standby mode when recovery_command is specified, the initial value of
currentSource would be XLOG_FROM_ARCHIVE (see [1]). If the archive is
exhausted of WAL or the standby fails to fetch from the archive, then
it switches to XLOG_FROM_STREAM. If the standby fails to receive WAL
from primary, it switches back to XLOG_FROM_ARCHIVE. This continues
unless the standby gets promoted. With the patch, we enable the
standby to try fetching from the primary, instead of waiting for WAL
in the archive to get exhausted or for an error to occur in the
standby while receiving from the archive.

> +   if (!first_time &&
> +   
> TimestampDifferenceExceeds(last_switch_time, curr_time,
> + 
>  wal_source_switch_interval))
>
> Shouldn't this also check that wal_source_switch_interval is not set to 0?

Corrected.

> +   elog(DEBUG2,
> +"trying to switch 
> WAL source to %s after fetching WAL from %s for %d milliseconds",
> +
> xlogSourceNames[XLOG_FROM_STREAM],
> +
> xlogSourceNames[currentSource],
> +
> wal_source_switch_interval);
> +
> +   last_switch_time = curr_time;
>
> Shouldn't the last_switch_time be set when the state machine first enters
> XLOG_FROM_ARCHIVE?  IIUC this logic is currently counting time spent
> elsewhere (e.g., XLOG_FROM_STREAM) when determining whether to force a
> source switch.  This would mean that a standby that has spent a lot of time
> in streaming replication before failing would flip to XLOG_FROM_ARCHIVE,
> immediately flip back to XLOG_FROM_STREAM, and then likely flip back to
> XLOG_FROM_ARCHIVE when it failed again.  Given the standby will wait for
> wal_retrieve_retry_interval before going back to XLOG_FROM_ARCHIVE, it
> seems like we could end up rapidly looping between sources.  Perhaps I am
> misunderstanding how this is meant to work.

last_switch_time indicates the time when the standby last attempted to
switch to primary. For instance, a standby:
1) for the first time, sets last_switch_time = current_time when in archive mode
2) if current_time < last_switch_time + interval, continues 

Re: Switching XLog source from archive to streaming when primary available

2022-09-06 Thread Nathan Bossart
+  
+   wal_source_switch_interval configuration 
parameter
+  

I don't want to bikeshed on the name too much, but I do think we need
something more descriptive.  I'm thinking of something like
streaming_replication_attempt_interval or
streaming_replication_retry_interval.

+Specifies how long the standby server should wait before switching WAL
+source from WAL archive to primary (streaming replication). This can
+happen either during the standby initial recovery or after a previous
+failed attempt to stream WAL from the primary.

I'm not sure what the second sentence means.  In general, I think the
explanation in your commit message is much clearer:

The standby makes an attempt to read WAL from primary after
wal_retrieve_retry_interval milliseconds reading from archive.

+If this value is specified without units, it is taken as milliseconds.
+The default value is 5 seconds. A setting of 0
+disables the feature.

5 seconds seems low.  I would expect the default to be 1-5 minutes.  I
think it's important to strike a balance between interrupting archive
recovery to attempt streaming replication and letting archive recovery make
progress.

+* Try reading WAL from primary after every wal_source_switch_interval
+* milliseconds, when state machine is in XLOG_FROM_ARCHIVE state. If
+* successful, the state machine moves to XLOG_FROM_STREAM state, 
otherwise
+* it falls back to XLOG_FROM_ARCHIVE state.

It's not clear to me how this is expected to interact with the pg_wal phase
of standby recovery.  As the docs note [0], standby servers loop through
archive recovery, recovery from pg_wal, and streaming replication.  Does
this cause the pg_wal phase to be skipped (i.e., the standby goes straight
from archive recovery to streaming replication)?  I wonder if it'd be
better for this mechanism to simply move the standby to the pg_wal phase so
that the usual ordering isn't changed.

+   if (!first_time &&
+   
TimestampDifferenceExceeds(last_switch_time, curr_time,
+   
   wal_source_switch_interval))

Shouldn't this also check that wal_source_switch_interval is not set to 0?

+   elog(DEBUG2,
+"trying to switch WAL 
source to %s after fetching WAL from %s for %d milliseconds",
+
xlogSourceNames[XLOG_FROM_STREAM],
+
xlogSourceNames[currentSource],
+
wal_source_switch_interval);
+
+   last_switch_time = curr_time;

Shouldn't the last_switch_time be set when the state machine first enters
XLOG_FROM_ARCHIVE?  IIUC this logic is currently counting time spent
elsewhere (e.g., XLOG_FROM_STREAM) when determining whether to force a
source switch.  This would mean that a standby that has spent a lot of time
in streaming replication before failing would flip to XLOG_FROM_ARCHIVE,
immediately flip back to XLOG_FROM_STREAM, and then likely flip back to
XLOG_FROM_ARCHIVE when it failed again.  Given the standby will wait for
wal_retrieve_retry_interval before going back to XLOG_FROM_ARCHIVE, it
seems like we could end up rapidly looping between sources.  Perhaps I am
misunderstanding how this is meant to work.

+   {
+   {"wal_source_switch_interval", PGC_SIGHUP, REPLICATION_STANDBY,
+   gettext_noop("Sets the time to wait before switching 
WAL source from archive to primary"),
+   gettext_noop("0 turns this feature off."),
+   GUC_UNIT_MS
+   },
+   _source_switch_interval,
+   5000, 0, INT_MAX,
+   NULL, NULL, NULL
+   },

I wonder if the lower bound should be higher to avoid switching
unnecessarily rapidly between WAL sources.  I see that
WaitForWALToBecomeAvailable() ensures that standbys do not switch from
XLOG_FROM_STREAM to XLOG_FROM_ARCHIVE more often than once per
wal_retrieve_retry_interval.  Perhaps wal_retrieve_retry_interval should be
the lower bound for this GUC, too.  Or maybe WaitForWALToBecomeAvailable()
should make sure that the standby makes at least once attempt to restore
the file from archive before switching to streaming replication.

[0] 
https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-OPERATION

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Switching XLog source from archive to streaming when primary available

2022-08-11 Thread Bharath Rupireddy
On Fri, Jul 8, 2022 at 9:16 PM Bharath Rupireddy
 wrote:
>
> On Sat, Jun 25, 2022 at 1:31 AM Cary Huang  wrote:
> >
> > The following review has been posted through the commitfest application:
> > make installcheck-world:  tested, passed
> > Implements feature:   tested, passed
> > Spec compliant:   not tested
> > Documentation:not tested
> >
> > Hello
> >
> > I tested this patch in a setup where the standby is in the middle of 
> > replicating and REDOing primary's WAL files during a very large data 
> > insertion. During this time, I keep killing the walreceiver process to 
> > cause a stream failure and force standby to read from archive. The system 
> > will restore from archive for "wal_retrieve_retry_interval" seconds before 
> > it attempts to steam again. Without this patch, once the streaming is 
> > interrupted, it keeps reading from archive until standby reaches the same 
> > consistent state of primary and then it will switch back to streaming 
> > again. So it seems that the patch does the job as described and does bring 
> > some benefit during a very large REDO job where it will try to re-stream 
> > after restoring some WALs from archive to speed up this "catch up" process. 
> > But if the recovery job is not a large one, PG is already switching back to 
> > streaming once it hits consistent state.
>
> Thanks a lot Cary for testing the patch.
>
> > Here's a v1 patch that I've come up with. I'm right now using the
> > existing GUC wal_retrieve_retry_interval to switch to stream mode from
> > archive mode as opposed to switching only after the failure to get WAL
> > from archive mode. If okay with the approach, I can add tests, change
> > the docs and add a new GUC to control this behaviour. I'm open to
> > thoughts and ideas here.
>
> It will be great if I can hear some thoughts on the above points (as
> posted upthread).

Here's the v2 patch with a separate GUC, new GUC was necessary as the
existing GUC wal_retrieve_retry_interval is loaded with multiple
usages. When the feature is enabled, it will let standby to switch to
stream mode i.e. fetch WAL from primary before even fetching from
archive fails. The switching to stream mode from archive happens in 2
scenarios: 1) when standby is in initial recovery 2) when there was a
failure in receiving from primary (walreceiver got killed or crashed
or timed out, or connectivity to primary was broken - for whatever
reasons).

I also added test cases to the v2 patch.

Please review the patch.

--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/


v2-0001-Switch-WAL-source-to-stream-from-archive.patch
Description: Binary data


Re: Switching XLog source from archive to streaming when primary available

2022-07-08 Thread Bharath Rupireddy
On Sat, Jun 25, 2022 at 1:31 AM Cary Huang  wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:not tested
>
> Hello
>
> I tested this patch in a setup where the standby is in the middle of 
> replicating and REDOing primary's WAL files during a very large data 
> insertion. During this time, I keep killing the walreceiver process to cause 
> a stream failure and force standby to read from archive. The system will 
> restore from archive for "wal_retrieve_retry_interval" seconds before it 
> attempts to steam again. Without this patch, once the streaming is 
> interrupted, it keeps reading from archive until standby reaches the same 
> consistent state of primary and then it will switch back to streaming again. 
> So it seems that the patch does the job as described and does bring some 
> benefit during a very large REDO job where it will try to re-stream after 
> restoring some WALs from archive to speed up this "catch up" process. But if 
> the recovery job is not a large one, PG is already switching back to 
> streaming once it hits consistent state.

Thanks a lot Cary for testing the patch.

> Here's a v1 patch that I've come up with. I'm right now using the
> existing GUC wal_retrieve_retry_interval to switch to stream mode from
> archive mode as opposed to switching only after the failure to get WAL
> from archive mode. If okay with the approach, I can add tests, change
> the docs and add a new GUC to control this behaviour. I'm open to
> thoughts and ideas here.

It will be great if I can hear some thoughts on the above points (as
posted upthread).

Regards,
Bharath Rupireddy.




Re: Switching XLog source from archive to streaming when primary available

2022-06-24 Thread Cary Huang
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Hello

I tested this patch in a setup where the standby is in the middle of 
replicating and REDOing primary's WAL files during a very large data insertion. 
During this time, I keep killing the walreceiver process to cause a stream 
failure and force standby to read from archive. The system will restore from 
archive for "wal_retrieve_retry_interval" seconds before it attempts to steam 
again. Without this patch, once the streaming is interrupted, it keeps reading 
from archive until standby reaches the same consistent state of primary and 
then it will switch back to streaming again. So it seems that the patch does 
the job as described and does bring some benefit during a very large REDO job 
where it will try to re-stream after restoring some WALs from archive to speed 
up this "catch up" process. But if the recovery job is not a large one, PG is 
already switching back to streaming once it hits consistent state.

thank you

Cary Huang
HighGo Software Canada

Re: Switching XLog source from archive to streaming when primary available

2022-05-24 Thread Bharath Rupireddy
On Sat, Apr 30, 2022 at 6:19 PM Bharath Rupireddy
 wrote:
>
> On Mon, Nov 29, 2021 at 1:30 AM SATYANARAYANA NARLAPURAM
>  wrote:
> >
> > Hi Hackers,
> >
> > When the standby couldn't connect to the primary it switches the XLog 
> > source from streaming to archive and continues in that state until it can 
> > get the WAL from the archive location. On a server with high WAL activity, 
> > typically getting the WAL from the archive is slower than streaming it from 
> > the primary and couldn't exit from that state. This not only increases the 
> > lag on the standby but also adversely impacts the primary as the WAL gets 
> > accumulated, and vacuum is not able to collect the dead tuples. DBAs as a 
> > mitigation can however remove/advance the slot or remove the 
> > restore_command on the standby but this is a manual work I am trying to 
> > avoid. I would like to propose the following, please let me know your 
> > thoughts.
> >
> > Automatically attempt to switch the source from Archive to streaming when 
> > the primary_conninfo is set after replaying 'N' wal segment governed by the 
> > GUC retry_primary_conn_after_wal_segments
> > when  retry_primary_conn_after_wal_segments is set to -1 then the feature 
> > is disabled
> > When the retry attempt fails, then switch back to the archive
>
> I've gone through the state machine in WaitForWALToBecomeAvailable and
> I understand it this way: failed to receive WAL records from the
> primary causes the current source to switch to archive and the standby
> continues to get WAL records from archive location unless some failure
> occurs there the current source is never going to switch back to
> stream. Given the fact that getting WAL from archive location causes
> delay in production environments, we miss to take the advantage of the
> reconnection to primary after previous failed attempt.
>
> So basically, we try to attempt to switch to streaming from archive
> (even though fetching from archive can succeed) after a certain amount
> of time or WAL segments. I prefer timing-based switch to streaming
> from archive instead of after a number of WAL segments fetched from
> archive. Right now, wal_retrieve_retry_interval is being used to wait
> before switching to archive after failed attempt from streaming, IMO,
> a similar GUC (that gets set once the source switched from streaming
> to archive and on timeout it switches to streaming again) can be used
> to switch from archive to streaming after the specified amount of
> time.
>
> Thoughts?

Here's a v1 patch that I've come up with. I'm right now using the
existing GUC wal_retrieve_retry_interval to switch to stream mode from
archive mode as opposed to switching only after the failure to get WAL
from archive mode. If okay with the approach, I can add tests, change
the docs and add a new GUC to control this behaviour. I'm open to
thoughts and ideas here.

Regards,
Bharath Rupireddy.


v1-0001-Switch-to-stream-mode-from-archive-occasionally.patch
Description: Binary data


Re: Switching XLog source from archive to streaming when primary available

2022-04-30 Thread Bharath Rupireddy
On Mon, Nov 29, 2021 at 1:30 AM SATYANARAYANA NARLAPURAM
 wrote:
>
> Hi Hackers,
>
> When the standby couldn't connect to the primary it switches the XLog source 
> from streaming to archive and continues in that state until it can get the 
> WAL from the archive location. On a server with high WAL activity, typically 
> getting the WAL from the archive is slower than streaming it from the primary 
> and couldn't exit from that state. This not only increases the lag on the 
> standby but also adversely impacts the primary as the WAL gets accumulated, 
> and vacuum is not able to collect the dead tuples. DBAs as a mitigation can 
> however remove/advance the slot or remove the restore_command on the standby 
> but this is a manual work I am trying to avoid. I would like to propose the 
> following, please let me know your thoughts.
>
> Automatically attempt to switch the source from Archive to streaming when the 
> primary_conninfo is set after replaying 'N' wal segment governed by the GUC 
> retry_primary_conn_after_wal_segments
> when  retry_primary_conn_after_wal_segments is set to -1 then the feature is 
> disabled
> When the retry attempt fails, then switch back to the archive

I've gone through the state machine in WaitForWALToBecomeAvailable and
I understand it this way: failed to receive WAL records from the
primary causes the current source to switch to archive and the standby
continues to get WAL records from archive location unless some failure
occurs there the current source is never going to switch back to
stream. Given the fact that getting WAL from archive location causes
delay in production environments, we miss to take the advantage of the
reconnection to primary after previous failed attempt.

So basically, we try to attempt to switch to streaming from archive
(even though fetching from archive can succeed) after a certain amount
of time or WAL segments. I prefer timing-based switch to streaming
from archive instead of after a number of WAL segments fetched from
archive. Right now, wal_retrieve_retry_interval is being used to wait
before switching to archive after failed attempt from streaming, IMO,
a similar GUC (that gets set once the source switched from streaming
to archive and on timeout it switches to streaming again) can be used
to switch from archive to streaming after the specified amount of
time.

Thoughts?

Regards,
Bharath Rupireddy.




Re: Switching XLog source from archive to streaming when primary available

2021-11-29 Thread Dilip Kumar
On Mon, Nov 29, 2021 at 1:30 AM SATYANARAYANA NARLAPURAM
 wrote:
>
> Hi Hackers,
>
> When the standby couldn't connect to the primary it switches the XLog source 
> from streaming to archive and continues in that state until it can get the 
> WAL from the archive location. On a server with high WAL activity, typically 
> getting the WAL from the archive is slower than streaming it from the primary 
> and couldn't exit from that state. This not only increases the lag on the 
> standby but also adversely impacts the primary as the WAL gets accumulated, 
> and vacuum is not able to collect the dead tuples. DBAs as a mitigation can 
> however remove/advance the slot or remove the restore_command on the standby 
> but this is a manual work I am trying to avoid. I would like to propose the 
> following, please let me know your thoughts.
>
> Automatically attempt to switch the source from Archive to streaming when the 
> primary_conninfo is set after replaying 'N' wal segment governed by the GUC 
> retry_primary_conn_after_wal_segments
> when  retry_primary_conn_after_wal_segments is set to -1 then the feature is 
> disabled
> When the retry attempt fails, then switch back to the archive

I think there is another thread [1] that is logically trying to solve
a similar issue, basically, in the main recovery apply loop is the
walreceiver does not exist then it is launching the walreceiver.
However, in that patch, it is not changing the current Xlog source but
I think that is not a good idea because with that it will restore from
the archive as well as stream from the primary so I have given that
review comment on that thread as well.  One big difference is that
patch is launching the walreceiver even if the WAL is locally
available and we don't really need more WAL but that is controlled by
a GUC.

[1] 
https://www.postgresql.org/message-id/CAKYtNApe05WmeRo92gTePEmhOM4myMpCK_%2BceSJtC7-AWLw1qw%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com