Re: Switching XLog source from archive to streaming when primary available
On Mon, Mar 18, 2024 at 11:38 AM Michael Paquier wrote: > > On Sun, Mar 17, 2024 at 11:37:58AM +0530, Bharath Rupireddy wrote: > > Rebase needed after 071e3ad59d6fd2d6d1277b2bd9579397d10ded28 due to a > > conflict in meson.build. Please see the attached v23 patch. > > I've been reading this patch, and this is a very tricky one. Please > be *very* cautious. Thanks for looking into this. > +#streaming_replication_retry_interval = 0# time after which standby > +# attempts to switch WAL source from archive to > +# streaming replication in seconds; 0 disables > > This stuff allows a minimal retry interval of 1s. Could it be useful > to have more responsiveness here and allow lower values than that? > Why not switching the units to be milliseconds? Nathan had a different view on this to have it on the order of seconds - https://www.postgresql.org/message-id/20240305020452.GA3373526%40nathanxps13. If set to a too low value, the frequency of standby trying to connect to primary increases. IMO, the order of seconds seems fine. > +if (streaming_replication_retry_interval <= 0 || > +!StandbyMode || > +currentSource != XLOG_FROM_ARCHIVE) > +return SWITCH_TO_STREAMING_NONE; > > Hmm. Perhaps this should mention why we don't care about the > consistent point. Are you asking why we don't care whether the standby reached a consistent point when switching to streaming mode due to this new parameter? If this is the ask, the same applies when a standby typically switches to streaming replication (get WAL from primary) today, that is when receive from WAL archive finishes (no more WAL left there) or fails for any reason. The standby doesn't care about the consistent point even today, it just trusts the WAL source and makes the switch. > +/* See if we can switch WAL source to streaming */ > +if (wal_source_switch_state == SWITCH_TO_STREAMING_NONE) > +wal_source_switch_state = SwitchWALSourceToPrimary(); > > Rather than a routine that returns as result the value to use for the > GUC, I'd suggest to let this routine set the GUC as there is only one > caller of SwitchWALSourceToPrimary(). This can also include a check > on SWITCH_TO_STREAMING_NONE, based on what I'm reading that. Firstly, wal_source_switch_state is not a GUC, it's a static variable to be used across WaitForWALToBecomeAvailable calls. And, if you are suggesting to turn SwitchWALSourceToPrimary so that it sets wal_source_switch_state directly, I'd not do that because when wal_source_switch_state is not SWITCH_TO_STREAMING_NONE, the function gets called unnecessarily. > - if (lastSourceFailed) > + if (lastSourceFailed || > + wal_source_switch_state == SWITCH_TO_STREAMING) > > Hmm. This one may be tricky. I'd recommend a separation between the > failure in reading from a source and the switch to a new "forced" > source. Separation would just add duplicate code. Moreover, the code wrapped within if (lastSourceFailed) doesn't do any error handling or such, it just resets a few stuff from the previous source and sets the next source. FWIW, please check [1] (and the discussion thereon) for how the lastSourceFailed flag is being used to consume all the streamed WAL in pg_wal directly upon detecting promotion trigger file. Therefore, I see no problem with the way it is right now for this new feature. [1] /* * Data not here yet. Check for trigger, then wait for * walreceiver to wake us up when new WAL arrives. */ if (CheckForStandbyTrigger()) { /* * Note that we don't return XLREAD_FAIL immediately * here. After being triggered, we still want to * replay all the WAL that was already streamed. It's * in pg_wal now, so we just treat this as a failure, * and the state machine will move on to replay the * streamed WAL from pg_wal, and then recheck the * trigger and exit replay. */ lastSourceFailed = true; > +if (wal_source_switch_state == SWITCH_TO_STREAMING_PENDING) > +readFrom = XLOG_FROM_PG_WAL; > +else > +readFrom = currentSource == XLOG_FROM_ARCHIVE ? > +XLOG_FROM_ANY : currentSource; > > WALSourceSwitchState looks confusing here, and are you sure that this > is actualy correct? Shouldn't we still try a READ_FROM_ANY or a read > from the archives even with a streaming pending. Please see the discussion starting from https://www.postgresql.org/message-id/20221008215221.GA894639%40nathanxps13. We wanted to keep the existing behaviour the same when we
Re: Switching XLog source from archive to streaming when primary available
On Sun, Mar 17, 2024 at 11:37:58AM +0530, Bharath Rupireddy wrote: > Rebase needed after 071e3ad59d6fd2d6d1277b2bd9579397d10ded28 due to a > conflict in meson.build. Please see the attached v23 patch. I've been reading this patch, and this is a very tricky one. Please be *very* cautious. +#streaming_replication_retry_interval = 0# time after which standby +# attempts to switch WAL source from archive to +# streaming replication in seconds; 0 disables This stuff allows a minimal retry interval of 1s. Could it be useful to have more responsiveness here and allow lower values than that? Why not switching the units to be milliseconds? +if (streaming_replication_retry_interval <= 0 || +!StandbyMode || +currentSource != XLOG_FROM_ARCHIVE) +return SWITCH_TO_STREAMING_NONE; Hmm. Perhaps this should mention why we don't care about the consistent point. +/* See if we can switch WAL source to streaming */ +if (wal_source_switch_state == SWITCH_TO_STREAMING_NONE) +wal_source_switch_state = SwitchWALSourceToPrimary(); Rather than a routine that returns as result the value to use for the GUC, I'd suggest to let this routine set the GUC as there is only one caller of SwitchWALSourceToPrimary(). This can also include a check on SWITCH_TO_STREAMING_NONE, based on what I'm reading that. - if (lastSourceFailed) + if (lastSourceFailed || + wal_source_switch_state == SWITCH_TO_STREAMING) Hmm. This one may be tricky. I'd recommend a separation between the failure in reading from a source and the switch to a new "forced" source. +if (wal_source_switch_state == SWITCH_TO_STREAMING_PENDING) +readFrom = XLOG_FROM_PG_WAL; +else +readFrom = currentSource == XLOG_FROM_ARCHIVE ? +XLOG_FROM_ANY : currentSource; WALSourceSwitchState looks confusing here, and are you sure that this is actualy correct? Shouldn't we still try a READ_FROM_ANY or a read from the archives even with a streaming pending. By the way, I am not convinced that what you have is the best interface ever. This assumes that we'd always want to switch to streaming more aggressively. Could there be a point in also controlling if we should switch to pg_wal/ or just to archiving more aggressively as well, aka be able to do the opposite switch of WAL source? This design looks somewhat limited to me. The origin of the issue is that we don't have a way to control the order of the sources consumed by WAL replay. Perhaps something like a replay_source_order that uses a list would be better, with elements settable to archive, pg_wal and streaming? -- Michael signature.asc Description: PGP signature
Re: Switching XLog source from archive to streaming when primary available
On Wed, Mar 6, 2024 at 9:49 PM Nathan Bossart wrote: > > > I played with that idea and it came out very nice. Please see the > > attached v22 patch. Note that personally I didn't like "FORCE" being > > there in the names, so I've simplified them a bit. > > Thanks. I'd like to spend some time testing this, but from a glance, the > code appears to be in decent shape. Rebase needed after 071e3ad59d6fd2d6d1277b2bd9579397d10ded28 due to a conflict in meson.build. Please see the attached v23 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From b451e18b7b1f91eb3a6857efe552e7fe97cc7f39 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Sun, 17 Mar 2024 05:48:38 + Subject: [PATCH v23] Allow standby to switch WAL source from archive to streaming A standby typically switches to streaming replication (get WAL from primary), only when receive from WAL archive finishes (no more WAL left there) or fails for any reason. Reading WAL from archive may not always be as efficient and fast as reading from primary. This can be due to the differences in disk types, IO costs, network latencies etc.. All of these can impact the recovery performance on standby and increase the replication lag on primary. In addition, the primary keeps accumulating WAL needed for the standby while the standby reads WAL from archive because the standby replication slot stays inactive. To avoid these problems, one can use this parameter to make standby switch to stream mode sooner. This feature adds a new GUC that specifies amount of time after which standby attempts to switch WAL source from WAL archive to streaming replication (getting WAL from primary). However, standby exhausts all the WAL present in pg_wal before switching. If standby fails to switch to stream mode, it falls back to archive mode. Author: Bharath Rupireddy Reviewed-by: Cary Huang, Nathan Bossart Reviewed-by: Kyotaro Horiguchi, SATYANARAYANA NARLAPURAM Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com --- doc/src/sgml/config.sgml | 49 doc/src/sgml/high-availability.sgml | 15 ++- src/backend/access/transam/xlogrecovery.c | 117 -- src/backend/utils/misc/guc_tables.c | 12 ++ src/backend/utils/misc/postgresql.conf.sample | 3 + src/include/access/xlogrecovery.h | 1 + src/test/recovery/meson.build | 1 + src/test/recovery/t/043_wal_source_switch.pl | 113 + 8 files changed, 296 insertions(+), 15 deletions(-) create mode 100644 src/test/recovery/t/043_wal_source_switch.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 65a6e6c408..40c2ae93d3 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5050,6 +5050,55 @@ ANY num_sync ( + streaming_replication_retry_interval (integer) + + streaming_replication_retry_interval configuration parameter + + + + +Specifies amount of time after which standby attempts to switch WAL +source from archive to streaming replication (i.e., getting WAL from +primary). However, the standby exhausts all the WAL present in +pg_wal directory before switching. If the standby +fails to switch to stream mode, it falls back to archive mode. If this +parameter value is specified without units, it is taken as seconds. +With a lower value for this parameter, the standby makes frequent WAL +source switch attempts. To avoid this, it is recommended to set a +value depending on the rate of WAL generation on the primary. If the +WAL files grow faster, the disk runs out of space sooner, so setting a +value to make frequent WAL source switch attempts can help. The default +is zero, disabling this feature. When disabled, the standby typically +switches to stream mode only after receiving WAL from archive finishes +(i.e., no more WAL left there) or fails for any reason. This parameter +can only be set in the postgresql.conf file or on +the server command line. + + + + Standby may not always attempt to switch source from WAL archive to + streaming replication at exact + streaming_replication_retry_interval intervals. For + example, if the parameter is set to 1min and + fetching WAL file from archive takes about 2min, + then the source switch attempt happens for the next WAL file after + the current WAL file fetched from archive is fully applied. + + + +Reading WAL from archive may not always be as efficient and fast as +reading from primary. This can be due to the differences in disk types, +IO costs, network latencies etc. All of these can impact the recovery +
Re: Switching XLog source from archive to streaming when primary available
On Wed, Mar 06, 2024 at 10:02:43AM +0530, Bharath Rupireddy wrote: > On Wed, Mar 6, 2024 at 1:22 AM Nathan Bossart > wrote: >> I was thinking of something more like >> >> typedef enum >> { >> NO_FORCE_SWITCH_TO_STREAMING, /* no switch >> necessary */ >> FORCE_SWITCH_TO_STREAMING_PENDING, /* exhausting pg_wal >> */ >> FORCE_SWITCH_TO_STREAMING, /* switch to >> streaming now */ >> } WALSourceSwitchState; >> >> At least, that illustrates my mental model of the process here. IMHO >> that's easier to follow than two similarly-named bool variables. > > I played with that idea and it came out very nice. Please see the > attached v22 patch. Note that personally I didn't like "FORCE" being > there in the names, so I've simplified them a bit. Thanks. I'd like to spend some time testing this, but from a glance, the code appears to be in decent shape. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
On Wed, Mar 6, 2024 at 1:22 AM Nathan Bossart wrote: > > I was thinking of something more like > > typedef enum > { > NO_FORCE_SWITCH_TO_STREAMING, /* no switch > necessary */ > FORCE_SWITCH_TO_STREAMING_PENDING, /* exhausting pg_wal > */ > FORCE_SWITCH_TO_STREAMING, /* switch to > streaming now */ > } WALSourceSwitchState; > > At least, that illustrates my mental model of the process here. IMHO > that's easier to follow than two similarly-named bool variables. I played with that idea and it came out very nice. Please see the attached v22 patch. Note that personally I didn't like "FORCE" being there in the names, so I've simplified them a bit. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From f2c9d1a170ce4cc536ce9139d7a2e0fdc268be69 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Wed, 6 Mar 2024 04:12:26 + Subject: [PATCH v22] Allow standby to switch WAL source from archive to streaming A standby typically switches to streaming replication (get WAL from primary), only when receive from WAL archive finishes (no more WAL left there) or fails for any reason. Reading WAL from archive may not always be as efficient and fast as reading from primary. This can be due to the differences in disk types, IO costs, network latencies etc.. All of these can impact the recovery performance on standby and increase the replication lag on primary. In addition, the primary keeps accumulating WAL needed for the standby while the standby reads WAL from archive because the standby replication slot stays inactive. To avoid these problems, one can use this parameter to make standby switch to stream mode sooner. This feature adds a new GUC that specifies amount of time after which standby attempts to switch WAL source from WAL archive to streaming replication (getting WAL from primary). However, standby exhausts all the WAL present in pg_wal before switching. If standby fails to switch to stream mode, it falls back to archive mode. Author: Bharath Rupireddy Reviewed-by: Cary Huang, Nathan Bossart Reviewed-by: Kyotaro Horiguchi, SATYANARAYANA NARLAPURAM Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com --- doc/src/sgml/config.sgml | 49 doc/src/sgml/high-availability.sgml | 15 ++- src/backend/access/transam/xlogrecovery.c | 117 -- src/backend/utils/misc/guc_tables.c | 12 ++ src/backend/utils/misc/postgresql.conf.sample | 3 + src/include/access/xlogrecovery.h | 1 + src/test/recovery/meson.build | 1 + src/test/recovery/t/042_wal_source_switch.pl | 113 + 8 files changed, 296 insertions(+), 15 deletions(-) create mode 100644 src/test/recovery/t/042_wal_source_switch.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index b38cbd714a..02e79f32fb 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5011,6 +5011,55 @@ ANY num_sync ( + streaming_replication_retry_interval (integer) + + streaming_replication_retry_interval configuration parameter + + + + +Specifies amount of time after which standby attempts to switch WAL +source from archive to streaming replication (i.e., getting WAL from +primary). However, the standby exhausts all the WAL present in +pg_wal directory before switching. If the standby +fails to switch to stream mode, it falls back to archive mode. If this +parameter value is specified without units, it is taken as seconds. +With a lower value for this parameter, the standby makes frequent WAL +source switch attempts. To avoid this, it is recommended to set a +value depending on the rate of WAL generation on the primary. If the +WAL files grow faster, the disk runs out of space sooner, so setting a +value to make frequent WAL source switch attempts can help. The default +is zero, disabling this feature. When disabled, the standby typically +switches to stream mode only after receiving WAL from archive finishes +(i.e., no more WAL left there) or fails for any reason. This parameter +can only be set in the postgresql.conf file or on +the server command line. + + + + Standby may not always attempt to switch source from WAL archive to + streaming replication at exact + streaming_replication_retry_interval intervals. For + example, if the parameter is set to 1min and + fetching WAL file from archive takes about 2min, + then the source switch attempt happens for the next WAL file after + the current WAL file fetched from archive is fully applied. +
Re: Switching XLog source from archive to streaming when primary available
On Tue, Mar 05, 2024 at 11:38:37PM +0530, Bharath Rupireddy wrote: > On Tue, Mar 5, 2024 at 7:34 AM Nathan Bossart > wrote: >> Is there any way to simplify this? For >> example, would it be possible to make an enum that tracks the >> streaming_replication_retry_interval state? > > I guess the way it is right now looks simple IMHO. If the suggestion > is to have an enum like below; it looks overkill for just two states. > > typedef enum > { > CAN_SWITCH_SOURCE, > SWITCH_SOURCE > } XLogSourceSwitchState; I was thinking of something more like typedef enum { NO_FORCE_SWITCH_TO_STREAMING, /* no switch necessary */ FORCE_SWITCH_TO_STREAMING_PENDING, /* exhausting pg_wal */ FORCE_SWITCH_TO_STREAMING, /* switch to streaming now */ } WALSourceSwitchState; At least, that illustrates my mental model of the process here. IMHO that's easier to follow than two similarly-named bool variables. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
On Tue, Mar 5, 2024 at 7:34 AM Nathan Bossart wrote: > > cfbot claims that this one needs another rebase. Yeah, the conflict was with the new TAP test file name in src/test/recovery/meson.build. > I've spent some time thinking about this one. I'll admit I'm a bit worried > about adding more complexity to this state machine, but I also haven't > thought of any other viable approaches, Right. I understand that the WaitForWALToBecomeAvailable()'s state machine is a complex piece. > and this still seems like a useful > feature. So, for now, I think we should continue with the current > approach. Yes, the feature is useful like mentioned in the docs as below: +Reading WAL from archive may not always be as efficient and fast as +reading from primary. This can be due to the differences in disk types, +IO costs, network latencies etc. All of these can impact the recovery +performance on standby, and can increase the replication lag on +primary. In addition, the primary keeps accumulating WAL needed for the +standby while the standby reads WAL from archive, because the standby +replication slot stays inactive. To avoid these problems, one can use +this parameter to make standby switch to stream mode sooner. > +fails to switch to stream mode, it falls back to archive mode. If > this > +parameter value is specified without units, it is taken as > +milliseconds. Default is 5min. With a lower value > > Does this really need to be milliseconds? I would think that any > reasonable setting would at least on the order of seconds. Agreed. Done that way. > +attempts. To avoid this, it is recommended to set a reasonable value. > > I think we might want to suggest what a "reasonable value" is. It really depends on the WAL generation rate on the primary. If the WAL files grow faster, the disk runs out of space sooner, so setting a value to make frequent WAL source switch attempts can help. It's hard to suggest a one-size-fits-all value. Therefore, I've tweaked the docs a bit to reflect the fact that it depends on the WAL generation rate. > + static bool canSwitchSource = false; > + boolswitchSource = false; > > IIUC "canSwitchSource" indicates that we are trying to force a switch to > streaming, but we are currently exhausting anything that's present in the > pg_wal directory, Right. > while "switchSource" indicates that we should force a > switch to streaming right now. It's not indicating force switch, it says "previously I was asked to switch source via canSwitchSource, now that I've exhausted all the WAL from the pg_wal directory, I'll make a source switch attempt now". > Furthermore, "canSwitchSource" is static > while "switchSource" is not. This is because the WaitForWALToBecomeAvailable() has to remember the decision (that streaming_replication_retry_interval has occurred) across the calls. And, switchSource is decided within WaitForWALToBecomeAvailable() for every function call. > Is there any way to simplify this? For > example, would it be possible to make an enum that tracks the > streaming_replication_retry_interval state? I guess the way it is right now looks simple IMHO. If the suggestion is to have an enum like below; it looks overkill for just two states. typedef enum { CAN_SWITCH_SOURCE, SWITCH_SOURCE } XLogSourceSwitchState; > /* > * Don't allow any retry loops to occur during > nonblocking > -* readahead. Let the caller process everything that > has been > -* decoded already first. > +* readahead if we failed to read from the current > source. Let the > +* caller process everything that has been decoded > already first. > */ > - if (nonblocking) > + if (nonblocking && lastSourceFailed) > return XLREAD_WOULDBLOCK; > > Why do we skip this when "switchSource" is set? It was leftover from the initial version of the patch - I was then encountering some issue and had that piece there. Removed it now. > + /* Reset the WAL source switch state */ > + if (switchSource) > + { > + Assert(canSwitchSource); > + Assert(currentSource == XLOG_FROM_STREAM); > + Assert(oldSource == XLOG_FROM_ARCHIVE); > + switchSource = false; > + canSwitchSource = false; > + } > > How do we know that oldSource is guaranteed to be XLOG_FROM_ARCHIVE? Is > there no way it could be XLOG_FROM_PG_WAL? No. switchSource is set to true only when canSwitchSource is set to true, which happens only when currentSource
Re: Switching XLog source from archive to streaming when primary available
cfbot claims that this one needs another rebase. I've spent some time thinking about this one. I'll admit I'm a bit worried about adding more complexity to this state machine, but I also haven't thought of any other viable approaches, and this still seems like a useful feature. So, for now, I think we should continue with the current approach. +fails to switch to stream mode, it falls back to archive mode. If this +parameter value is specified without units, it is taken as +milliseconds. Default is 5min. With a lower value Does this really need to be milliseconds? I would think that any reasonable setting would at least on the order of seconds. +attempts. To avoid this, it is recommended to set a reasonable value. I think we might want to suggest what a "reasonable value" is. + static bool canSwitchSource = false; + boolswitchSource = false; IIUC "canSwitchSource" indicates that we are trying to force a switch to streaming, but we are currently exhausting anything that's present in the pg_wal directory, while "switchSource" indicates that we should force a switch to streaming right now. Furthermore, "canSwitchSource" is static while "switchSource" is not. Is there any way to simplify this? For example, would it be possible to make an enum that tracks the streaming_replication_retry_interval state? /* * Don't allow any retry loops to occur during nonblocking -* readahead. Let the caller process everything that has been -* decoded already first. +* readahead if we failed to read from the current source. Let the +* caller process everything that has been decoded already first. */ - if (nonblocking) + if (nonblocking && lastSourceFailed) return XLREAD_WOULDBLOCK; Why do we skip this when "switchSource" is set? + /* Reset the WAL source switch state */ + if (switchSource) + { + Assert(canSwitchSource); + Assert(currentSource == XLOG_FROM_STREAM); + Assert(oldSource == XLOG_FROM_ARCHIVE); + switchSource = false; + canSwitchSource = false; + } How do we know that oldSource is guaranteed to be XLOG_FROM_ARCHIVE? Is there no way it could be XLOG_FROM_PG_WAL? +#streaming_replication_retry_interval = 5min # time after which standby + # attempts to switch WAL source from archive to + # streaming replication + # in milliseconds; 0 disables I think we might want to turn this feature off by default, at least for the first release. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
On Tue, Feb 20, 2024 at 11:54 AM Japin Li wrote: > > > On Tue, 20 Feb 2024 at 13:40, Bharath Rupireddy > wrote: > > On Mon, Feb 19, 2024 at 8:25 PM Japin Li wrote: > >> [2] > >> +# Ensure checkpoint doesn't come in our way > >> +$primary->append_conf('postgresql.conf', qq( > >> +min_wal_size = 2MB > >> +max_wal_size = 1GB > >> +checkpoint_timeout = 1h > >> + autovacuum = off > >> +)); > >> > >> Keeping the same indentation might be better. > > > > The autovacuum line looks mis-indented in the patch file. However, I > > now ran src/tools/pgindent/perltidyrc > > src/test/recovery/t/041_wal_source_switch.pl on it. > > > > Thanks for updating the patch. It seems still with the wrong indent. Thanks. perltidyrc didn't complain about anything on v19. However, I kept the alignment same as other TAP tests for multi-line append_conf. If that's not correct, I'll leave it to the committer to decide. PSA v20 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From beea9f0b8bbc76bb48dd1a5d64a5b52bafd09e6f Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Tue, 20 Feb 2024 07:31:29 + Subject: [PATCH v20] Allow standby to switch WAL source from archive to streaming A standby typically switches to streaming replication (get WAL from primary), only when receive from WAL archive finishes (no more WAL left there) or fails for any reason. Reading WAL from archive may not always be as efficient and fast as reading from primary. This can be due to the differences in disk types, IO costs, network latencies etc.. All of these can impact the recovery performance on standby and increase the replication lag on primary. In addition, the primary keeps accumulating WAL needed for the standby while the standby reads WAL from archive because the standby replication slot stays inactive. To avoid these problems, one can use this parameter to make standby switch to stream mode sooner. This feature adds a new GUC that specifies amount of time after which standby attempts to switch WAL source from WAL archive to streaming replication (getting WAL from primary). However, standby exhausts all the WAL present in pg_wal before switching. If standby fails to switch to stream mode, it falls back to archive mode. Author: Bharath Rupireddy Reviewed-by: Cary Huang, Nathan Bossart Reviewed-by: Kyotaro Horiguchi, SATYANARAYANA NARLAPURAM Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com --- doc/src/sgml/config.sgml | 48 doc/src/sgml/high-availability.sgml | 15 ++- src/backend/access/transam/xlogrecovery.c | 115 -- src/backend/utils/misc/guc_tables.c | 12 ++ src/backend/utils/misc/postgresql.conf.sample | 4 + src/include/access/xlogrecovery.h | 1 + src/test/recovery/meson.build | 1 + src/test/recovery/t/041_wal_source_switch.pl | 110 + 8 files changed, 287 insertions(+), 19 deletions(-) create mode 100644 src/test/recovery/t/041_wal_source_switch.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index ffd711b7f2..2fbf1ad6e1 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4872,6 +4872,54 @@ ANY num_sync ( + streaming_replication_retry_interval (integer) + + streaming_replication_retry_interval configuration parameter + + + + +Specifies amount of time after which standby attempts to switch WAL +source from archive to streaming replication (i.e., getting WAL from +primary). However, the standby exhausts all the WAL present in +pg_wal directory before switching. If the standby +fails to switch to stream mode, it falls back to archive mode. If this +parameter value is specified without units, it is taken as +milliseconds. Default is 5min. With a lower value +for this parameter, the standby makes frequent WAL source switch +attempts. To avoid this, it is recommended to set a reasonable value. +A setting of 0 disables the feature. When disabled, +the standby typically switches to stream mode only after receiving WAL +from archive finishes (i.e., no more WAL left there) or fails for any +reason. This parameter can only be set in the +postgresql.conf file or on the server command +line. + + + + Standby may not always attempt to switch source from WAL archive to + streaming replication at exact + streaming_replication_retry_interval intervals. For + example, if the parameter is set to 1min and + fetching WAL file from archive takes about 2min, + then the source switch attempt happens for the next WAL file after + current WAL file fetched from archive is fully applied. +
Re: Switching XLog source from archive to streaming when primary available
On Tue, 20 Feb 2024 at 13:40, Bharath Rupireddy wrote: > On Mon, Feb 19, 2024 at 8:25 PM Japin Li wrote: >> [2] >> +# Ensure checkpoint doesn't come in our way >> +$primary->append_conf('postgresql.conf', qq( >> +min_wal_size = 2MB >> +max_wal_size = 1GB >> +checkpoint_timeout = 1h >> + autovacuum = off >> +)); >> >> Keeping the same indentation might be better. > > The autovacuum line looks mis-indented in the patch file. However, I > now ran src/tools/pgindent/perltidyrc > src/test/recovery/t/041_wal_source_switch.pl on it. > Thanks for updating the patch. It seems still with the wrong indent. diff --git a/src/test/recovery/t/041_wal_source_switch.pl b/src/test/recovery/t/041_wal_source_switch.pl index 082680bf4a..b5eddba1d5 100644 --- a/src/test/recovery/t/041_wal_source_switch.pl +++ b/src/test/recovery/t/041_wal_source_switch.pl @@ -18,9 +18,9 @@ $primary->init( # Ensure checkpoint doesn't come in our way $primary->append_conf( 'postgresql.conf', qq( -min_wal_size = 2MB -max_wal_size = 1GB -checkpoint_timeout = 1h + min_wal_size = 2MB + max_wal_size = 1GB + checkpoint_timeout = 1h autovacuum = off )); $primary->start; @@ -85,7 +85,7 @@ my $offset = -s $standby->logfile; my $apply_delay = $retry_interval * 5; $standby->append_conf( 'postgresql.conf', qq( -recovery_min_apply_delay = '${apply_delay}ms' + recovery_min_apply_delay = '${apply_delay}ms' )); $standby->start;
Re: Switching XLog source from archive to streaming when primary available
On Mon, Feb 19, 2024 at 8:25 PM Japin Li wrote: > > > Strengthened tests a bit by using recovery_min_apply_delay to mimic > > standby spending some time fetching from archive. PSA v18 patch. > > Here are some minor comments: Thanks for taking a look at it. > [1] > +primary). However, the standby exhausts all the WAL present in pg_wal > > s|pg_wal|pg_wal|g Done. > [2] > +# Ensure checkpoint doesn't come in our way > +$primary->append_conf('postgresql.conf', qq( > +min_wal_size = 2MB > +max_wal_size = 1GB > +checkpoint_timeout = 1h > + autovacuum = off > +)); > > Keeping the same indentation might be better. The autovacuum line looks mis-indented in the patch file. However, I now ran src/tools/pgindent/perltidyrc src/test/recovery/t/041_wal_source_switch.pl on it. Please see the attached v19 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v19-0001-Allow-standby-to-switch-WAL-source-from-archive-.patch Description: Binary data
Re: Switching XLog source from archive to streaming when primary available
On Mon, 19 Feb 2024 at 18:36, Bharath Rupireddy wrote: > On Wed, Jan 31, 2024 at 6:30 PM Bharath Rupireddy > wrote: >> >> Needed a rebase due to commit 776621a (conflict in >> src/test/recovery/meson.build for new TAP test file added). Please >> find the attached v17 patch. > > Strengthened tests a bit by using recovery_min_apply_delay to mimic > standby spending some time fetching from archive. PSA v18 patch. Here are some minor comments: [1] +primary). However, the standby exhausts all the WAL present in pg_wal s|pg_wal|pg_wal|g [2] +# Ensure checkpoint doesn't come in our way +$primary->append_conf('postgresql.conf', qq( +min_wal_size = 2MB +max_wal_size = 1GB +checkpoint_timeout = 1h + autovacuum = off +)); Keeping the same indentation might be better.
Re: Switching XLog source from archive to streaming when primary available
On Wed, Jan 31, 2024 at 6:30 PM Bharath Rupireddy wrote: > > Needed a rebase due to commit 776621a (conflict in > src/test/recovery/meson.build for new TAP test file added). Please > find the attached v17 patch. Strengthened tests a bit by using recovery_min_apply_delay to mimic standby spending some time fetching from archive. PSA v18 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v18-0001-Allow-standby-to-switch-WAL-source-from-archive-.patch Description: Binary data
Re: Switching XLog source from archive to streaming when primary available
On Wed, Jan 3, 2024 at 4:58 PM Bharath Rupireddy wrote: > > On Thu, Dec 28, 2023 at 5:26 PM Bharath Rupireddy > wrote: > > > > I took a closer look at v14 and came up with the following changes: > > > > 1. Used advance_wal introduced by commit c161ab74f7. > > 2. Simplified the core logic and new TAP tests. > > 3. Reworded the comments and docs. > > 4. Simplified new DEBUG messages. > > > > I've attached the v15 patch for further review. > > Per a recent commit c538592, FATAL-ized perl warnings in the newly > added TAP test and attached the v16 patch. Needed a rebase due to commit 776621a (conflict in src/test/recovery/meson.build for new TAP test file added). Please find the attached v17 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From 95a6f6c484dcff68f01ff90d9011abff2f15ad89 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Wed, 31 Jan 2024 11:59:02 + Subject: [PATCH v17] Allow standby to switch WAL source from archive to streaming A standby typically switches to streaming replication (get WAL from primary), only when receive from WAL archive finishes (no more WAL left there) or fails for any reason. Reading WAL from archive may not always be as efficient and fast as reading from primary. This can be due to the differences in disk types, IO costs, network latencies etc.. All of these can impact the recovery performance on standby and increase the replication lag on primary. In addition, the primary keeps accumulating WAL needed for the standby while the standby reads WAL from archive because the standby replication slot stays inactive. To avoid these problems, one can use this parameter to make standby switch to stream mode sooner. This feature adds a new GUC that specifies amount of time after which standby attempts to switch WAL source from WAL archive to streaming replication (getting WAL from primary). However, standby exhausts all the WAL present in pg_wal before switching. If standby fails to switch to stream mode, it falls back to archive mode. Author: Bharath Rupireddy Reviewed-by: Cary Huang, Nathan Bossart Reviewed-by: Kyotaro Horiguchi, SATYANARAYANA NARLAPURAM Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com --- doc/src/sgml/config.sgml | 47 +++ doc/src/sgml/high-availability.sgml | 15 ++- src/backend/access/transam/xlogrecovery.c | 115 -- src/backend/utils/misc/guc_tables.c | 12 ++ src/backend/utils/misc/postgresql.conf.sample | 4 + src/include/access/xlogrecovery.h | 1 + src/test/recovery/meson.build | 1 + src/test/recovery/t/041_wal_source_switch.pl | 93 ++ 8 files changed, 269 insertions(+), 19 deletions(-) create mode 100644 src/test/recovery/t/041_wal_source_switch.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 61038472c5..f0e45cf49d 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4867,6 +4867,53 @@ ANY num_sync ( + streaming_replication_retry_interval (integer) + + streaming_replication_retry_interval configuration parameter + + + + +Specifies amount of time after which standby attempts to switch WAL +source from archive to streaming replication (i.e., getting WAL from +primary). However, the standby exhausts all the WAL present in pg_wal +before switching. If the standby fails to switch to stream mode, it +falls back to archive mode. If this parameter value is specified +without units, it is taken as milliseconds. Default is +5min. With a lower value for this parameter, the +standby makes frequent WAL source switch attempts. To avoid this, it is +recommended to set a reasonable value. A setting of 0 +disables the feature. When disabled, the standby typically switches to +stream mode only after receiving WAL from archive finishes (i.e., no +more WAL left there) or fails for any reason. This parameter can only +be set in the postgresql.conf file or on the +server command line. + + + + Standby may not always attempt to switch source from WAL archive to + streaming replication at exact + streaming_replication_retry_interval intervals. For + example, if the parameter is set to 1min and + fetching WAL file from archive takes about 2min, + then the source switch attempt happens for the next WAL file after + current WAL file fetched from archive is fully applied. + + + +Reading WAL from archive may not always be as efficient and fast as +reading from primary. This can be due to the differences in disk types, +IO costs, network latencies etc.. All of these can impact the
Re: Switching XLog source from archive to streaming when primary available
On Thu, Dec 28, 2023 at 5:26 PM Bharath Rupireddy wrote: > > I took a closer look at v14 and came up with the following changes: > > 1. Used advance_wal introduced by commit c161ab74f7. > 2. Simplified the core logic and new TAP tests. > 3. Reworded the comments and docs. > 4. Simplified new DEBUG messages. > > I've attached the v15 patch for further review. Per a recent commit c538592, FATAL-ized perl warnings in the newly added TAP test and attached the v16 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v16-0001-Allow-standby-to-switch-WAL-source-from-archive-.patch Description: Binary data
Re: Switching XLog source from archive to streaming when primary available
On Sat, Oct 21, 2023 at 11:59 PM Bharath Rupireddy wrote: > > On Fri, Jul 21, 2023 at 12:38 PM Bharath Rupireddy > wrote: > > > > Needed a rebase. I'm attaching the v13 patch for further consideration. > > Needed a rebase. I'm attaching the v14 patch. It also has the following > changes: > > - Ran pgindent on the new source code. > - Ran pgperltidy on the new TAP test. > - Improved the newly added TAP test a bit. Used the new wait_for_log > core TAP function in place of custom find_in_log. > > Thoughts? I took a closer look at v14 and came up with the following changes: 1. Used advance_wal introduced by commit c161ab74f7. 2. Simplified the core logic and new TAP tests. 3. Reworded the comments and docs. 4. Simplified new DEBUG messages. I've attached the v15 patch for further review. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From 8bdd3b999343c02ae01a7d15ef7fdd0be25623bd Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Thu, 28 Dec 2023 11:14:05 + Subject: [PATCH v15] Allow standby to switch WAL source from archive to streaming A standby typically switches to streaming replication (get WAL from primary), only when receive from WAL archive finishes (no more WAL left there) or fails for any reason. Reading WAL from archive may not always be as efficient and fast as reading from primary. This can be due to the differences in disk types, IO costs, network latencies etc.. All of these can impact the recovery performance on standby and increase the replication lag on primary. In addition, the primary keeps accumulating WAL needed for the standby while the standby reads WAL from archive because the standby replication slot stays inactive. To avoid these problems, one can use this parameter to make standby switch to stream mode sooner. This feature adds a new GUC that specifies amount of time after which standby attempts to switch WAL source from WAL archive to streaming replication (getting WAL from primary). However, standby exhausts all the WAL present in pg_wal before switching. If standby fails to switch to stream mode, it falls back to archive mode. Author: Bharath Rupireddy Reviewed-by: Cary Huang, Nathan Bossart Reviewed-by: Kyotaro Horiguchi, SATYANARAYANA NARLAPURAM Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com --- doc/src/sgml/config.sgml | 47 +++ doc/src/sgml/high-availability.sgml | 15 ++- src/backend/access/transam/xlogrecovery.c | 115 -- src/backend/utils/misc/guc_tables.c | 12 ++ src/backend/utils/misc/postgresql.conf.sample | 4 + src/include/access/xlogrecovery.h | 1 + src/test/recovery/meson.build | 1 + src/test/recovery/t/040_wal_source_switch.pl | 93 ++ 8 files changed, 269 insertions(+), 19 deletions(-) create mode 100644 src/test/recovery/t/040_wal_source_switch.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index b5624ca884..04aa2fa8d2 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4866,6 +4866,53 @@ ANY num_sync ( + streaming_replication_retry_interval (integer) + + streaming_replication_retry_interval configuration parameter + + + + +Specifies amount of time after which standby attempts to switch WAL +source from archive to streaming replication (i.e., getting WAL from +primary). However, the standby exhausts all the WAL present in pg_wal +before switching. If the standby fails to switch to stream mode, it +falls back to archive mode. If this parameter value is specified +without units, it is taken as milliseconds. Default is +5min. With a lower value for this parameter, the +standby makes frequent WAL source switch attempts. To avoid this, it is +recommended to set a reasonable value. A setting of 0 +disables the feature. When disabled, the standby typically switches to +stream mode only after receiving WAL from archive finishes (i.e., no +more WAL left there) or fails for any reason. This parameter can only +be set in the postgresql.conf file or on the +server command line. + + + + Standby may not always attempt to switch source from WAL archive to + streaming replication at exact + streaming_replication_retry_interval intervals. For + example, if the parameter is set to 1min and + fetching WAL file from archive takes about 2min, + then the source switch attempt happens for the next WAL file after + current WAL file fetched from archive is fully applied. + + + +Reading WAL from archive may not always be as efficient and fast as +reading from primary. This can be due to the differences
Re: Switching XLog source from archive to streaming when primary available
On Fri, Jul 21, 2023 at 12:38 PM Bharath Rupireddy wrote: > > Needed a rebase. I'm attaching the v13 patch for further consideration. Needed a rebase. I'm attaching the v14 patch. It also has the following changes: - Ran pgindent on the new source code. - Ran pgperltidy on the new TAP test. - Improved the newly added TAP test a bit. Used the new wait_for_log core TAP function in place of custom find_in_log. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From e6010e3b2e4c52d32a5d2f3eb2d59954617b221b Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Sat, 21 Oct 2023 13:41:46 + Subject: [PATCH v14] Allow standby to switch WAL source from archive to streaming A standby typically switches to streaming replication (get WAL from primary), only when receive from WAL archive finishes (no more WAL left there) or fails for any reason. Reading WAL from archive may not always be as efficient and fast as reading from primary. This can be due to the differences in disk types, IO costs, network latencies etc., all impacting recovery performance on standby. And, while standby is reading WAL from archive, primary accumulates WAL because the standby's replication slot stays inactive. To avoid these problems, one can use this parameter to make standby switch to stream mode sooner. This feature adds a new GUC that specifies amount of time after which standby attempts to switch WAL source from WAL archive to streaming replication (getting WAL from primary). However, standby exhausts all the WAL present in pg_wal before switching. If standby fails to switch to stream mode, it falls back to archive mode. Reported-by: SATYANARAYANA NARLAPURAM Author: Bharath Rupireddy Reviewed-by: Cary Huang, Nathan Bossart Reviewed-by: Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com --- doc/src/sgml/config.sgml | 45 ++ doc/src/sgml/high-availability.sgml | 15 +- src/backend/access/transam/xlogrecovery.c | 146 -- src/backend/utils/misc/guc_tables.c | 12 ++ src/backend/utils/misc/postgresql.conf.sample | 4 + src/include/access/xlogrecovery.h | 1 + src/test/recovery/meson.build | 1 + src/test/recovery/t/040_wal_source_switch.pl | 130 8 files changed, 333 insertions(+), 21 deletions(-) create mode 100644 src/test/recovery/t/040_wal_source_switch.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 3839c72c86..3a18ba9b26 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4791,6 +4791,51 @@ ANY num_sync ( + streaming_replication_retry_interval (integer) + + streaming_replication_retry_interval configuration parameter + + + + +Specifies amount of time after which standby attempts to switch WAL +source from WAL archive to streaming replication (getting WAL from +primary). However, standby exhausts all the WAL present in pg_wal +before switching. If standby fails to switch to stream mode, it falls +back to archive mode. If this parameter's value is specified without +units, it is taken as milliseconds. Default is five minutes +(5min). With a lower value for this parameter, +standby makes frequent WAL source switch attempts. To avoid this, it is +recommended to set a reasonable value. A setting of 0 +disables the feature. When disabled, standby typically switches to +stream mode only when receive from WAL archive finishes (no more WAL +left there) or fails for any reason. This parameter can only be set in +the postgresql.conf file or on the server command +line. + + + + Standby may not always attempt to switch source from WAL archive to + streaming replication at exact streaming_replication_retry_interval + intervals. For example, if the parameter is set to 1min + and fetching WAL file from archive takes about 2min, + then the source switch attempt happens for the next WAL file after + current WAL file fetched from archive is fully applied. + + + +Reading WAL from archive may not always be as efficient and fast as +reading from primary. This can be due to the differences in disk types, +IO costs, network latencies etc., all impacting recovery performance on +standby. And, while standby is reading WAL from archive, primary +accumulates WAL because the standby's replication slot stays inactive. +To avoid these problems, one can use this parameter to make standby +switch to stream mode sooner. + + + + recovery_min_apply_delay (integer) diff --git
Re: Switching XLog source from archive to streaming when primary available
On Tue, Apr 25, 2023 at 9:27 PM Bharath Rupireddy wrote: > > > Needed a rebase. I'm attaching the v11 patch for further review. > > Needed a rebase, so attaching the v12 patch. I word-smithed comments > and docs a bit. Needed a rebase. I'm attaching the v13 patch for further consideration. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v13-0001-Allow-standby-to-switch-WAL-source-from-archive-.patch Description: Binary data
Re: Switching XLog source from archive to streaming when primary available
On Fri, Feb 24, 2023 at 10:26 AM Bharath Rupireddy wrote: > > On Wed, Nov 16, 2022 at 11:39 AM Bharath Rupireddy > wrote: > > > > I'm attaching the v10 patch for further review. > > Needed a rebase. I'm attaching the v11 patch for further review. Needed a rebase, so attaching the v12 patch. I word-smithed comments and docs a bit. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From 80a982fb1c0d0737c5144d8ef50bb5e3ff845d07 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Tue, 25 Apr 2023 08:36:48 + Subject: [PATCH v12] Allow standby to switch WAL source from archive to streaming replication A standby typically switches to streaming replication (get WAL from primary), only when receive from WAL archive finishes (no more WAL left there) or fails for any reason. Reading WAL from archive may not always be as efficient and fast as reading from primary. This can be due to the differences in disk types, IO costs, network latencies etc., all impacting recovery performance on standby. And, while standby is reading WAL from archive, primary accumulates WAL because the standby's replication slot stays inactive. To avoid these problems, one can use this parameter to make standby switch to stream mode sooner. This feature adds a new GUC that specifies amount of time after which standby attempts to switch WAL source from WAL archive to streaming replication (getting WAL from primary). However, standby exhausts all the WAL present in pg_wal before switching. If standby fails to switch to stream mode, it falls back to archive mode. Reported-by: SATYANARAYANA NARLAPURAM Author: Bharath Rupireddy Reviewed-by: Cary Huang, Nathan Bossart Reviewed-by: Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com --- doc/src/sgml/config.sgml | 45 ++ doc/src/sgml/high-availability.sgml | 15 +- src/backend/access/transam/xlogrecovery.c | 144 -- src/backend/utils/misc/guc_tables.c | 12 ++ src/backend/utils/misc/postgresql.conf.sample | 4 + src/include/access/xlogrecovery.h | 1 + src/test/recovery/meson.build | 1 + src/test/recovery/t/037_wal_source_switch.pl | 126 +++ 8 files changed, 328 insertions(+), 20 deletions(-) create mode 100644 src/test/recovery/t/037_wal_source_switch.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index b56f073a91..8050f981e9 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4923,6 +4923,51 @@ ANY num_sync ( + streaming_replication_retry_interval (integer) + + streaming_replication_retry_interval configuration parameter + + + + +Specifies amount of time after which standby attempts to switch WAL +source from WAL archive to streaming replication (getting WAL from +primary). However, standby exhausts all the WAL present in pg_wal +before switching. If standby fails to switch to stream mode, it falls +back to archive mode. If this parameter's value is specified without +units, it is taken as milliseconds. Default is five minutes +(5min). With a lower value for this parameter, +standby makes frequent WAL source switch attempts. To avoid this, it is +recommended to set a reasonable value. A setting of 0 +disables the feature. When disabled, standby typically switches to +stream mode only when receive from WAL archive finishes (no more WAL +left there) or fails for any reason. This parameter can only be set in +the postgresql.conf file or on the server command +line. + + + + Standby may not always attempt to switch source from WAL archive to + streaming replication at exact streaming_replication_retry_interval + intervals. For example, if the parameter is set to 1min + and fetching WAL file from archive takes about 2min, + then the source switch attempt happens for the next WAL file after + current WAL file fetched from archive is fully applied. + + + +Reading WAL from archive may not always be as efficient and fast as +reading from primary. This can be due to the differences in disk types, +IO costs, network latencies etc., all impacting recovery performance on +standby. And, while standby is reading WAL from archive, primary +accumulates WAL because the standby's replication slot stays inactive. +To avoid these problems, one can use this parameter to make standby +switch to stream mode sooner. + + + + recovery_min_apply_delay (integer) diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index
Re: Switching XLog source from archive to streaming when primary available
On Wed, Nov 16, 2022 at 11:39 AM Bharath Rupireddy wrote: > > I'm attaching the v10 patch for further review. Needed a rebase. I'm attaching the v11 patch for further review. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From 6cf6c9bc0a6d6c4325459cddc22fd6fcdc970a03 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Fri, 24 Feb 2023 04:38:13 + Subject: [PATCH v11] Allow standby to switch WAL source from archive to streaming replication A standby typically switches to streaming replication (get WAL from primary), only when receive from WAL archive finishes (no more WAL left there) or fails for any reason. Reading WAL from archive may not always be efficient and cheaper because network latencies, disk IO cost might differ on the archive as compared to primary and often the archive may sit far from standby impacting recovery performance on the standby. Hence reading WAL from the primary, by setting this parameter, as opposed to the archive enables the standby to catch up with the primary sooner thus reducing replication lag and avoiding WAL files accumulation on the primary. This feature adds a new GUC that specifies amount of time after which standby attempts to switch WAL source from WAL archive to streaming replication. However, exhaust all the WAL present in pg_wal before switching. If the standby fails to switch to stream mode, it falls back to archive mode. Reported-by: SATYANARAYANA NARLAPURAM Author: Bharath Rupireddy Reviewed-by: Cary Huang, Nathan Bossart Reviewed-by: Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com --- doc/src/sgml/config.sgml | 46 + doc/src/sgml/high-availability.sgml | 7 + src/backend/access/transam/xlogrecovery.c | 158 -- src/backend/utils/misc/guc_tables.c | 12 ++ src/backend/utils/misc/postgresql.conf.sample | 4 + src/include/access/xlogrecovery.h | 1 + src/test/recovery/meson.build | 1 + src/test/recovery/t/035_wal_source_switch.pl | 126 ++ 8 files changed, 338 insertions(+), 17 deletions(-) create mode 100644 src/test/recovery/t/035_wal_source_switch.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e5c41cc6c6..fd400ac662 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4884,6 +4884,52 @@ ANY num_sync ( + streaming_replication_retry_interval (integer) + + streaming_replication_retry_interval configuration parameter + + + + +Specifies amount of time after which standby attempts to switch WAL +source from WAL archive to streaming replication (get WAL from +primary). However, exhaust all the WAL present in pg_wal before +switching. If the standby fails to switch to stream mode, it falls +back to archive mode. +If this value is specified without units, it is taken as milliseconds. +The default is five minutes (5min). +With a lower setting of this parameter, the standby makes frequent +WAL source switch attempts when the primary is lost for quite longer. +To avoid this, set a reasonable value. +A setting of 0 disables the feature. When disabled, +the standby typically switches to stream mode, only when receive from +WAL archive finishes (no more WAL left there) or fails for any reason. +This parameter can only be set in +the postgresql.conf file or on the server +command line. + + +Reading WAL from archive may not always be efficient and cheaper +because network latencies, disk IO cost might differ on the archive as +compared to primary and often the archive may sit far from standby +impacting recovery performance on the standby. Hence reading WAL +from the primary, by setting this parameter, as opposed to the archive +enables the standby to catch up with the primary sooner thus reducing +replication lag and avoiding WAL files accumulation on the primary. + + +Note that the standby may not always attempt to switch source from +WAL archive to streaming replication at exact +streaming_replication_retry_interval intervals. +For example, if the parameter is set to 1min and +fetching from WAL archive takes 5min, then the +source switch attempt happens for the next WAL after current WAL is +fetched from WAL archive and applied. + + + + recovery_min_apply_delay (integer) diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index f180607528..0323f4cc43 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -636,6 +636,13 @@
Re: Switching XLog source from archive to streaming when primary available
On Thu, Jan 19, 2023 at 6:20 AM Nathan Bossart wrote: > > On Tue, Jan 17, 2023 at 07:44:52PM +0530, Bharath Rupireddy wrote: > > On Thu, Jan 12, 2023 at 6:21 AM Nathan Bossart > > wrote: > >> With your patch, we might replay one of these "old" files in pg_wal instead > >> of the complete version of the file from the archives, > > > > That's true even today, without the patch, no? We're not changing the > > existing behaviour of the state machine. Can you explain how it > > happens with the patch? > > My point is that on HEAD, we will always prefer a complete archive file. > With your patch, we might instead choose to replay an old file in pg_wal > because we are artificially advancing the state machine. IOW even if > there's a complete archive available, we might not use it. This is a > behavior change, but I think it is okay. Oh, yeah, I too agree that it's okay because manually copying WAL files directly to pg_wal (which eventually get replayed before switching to streaming) isn't recommended anyway for production level servers. I think, we covered it in the documentation that it exhausts all the WAL present in pg_wal before switching. Isn't that enough? +Specifies amount of time after which standby attempts to switch WAL +source from WAL archive to streaming replication (get WAL from +primary). However, exhaust all the WAL present in pg_wal before +switching. If the standby fails to switch to stream mode, it falls +back to archive mode. > >> Would you mind testing this scenario? ndby should receive f6 via archive and replay it (check the > > replay lsn an> > > > I meant testing the scenario where there's an old file in pg_wal, a > complete file in the archives, and your new GUC forces replay of the > former. This might be difficult to do in a TAP test. Ultimately, I just > want to validate the assumptions discussed above. I think testing the scenario [1] is achievable. I could write a TAP test for it - https://github.com/BRupireddy/postgres/tree/prefer_archived_wal_v1. It's a bit flaky and needs a little more work (1 - writing a custom script for restore_command that sleeps only after fetching an existing WAL file from archive, not sleeping for a history file or a non-existent WAL file. 2- finding a command-line way to sleep on Windows.) to stabilize it, but it seems doable. I can spend some more time, if one thinks that the test is worth adding to the core, perhaps discussing it separately from this thread. [1] RestoreArchivedFile(): /* * When doing archive recovery, we always prefer an archived log file even * if a file of the same name exists in XLOGDIR. The reason is that the * file in XLOGDIR could be an old, un-filled or partly-filled version * that was copied and restored as part of backing up $PGDATA. * -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
On Tue, Jan 17, 2023 at 07:44:52PM +0530, Bharath Rupireddy wrote: > On Thu, Jan 12, 2023 at 6:21 AM Nathan Bossart > wrote: >> With your patch, we might replay one of these "old" files in pg_wal instead >> of the complete version of the file from the archives, > > That's true even today, without the patch, no? We're not changing the > existing behaviour of the state machine. Can you explain how it > happens with the patch? My point is that on HEAD, we will always prefer a complete archive file. With your patch, we might instead choose to replay an old file in pg_wal because we are artificially advancing the state machine. IOW even if there's a complete archive available, we might not use it. This is a behavior change, but I think it is okay. >> Would you mind testing this scenario? > > How about something like below for testing the above scenario? If it > looks okay, I can add it as a new TAP test file. > > 1. Generate WAL files f1 and f2 and archive them. > 2. Check the replay lsn and WAL file name on the standby, when it > replays upto f2, stop the standby. > 3. Set recovery to fail on the standby, and stop the standby. > 4. Generate f3, f4 (partially filled) on the primary. > 5. Manually copy f3, f4 to the standby's pg_wal. > 6. Start the standby, since recovery is set to fail, and there're new > WAL files (f3, f4) under its pg_wal, it must replay those WAL files > (check the replay lsn and WAL file name, it must be f4) before > switching to streaming. > 7. Generate f5 on the primary. > 8. The standby should receive f5 and replay it (check the replay lsn > and WAL file name, it must be f5). > 9. Set streaming to fail on the standby and set recovery to succeed. > 10. Generate f6 on the primary. > 11. The standby should receive f6 via archive and replay it (check the > replay lsn and WAL file name, it must be f6). I meant testing the scenario where there's an old file in pg_wal, a complete file in the archives, and your new GUC forces replay of the former. This might be difficult to do in a TAP test. Ultimately, I just want to validate the assumptions discussed above. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
On Thu, Jan 12, 2023 at 6:21 AM Nathan Bossart wrote: > > On Tue, Oct 18, 2022 at 07:31:33AM +0530, Bharath Rupireddy wrote: > > In summary, the standby state machine in WaitForWALToBecomeAvailable() > > exhausts all the WAL in pg_wal before switching to streaming after > > failing to fetch from archive. The v8 patch proposed upthread deviates > > from this behaviour. Hence, attaching v9 patch that keeps the > > behaviour as-is, that means, the standby exhausts all the WAL in > > pg_wal before switching to streaming after fetching WAL from archive > > for at least streaming_replication_retry_interval milliseconds. > > I think this is okay. The following comment explains why archives are > preferred over existing files in pg_wal: > > * When doing archive recovery, we always prefer an archived log file > even > * if a file of the same name exists in XLOGDIR. The reason is that > the > * file in XLOGDIR could be an old, un-filled or partly-filled version > * that was copied and restored as part of backing up $PGDATA. > > With your patch, we might replay one of these "old" files in pg_wal instead > of the complete version of the file from the archives, That's true even today, without the patch, no? We're not changing the existing behaviour of the state machine. Can you explain how it happens with the patch? On HEAD, after failing to read from the archive, exhaust all wal from pg_wal and then switch to streaming mode. With the patch, after reading from the archive for at least streaming_replication_retry_interval milliseconds, exhaust all wal from pg_wal and then switch to streaming mode. > but I think that is > still correct. We'll just replay whatever exists in pg_wal (which may be > un-filled or partly-filled) before attempting streaming. If that fails, > we'll go back to trying the archives again. > > Would you mind testing this scenario? How about something like below for testing the above scenario? If it looks okay, I can add it as a new TAP test file. 1. Generate WAL files f1 and f2 and archive them. 2. Check the replay lsn and WAL file name on the standby, when it replays upto f2, stop the standby. 3. Set recovery to fail on the standby, and stop the standby. 4. Generate f3, f4 (partially filled) on the primary. 5. Manually copy f3, f4 to the standby's pg_wal. 6. Start the standby, since recovery is set to fail, and there're new WAL files (f3, f4) under its pg_wal, it must replay those WAL files (check the replay lsn and WAL file name, it must be f4) before switching to streaming. 7. Generate f5 on the primary. 8. The standby should receive f5 and replay it (check the replay lsn and WAL file name, it must be f5). 9. Set streaming to fail on the standby and set recovery to succeed. 10. Generate f6 on the primary. 11. The standby should receive f6 via archive and replay it (check the replay lsn and WAL file name, it must be f6). If needed, we can look out for these messages to confirm it works as expected: elog(DEBUG2, "switched WAL source from %s to %s after %s", xlogSourceNames[oldSource], xlogSourceNames[currentSource], lastSourceFailed ? "failure" : "success"); ereport(LOG, (errmsg("restored log file \"%s\" from archive", xlogfname))); Essentially, it covers what the documentation https://www.postgresql.org/docs/devel/warm-standby.html says: "In standby mode, the server continuously applies WAL received from the primary server. The standby server can read WAL from a WAL archive (see restore_command) or directly from the primary over a TCP connection (streaming replication). The standby server will also attempt to restore any WAL found in the standby cluster's pg_wal directory. That typically happens after a server restart, when the standby replays again WAL that was streamed from the primary before the restart, but you can also manually copy files to pg_wal at any time to have them replayed." Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
On Tue, Oct 18, 2022 at 07:31:33AM +0530, Bharath Rupireddy wrote: > In summary, the standby state machine in WaitForWALToBecomeAvailable() > exhausts all the WAL in pg_wal before switching to streaming after > failing to fetch from archive. The v8 patch proposed upthread deviates > from this behaviour. Hence, attaching v9 patch that keeps the > behaviour as-is, that means, the standby exhausts all the WAL in > pg_wal before switching to streaming after fetching WAL from archive > for at least streaming_replication_retry_interval milliseconds. I think this is okay. The following comment explains why archives are preferred over existing files in pg_wal: * When doing archive recovery, we always prefer an archived log file even * if a file of the same name exists in XLOGDIR. The reason is that the * file in XLOGDIR could be an old, un-filled or partly-filled version * that was copied and restored as part of backing up $PGDATA. With your patch, we might replay one of these "old" files in pg_wal instead of the complete version of the file from the archives, but I think that is still correct. We'll just replay whatever exists in pg_wal (which may be un-filled or partly-filled) before attempting streaming. If that fails, we'll go back to trying the archives again. Would you mind testing this scenario? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
On Wed, Nov 16, 2022 at 9:38 AM Ian Lawrence Barwick wrote: > > While reviewing the patch backlog, we have determined that this patch adds > one or more TAP tests but has not added the test to the "meson.build" file. Thanks for pointing it out. Yeah, the test wasn't picking up on meson builds. I added the new test file name in src/test/recovery/meson.build. I'm attaching the v10 patch for further review. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From 70323a5c96a889a21a5339c531193b6440d2e270 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Wed, 16 Nov 2022 05:34:41 + Subject: [PATCH v10] Allow standby to switch WAL source from archive to streaming replication A standby typically switches to streaming replication (get WAL from primary), only when receive from WAL archive finishes (no more WAL left there) or fails for any reason. Reading WAL from archive may not always be efficient and cheaper because network latencies, disk IO cost might differ on the archive as compared to primary and often the archive may sit far from standby impacting recovery performance on the standby. Hence reading WAL from the primary, by setting this parameter, as opposed to the archive enables the standby to catch up with the primary sooner thus reducing replication lag and avoiding WAL files accumulation on the primary. This feature adds a new GUC that specifies amount of time after which standby attempts to switch WAL source from WAL archive to streaming replication. However, exhaust all the WAL present in pg_wal before switching. If the standby fails to switch to stream mode, it falls back to archive mode. Reported-by: SATYANARAYANA NARLAPURAM Author: Bharath Rupireddy Reviewed-by: Cary Huang, Nathan Bossart Reviewed-by: Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com --- doc/src/sgml/config.sgml | 46 + doc/src/sgml/high-availability.sgml | 7 + src/backend/access/transam/xlogrecovery.c | 158 -- src/backend/utils/misc/guc_tables.c | 12 ++ src/backend/utils/misc/postgresql.conf.sample | 4 + src/include/access/xlogrecovery.h | 1 + src/test/recovery/meson.build | 1 + src/test/recovery/t/034_wal_source_switch.pl | 126 ++ 8 files changed, 338 insertions(+), 17 deletions(-) create mode 100644 src/test/recovery/t/034_wal_source_switch.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index bd50ea8e48..8a3a9ff296 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4852,6 +4852,52 @@ ANY num_sync ( + streaming_replication_retry_interval (integer) + + streaming_replication_retry_interval configuration parameter + + + + +Specifies amount of time after which standby attempts to switch WAL +source from WAL archive to streaming replication (get WAL from +primary). However, exhaust all the WAL present in pg_wal before +switching. If the standby fails to switch to stream mode, it falls +back to archive mode. +If this value is specified without units, it is taken as milliseconds. +The default is five minutes (5min). +With a lower setting of this parameter, the standby makes frequent +WAL source switch attempts when the primary is lost for quite longer. +To avoid this, set a reasonable value. +A setting of 0 disables the feature. When disabled, +the standby typically switches to stream mode, only when receive from +WAL archive finishes (no more WAL left there) or fails for any reason. +This parameter can only be set in +the postgresql.conf file or on the server +command line. + + +Reading WAL from archive may not always be efficient and cheaper +because network latencies, disk IO cost might differ on the archive as +compared to primary and often the archive may sit far from standby +impacting recovery performance on the standby. Hence reading WAL +from the primary, by setting this parameter, as opposed to the archive +enables the standby to catch up with the primary sooner thus reducing +replication lag and avoiding WAL files accumulation on the primary. + + +Note that the standby may not always attempt to switch source from +WAL archive to streaming replication at exact +streaming_replication_retry_interval intervals. +For example, if the parameter is set to 1min and +fetching from WAL archive takes 5min, then the +source switch attempt happens for the next WAL after current WAL is +fetched from WAL archive and applied. + + + + recovery_min_apply_delay (integer)
Re: Switching XLog source from archive to streaming when primary available
2022年10月18日(火) 11:02 Bharath Rupireddy : > > On Tue, Oct 11, 2022 at 8:40 AM Nathan Bossart > wrote: > > > > On Mon, Oct 10, 2022 at 11:33:57AM +0530, Bharath Rupireddy wrote: > > > On Mon, Oct 10, 2022 at 3:17 AM Nathan Bossart > > > wrote: > > >> I wonder if it would be better to simply remove this extra polling of > > >> pg_wal as a prerequisite to your patch. The existing commentary leads me > > >> to think there might not be a strong reason for this behavior, so it > > >> could > > >> be a nice way to simplify your patch. > > > > > > I don't think it's a good idea to remove that completely. As said > > > above, it might help someone, we never know. > > > > It would be great to hear whether anyone is using this functionality. If > > no one is aware of existing usage and there is no interest in keeping it > > around, I don't think it would be unreasonable to remove it in v16. > > It seems like exhausting all the WAL in pg_wal before switching to > streaming after failing to fetch from archive is unremovable. I found > this after experimenting with it, here are my findings: > 1. The standby has to recover initial WAL files in the pg_wal > directory even for the normal post-restart/first-time-start case, I > mean, in non-crash recovery case. > 2. The standby received WAL files from primary (walreceiver just > writes and flushes the received WAL to WAL files under pg_wal) > pretty-fast and/or standby recovery is slow, say both the standby > connection to primary and archive connection are broken for whatever > reasons, then it has WAL files to recover in pg_wal directory. > > I think the fundamental behaviour for the standy is that it has to > fully recover to the end of WAL under pg_wal no matter who copies WAL > files there. I fully understand the consequences of manually copying > WAL files into pg_wal, for that matter, manually copying/tinkering any > other files into/under the data directory is something we don't > recommend and encourage. > > In summary, the standby state machine in WaitForWALToBecomeAvailable() > exhausts all the WAL in pg_wal before switching to streaming after > failing to fetch from archive. The v8 patch proposed upthread deviates > from this behaviour. Hence, attaching v9 patch that keeps the > behaviour as-is, that means, the standby exhausts all the WAL in > pg_wal before switching to streaming after fetching WAL from archive > for at least streaming_replication_retry_interval milliseconds. > > Please review the v9 patch further. Thanks for the updated patch. While reviewing the patch backlog, we have determined that this patch adds one or more TAP tests but has not added the test to the "meson.build" file. To do this, locate the relevant "meson.build" file for each test and add it in the 'tests' dictionary, which will look something like this: 'tap': { 'tests': [ 't/001_basic.pl', ], }, For some additional details please see this Wiki article: https://wiki.postgresql.org/wiki/Meson_for_patch_authors For more information on the meson build system for PostgreSQL see: https://wiki.postgresql.org/wiki/Meson Regards Ian Barwick
Re: Switching XLog source from archive to streaming when primary available
On Tue, Oct 11, 2022 at 8:40 AM Nathan Bossart wrote: > > On Mon, Oct 10, 2022 at 11:33:57AM +0530, Bharath Rupireddy wrote: > > On Mon, Oct 10, 2022 at 3:17 AM Nathan Bossart > > wrote: > >> I wonder if it would be better to simply remove this extra polling of > >> pg_wal as a prerequisite to your patch. The existing commentary leads me > >> to think there might not be a strong reason for this behavior, so it could > >> be a nice way to simplify your patch. > > > > I don't think it's a good idea to remove that completely. As said > > above, it might help someone, we never know. > > It would be great to hear whether anyone is using this functionality. If > no one is aware of existing usage and there is no interest in keeping it > around, I don't think it would be unreasonable to remove it in v16. It seems like exhausting all the WAL in pg_wal before switching to streaming after failing to fetch from archive is unremovable. I found this after experimenting with it, here are my findings: 1. The standby has to recover initial WAL files in the pg_wal directory even for the normal post-restart/first-time-start case, I mean, in non-crash recovery case. 2. The standby received WAL files from primary (walreceiver just writes and flushes the received WAL to WAL files under pg_wal) pretty-fast and/or standby recovery is slow, say both the standby connection to primary and archive connection are broken for whatever reasons, then it has WAL files to recover in pg_wal directory. I think the fundamental behaviour for the standy is that it has to fully recover to the end of WAL under pg_wal no matter who copies WAL files there. I fully understand the consequences of manually copying WAL files into pg_wal, for that matter, manually copying/tinkering any other files into/under the data directory is something we don't recommend and encourage. In summary, the standby state machine in WaitForWALToBecomeAvailable() exhausts all the WAL in pg_wal before switching to streaming after failing to fetch from archive. The v8 patch proposed upthread deviates from this behaviour. Hence, attaching v9 patch that keeps the behaviour as-is, that means, the standby exhausts all the WAL in pg_wal before switching to streaming after fetching WAL from archive for at least streaming_replication_retry_interval milliseconds. Please review the v9 patch further. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v9-0001-Allow-standby-to-switch-WAL-source-from-archive-t.patch Description: Binary data
Re: Switching XLog source from archive to streaming when primary available
On Mon, Oct 10, 2022 at 11:33:57AM +0530, Bharath Rupireddy wrote: > On Mon, Oct 10, 2022 at 3:17 AM Nathan Bossart > wrote: >> I wonder if it would be better to simply remove this extra polling of >> pg_wal as a prerequisite to your patch. The existing commentary leads me >> to think there might not be a strong reason for this behavior, so it could >> be a nice way to simplify your patch. > > I don't think it's a good idea to remove that completely. As said > above, it might help someone, we never know. It would be great to hear whether anyone is using this functionality. If no one is aware of existing usage and there is no interest in keeping it around, I don't think it would be unreasonable to remove it in v16. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
On Mon, Oct 10, 2022 at 3:17 AM Nathan Bossart wrote: > > > Instead, the simplest would be to just pass XLOG_FROM_WAL to > > XLogFileReadAnyTLI() when we're about to switch the source to stream > > mode. This doesn't change the existing behaviour. > > It might be more consistent with existing behavior, but one thing I hadn't > considered is that it might make your proposed feature ineffective when > users are copying files straight into pg_wal. IIUC as long as the files > are present in pg_wal, the source-switch logic won't kick in. It happens even now, that is, the server will not switch to streaming mode from the archive after a failure if there's someone continuously copying WAL files to the pg_wal directory. I have not personally seen anyone or any service doing that. It doesn't mean that can't happen. They might do it for some purpose such as 1) to bring back in sync quickly a standby that's lagging behind the primary after the archive connection and/or streaming replication connection are/is broken but many WAL files leftover on the primary 2) before promoting a standby that's lagging behind the primary for failover or other purposes. However, I'm not sure if someone does these things on production servers. > > Unrelated to this patch, the fact that the standby polls pg_wal is not > > documented or recommended, is not true, it is actually documented [2]. > > Whether or not we change the docs to be something like [3], is a > > separate discussion. > > I wonder if it would be better to simply remove this extra polling of > pg_wal as a prerequisite to your patch. The existing commentary leads me > to think there might not be a strong reason for this behavior, so it could > be a nice way to simplify your patch. I don't think it's a good idea to remove that completely. As said above, it might help someone, we never know. I think for this feature, we just need to decide on whether or not we'd allow pg_wal polling before switching to streaming mode. If we allow it like in the v8 patch, we can document the behavior. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
On Sun, Oct 09, 2022 at 02:39:47PM +0530, Bharath Rupireddy wrote: > We can give it a chance to restore from pg_wal before switching to > streaming to not change any behaviour of the state machine. But, not > definitely by setting currentSource to XLOG_FROM_WAL, we basically > never explicitly set currentSource to XLOG_FROM_WAL, other than when > not in archive recovery i.e. InArchiveRecovery is false. Also, see the > comment [1]. > > Instead, the simplest would be to just pass XLOG_FROM_WAL to > XLogFileReadAnyTLI() when we're about to switch the source to stream > mode. This doesn't change the existing behaviour. It might be more consistent with existing behavior, but one thing I hadn't considered is that it might make your proposed feature ineffective when users are copying files straight into pg_wal. IIUC as long as the files are present in pg_wal, the source-switch logic won't kick in. > Unrelated to this patch, the fact that the standby polls pg_wal is not > documented or recommended, is not true, it is actually documented [2]. > Whether or not we change the docs to be something like [3], is a > separate discussion. I wonder if it would be better to simply remove this extra polling of pg_wal as a prerequisite to your patch. The existing commentary leads me to think there might not be a strong reason for this behavior, so it could be a nice way to simplify your patch. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
On Sun, Oct 9, 2022 at 3:22 AM Nathan Bossart wrote: > > As I mentioned upthread [0], I'm still a little concerned that this patch > will cause the state machine to go straight from archive recovery to > streaming replication, skipping recovery from pg_wal. > > [0] https://postgr.es/m/20220906215704.GA2084086%40nathanxps13 Yes, it goes straight to streaming replication skipping recovery from pg_wal with the patch. > I wonder if this > could be resolved by moving the standby to the pg_wal phase instead. > Concretely, this line > > + if (switchSource) > + break; > > would instead change currentSource from XLOG_FROM_ARCHIVE to > XLOG_FROM_PG_WAL before the call to XLogFileReadAnyTLI(). I suspect the > behavior would be basically the same, but it would maintain the existing > ordering. We can give it a chance to restore from pg_wal before switching to streaming to not change any behaviour of the state machine. But, not definitely by setting currentSource to XLOG_FROM_WAL, we basically never explicitly set currentSource to XLOG_FROM_WAL, other than when not in archive recovery i.e. InArchiveRecovery is false. Also, see the comment [1]. Instead, the simplest would be to just pass XLOG_FROM_WAL to XLogFileReadAnyTLI() when we're about to switch the source to stream mode. This doesn't change the existing behaviour. > However, I do see the following note elsewhere in xlogrecovery.c: > > * The segment can be fetched via restore_command, or via walreceiver having > * streamed the record, or it can already be present in pg_wal. Checking > * pg_wal is mainly for crash recovery, but it will be polled in standby mode > * too, in case someone copies a new segment directly to pg_wal. That is not > * documented or recommended, though. > > Given this information, the present behavior might not be too important, > but I don't see a point in changing it without good reason. Yeah, with the attached patch we don't skip pg_wal before switching to streaming mode. I've also added a note in the 'Standby Server Operation' section about the new feature. Please review the v8 patch further. Unrelated to this patch, the fact that the standby polls pg_wal is not documented or recommended, is not true, it is actually documented [2]. Whether or not we change the docs to be something like [3], is a separate discussion. [1] /* * We just successfully read a file in pg_wal. We prefer files in * the archive over ones in pg_wal, so try the next file again * from the archive first. */ [2] https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-OPERATION The standby server will also attempt to restore any WAL found in the standby cluster's pg_wal directory. That typically happens after a server restart, when the standby replays again WAL that was streamed from the primary before the restart, but you can also manually copy files to pg_wal at any time to have them replayed. [3] The standby server will also attempt to restore any WAL found in the standby cluster's pg_wal directory. That typically happens after a server restart, when the standby replays again WAL that was streamed from the primary before the restart, but you can also manually copy files to pg_wal at any time to have them replayed. However, copying of WAL files manually is not recommended. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From 171e11088cca63e99726b49b1b9b408eed81f299 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Sun, 9 Oct 2022 08:11:10 + Subject: [PATCH v8] Allow standby to switch WAL source from archive to streaming replication A standby typically switches to streaming replication (get WAL from primary), only when receive from WAL archive finishes (no more WAL left there) or fails for any reason. Reading WAL from archive may not always be efficient and cheaper because network latencies, disk IO cost might differ on the archive as compared to primary and often the archive may sit far from standby impacting recovery performance on the standby. Hence reading WAL from the primary, by setting this parameter, as opposed to the archive enables the standby to catch up with the primary sooner thus reducing replication lag and avoiding WAL files accumulation on the primary. This feature adds a new GUC that specifies amount of time after which standby attempts to switch WAL source from WAL archive to streaming replication. If the standby fails to switch to stream mode, it falls back to archive mode. Reported-by: SATYANARAYANA NARLAPURAM Author: Bharath Rupireddy Reviewed-by: Cary Huang, Nathan Bossart Reviewed-by: Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com --- doc/src/sgml/config.sgml | 45 ++
Re: Switching XLog source from archive to streaming when primary available
On Mon, Sep 19, 2022 at 07:49:21PM +0530, Bharath Rupireddy wrote: > SwitchFromArchiveToStreamEnabled() seemed better at this point. I'm > attaching the v7 patch with that change. Please review it further. As I mentioned upthread [0], I'm still a little concerned that this patch will cause the state machine to go straight from archive recovery to streaming replication, skipping recovery from pg_wal. I wonder if this could be resolved by moving the standby to the pg_wal phase instead. Concretely, this line + if (switchSource) + break; would instead change currentSource from XLOG_FROM_ARCHIVE to XLOG_FROM_PG_WAL before the call to XLogFileReadAnyTLI(). I suspect the behavior would be basically the same, but it would maintain the existing ordering. However, I do see the following note elsewhere in xlogrecovery.c: * The segment can be fetched via restore_command, or via walreceiver having * streamed the record, or it can already be present in pg_wal. Checking * pg_wal is mainly for crash recovery, but it will be polled in standby mode * too, in case someone copies a new segment directly to pg_wal. That is not * documented or recommended, though. Given this information, the present behavior might not be too important, but I don't see a point in changing it without good reason. [0] https://postgr.es/m/20220906215704.GA2084086%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
On Fri, Sep 16, 2022 at 4:58 PM Bharath Rupireddy wrote: > > On Fri, Sep 16, 2022 at 12:06 PM Kyotaro Horiguchi > wrote: > > > > In other words, it seems to me that the macro name doesn't manifest > > the condition correctly. > > > > I don't think we don't particularly want to do that unconditionally. > > I wanted just to get rid of the macro from the usage site. Even if > > the same condition is used elsewhere, I see it better to write out the > > condition directly there.. > > I wanted to avoid a bit of duplicate code there. How about naming that > macro IsXLOGSourceSwitchToStreamEnabled() or > SwitchFromArchiveToStreamEnabled() or just SwitchFromArchiveToStream() > or any other better name? SwitchFromArchiveToStreamEnabled() seemed better at this point. I'm attaching the v7 patch with that change. Please review it further. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From b94327d9af60a44f252251be97ee1efabc964f97 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Mon, 19 Sep 2022 14:04:49 + Subject: [PATCH v7] Allow standby to switch WAL source from archive to streaming replication A standby typically switches to streaming replication (get WAL from primary), only when receive from WAL archive finishes (no more WAL left there) or fails for any reason. Reading WAL from archive may not always be efficient and cheaper because network latencies, disk IO cost might differ on the archive as compared to primary and often the archive may sit far from standby impacting recovery performance on the standby. Hence reading WAL from the primary, by setting this parameter, as opposed to the archive enables the standby to catch up with the primary sooner thus reducing replication lag and avoiding WAL files accumulation on the primary. This feature adds a new GUC that specifies amount of time after which standby attempts to switch WAL source from WAL archive to streaming replication. If the standby fails to switch to stream mode, it falls back to archive mode. Reported-by: SATYANARAYANA NARLAPURAM Author: Bharath Rupireddy Reviewed-by: Cary Huang, Nathan Bossart Reviewed-by: Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com --- doc/src/sgml/config.sgml | 45 +++ src/backend/access/transam/xlogrecovery.c | 124 +++-- src/backend/utils/misc/guc_tables.c | 12 ++ src/backend/utils/misc/postgresql.conf.sample | 4 + src/include/access/xlogrecovery.h | 1 + src/test/recovery/t/034_wal_source_switch.pl | 126 ++ 6 files changed, 301 insertions(+), 11 deletions(-) create mode 100644 src/test/recovery/t/034_wal_source_switch.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 700914684d..892442a053 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4840,6 +4840,51 @@ ANY num_sync ( + streaming_replication_retry_interval (integer) + + streaming_replication_retry_interval configuration parameter + + + + +Specifies amount of time after which standby attempts to switch WAL +source from WAL archive to streaming replication (get WAL from +primary). If the standby fails to switch to stream mode, it falls back +to archive mode. +If this value is specified without units, it is taken as milliseconds. +The default is five minutes (5min). +With a lower setting of this parameter, the standby makes frequent +WAL source switch attempts when the primary is lost for quite longer. +To avoid this, set a reasonable value. +A setting of 0 disables the feature. When disabled, +the standby typically switches to stream mode, only when receive from +WAL archive finishes (no more WAL left there) or fails for any reason. +This parameter can only be set in +the postgresql.conf file or on the server +command line. + + +Reading WAL from archive may not always be efficient and cheaper +because network latencies, disk IO cost might differ on the archive as +compared to primary and often the archive may sit far from standby +impacting recovery performance on the standby. Hence reading WAL +from the primary, by setting this parameter, as opposed to the archive +enables the standby to catch up with the primary sooner thus reducing +replication lag and avoiding WAL files accumulation on the primary. + + +Note that the standby may not always attempt to switch source from +WAL archive to streaming replication at exact +streaming_replication_retry_interval intervals. +For example, if the parameter is set to 1min and +fetching from WAL archive takes 5min, then the +source
Re: Switching XLog source from archive to streaming when primary available
On Fri, Sep 16, 2022 at 12:06 PM Kyotaro Horiguchi wrote: > > In other words, it seems to me that the macro name doesn't manifest > the condition correctly. > > I don't think we don't particularly want to do that unconditionally. > I wanted just to get rid of the macro from the usage site. Even if > the same condition is used elsewhere, I see it better to write out the > condition directly there.. I wanted to avoid a bit of duplicate code there. How about naming that macro IsXLOGSourceSwitchToStreamEnabled() or SwitchFromArchiveToStreamEnabled() or just SwitchFromArchiveToStream() or any other better name? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
At Fri, 16 Sep 2022 09:15:58 +0530, Bharath Rupireddy wrote in > On Thu, Sep 15, 2022 at 1:52 PM Kyotaro Horiguchi > wrote: > > > > At Thu, 15 Sep 2022 10:28:12 +0530, Bharath Rupireddy > > wrote in > > > I'm attaching the v6 patch that's rebased on to the latest HEAD. > > > Please consider this for review. > > > > Thaks for the new version! > > > > +#define StreamingReplRetryEnabled() \ > > + (streaming_replication_retry_interval > 0 && \ > > +StandbyMode && \ > > +currentSource == XLOG_FROM_ARCHIVE) > > > > It seems to me a bit too complex.. In other words, it seems to me that the macro name doesn't manifest the condition correctly. > I don't think so, it just tells whether a standby is allowed to switch > source to stream from archive. > > > + /* Save the timestamp at which we're switching to > > archive. */ > > + if (StreamingReplRetryEnabled()) > > + switched_to_archive_at = > > GetCurrentTimestamp(); > > > > Anyway we are going to open a file just after this so > > GetCurrentTimestamp() cannot cause a perceptible degradation. > > Coulnd't we do that unconditionally, to get rid of the macro? > > Do we really need to do it unconditionally? I don't think so. And, we > can't get rid of the macro, as we need to check for the current > source, GUC and standby mode. When this feature is disabled, it > mustn't execute any extra code IMO. I don't think we don't particularly want to do that unconditionally. I wanted just to get rid of the macro from the usage site. Even if the same condition is used elsewhere, I see it better to write out the condition directly there.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Switching XLog source from archive to streaming when primary available
On Thu, Sep 15, 2022 at 1:52 PM Kyotaro Horiguchi wrote: > > At Thu, 15 Sep 2022 10:28:12 +0530, Bharath Rupireddy > wrote in > > I'm attaching the v6 patch that's rebased on to the latest HEAD. > > Please consider this for review. > > Thaks for the new version! > > +#define StreamingReplRetryEnabled() \ > + (streaming_replication_retry_interval > 0 && \ > +StandbyMode && \ > +currentSource == XLOG_FROM_ARCHIVE) > > It seems to me a bit too complex.. I don't think so, it just tells whether a standby is allowed to switch source to stream from archive. > + /* Save the timestamp at which we're switching to > archive. */ > + if (StreamingReplRetryEnabled()) > + switched_to_archive_at = > GetCurrentTimestamp(); > > Anyway we are going to open a file just after this so > GetCurrentTimestamp() cannot cause a perceptible degradation. > Coulnd't we do that unconditionally, to get rid of the macro? Do we really need to do it unconditionally? I don't think so. And, we can't get rid of the macro, as we need to check for the current source, GUC and standby mode. When this feature is disabled, it mustn't execute any extra code IMO. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
At Thu, 15 Sep 2022 10:28:12 +0530, Bharath Rupireddy wrote in > I'm attaching the v6 patch that's rebased on to the latest HEAD. > Please consider this for review. Thaks for the new version! +#define StreamingReplRetryEnabled() \ + (streaming_replication_retry_interval > 0 && \ +StandbyMode && \ +currentSource == XLOG_FROM_ARCHIVE) It seems to me a bit too complex.. + /* Save the timestamp at which we're switching to archive. */ + if (StreamingReplRetryEnabled()) + switched_to_archive_at = GetCurrentTimestamp(); Anyway we are going to open a file just after this so GetCurrentTimestamp() cannot cause a perceptible degradation. Coulnd't we do that unconditionally, to get rid of the macro? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Switching XLog source from archive to streaming when primary available
On Mon, Sep 12, 2022 at 11:56 AM Bharath Rupireddy wrote: > > Please review the attached v5 patch. I'm attaching the v6 patch that's rebased on to the latest HEAD. Please consider this for review. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v6-0001-Allow-standby-to-switch-WAL-source-from-archive-t.patch Description: Binary data
Re: Switching XLog source from archive to streaming when primary available
On Mon, Sep 12, 2022 at 9:03 AM Bharath Rupireddy wrote: > > Please review the attached v4 patch addressing above review comments. Oops, there's a compiler warning [1] with the v4 patch, fixed it. Please review the attached v5 patch. [1] https://cirrus-ci.com/task/5730076611313664?logs=gcc_warning#L450 -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From 857415757ec21bd8b0195c17694618a1cbf22a57 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Mon, 12 Sep 2022 04:44:35 + Subject: [PATCH v5] Allow standby to switch WAL source from archive to streaming replication A standby typically switches to streaming replication (get WAL from primary), only when receive from WAL archive finishes (no more WAL left there) or fails for any reason. Reading WAL from archive may not always be efficient and cheaper because network latencies, disk IO cost might differ on the archive as compared to primary and often the archive may sit far from standby impacting recovery performance on the standby. Hence reading WAL from the primary, by setting this parameter, as opposed to the archive enables the standby to catch up with the primary sooner thus reducing replication lag and avoiding WAL files accumulation on the primary. This feature adds a new GUC that specifies amount of time after which standby attempts to switch WAL source from WAL archive to streaming replication. If the standby fails to switch to stream mode, it falls back to archive mode. Reported-by: SATYANARAYANA NARLAPURAM Author: Bharath Rupireddy Reviewed-by: Cary Huang, Nathan Bossart Reviewed-by: Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com --- doc/src/sgml/config.sgml | 45 +++ src/backend/access/transam/xlogrecovery.c | 124 +++-- src/backend/utils/misc/guc.c | 12 ++ src/backend/utils/misc/postgresql.conf.sample | 4 + src/include/access/xlogrecovery.h | 1 + src/test/recovery/t/034_wal_source_switch.pl | 126 ++ 6 files changed, 301 insertions(+), 11 deletions(-) create mode 100644 src/test/recovery/t/034_wal_source_switch.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index a5cd4e44c7..694da93b8c 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4840,6 +4840,51 @@ ANY num_sync ( + streaming_replication_retry_interval (integer) + + streaming_replication_retry_interval configuration parameter + + + + +Specifies amount of time after which standby attempts to switch WAL +source from WAL archive to streaming replication (get WAL from +primary). If the standby fails to switch to stream mode, it falls back +to archive mode. +If this value is specified without units, it is taken as milliseconds. +The default is five minutes (5min). +With a lower setting of this parameter, the standby makes frequent +WAL source switch attempts when the primary is lost for quite longer. +To avoid this, set a reasonable value. +A setting of 0 disables the feature. When disabled, +the standby typically switches to stream mode, only when receive from +WAL archive finishes (no more WAL left there) or fails for any reason. +This parameter can only be set in +the postgresql.conf file or on the server +command line. + + +Reading WAL from archive may not always be efficient and cheaper +because network latencies, disk IO cost might differ on the archive as +compared to primary and often the archive may sit far from standby +impacting recovery performance on the standby. Hence reading WAL +from the primary, by setting this parameter, as opposed to the archive +enables the standby to catch up with the primary sooner thus reducing +replication lag and avoiding WAL files accumulation on the primary. + + +Note that the standby may not always attempt to switch source from +WAL archive to streaming replication at exact +streaming_replication_retry_interval intervals. +For example, if the parameter is set to 1min and +fetching from WAL archive takes 5min, then the +source switch attempt happens for the next WAL after current WAL is +fetched from WAL archive and applied. + + + + recovery_min_apply_delay (integer) diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 9a80084a68..35f7985e65 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -65,6 +65,12 @@ #define RECOVERY_COMMAND_FILE "recovery.conf" #define RECOVERY_COMMAND_DONE "recovery.done" +
Re: Switching XLog source from archive to streaming when primary available
On Sat, Sep 10, 2022 at 3:35 AM Nathan Bossart wrote: > > Well, for wal_keep_size, using bytes makes sense. Given you know how much > disk space you have, you can set this parameter accordingly to avoid > retaining too much of it for standby servers. For your proposed parameter, > it's not so simple. The same setting could have wildly different timing > behavior depending on the server. I still think that a timeout is the most > intuitive. Hm. In v3 patch, I've used the timeout approach, but tracking the duration server spends in XLOG_FROM_ARCHIVE as opposed to tracking last failed time in streaming from primary. > So I think it > would be difficult for the end user to decide on a value. However, even > the timeout approach has this sort of problem. If your parameter is set to > 1 minute, but the current archive takes 5 minutes to recover, you won't > really be testing streaming replication once a minute. That would likely > need to be documented. Added a note in the docs. On Fri, Sep 9, 2022 at 10:46 AM Kyotaro Horiguchi wrote: > > Being late for the party. Thanks for reviewing this. > It seems to me that the function is getting too long. I think we > might want to move the core part of the patch into another function. Yeah, the WaitForWALToBecomeAvailable() (without this patch) has around 460 LOC out of which WAL fetching from the chosen source is of 240 LOC, IMO, this code will be a candidate for a new function. I think that part can be discussed separately. Having said that, I moved the new code to a new function. > I think it might be better if intentionalSourceSwitch doesn't need > lastSourceFailed set. It would look like this: > > > if (lastSourceFailed || switchSource) > > { > > if (nonblocking && lastSourceFailed) > >return XLREAD_WOULDBLOCK; I think the above looks good, done that way in the latest patch. > I don't think the flag first_time is needed. Addressed this in the v4 patch. Please review the attached v4 patch addressing above review comments. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v4-0001-Allow-standby-to-switch-WAL-source-from-archive-t.patch Description: Binary data
Re: Switching XLog source from archive to streaming when primary available
On Fri, Sep 09, 2022 at 11:07:00PM +0530, Bharath Rupireddy wrote: > On Fri, Sep 9, 2022 at 10:29 PM Nathan Bossart > wrote: >> IMO the timeout approach would be more intuitive for users. When it comes >> to archive recovery, "WAL segment" isn't a standard unit of measure. WAL >> segment size can differ between clusters, and WAL files can have different >> amounts of data or take different amounts of time to replay. > > How about the amount of WAL bytes fetched from the archive after which > a standby attempts to connect to primary or enter streaming mode? Of > late, we've changed some GUCs to represent bytes instead of WAL > files/segments, see [1]. Well, for wal_keep_size, using bytes makes sense. Given you know how much disk space you have, you can set this parameter accordingly to avoid retaining too much of it for standby servers. For your proposed parameter, it's not so simple. The same setting could have wildly different timing behavior depending on the server. I still think that a timeout is the most intuitive. >> So I think it >> would be difficult for the end user to decide on a value. However, even >> the timeout approach has this sort of problem. If your parameter is set to >> 1 minute, but the current archive takes 5 minutes to recover, you won't >> really be testing streaming replication once a minute. That would likely >> need to be documented. > > If we have configurable WAL bytes instead of timeout for standby WAL > source switch from archive to primary, we don't have the above problem > right? If you are going to stop replaying in the middle of a WAL archive, then maybe. But I don't think I'd recommend that. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
On Fri, Sep 9, 2022 at 10:29 PM Nathan Bossart wrote: > > On Fri, Sep 09, 2022 at 12:14:25PM +0530, Bharath Rupireddy wrote: > > On Fri, Sep 9, 2022 at 10:57 AM Kyotaro Horiguchi > > wrote: > >> At Thu, 8 Sep 2022 10:53:56 -0700, Nathan Bossart > >> wrote in > >> > My general point is that we should probably offer some basic preventative > >> > measure against flipping back and forth between streaming and archive > >> > recovery while making zero progress. As I noted, maybe that's as simple > >> > as > >> > having WaitForWALToBecomeAvailable() attempt to restore a file from > >> > archive > >> > at least once before the new parameter forces us to switch to streaming > >> > replication. There might be other ways to handle this. > >> > >> +1. > > > > Hm. In that case, I think we can get rid of timeout based switching > > mechanism and have this behaviour - the standby can attempt to switch > > to streaming mode from archive, say, after fetching 1, 2 or a > > configurable number of WAL files. In fact, this is the original idea > > proposed by Satya in this thread. > > IMO the timeout approach would be more intuitive for users. When it comes > to archive recovery, "WAL segment" isn't a standard unit of measure. WAL > segment size can differ between clusters, and WAL files can have different > amounts of data or take different amounts of time to replay. How about the amount of WAL bytes fetched from the archive after which a standby attempts to connect to primary or enter streaming mode? Of late, we've changed some GUCs to represent bytes instead of WAL files/segments, see [1]. > So I think it > would be difficult for the end user to decide on a value. However, even > the timeout approach has this sort of problem. If your parameter is set to > 1 minute, but the current archive takes 5 minutes to recover, you won't > really be testing streaming replication once a minute. That would likely > need to be documented. If we have configurable WAL bytes instead of timeout for standby WAL source switch from archive to primary, we don't have the above problem right? [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c3fe108c025e4a080315562d4c15ecbe3f00405e -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
On Fri, Sep 09, 2022 at 12:14:25PM +0530, Bharath Rupireddy wrote: > On Fri, Sep 9, 2022 at 10:57 AM Kyotaro Horiguchi > wrote: >> At Thu, 8 Sep 2022 10:53:56 -0700, Nathan Bossart >> wrote in >> > My general point is that we should probably offer some basic preventative >> > measure against flipping back and forth between streaming and archive >> > recovery while making zero progress. As I noted, maybe that's as simple as >> > having WaitForWALToBecomeAvailable() attempt to restore a file from archive >> > at least once before the new parameter forces us to switch to streaming >> > replication. There might be other ways to handle this. >> >> +1. > > Hm. In that case, I think we can get rid of timeout based switching > mechanism and have this behaviour - the standby can attempt to switch > to streaming mode from archive, say, after fetching 1, 2 or a > configurable number of WAL files. In fact, this is the original idea > proposed by Satya in this thread. IMO the timeout approach would be more intuitive for users. When it comes to archive recovery, "WAL segment" isn't a standard unit of measure. WAL segment size can differ between clusters, and WAL files can have different amounts of data or take different amounts of time to replay. So I think it would be difficult for the end user to decide on a value. However, even the timeout approach has this sort of problem. If your parameter is set to 1 minute, but the current archive takes 5 minutes to recover, you won't really be testing streaming replication once a minute. That would likely need to be documented. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
On Fri, Sep 9, 2022 at 10:57 AM Kyotaro Horiguchi wrote: > > At Thu, 8 Sep 2022 10:53:56 -0700, Nathan Bossart > wrote in > > On Thu, Sep 08, 2022 at 05:16:53PM +0530, Bharath Rupireddy wrote: > > > I'm attaching the v3 patch with the review comments addressed, please > > > review it further. > > > > My general point is that we should probably offer some basic preventative > > measure against flipping back and forth between streaming and archive > > recovery while making zero progress. As I noted, maybe that's as simple as > > having WaitForWALToBecomeAvailable() attempt to restore a file from archive > > at least once before the new parameter forces us to switch to streaming > > replication. There might be other ways to handle this. > > +1. Hm. In that case, I think we can get rid of timeout based switching mechanism and have this behaviour - the standby can attempt to switch to streaming mode from archive, say, after fetching 1, 2 or a configurable number of WAL files. In fact, this is the original idea proposed by Satya in this thread. If okay, I can code on that. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
At Thu, 8 Sep 2022 10:53:56 -0700, Nathan Bossart wrote in > On Thu, Sep 08, 2022 at 05:16:53PM +0530, Bharath Rupireddy wrote: > > I'm attaching the v3 patch with the review comments addressed, please > > review it further. > > My general point is that we should probably offer some basic preventative > measure against flipping back and forth between streaming and archive > recovery while making zero progress. As I noted, maybe that's as simple as > having WaitForWALToBecomeAvailable() attempt to restore a file from archive > at least once before the new parameter forces us to switch to streaming > replication. There might be other ways to handle this. +1. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Switching XLog source from archive to streaming when primary available
Being late for the party. It seems to me that the function is getting too long. I think we might want to move the core part of the patch into another function. I think it might be better if intentionalSourceSwitch doesn't need lastSourceFailed set. It would look like this: > if (lastSourceFailed || switchSource) > { > if (nonblocking && lastSourceFailed) >return XLREAD_WOULDBLOCK; + if (first_time) + last_switch_time = curr_time; .. + if (!first_time && + TimestampDifferenceExceeds(last_switch_time, curr_time, .. + /* We're not here for the first time any more */ + if (first_time) + first_time = false; I don't think the flag first_time is needed. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Switching XLog source from archive to streaming when primary available
On Thu, Sep 08, 2022 at 05:16:53PM +0530, Bharath Rupireddy wrote: > On Wed, Sep 7, 2022 at 3:27 AM Nathan Bossart > wrote: >> It's not clear to me how this is expected to interact with the pg_wal phase >> of standby recovery. As the docs note [0], standby servers loop through >> archive recovery, recovery from pg_wal, and streaming replication. Does >> this cause the pg_wal phase to be skipped (i.e., the standby goes straight >> from archive recovery to streaming replication)? I wonder if it'd be >> better for this mechanism to simply move the standby to the pg_wal phase so >> that the usual ordering isn't changed. > > It doesn't change any behaviour as such for XLOG_FROM_PG_WAL. In > standby mode when recovery_command is specified, the initial value of > currentSource would be XLOG_FROM_ARCHIVE (see [1]). If the archive is > exhausted of WAL or the standby fails to fetch from the archive, then > it switches to XLOG_FROM_STREAM. If the standby fails to receive WAL > from primary, it switches back to XLOG_FROM_ARCHIVE. This continues > unless the standby gets promoted. With the patch, we enable the > standby to try fetching from the primary, instead of waiting for WAL > in the archive to get exhausted or for an error to occur in the > standby while receiving from the archive. Okay. I see that you are checking for XLOG_FROM_ARCHIVE. >> Shouldn't the last_switch_time be set when the state machine first enters >> XLOG_FROM_ARCHIVE? IIUC this logic is currently counting time spent >> elsewhere (e.g., XLOG_FROM_STREAM) when determining whether to force a >> source switch. This would mean that a standby that has spent a lot of time >> in streaming replication before failing would flip to XLOG_FROM_ARCHIVE, >> immediately flip back to XLOG_FROM_STREAM, and then likely flip back to >> XLOG_FROM_ARCHIVE when it failed again. Given the standby will wait for >> wal_retrieve_retry_interval before going back to XLOG_FROM_ARCHIVE, it >> seems like we could end up rapidly looping between sources. Perhaps I am >> misunderstanding how this is meant to work. > > last_switch_time indicates the time when the standby last attempted to > switch to primary. For instance, a standby: > 1) for the first time, sets last_switch_time = current_time when in archive > mode > 2) if current_time < last_switch_time + interval, continues to be in > archive mode > 3) if current_time >= last_switch_time + interval, attempts to switch > to primary and sets last_switch_time = current_time > 3.1) if successfully switches to primary, continues in there and for > any reason fails to fetch from primary, then enters archive mode and > loops from step (2) > 3.2) if fails to switch to primary, then enters archive mode and loops > from step (2) Let's say I have this new parameter set to 5 minutes, and I have a standby that's been at step 3.1 for 5 days before failing and going back to step 2. Won't the standby immediately jump back to step 3.1? I think we should place the limit on how long the server stays in XLOG_FROM_ARCHIVE, not how long it's been since we last tried XLOG_FROM_STREAM. >> I wonder if the lower bound should be higher to avoid switching >> unnecessarily rapidly between WAL sources. I see that >> WaitForWALToBecomeAvailable() ensures that standbys do not switch from >> XLOG_FROM_STREAM to XLOG_FROM_ARCHIVE more often than once per >> wal_retrieve_retry_interval. Perhaps wal_retrieve_retry_interval should be >> the lower bound for this GUC, too. Or maybe WaitForWALToBecomeAvailable() >> should make sure that the standby makes at least once attempt to restore >> the file from archive before switching to streaming replication. > > No, we need a way to disable the feature, so I'm not changing the > lower bound. And let's not make this GUC dependent on any other GUC, I > would like to keep it simple for better usability. However, I've > increased the default value to 5min and added a note in the docs about > the lower values. > > I'm attaching the v3 patch with the review comments addressed, please > review it further. My general point is that we should probably offer some basic preventative measure against flipping back and forth between streaming and archive recovery while making zero progress. As I noted, maybe that's as simple as having WaitForWALToBecomeAvailable() attempt to restore a file from archive at least once before the new parameter forces us to switch to streaming replication. There might be other ways to handle this. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
On Wed, Sep 7, 2022 at 3:27 AM Nathan Bossart wrote: > > + > + wal_source_switch_interval configuration > parameter > + > > I don't want to bikeshed on the name too much, but I do think we need > something more descriptive. I'm thinking of something like > streaming_replication_attempt_interval or > streaming_replication_retry_interval. I could come up with wal_source_switch_interval after a log of bikeshedding myself :). However, streaming_replication_retry_interval looks much better, I've used it in the latest patch. Thanks. > +Specifies how long the standby server should wait before switching > WAL > +source from WAL archive to primary (streaming replication). This can > +happen either during the standby initial recovery or after a previous > +failed attempt to stream WAL from the primary. > > I'm not sure what the second sentence means. In general, I think the > explanation in your commit message is much clearer: I polished the comments, docs and commit message a bit, please check now. > 5 seconds seems low. I would expect the default to be 1-5 minutes. I > think it's important to strike a balance between interrupting archive > recovery to attempt streaming replication and letting archive recovery make > progress. +1 for a default value of 5 minutes to avoid frequent interruptions for archive mode when primary is really down for a long time. I've also added a cautionary note in the docs about the lower values. > +* Try reading WAL from primary after every wal_source_switch_interval > +* milliseconds, when state machine is in XLOG_FROM_ARCHIVE state. If > +* successful, the state machine moves to XLOG_FROM_STREAM state, > otherwise > +* it falls back to XLOG_FROM_ARCHIVE state. > > It's not clear to me how this is expected to interact with the pg_wal phase > of standby recovery. As the docs note [0], standby servers loop through > archive recovery, recovery from pg_wal, and streaming replication. Does > this cause the pg_wal phase to be skipped (i.e., the standby goes straight > from archive recovery to streaming replication)? I wonder if it'd be > better for this mechanism to simply move the standby to the pg_wal phase so > that the usual ordering isn't changed. > > [0] > https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-OPERATION It doesn't change any behaviour as such for XLOG_FROM_PG_WAL. In standby mode when recovery_command is specified, the initial value of currentSource would be XLOG_FROM_ARCHIVE (see [1]). If the archive is exhausted of WAL or the standby fails to fetch from the archive, then it switches to XLOG_FROM_STREAM. If the standby fails to receive WAL from primary, it switches back to XLOG_FROM_ARCHIVE. This continues unless the standby gets promoted. With the patch, we enable the standby to try fetching from the primary, instead of waiting for WAL in the archive to get exhausted or for an error to occur in the standby while receiving from the archive. > + if (!first_time && > + > TimestampDifferenceExceeds(last_switch_time, curr_time, > + > wal_source_switch_interval)) > > Shouldn't this also check that wal_source_switch_interval is not set to 0? Corrected. > + elog(DEBUG2, > +"trying to switch > WAL source to %s after fetching WAL from %s for %d milliseconds", > + > xlogSourceNames[XLOG_FROM_STREAM], > + > xlogSourceNames[currentSource], > + > wal_source_switch_interval); > + > + last_switch_time = curr_time; > > Shouldn't the last_switch_time be set when the state machine first enters > XLOG_FROM_ARCHIVE? IIUC this logic is currently counting time spent > elsewhere (e.g., XLOG_FROM_STREAM) when determining whether to force a > source switch. This would mean that a standby that has spent a lot of time > in streaming replication before failing would flip to XLOG_FROM_ARCHIVE, > immediately flip back to XLOG_FROM_STREAM, and then likely flip back to > XLOG_FROM_ARCHIVE when it failed again. Given the standby will wait for > wal_retrieve_retry_interval before going back to XLOG_FROM_ARCHIVE, it > seems like we could end up rapidly looping between sources. Perhaps I am > misunderstanding how this is meant to work. last_switch_time indicates the time when the standby last attempted to switch to primary. For instance, a standby: 1) for the first time, sets last_switch_time = current_time when in archive mode 2) if current_time < last_switch_time + interval, continues
Re: Switching XLog source from archive to streaming when primary available
+ + wal_source_switch_interval configuration parameter + I don't want to bikeshed on the name too much, but I do think we need something more descriptive. I'm thinking of something like streaming_replication_attempt_interval or streaming_replication_retry_interval. +Specifies how long the standby server should wait before switching WAL +source from WAL archive to primary (streaming replication). This can +happen either during the standby initial recovery or after a previous +failed attempt to stream WAL from the primary. I'm not sure what the second sentence means. In general, I think the explanation in your commit message is much clearer: The standby makes an attempt to read WAL from primary after wal_retrieve_retry_interval milliseconds reading from archive. +If this value is specified without units, it is taken as milliseconds. +The default value is 5 seconds. A setting of 0 +disables the feature. 5 seconds seems low. I would expect the default to be 1-5 minutes. I think it's important to strike a balance between interrupting archive recovery to attempt streaming replication and letting archive recovery make progress. +* Try reading WAL from primary after every wal_source_switch_interval +* milliseconds, when state machine is in XLOG_FROM_ARCHIVE state. If +* successful, the state machine moves to XLOG_FROM_STREAM state, otherwise +* it falls back to XLOG_FROM_ARCHIVE state. It's not clear to me how this is expected to interact with the pg_wal phase of standby recovery. As the docs note [0], standby servers loop through archive recovery, recovery from pg_wal, and streaming replication. Does this cause the pg_wal phase to be skipped (i.e., the standby goes straight from archive recovery to streaming replication)? I wonder if it'd be better for this mechanism to simply move the standby to the pg_wal phase so that the usual ordering isn't changed. + if (!first_time && + TimestampDifferenceExceeds(last_switch_time, curr_time, + wal_source_switch_interval)) Shouldn't this also check that wal_source_switch_interval is not set to 0? + elog(DEBUG2, +"trying to switch WAL source to %s after fetching WAL from %s for %d milliseconds", + xlogSourceNames[XLOG_FROM_STREAM], + xlogSourceNames[currentSource], + wal_source_switch_interval); + + last_switch_time = curr_time; Shouldn't the last_switch_time be set when the state machine first enters XLOG_FROM_ARCHIVE? IIUC this logic is currently counting time spent elsewhere (e.g., XLOG_FROM_STREAM) when determining whether to force a source switch. This would mean that a standby that has spent a lot of time in streaming replication before failing would flip to XLOG_FROM_ARCHIVE, immediately flip back to XLOG_FROM_STREAM, and then likely flip back to XLOG_FROM_ARCHIVE when it failed again. Given the standby will wait for wal_retrieve_retry_interval before going back to XLOG_FROM_ARCHIVE, it seems like we could end up rapidly looping between sources. Perhaps I am misunderstanding how this is meant to work. + { + {"wal_source_switch_interval", PGC_SIGHUP, REPLICATION_STANDBY, + gettext_noop("Sets the time to wait before switching WAL source from archive to primary"), + gettext_noop("0 turns this feature off."), + GUC_UNIT_MS + }, + _source_switch_interval, + 5000, 0, INT_MAX, + NULL, NULL, NULL + }, I wonder if the lower bound should be higher to avoid switching unnecessarily rapidly between WAL sources. I see that WaitForWALToBecomeAvailable() ensures that standbys do not switch from XLOG_FROM_STREAM to XLOG_FROM_ARCHIVE more often than once per wal_retrieve_retry_interval. Perhaps wal_retrieve_retry_interval should be the lower bound for this GUC, too. Or maybe WaitForWALToBecomeAvailable() should make sure that the standby makes at least once attempt to restore the file from archive before switching to streaming replication. [0] https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-OPERATION -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
On Fri, Jul 8, 2022 at 9:16 PM Bharath Rupireddy wrote: > > On Sat, Jun 25, 2022 at 1:31 AM Cary Huang wrote: > > > > The following review has been posted through the commitfest application: > > make installcheck-world: tested, passed > > Implements feature: tested, passed > > Spec compliant: not tested > > Documentation:not tested > > > > Hello > > > > I tested this patch in a setup where the standby is in the middle of > > replicating and REDOing primary's WAL files during a very large data > > insertion. During this time, I keep killing the walreceiver process to > > cause a stream failure and force standby to read from archive. The system > > will restore from archive for "wal_retrieve_retry_interval" seconds before > > it attempts to steam again. Without this patch, once the streaming is > > interrupted, it keeps reading from archive until standby reaches the same > > consistent state of primary and then it will switch back to streaming > > again. So it seems that the patch does the job as described and does bring > > some benefit during a very large REDO job where it will try to re-stream > > after restoring some WALs from archive to speed up this "catch up" process. > > But if the recovery job is not a large one, PG is already switching back to > > streaming once it hits consistent state. > > Thanks a lot Cary for testing the patch. > > > Here's a v1 patch that I've come up with. I'm right now using the > > existing GUC wal_retrieve_retry_interval to switch to stream mode from > > archive mode as opposed to switching only after the failure to get WAL > > from archive mode. If okay with the approach, I can add tests, change > > the docs and add a new GUC to control this behaviour. I'm open to > > thoughts and ideas here. > > It will be great if I can hear some thoughts on the above points (as > posted upthread). Here's the v2 patch with a separate GUC, new GUC was necessary as the existing GUC wal_retrieve_retry_interval is loaded with multiple usages. When the feature is enabled, it will let standby to switch to stream mode i.e. fetch WAL from primary before even fetching from archive fails. The switching to stream mode from archive happens in 2 scenarios: 1) when standby is in initial recovery 2) when there was a failure in receiving from primary (walreceiver got killed or crashed or timed out, or connectivity to primary was broken - for whatever reasons). I also added test cases to the v2 patch. Please review the patch. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/ v2-0001-Switch-WAL-source-to-stream-from-archive.patch Description: Binary data
Re: Switching XLog source from archive to streaming when primary available
On Sat, Jun 25, 2022 at 1:31 AM Cary Huang wrote: > > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: not tested > Documentation:not tested > > Hello > > I tested this patch in a setup where the standby is in the middle of > replicating and REDOing primary's WAL files during a very large data > insertion. During this time, I keep killing the walreceiver process to cause > a stream failure and force standby to read from archive. The system will > restore from archive for "wal_retrieve_retry_interval" seconds before it > attempts to steam again. Without this patch, once the streaming is > interrupted, it keeps reading from archive until standby reaches the same > consistent state of primary and then it will switch back to streaming again. > So it seems that the patch does the job as described and does bring some > benefit during a very large REDO job where it will try to re-stream after > restoring some WALs from archive to speed up this "catch up" process. But if > the recovery job is not a large one, PG is already switching back to > streaming once it hits consistent state. Thanks a lot Cary for testing the patch. > Here's a v1 patch that I've come up with. I'm right now using the > existing GUC wal_retrieve_retry_interval to switch to stream mode from > archive mode as opposed to switching only after the failure to get WAL > from archive mode. If okay with the approach, I can add tests, change > the docs and add a new GUC to control this behaviour. I'm open to > thoughts and ideas here. It will be great if I can hear some thoughts on the above points (as posted upthread). Regards, Bharath Rupireddy.
Re: Switching XLog source from archive to streaming when primary available
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested Hello I tested this patch in a setup where the standby is in the middle of replicating and REDOing primary's WAL files during a very large data insertion. During this time, I keep killing the walreceiver process to cause a stream failure and force standby to read from archive. The system will restore from archive for "wal_retrieve_retry_interval" seconds before it attempts to steam again. Without this patch, once the streaming is interrupted, it keeps reading from archive until standby reaches the same consistent state of primary and then it will switch back to streaming again. So it seems that the patch does the job as described and does bring some benefit during a very large REDO job where it will try to re-stream after restoring some WALs from archive to speed up this "catch up" process. But if the recovery job is not a large one, PG is already switching back to streaming once it hits consistent state. thank you Cary Huang HighGo Software Canada
Re: Switching XLog source from archive to streaming when primary available
On Sat, Apr 30, 2022 at 6:19 PM Bharath Rupireddy wrote: > > On Mon, Nov 29, 2021 at 1:30 AM SATYANARAYANA NARLAPURAM > wrote: > > > > Hi Hackers, > > > > When the standby couldn't connect to the primary it switches the XLog > > source from streaming to archive and continues in that state until it can > > get the WAL from the archive location. On a server with high WAL activity, > > typically getting the WAL from the archive is slower than streaming it from > > the primary and couldn't exit from that state. This not only increases the > > lag on the standby but also adversely impacts the primary as the WAL gets > > accumulated, and vacuum is not able to collect the dead tuples. DBAs as a > > mitigation can however remove/advance the slot or remove the > > restore_command on the standby but this is a manual work I am trying to > > avoid. I would like to propose the following, please let me know your > > thoughts. > > > > Automatically attempt to switch the source from Archive to streaming when > > the primary_conninfo is set after replaying 'N' wal segment governed by the > > GUC retry_primary_conn_after_wal_segments > > when retry_primary_conn_after_wal_segments is set to -1 then the feature > > is disabled > > When the retry attempt fails, then switch back to the archive > > I've gone through the state machine in WaitForWALToBecomeAvailable and > I understand it this way: failed to receive WAL records from the > primary causes the current source to switch to archive and the standby > continues to get WAL records from archive location unless some failure > occurs there the current source is never going to switch back to > stream. Given the fact that getting WAL from archive location causes > delay in production environments, we miss to take the advantage of the > reconnection to primary after previous failed attempt. > > So basically, we try to attempt to switch to streaming from archive > (even though fetching from archive can succeed) after a certain amount > of time or WAL segments. I prefer timing-based switch to streaming > from archive instead of after a number of WAL segments fetched from > archive. Right now, wal_retrieve_retry_interval is being used to wait > before switching to archive after failed attempt from streaming, IMO, > a similar GUC (that gets set once the source switched from streaming > to archive and on timeout it switches to streaming again) can be used > to switch from archive to streaming after the specified amount of > time. > > Thoughts? Here's a v1 patch that I've come up with. I'm right now using the existing GUC wal_retrieve_retry_interval to switch to stream mode from archive mode as opposed to switching only after the failure to get WAL from archive mode. If okay with the approach, I can add tests, change the docs and add a new GUC to control this behaviour. I'm open to thoughts and ideas here. Regards, Bharath Rupireddy. v1-0001-Switch-to-stream-mode-from-archive-occasionally.patch Description: Binary data
Re: Switching XLog source from archive to streaming when primary available
On Mon, Nov 29, 2021 at 1:30 AM SATYANARAYANA NARLAPURAM wrote: > > Hi Hackers, > > When the standby couldn't connect to the primary it switches the XLog source > from streaming to archive and continues in that state until it can get the > WAL from the archive location. On a server with high WAL activity, typically > getting the WAL from the archive is slower than streaming it from the primary > and couldn't exit from that state. This not only increases the lag on the > standby but also adversely impacts the primary as the WAL gets accumulated, > and vacuum is not able to collect the dead tuples. DBAs as a mitigation can > however remove/advance the slot or remove the restore_command on the standby > but this is a manual work I am trying to avoid. I would like to propose the > following, please let me know your thoughts. > > Automatically attempt to switch the source from Archive to streaming when the > primary_conninfo is set after replaying 'N' wal segment governed by the GUC > retry_primary_conn_after_wal_segments > when retry_primary_conn_after_wal_segments is set to -1 then the feature is > disabled > When the retry attempt fails, then switch back to the archive I've gone through the state machine in WaitForWALToBecomeAvailable and I understand it this way: failed to receive WAL records from the primary causes the current source to switch to archive and the standby continues to get WAL records from archive location unless some failure occurs there the current source is never going to switch back to stream. Given the fact that getting WAL from archive location causes delay in production environments, we miss to take the advantage of the reconnection to primary after previous failed attempt. So basically, we try to attempt to switch to streaming from archive (even though fetching from archive can succeed) after a certain amount of time or WAL segments. I prefer timing-based switch to streaming from archive instead of after a number of WAL segments fetched from archive. Right now, wal_retrieve_retry_interval is being used to wait before switching to archive after failed attempt from streaming, IMO, a similar GUC (that gets set once the source switched from streaming to archive and on timeout it switches to streaming again) can be used to switch from archive to streaming after the specified amount of time. Thoughts? Regards, Bharath Rupireddy.
Re: Switching XLog source from archive to streaming when primary available
On Mon, Nov 29, 2021 at 1:30 AM SATYANARAYANA NARLAPURAM wrote: > > Hi Hackers, > > When the standby couldn't connect to the primary it switches the XLog source > from streaming to archive and continues in that state until it can get the > WAL from the archive location. On a server with high WAL activity, typically > getting the WAL from the archive is slower than streaming it from the primary > and couldn't exit from that state. This not only increases the lag on the > standby but also adversely impacts the primary as the WAL gets accumulated, > and vacuum is not able to collect the dead tuples. DBAs as a mitigation can > however remove/advance the slot or remove the restore_command on the standby > but this is a manual work I am trying to avoid. I would like to propose the > following, please let me know your thoughts. > > Automatically attempt to switch the source from Archive to streaming when the > primary_conninfo is set after replaying 'N' wal segment governed by the GUC > retry_primary_conn_after_wal_segments > when retry_primary_conn_after_wal_segments is set to -1 then the feature is > disabled > When the retry attempt fails, then switch back to the archive I think there is another thread [1] that is logically trying to solve a similar issue, basically, in the main recovery apply loop is the walreceiver does not exist then it is launching the walreceiver. However, in that patch, it is not changing the current Xlog source but I think that is not a good idea because with that it will restore from the archive as well as stream from the primary so I have given that review comment on that thread as well. One big difference is that patch is launching the walreceiver even if the WAL is locally available and we don't really need more WAL but that is controlled by a GUC. [1] https://www.postgresql.org/message-id/CAKYtNApe05WmeRo92gTePEmhOM4myMpCK_%2BceSJtC7-AWLw1qw%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com