Re: Unnecessary delay in streaming replication due to replay lag

2021-12-16 Thread Kyotaro Horiguchi
At Wed, 15 Dec 2021 17:01:24 -0800, Soumyadeep Chakraborty 
 wrote in 
> Sure, that makes more sense. Fixed.

As I played with this briefly.  I started a standby from a backup that
has an access to archive.  I had the following log lines steadily.


[139535:postmaster] LOG:  database system is ready to accept read-only 
connections
[139542:walreceiver] LOG:  started streaming WAL from primary at 0/200 on 
timeline 1
cp: cannot stat '/home/horiguti/data/arc_work/00010003': No 
such file or directory
[139542:walreceiver] FATAL:  could not open file 
"pg_wal/00010003": No such file or directory
cp: cannot stat '/home/horiguti/data/arc_work/0002.history': No such file 
or directory
cp: cannot stat '/home/horiguti/data/arc_work/00010003': No 
such file or directory
[139548:walreceiver] LOG:  started streaming WAL from primary at 0/300 on 
timeline 1

The "FATAL:  could not open file" message from walreceiver means that
the walreceiver was operationally prohibited to install a new wal
segment at the time.  Thus the walreceiver ended as soon as started.
In short, the eager replication is not working at all.


I have a comment on the behavior and objective of this feature.

In the case where archive recovery is started from a backup, this
feature lets walreceiver start while the archive recovery is ongoing.
If walreceiver (or the eager replication) worked as expected, it would
write wal files while archive recovery writes the same set of WAL
segments to the same directory. I don't think that is a sane behavior.
Or, if putting more modestly, an unintended behavior.

In common cases, I believe archive recovery is faster than
replication.  If a segment is available from archive, we don't need to
prefetch it via stream.

If this feature is intended to use only for crash recovery of a
standby, it should fire only when it is needed.

If not, that is, if it is intended to work also for archive recovery,
I think the eager replication should start from the next segment of
the last WAL in archive but that would invite more complex problems.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Unnecessary delay in streaming replication due to replay lag

2021-12-15 Thread Soumyadeep Chakraborty
Hi Bharath,

Thanks for the review!

On Sat, Nov 27, 2021 at 6:36 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> 1) A memory leak: add FreeDir(dir); before returning.
> + ereport(LOG,
> + (errmsg("Could not start streaming WAL eagerly"),
> + errdetail("There are timeline changes in the locally available WAL
files."),
> + errhint("WAL streaming will begin once all local WAL and archives
> are exhausted.")));
> + return;
> + }
>

Thanks for catching that. Fixed.

>
>
> 2) Is there a guarantee that while we traverse the pg_wal directory to
> find startsegno and endsegno, the new wal files arrive from the
> primary or archive location or old wal files get removed/recycled by
> the standby? Especially when wal_receiver_start_condition=consistency?
> + startsegno = (startsegno == -1) ? logSegNo : Min(startsegno, logSegNo);
> + endsegno = (endsegno == -1) ? logSegNo : Max(endsegno, logSegNo);
> + }
>

Even if newer wal files arrive after the snapshot of the dir listing
taken by AllocateDir()/ReadDir(), we will in effect start from a
slightly older location, which should be fine. It shouldn't matter if
an older file is recycled. If the last valid WAL segment is recycled,
we will ERROR out in StartWALReceiverEagerlyIfPossible() and the eager
start can be retried by the startup process when
CheckRecoveryConsistency() is called again.

>
>
> 3) I think the errmsg text format isn't correct. Note that the errmsg
> text starts with lowercase and doesn't end with "." whereas errdetail
> or errhint starts with uppercase and ends with ".". Please check other
> messages for reference.
> The following should be changed.
> + errmsg("Requesting stream from beginning of: %s",
> + errmsg("Invalid WAL segment found while calculating stream start:
> %s. Skipping.",
> + (errmsg("Could not start streaming WAL eagerly"),

Fixed.

> 4) I think you also need to have wal files names in double quotes,
> something like below:
> errmsg("could not close file \"%s\": %m", xlogfname)));

Fixed.

>
> 5) It is ".stream start: \"%s\", skipping..",
> + errmsg("Invalid WAL segment found while calculating stream start:
> %s. Skipping.",

Fixed.

> 4) I think the patch can make the startup process significantly slow,
> especially when there are lots of wal files that exist in the standby
> pg_wal directory. This is because of the overhead
> StartWALReceiverEagerlyIfPossible adds i.e. doing two while loops to
> figure out the start position of the
> streaming in advance. This might end up the startup process doing the
> loop over in the directory rather than the important thing of doing
> crash recovery or standby recovery.

Well, 99% of the time we can expect that the second loop finishes after
1 or 2 iterations, as the last valid WAL segment would most likely be
the highest numbered WAL file or thereabouts. I don't think that the
overhead will be significant as we are just looking up a directory
listing and not reading any files.

> 5) What happens if this new GUC is enabled in case of a synchronous
standby?
> What happens if this new GUC is enabled in case of a crash recovery?
> What happens if this new GUC is enabled in case a restore command is
> set i.e. standby performing archive recovery?

The GUC would behave the same way for all of these cases. If we have
chosen 'startup'/'consistency', we would be starting the WAL receiver
eagerly. There might be certain race conditions when one combines this
GUC with archive recovery, which was discussed upthread. [1]

> 6) How about bgwriter/checkpointer which gets started even before the
> startup process (or a new bg worker? of course it's going to be an
> overkill) finding out the new start pos for the startup process and
> then we could get rid of startup behaviour of the
> patch? This avoids an extra burden on the startup process. Many times,
> users will be complaining about why recovery is taking more time now,
> after the GUC wal_receiver_start_condition=startup.

Hmm, then we would be needing additional synchronization. There will
also be an added dependency on checkpoint_timeout. I don't think that
the performance hit is significant enough to warrant this change.

> 8) Can we have a better GUC name than wal_receiver_start_condition?
> Something like wal_receiver_start_at or wal_receiver_start or some
> other?

Sure, that makes more sense. Fixed.

Regards,
Soumyadeep (VMware)

[1]
https://www.postgresql.org/message-id/CAE-ML%2B-8KnuJqXKHz0mrC7-qFMQJ3ArDC78X3-AjGKos7Ceocw%40mail.gmail.com
From e6ffb5400fd841e4285939133739692c9cb4ba17 Mon Sep 17 00:00:00 2001
From: Soumyadeep Chakraborty 
Date: Fri, 19 Nov 2021 00:33:17 -0800
Subject: [PATCH v5 1/1] Introduce feature to start WAL receiver eagerly

This commit introduces a new GUC wal_receiver_start_condition which can
enable the standby to start it's WAL receiver at an earlier stage. The
GUC will default to starting the WAL receiver after WAL from archives
and pg_wal have been exhausted, designated by the value 

Re: Unnecessary delay in streaming replication due to replay lag

2021-11-27 Thread Bharath Rupireddy
On Tue, Nov 23, 2021 at 1:39 AM Soumyadeep Chakraborty
 wrote:
>
> Hi Bharath,
>
> Yes, that thread has been discussed here. Asim had x-posted the patch to [1]. 
> This thread
> was more recent when Ashwin and I picked up the patch in Aug 2021, so we 
> continued here.
> The patch has been significantly updated by us, addressing Michael's long 
> outstanding feedback.

Thanks for the patch. I reviewed it a bit, here are some comments:

1) A memory leak: add FreeDir(dir); before returning.
+ ereport(LOG,
+ (errmsg("Could not start streaming WAL eagerly"),
+ errdetail("There are timeline changes in the locally available WAL files."),
+ errhint("WAL streaming will begin once all local WAL and archives
are exhausted.")));
+ return;
+ }

2) Is there a guarantee that while we traverse the pg_wal directory to
find startsegno and endsegno, the new wal files arrive from the
primary or archive location or old wal files get removed/recycled by
the standby? Especially when wal_receiver_start_condition=consistency?
+ startsegno = (startsegno == -1) ? logSegNo : Min(startsegno, logSegNo);
+ endsegno = (endsegno == -1) ? logSegNo : Max(endsegno, logSegNo);
+ }

3) I think the errmsg text format isn't correct. Note that the errmsg
text starts with lowercase and doesn't end with "." whereas errdetail
or errhint starts with uppercase and ends with ".". Please check other
messages for reference.
The following should be changed.
+ errmsg("Requesting stream from beginning of: %s",
+ errmsg("Invalid WAL segment found while calculating stream start:
%s. Skipping.",
+ (errmsg("Could not start streaming WAL eagerly"),

4) I think you also need to have wal files names in double quotes,
something like below:
errmsg("could not close file \"%s\": %m", xlogfname)));

5) It is ".stream start: \"%s\", skipping..",
+ errmsg("Invalid WAL segment found while calculating stream start:
%s. Skipping.",

4) I think the patch can make the startup process significantly slow,
especially when there are lots of wal files that exist in the standby
pg_wal directory. This is because of the overhead
StartWALReceiverEagerlyIfPossible adds i.e. doing two while loops to
figure out the start position of the
streaming in advance. This might end up the startup process doing the
loop over in the directory rather than the important thing of doing
crash recovery or standby recovery.

5) What happens if this new GUC is enabled in case of a synchronous standby?
What happens if this new GUC is enabled in case of a crash recovery?
What happens if this new GUC is enabled in case a restore command is
set i.e. standby performing archive recovery?

6) How about bgwriter/checkpointer which gets started even before the
startup process (or a new bg worker? of course it's going to be an
overkill) finding out the new start pos for the startup process and
then we could get rid of startup behaviour of the
patch? This avoids an extra burden on the startup process. Many times,
users will be complaining about why recovery is taking more time now,
after the GUC wal_receiver_start_condition=startup.

7) I think we can just have 'consistency' and 'exhaust' behaviours and
let the bgwrite or checkpointer find out the start position for the
startup process, so the startup process whenever reaches a consistent
point, it sees if the other process has calculated
start pos for it or not, if yes it starts wal receiver other wise it
goes with its usual recovery. I'm not sure if this will be a good
idea.

8) Can we have a better GUC name than wal_receiver_start_condition?
Something like wal_receiver_start_at or wal_receiver_start or some
other?

Regards,
Bharath Rupireddy.




Re: Unnecessary delay in streaming replication due to replay lag

2021-11-22 Thread Soumyadeep Chakraborty
Hi Bharath,

Yes, that thread has been discussed here. Asim had x-posted the patch to
[1]. This thread
was more recent when Ashwin and I picked up the patch in Aug 2021, so we
continued here.
The patch has been significantly updated by us, addressing Michael's long
outstanding feedback.

Regards,
Soumyadeep (VMware)

[1]
https://www.postgresql.org/message-id/CANXE4TeinQdw%2BM2Or0kTR24eRgWCOg479N8%3DgRvj9Ouki-tZFg%40mail.gmail.com


Re: Unnecessary delay in streaming replication due to replay lag

2021-11-22 Thread Bharath Rupireddy
On Fri, Nov 19, 2021 at 2:05 PM Soumyadeep Chakraborty
 wrote:
>
> Hi Daniel,
>
> Thanks for checking in on this patch.
> Attached rebased version.

Hi, I've not gone through the patch or this thread entirely, yet, can
you please confirm if there's any relation between this thread and
another one at [1]

[1] 
https://www.postgresql.org/message-id/CAFiTN-vzbcSM_qZ%2B-mhS3OWecxupDCR5DkhQUTy%2BTKfrCMQLKQ%40mail.gmail.com




Re: Unnecessary delay in streaming replication due to replay lag

2021-11-19 Thread Soumyadeep Chakraborty
Hi Daniel,

Thanks for checking in on this patch.
Attached rebased version.

Regards,
Soumyadeep (VMware)
From 51149e4f877dc2f8bf47a1356fc8b0ec2512ac6d Mon Sep 17 00:00:00 2001
From: Soumyadeep Chakraborty 
Date: Fri, 19 Nov 2021 00:33:17 -0800
Subject: [PATCH v4 1/1] Introduce feature to start WAL receiver eagerly

This commit introduces a new GUC wal_receiver_start_condition which can
enable the standby to start it's WAL receiver at an earlier stage. The
GUC will default to starting the WAL receiver after WAL from archives
and pg_wal have been exhausted, designated by the value 'exhaust'.
The value of 'startup' indicates that the WAL receiver will be started
immediately on standby startup. Finally, the value of 'consistency'
indicates that the server will start after the standby has replayed up
to the consistency point.

If 'startup' or 'consistency' is specified, the starting point for the
WAL receiver will always be the end of all locally available WAL in
pg_wal. The end is determined by finding the latest WAL segment in
pg_wal and then iterating to the earliest segment. The iteration is
terminated as soon as a valid WAL segment is found. Streaming can then
commence from the start of that segment.

Co-authors: Ashwin Agrawal, Asim Praveen, Wu Hao, Konstantin Knizhnik
Discussion:
https://www.postgresql.org/message-id/flat/CANXE4Tc3FNvZ_xAimempJWv_RH9pCvsZH7Yq93o1VuNLjUT-mQ%40mail.gmail.com
---
 doc/src/sgml/config.sgml  |  33 
 src/backend/access/transam/xlog.c | 170 +-
 src/backend/replication/walreceiver.c |   1 +
 src/backend/utils/misc/guc.c  |  18 ++
 src/backend/utils/misc/postgresql.conf.sample |   3 +-
 src/include/replication/walreceiver.h |  10 ++
 src/test/recovery/t/027_walreceiver_start.pl  |  96 ++
 7 files changed, 321 insertions(+), 10 deletions(-)
 create mode 100644 src/test/recovery/t/027_walreceiver_start.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3f806740d5d..b911615bf04 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4604,6 +4604,39 @@ ANY num_sync ( 
+  wal_receiver_start_condition (enum)
+  
+   wal_receiver_start_condition configuration parameter
+  
+  
+  
+   
+Specifies when the WAL receiver process will be started for a standby
+server.
+The allowed values of wal_receiver_start_condition
+are startup (start immediately when the standby starts),
+consistency (start only after reaching consistency), and
+exhaust (start only after all WAL from the archive and
+pg_wal has been replayed)
+ The default setting isexhaust.
+   
+
+   
+Traditionally, the WAL receiver process is started only after the
+standby server has exhausted all WAL from the WAL archive and the local
+pg_wal directory. In some environments there can be a significant volume
+of local WAL left to replay, along with a large volume of yet to be
+streamed WAL. Such environments can benefit from setting
+wal_receiver_start_condition to
+startup or consistency. These
+values will lead to the WAL receiver starting much earlier, and from
+the end of locally available WAL. The network will be utilized to stream
+WAL concurrently with replay, improving performance significantly.
+   
+  
+ 
+
  
   wal_receiver_status_interval (integer)
   
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 16164483688..9c1243a2f85 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -983,6 +983,8 @@ static XLogRecPtr XLogBytePosToRecPtr(uint64 bytepos);
 static XLogRecPtr XLogBytePosToEndRecPtr(uint64 bytepos);
 static uint64 XLogRecPtrToBytePos(XLogRecPtr ptr);
 static void checkXLogConsistency(XLogReaderState *record);
+static void StartWALReceiverEagerlyIfPossible(WalRcvStartCondition startPoint,
+			  TimeLineID currentTLI);
 
 static void WALInsertLockAcquire(void);
 static void WALInsertLockAcquireExclusive(void);
@@ -1538,6 +1540,152 @@ checkXLogConsistency(XLogReaderState *record)
 	}
 }
 
+/*
+ * Start WAL receiver eagerly without waiting to play all WAL from the archive
+ * and pg_wal. First, find the last valid WAL segment in pg_wal and then request
+ * streaming to commence from it's beginning. startPoint signifies whether we
+ * are trying the eager start right at startup or once we have reached
+ * consistency.
+ */
+static void
+StartWALReceiverEagerlyIfPossible(WalRcvStartCondition startPoint,
+  TimeLineID currentTLI)
+{
+	DIR		   *dir;
+	struct dirent *de;
+	XLogSegNo	startsegno = -1;
+	XLogSegNo	endsegno = -1;
+
+	/*
+	 * We should not be starting the walreceiver during bootstrap/init
+	 * processing.
+	 */
+	if (!IsNormalProcessingMode())
+		return;
+
+	/* Only the startup process 

Re: Unnecessary delay in streaming replication due to replay lag

2021-11-15 Thread Daniel Gustafsson
> On 10 Nov 2021, at 00:41, Soumyadeep Chakraborty  
> wrote:

> Thanks for the detailed review! Attached is a rebased patch that addresses
> most of the feedback.

This patch no longer applies after e997a0c64 and associated follow-up commits,
please submit a rebased version.

--
Daniel Gustafsson   https://vmware.com/





Re: Unnecessary delay in streaming replication due to replay lag

2021-11-09 Thread Soumyadeep Chakraborty
Hi Michael,

Thanks for the detailed review! Attached is a rebased patch that addresses
most of the feedback.

On Mon, Nov 8, 2021 at 1:41 AM Michael Paquier  wrote:

> +static void
> +StartWALReceiverEagerly()
> +{
> The patch fails to apply because of the recent changes from Robert to
> eliminate ThisTimeLineID.  The correct thing to do would be to add one
> TimeLineID argument, passing down the local ThisTimeLineID in
> StartupXLOG() and using XLogCtl->lastReplayedTLI in
> CheckRecoveryConsistency().

Rebased.

> +   /*
> +* We should never reach here. We should have at least one valid
WAL
> +* segment in our pg_wal, for the standby to have started.
> +*/
> +   Assert(false);
> The reason behind that is not that we have a standby, but that we read
> at least the segment that included the checkpoint record we are
> replaying from, at least (it is possible for a standby to start
> without any contents in pg_wal/ as long as recovery is configured),
> and because StartWALReceiverEagerly() is called just after that.

Fair, comment updated.

> It would be better to make sure that StartWALReceiverEagerly() gets
> only called from the startup process, perhaps?

Added Assert(AmStartupProcess()) at the beginning of
StartWALReceiverEagerly().

>
> +   RequestXLogStreaming(ThisTimeLineID, startptr, PrimaryConnInfo,
> +PrimarySlotName,
wal_receiver_create_temp_slot);
> +   XLogReaderFree(state);
> XLogReaderFree() should happen before RequestXLogStreaming().  The
> tipping point of the patch is here, where the WAL receiver is started
> based on the location of the first valid WAL record found.

Done.

> wal_receiver_start_condition is missing in postgresql.conf.sample.

Fixed.

> +   /*
> +* Start WAL receiver eagerly if requested.
> +*/
> +   if (StandbyModeRequested && !WalRcvStreaming() &&
> +   PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0 &&
> +   wal_receiver_start_condition == WAL_RCV_START_AT_STARTUP)
> +   StartWALReceiverEagerly();
> [...]
> +   if (StandbyModeRequested && !WalRcvStreaming() &&
reachedConsistency &&
> +   PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0 &&
> +   wal_receiver_start_condition ==
WAL_RCV_START_AT_CONSISTENCY)
> +   StartWALReceiverEagerly();
> This repeats two times the same set of conditions, which does not look
> like a good idea to me.  I think that you'd better add an extra
> argument to StartWALReceiverEagerly to track the start timing expected
> in this code path, that will be matched with the GUC in the routine.
> It would be better to document the reasons behind each check done, as
> well.

Done.

> So, this reads the contents of pg_wal/ for any files that exist, then
> goes down to the first segment found with a valid beginning.  That's
> going to be expensive with a large max_wal_size. When searching for a
> point like that, a dichotomy method would be better to calculate a LSN
> you'd like to start from.

Even if there is a large max_wal_size, do we expect that there will be
a lot of invalid high-numbered WAL files? If that is not the case, most
of the time we would be looking at the last 1 or 2 WAL files to
determine the start point, making it efficient?

> Anyway, I think that there is a problem
> with the approach: what should we do if there are holes in the
> segments present in pg_wal/?  As of HEAD, or
> wal_receiver_start_condition = 'exhaust' in this patch, we would
> switch across local pg_wal/, archive and stream in a linear way,
> thanks to WaitForWALToBecomeAvailable().  For example, imagine that we
> have a standby with the following set of valid segments, because of
> the buggy way a base backup has been taken:
> 00010001
> 00010003
> 00010005
> What the patch would do is starting a WAL receiver from segment 5,
> which is in contradiction with the existing logic where we should try
> to look for the segment once we are waiting for something in segment
> 2.  This would be dangerous once the startup process waits for some
> WAL to become available, because we have a WAL receiver started, but
> we cannot fetch the segment we have.  Perhaps a deployment has
> archiving, in which case it would be able to grab segment 2 (if no
> archiving, recovery would not be able to move on, so that would be
> game over).

We could easily check for holes while we are doing the ReadDir() and
bail fron the early start if there are holes, just like we do if there
is a timeline jump in any of the WAL segments.

>  /*
>   * Move to XLOG_FROM_STREAM state, and set to start a
> - * walreceiver if necessary.
> + * walreceiver if necessary. The WAL receiver may have
> + * already started (if it was configured to start
> + * eagerly).
>   */
>  currentSource = XLOG_FROM_STREAM;
> -

Re: Unnecessary delay in streaming replication due to replay lag

2021-11-08 Thread Michael Paquier
On Tue, Aug 24, 2021 at 09:51:25PM -0700, Soumyadeep Chakraborty wrote:
> Ashwin and I recently got a chance to work on this and we addressed all
> outstanding feedback and suggestions. PFA a significantly reworked patch.

+static void
+StartWALReceiverEagerly()
+{
The patch fails to apply because of the recent changes from Robert to
eliminate ThisTimeLineID.  The correct thing to do would be to add one
TimeLineID argument, passing down the local ThisTimeLineID in
StartupXLOG() and using XLogCtl->lastReplayedTLI in
CheckRecoveryConsistency().

+   /*
+* We should never reach here. We should have at least one valid WAL
+* segment in our pg_wal, for the standby to have started.
+*/
+   Assert(false);
The reason behind that is not that we have a standby, but that we read
at least the segment that included the checkpoint record we are
replaying from, at least (it is possible for a standby to start
without any contents in pg_wal/ as long as recovery is configured),
and because StartWALReceiverEagerly() is called just after that.

It would be better to make sure that StartWALReceiverEagerly() gets
only called from the startup process, perhaps?

+   RequestXLogStreaming(ThisTimeLineID, startptr, PrimaryConnInfo,
+PrimarySlotName, wal_receiver_create_temp_slot);
+   XLogReaderFree(state);
XLogReaderFree() should happen before RequestXLogStreaming().  The
tipping point of the patch is here, where the WAL receiver is started
based on the location of the first valid WAL record found.

wal_receiver_start_condition is missing in postgresql.conf.sample.

+   /*
+* Start WAL receiver eagerly if requested.
+*/
+   if (StandbyModeRequested && !WalRcvStreaming() &&
+   PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0 &&
+   wal_receiver_start_condition == WAL_RCV_START_AT_STARTUP)
+   StartWALReceiverEagerly();
[...]
+   if (StandbyModeRequested && !WalRcvStreaming() && reachedConsistency &&
+   PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0 &&
+   wal_receiver_start_condition == WAL_RCV_START_AT_CONSISTENCY)
+   StartWALReceiverEagerly();
This repeats two times the same set of conditions, which does not look
like a good idea to me.  I think that you'd better add an extra
argument to StartWALReceiverEagerly to track the start timing expected
in this code path, that will be matched with the GUC in the routine.
It would be better to document the reasons behind each check done, as
well.

+   /* Find the latest and earliest WAL segments in pg_wal */
+   dir = AllocateDir("pg_wal");
+   while ((de = ReadDir(dir, "pg_wal")) != NULL)
+   {
[ ... ]
+   /* Find the latest valid WAL segment and request streaming from its 
start */
+   while (endsegno >= startsegno)
+   {
[...]
+   XLogReaderFree(state);
+   endsegno--;
+   }
So, this reads the contents of pg_wal/ for any files that exist, then
goes down to the first segment found with a valid beginning.  That's
going to be expensive with a large max_wal_size.  When searching for a
point like that, a dichotomy method would be better to calculate a LSN
you'd like to start from.  Anyway, I think that there is a problem
with the approach: what should we do if there are holes in the
segments present in pg_wal/?  As of HEAD, or
wal_receiver_start_condition = 'exhaust' in this patch, we would
switch across local pg_wal/, archive and stream in a linear way,
thanks to WaitForWALToBecomeAvailable().  For example, imagine that we
have a standby with the following set of valid segments, because of
the buggy way a base backup has been taken:
00010001
00010003
00010005
What the patch would do is starting a WAL receiver from segment 5,
which is in contradiction with the existing logic where we should try
to look for the segment once we are waiting for something in segment
2.  This would be dangerous once the startup process waits for some
WAL to become available, because we have a WAL receiver started, but
we cannot fetch the segment we have.  Perhaps a deployment has
archiving, in which case it would be able to grab segment 2 (if no
archiving, recovery would not be able to move on, so that would be
game over).
 
 /*
  * Move to XLOG_FROM_STREAM state, and set to start a
- * walreceiver if necessary.
+ * walreceiver if necessary. The WAL receiver may have
+ * already started (if it was configured to start
+ * eagerly).
  */
 currentSource = XLOG_FROM_STREAM;
-startWalReceiver = true;
+startWalReceiver = !WalRcvStreaming();
 break;
 case XLOG_FROM_ARCHIVE:
 case XLOG_FROM_PG_WAL:
 
-/*
- * WAL receiver must not be running when reading WAL from
- * archive or pg_wal.
- */
-

Re: Unnecessary delay in streaming replication due to replay lag

2021-10-25 Thread Soumyadeep Chakraborty
Rebased and added a CF entry for Nov CF:
https://commitfest.postgresql.org/35/3376/.


Re: Unnecessary delay in streaming replication due to replay lag

2021-08-24 Thread Soumyadeep Chakraborty
Hello,

Ashwin and I recently got a chance to work on this and we addressed all
outstanding feedback and suggestions. PFA a significantly reworked patch.

On 20.11.2020 11:21, Michael Paquier wrote:

> This patch thinks that it is fine to request streaming even if
> PrimaryConnInfo is not set, but that's not fine.

We introduced a check to ensure that PrimaryConnInfo is set up before we
request the WAL stream eagerly.

> Anyway, I don't quite understand what you are trying to achieve here.
> "startpoint" is used to request the beginning of streaming.  It is
> roughly the consistency LSN + some alpha with some checks on WAL
> pages (those WAL page checks are not acceptable as they make
> maintenance harder).  What about the case where consistency is
> reached but there are many segments still ahead that need to be
> replayed?  Your patch would cause streaming to begin too early, and
> a manual copy of segments is not a rare thing as in some environments
> a bulk copy of segments can make the catchup of a standby faster than
> streaming.
>
> It seems to me that what you are looking for here is some kind of
> pre-processing before entering the redo loop to determine the LSN
> that could be reused for the fast streaming start, which should match
> the end of the WAL present locally.  In short, you would need a
> XLogReaderState that begins a scan of WAL from the redo point until it
> cannot find anything more, and use the last LSN found as a base to
> begin requesting streaming.  The question of timeline jumps can also
> be very tricky, but it could also be possible to not allow this option
> if a timeline jump happens while attempting to guess the end of WAL
> ahead of time.  Another thing: could it be useful to have an extra
> mode to begin streaming without waiting for consistency to finish?

1. When wal_receiver_start_condition='consistency', we feel that the
stream start point calculation should be done only when we reach
consistency. Imagine the situation where consistency is reached 2 hours
after start, and within that 2 hours a lot of WAL has been manually
copied over into the standby's pg_wal. If we pre-calculated the stream
start location before we entered the main redo apply loop, we would be
starting the stream from a much earlier location (minus the 2 hours
worth of WAL), leading to wasted work.

2. We have significantly changed the code to calculate the WAL stream
start location. We now traverse pg_wal, find the latest valid WAL
segment and start the stream from the segment's start. This is much
more performant than reading from the beginning of the locally available
WAL.

3. To perform the validation check, we no longer have duplicate code -
as we can now rely on the XLogReaderState(), XLogReaderValidatePageHeader()
and friends.

4. We have an extra mode: wal_receiver_start_condition='startup', which
will start the WAL receiver before the startup process reaches
consistency. We don't fully understand the utility of having 'startup' over
'consistency' though.

5. During the traversal of pg_wal, if we find WAL segments on differing
timelines, we bail out and abandon attempting to start the WAL stream
eagerly.

6. To handle the cases where a lot of WAL is copied over after the
WAL receiver has started at consistency:
i) Don't recommend wal_receiver_start_condition='startup|consistency'.

ii) Copy over the WAL files and then start the standby, so that the WAL
stream starts from a fresher point.

iii) Have an LSN/segment# target to start the WAL receiver from?

7. We have significantly changed the test. It is much more simplified
and focused.

8. We did not test wal_receiver_start_condition='startup' in the test.
It's actually hard to assert that the walreceiver has started at
startup. recovery_min_apply_delay only kicks in once we reach
consistency, and thus there is no way I could think of to reliably halt
the startup process and check: "Has the wal receiver started even
though the standby hasn't reached consistency?" Only way we could think
of is to generate a large workload during the course of the backup so
that the standby has significant WAL to replay before it reaches
consistency. But that will make the test flaky as we will have no
absolutely precise wait condition. That said, we felt that checking
for 'consistency' is enough as it covers the majority of the added
code.

9. We added a documentation section describing the GUC.


Regards,
Ashwin and Soumyadeep (VMware)
From b3703fb16a352bd9166ed75de7b68599c735ac63 Mon Sep 17 00:00:00 2001
From: Soumyadeep Chakraborty 
Date: Fri, 30 Jul 2021 18:21:55 -0700
Subject: [PATCH v1 1/1] Introduce feature to start WAL receiver eagerly

This commit introduces a new GUC wal_receiver_start_condition which can
enable the standby to start it's WAL receiver at an earlier stage. The
GUC will default to starting the WAL receiver after WAL from archives
and pg_wal have been exhausted, designated by the value 'exhaust'.
The value of 'startup' indicates that the 

Re: Unnecessary delay in streaming replication due to replay lag

2020-12-01 Thread Anastasia Lubennikova

On 20.11.2020 11:21, Michael Paquier wrote:

On Tue, Sep 15, 2020 at 05:30:22PM +0800, lchch1...@sina.cn wrote:

I read the code and test the patch, it run well on my side, and I have several 
issues on the
patch.

+   RequestXLogStreaming(ThisTimeLineID,
+startpoint,
+PrimaryConnInfo,
+PrimarySlotName,
+wal_receiver_create_temp_slot);

This patch thinks that it is fine to request streaming even if
PrimaryConnInfo is not set, but that's not fine.

Anyway, I don't quite understand what you are trying to achieve here.
"startpoint" is used to request the beginning of streaming.  It is
roughly the consistency LSN + some alpha with some checks on WAL
pages (those WAL page checks are not acceptable as they make
maintenance harder).  What about the case where consistency is
reached but there are many segments still ahead that need to be
replayed?  Your patch would cause streaming to begin too early, and
a manual copy of segments is not a rare thing as in some environments
a bulk copy of segments can make the catchup of a standby faster than
streaming.

It seems to me that what you are looking for here is some kind of
pre-processing before entering the redo loop to determine the LSN
that could be reused for the fast streaming start, which should match
the end of the WAL present locally.  In short, you would need a
XLogReaderState that begins a scan of WAL from the redo point until it
cannot find anything more, and use the last LSN found as a base to
begin requesting streaming.  The question of timeline jumps can also
be very tricky, but it could also be possible to not allow this option
if a timeline jump happens while attempting to guess the end of WAL
ahead of time.  Another thing: could it be useful to have an extra
mode to begin streaming without waiting for consistency to finish?
--
Michael



Status update for a commitfest entry.

This entry was "Waiting On Author" during this CF, so I've marked it as 
returned with feedback. Feel free to resubmit an updated version to a 
future commitfest.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Unnecessary delay in streaming replication due to replay lag

2020-11-20 Thread Michael Paquier
On Tue, Sep 15, 2020 at 05:30:22PM +0800, lchch1...@sina.cn wrote:
> I read the code and test the patch, it run well on my side, and I have 
> several issues on the
> patch.

+   RequestXLogStreaming(ThisTimeLineID,
+startpoint,
+PrimaryConnInfo,
+PrimarySlotName,
+wal_receiver_create_temp_slot);

This patch thinks that it is fine to request streaming even if
PrimaryConnInfo is not set, but that's not fine.

Anyway, I don't quite understand what you are trying to achieve here.
"startpoint" is used to request the beginning of streaming.  It is
roughly the consistency LSN + some alpha with some checks on WAL
pages (those WAL page checks are not acceptable as they make
maintenance harder).  What about the case where consistency is
reached but there are many segments still ahead that need to be
replayed?  Your patch would cause streaming to begin too early, and
a manual copy of segments is not a rare thing as in some environments
a bulk copy of segments can make the catchup of a standby faster than
streaming.

It seems to me that what you are looking for here is some kind of
pre-processing before entering the redo loop to determine the LSN
that could be reused for the fast streaming start, which should match
the end of the WAL present locally.  In short, you would need a
XLogReaderState that begins a scan of WAL from the redo point until it
cannot find anything more, and use the last LSN found as a base to
begin requesting streaming.  The question of timeline jumps can also
be very tricky, but it could also be possible to not allow this option
if a timeline jump happens while attempting to guess the end of WAL
ahead of time.  Another thing: could it be useful to have an extra
mode to begin streaming without waiting for consistency to finish?
--
Michael


signature.asc
Description: PGP signature


Re: Unnecessary delay in streaming replication due to replay lag

2020-09-15 Thread lchch1...@sina.cn
Hello

I read the code and test the patch, it run well on my side, and I have several 
issues on the
patch.

1. When call RequestXLogStreaming() during replay, you pick timeline straightly 
from control
file, do you think it should pick timeline from timeline history file?

2. In archive recovery mode which will never turn to a stream mode, I think in 
current code it
will call RequestXLogStreaming() too which can avoid.

3. I found two 018_x.pl when I do make check, maybe rename the new one?




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: Unnecessary delay in streaming replication due to replay lag

2020-08-10 Thread Asim Praveen


> On 10-Aug-2020, at 12:27 PM, Masahiko Sawada 
>  wrote:
> 
> The patch can be applied cleanly to the current HEAD but I got the
> error on building the code with this patch:
> 
> xlog.c: In function 'StartupXLOG':
> xlog.c:7315:6: error: too few arguments to function 'RequestXLogStreaming'
> 7315 |  RequestXLogStreaming(ThisTimeLineID,
>  |  ^~~~
> In file included from xlog.c:59:
> ../../../../src/include/replication/walreceiver.h:463:13: note: declared here
>  463 | extern void RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr,
>  | ^~~~
> 
> cfbot also complaints this.
> 
> Could you please update the patch?
> 

Thank you for trying the patch and apologies for the compiler error.  I missed 
adding a hunk earlier, it should be fixed in the version attached here.



v3-0001-Start-WAL-receiver-before-startup-process-replays.patch
Description:  v3-0001-Start-WAL-receiver-before-startup-process-replays.patch


Re: Unnecessary delay in streaming replication due to replay lag

2020-08-10 Thread Masahiko Sawada
On Sun, 9 Aug 2020 at 14:54, Asim Praveen  wrote:
>
> I would like to revive this thready by submitting a rebased patch to start 
> streaming replication without waiting for startup process to finish replaying 
> all WAL.  The start LSN for streaming is determined to be the LSN that points 
> to the beginning of the most recently flushed WAL segment.
>
> The patch passes tests under src/test/recovery and top level “make check”.
>

The patch can be applied cleanly to the current HEAD but I got the
error on building the code with this patch:

xlog.c: In function 'StartupXLOG':
xlog.c:7315:6: error: too few arguments to function 'RequestXLogStreaming'
 7315 |  RequestXLogStreaming(ThisTimeLineID,
  |  ^~~~
In file included from xlog.c:59:
../../../../src/include/replication/walreceiver.h:463:13: note: declared here
  463 | extern void RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr,
  | ^~~~

cfbot also complaints this.

Could you please update the patch?

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Unnecessary delay in streaming replication due to replay lag

2020-08-09 Thread Asim Praveen


> On 09-Aug-2020, at 2:11 PM, Michael Paquier  wrote:
> 
> I have not really looked at the proposed patch, but it would be good
> to have some documentation.
> 

Ah, right.  The basic idea is to reuse the logic to allow read-only connections 
to also start WAL streaming.  The patch borrows a new GUC 
“wal_receiver_start_condition” introduced by another patch alluded to upthread. 
 It affects when to start WAL receiver process on a standby.  By default, the 
GUC is set to “replay”, which means no change in current behavior - WAL 
receiver is started only after replaying all WAL already available in pg_wal.  
When set to “consistency”, WAL receiver process is started earlier, as soon as 
consistent state is reached during WAL replay.

The LSN where to start streaming from is determined to be the LSN that points 
at the beginning of the WAL segment file that was most recently flushed in 
pg_wal.  To find the most recently flushed WAL segment, first blocks of all WAL 
segment files in pg_wal, starting from the segment that contains currently 
replayed record, are inspected.  The search stops when a first page with no 
valid header is found.

The benefits of starting WAL receiver early are mentioned upthread but allow me 
to reiterate: as WAL streaming starts, any commits that are waiting for 
synchronous replication on the master are unblocked.  The benefit of this is 
apparent in situations where significant replay lag has been built up and the 
replication is configured to be synchronous.

Asim

Re: Unnecessary delay in streaming replication due to replay lag

2020-08-09 Thread Michael Paquier
On Sun, Aug 09, 2020 at 05:54:32AM +, Asim Praveen wrote:
> I would like to revive this thready by submitting a rebased patch to
> start streaming replication without waiting for startup process to
> finish replaying all WAL.  The start LSN for streaming is determined
> to be the LSN that points to the beginning of the most recently
> flushed WAL segment.
> 
> The patch passes tests under src/test/recovery and top level “make check”.

I have not really looked at the proposed patch, but it would be good
to have some documentation.
--
Michael


signature.asc
Description: PGP signature


Re: Unnecessary delay in streaming replication due to replay lag

2020-08-08 Thread Asim Praveen
I would like to revive this thready by submitting a rebased patch to start 
streaming replication without waiting for startup process to finish replaying 
all WAL.  The start LSN for streaming is determined to be the LSN that points 
to the beginning of the most recently flushed WAL segment.

The patch passes tests under src/test/recovery and top level “make check”.



v2-0001-Start-WAL-receiver-before-startup-process-replays.patch
Description:  v2-0001-Start-WAL-receiver-before-startup-process-replays.patch


Re: Unnecessary delay in streaming replication due to replay lag

2020-01-17 Thread Asim R P
On Fri, Jan 17, 2020 at 11:08 AM Michael Paquier 
wrote:
>
> On Fri, Jan 17, 2020 at 09:34:05AM +0530, Asim R P wrote:
> >
> > 0001 - TAP test to demonstrate the problem.
>
> There is no real need for debug_replay_delay because we have already
> recovery_min_apply_delay, no?  That would count only after consistency
> has been reached, and only for COMMIT records, but your test would be
> enough with that.
>

Indeed, we didn't know about recovery_min_apply_delay.  Thank you for
the suggestion, the updated test is attached.

>
> > This is a POC, we are looking for early feedback on whether the
> > problem is worth solving and if it makes sense to solve if along this
> > route.
>
> You are not the first person interested in this problem, we have a
> patch registered in this CF to control the timing when a WAL receiver
> is started at recovery:
> https://commitfest.postgresql.org/26/1995/
>
https://www.postgresql.org/message-id/b271715f-f945-35b0-d1f5-c9de3e56f...@postgrespro.ru
>

Great to know about this patch and the discussion.  The test case and
the part that saves next start point in control file from our patch
can be combined with Konstantin's patch to solve this problem.  Let me
work on that.

> I am pretty sure that we should not change the default behavior to
> start the WAL receiver after replaying everything from the archives to
> avoid copying some WAL segments for nothing, so being able to use a
> GUC switch should be the way to go, and Konstantin's latest patch was
> using this approach.  Your patch 0002 adds visibly a third mode: start
> immediately on top of the two ones already proposed:
> - Start after replaying all WAL available locally and in the
> archives.
> - Start after reaching a consistent point.

Consistent point should be reached fairly quickly, in spite of large
replay lag.  Min recovery point is updated during XLOG flush and that
happens when a commit record is replayed.  Commits should occur
frequently in the WAL stream.  So I do not see much value in starting
WAL receiver immediately as compared to starting it after reaching a
consistent point.  Does that make sense?

That said, is there anything obviously wrong with starting WAL receiver
immediately, even before reaching consistent state?  A consequence is
that WAL receiver may overwrite a WAL segment while startup process is
reading and replaying WAL from it.  But that doesn't appear to be a
problem because the overwrite should happen with identical content as
before.

Asim


v1-0001-Test-that-replay-of-WAL-logs-on-standby-does-not-.patch
Description: Binary data


v1-0003-Start-WAL-receiver-when-it-is-found-not-running.patch
Description: Binary data


v1-0002-Start-WAL-receiver-before-startup-process-replays.patch
Description: Binary data


Re: Unnecessary delay in streaming replication due to replay lag

2020-01-16 Thread Michael Paquier
On Fri, Jan 17, 2020 at 09:34:05AM +0530, Asim R P wrote:
> Standby does not start walreceiver process until startup process
> finishes WAL replay.  The more WAL there is to replay, longer is the
> delay in starting streaming replication.  If replication connection is
> temporarily disconnected, this delay becomes a major problem and we
> are proposing a solution to avoid the delay.

Yeah, that's documented:
https://www.postgresql.org/message-id/20190910062325.gd11...@paquier.xyz

> We propose to address this by starting walreceiver without waiting for
> startup process to finish replay of WAL.  Please see attached
> patchset.  It can be summarized as follows:
> 
> 0001 - TAP test to demonstrate the problem.

There is no real need for debug_replay_delay because we have already
recovery_min_apply_delay, no?  That would count only after consistency
has been reached, and only for COMMIT records, but your test would be
enough with that.

> 0002 - The standby startup sequence is changed such that
>walreceiver is started by startup process before it begins
>to replay WAL.

See below.

> 0003 - Postmaster starts walreceiver if it finds that a
>walreceiver process is no longer running and the state
>indicates that it is operating as a standby.

I have not checked in details, but I smell some race conditions
between the postmaster and the startup process here.

> This is a POC, we are looking for early feedback on whether the
> problem is worth solving and if it makes sense to solve if along this
> route.

You are not the first person interested in this problem, we have a
patch registered in this CF to control the timing when a WAL receiver
is started at recovery:
https://commitfest.postgresql.org/26/1995/
https://www.postgresql.org/message-id/b271715f-f945-35b0-d1f5-c9de3e56f...@postgrespro.ru

I am pretty sure that we should not change the default behavior to
start the WAL receiver after replaying everything from the archives to
avoid copying some WAL segments for nothing, so being able to use a
GUC switch should be the way to go, and Konstantin's latest patch was
using this approach.  Your patch 0002 adds visibly a third mode: start
immediately on top of the two ones already proposed:
- Start after replaying all WAL available locally and in the
archives.
- Start after reaching a consistent point.
--
Michael


signature.asc
Description: PGP signature


Unnecessary delay in streaming replication due to replay lag

2020-01-16 Thread Asim R P
Hi

Standby does not start walreceiver process until startup process
finishes WAL replay.  The more WAL there is to replay, longer is the
delay in starting streaming replication.  If replication connection is
temporarily disconnected, this delay becomes a major problem and we
are proposing a solution to avoid the delay.

WAL replay is likely to fall behind when master is processing
write-heavy workload, because WAL is generated by concurrently running
backends on master while only one startup process on standby replays WAL
records in sequence as new WAL is received from master.

Replication connection between walsender and walreceiver may break due
to reasons such as transient network issue, standby going through
restart, etc.  The delay in resuming replication connection leads to
lack of high availability - only one copy of WAL is available during
this period.

The problem worsens when the replication is configured to be
synchronous.  Commits on master must wait until the WAL replay is
finished on standby, walreceiver is then started and it confirms flush
of WAL upto the commit LSN.  If synchronous_commit GUC is set to
remote_write, this behavior is equivalent to tacitly changing it to
remote_apply until the replication connection is re-established!

Has anyone encountered such a problem with streaming replication?

We propose to address this by starting walreceiver without waiting for
startup process to finish replay of WAL.  Please see attached
patchset.  It can be summarized as follows:

0001 - TAP test to demonstrate the problem.

0002 - The standby startup sequence is changed such that
   walreceiver is started by startup process before it begins
   to replay WAL.

0003 - Postmaster starts walreceiver if it finds that a
   walreceiver process is no longer running and the state
   indicates that it is operating as a standby.

This is a POC, we are looking for early feedback on whether the
problem is worth solving and if it makes sense to solve if along this
route.

Hao and Asim


0001-Test-that-replay-of-WAL-logs-on-standby-does-not-aff.patch
Description: Binary data


0003-Start-WAL-receiver-when-it-is-found-not-running.patch
Description: Binary data


0002-Start-WAL-receiver-before-startup-process-replays-ex.patch
Description: Binary data