Re: Crash by targetted recovery

2020-03-10 Thread Kyotaro Horiguchi
(Hmm. My sight must be as short as 2 word length..)

At Tue, 10 Mar 2020 14:59:00 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> In the first place the "seg" is "fileno". Honestly I don't think the
> test doesn't reach to fileno boundary but I did in the attached. Since

Of course it is a mistake of "Honestly I don't think the test reaches
to fileno boudary".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Crash by targetted recovery

2020-03-10 Thread Kyotaro Horiguchi
At Tue, 10 Mar 2020 10:50:52 +0900, Fujii Masao  
wrote in 
> Pushed the v5-0001-Tidy-up-XLogSource-usage.patch!

Thanks!

> Regarding the remaining patch adding the regression test,

I didn't seriously inteneded it to be in the tree.

> +$result =
> + $node_standby->safe_psql('postgres', "SELECT
> pg_last_wal_replay_lsn()");
> +my ($seg, $off) = split('/', $result);
> +my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) *
> $segsize);
> 
> What happens if "off" part gets the upper limit and "seg" part needs
> to be incremented? What happens if pg_last_wal_replay_lsn() advances
> very much (e.g., because of autovacuum) beyond the segment boundary
> until the standby restarts? Of course, these situations very rarely
> happen,
> but I'd like to avoid adding such not stable test if possible.

In the first place the "seg" is "fileno". Honestly I don't think the
test doesn't reach to fileno boundary but I did in the attached. Since
perl complains over-32bit integer arithmetic as incomptible, the
calculation gets a bit odd shape to avoid over-32bit arithmetic.

For the second point, which seems more likely to happen, I added the
VACUUM/pg_switch_wal() sequence then wait standby for catch up, before
doing the test.

Does it make sense?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 89621911e09df73f0f3a63501bb41ffb37edef05 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 10 Mar 2020 14:43:29 +0900
Subject: [PATCH v5] TAP test for a crash bug

Add a test for targetted promotion happend at segment boundary.
---
 src/test/recovery/t/003_recovery_targets.pl | 50 -
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index fd14bab208..6b319bc891 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 10;
 use Time::HiRes qw(usleep);
 
 # Create and test a standby from given backup, with a certain recovery target.
@@ -167,3 +167,51 @@ foreach my $i (0..100)
 $logfile = slurp_file($node_standby->logfile());
 ok($logfile =~ qr/FATAL:  recovery ended before configured recovery target was reached/,
 	'recovery end before target reached is a fatal error');
+
+# Corner case where targetted promotion happens on segment boundary
+$node_standby = get_new_node('standby_9');
+$node_standby->init_from_backup($node_master, 'my_backup',
+has_restoring => 1, has_streaming => 1);
+$node_standby->start;
+## make sure replication stays at the beginning of a segment
+$node_master->safe_psql('postgres', "VACUUM; SELECT pg_switch_wal();");
+my $catch_up_lsn =
+  $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
+my $caughtup_query =
+  "SELECT '$catch_up_lsn'::pg_lsn <= pg_last_wal_replay_lsn()";
+$node_standby->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for standby to catch up";
+## calculate the next segment boundary
+my $result =
+  $node_standby->safe_psql('postgres', "SHOW wal_segment_size");
+die "unknown format of wal_segment_size: $result\n"
+  if ($result !~ /^([0-9]+)MB$/);
+my $segsize = $1 * 1024 * 1024;
+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($fileno, $off) = split('/', $result);
+my ($newoff, $newfileno) = (hex($off), hex($fileno));
+my $newsegnum = int($newoff / $segsize) + 1;
+## treat segment-overflow, avoiding over-32bit arithmetic, segsize is
+## a power of 2 larger than 1MB.
+if ($newsegnum * ($segsize >> 1) >= 0x8000)
+{
+	$newfileno += ($newsegnum * ($segsize >> 1)) >> 31;
+	$newsegnum %= (0x8000 / ($segsize >> 1));
+}
+my $target = sprintf("%X/%08X", $newfileno, $newsegnum * $segsize);
+## the standby stops just after the next segment boundary
+$node_standby->stop;
+$node_standby->append_conf('postgresql.conf', qq(
+recovery_target_inclusive=no
+recovery_target_lsn='$target'
+recovery_target_action='promote'
+));
+$node_standby->start;
+## do targetted promote
+$node_master->safe_psql('postgres', "CREATE TABLE t(); DROP TABLE t;");
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal(); CHECKPOINT;");
+$caughtup_query = "SELECT NOT pg_is_in_recovery()";
+$node_standby->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for standby to promote";
+pass("targetted promotion on segment bounary");
-- 
2.18.2



Re: Crash by targetted recovery

2020-03-09 Thread Fujii Masao




On 2020/03/09 15:46, Fujii Masao wrote:



On 2020/03/09 13:49, Kyotaro Horiguchi wrote:

At Sat, 7 Mar 2020 01:46:16 +0900, Fujii Masao  
wrote in

(It seems retroverting to the first patch when I started this...)
The second place covers wider cases so I reverted the first place.


Thanks for updating the patch that way.
Not sure which patch you're mentioning, though.


That meant 0003.


Regarding 0003 patch, I added a bit more detail comments into
the patch so that we can understand the code more easily.
Updated version of 0003 patch attached. Barring any objection,
at first, I plan to commit this patch.


Looks good to me. Thanks for writing the detailed comments.


Thanks for the review! Pushed.

I will review other two patches later.


Pushed the v5-0001-Tidy-up-XLogSource-usage.patch!

Regarding the remaining patch adding the regression test,

+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($seg, $off) = split('/', $result);
+my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * $segsize);

What happens if "off" part gets the upper limit and "seg" part needs
to be incremented? What happens if pg_last_wal_replay_lsn() advances
very much (e.g., because of autovacuum) beyond the segment boundary
until the standby restarts? Of course, these situations very rarely happen,
but I'd like to avoid adding such not stable test if possible.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Crash by targetted recovery

2020-03-09 Thread Kyotaro Horiguchi
At Mon, 9 Mar 2020 15:46:49 +0900, Fujii Masao  
wrote in 
> On 2020/03/09 13:49, Kyotaro Horiguchi wrote:
> > At Sat, 7 Mar 2020 01:46:16 +0900, Fujii Masao
> >> Regarding 0003 patch, I added a bit more detail comments into
> >> the patch so that we can understand the code more easily.
> >> Updated version of 0003 patch attached. Barring any objection,
> >> at first, I plan to commit this patch.
> > Looks good to me. Thanks for writing the detailed comments.
> 
> Thanks for the review! Pushed.

Thanks for commiting.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Crash by targetted recovery

2020-03-09 Thread Fujii Masao




On 2020/03/09 13:49, Kyotaro Horiguchi wrote:

At Sat, 7 Mar 2020 01:46:16 +0900, Fujii Masao  
wrote in

(It seems retroverting to the first patch when I started this...)
The second place covers wider cases so I reverted the first place.


Thanks for updating the patch that way.
Not sure which patch you're mentioning, though.


That meant 0003.


Regarding 0003 patch, I added a bit more detail comments into
the patch so that we can understand the code more easily.
Updated version of 0003 patch attached. Barring any objection,
at first, I plan to commit this patch.


Looks good to me. Thanks for writing the detailed comments.


Thanks for the review! Pushed.

I will review other two patches later.


There seems to be more other places where XLogSource and
XLOG_FROM_XXX are not used yet. For example, the initial values
of readSource and XLogReceiptSource, the type of argument
"source" in XLogFileReadAnyTLI() and XLogFileRead(), etc.
These also should be updated?


Right. I checked through the file and AFAICS that's all.  The attachec
v5-0001-Tidy...patch is the fix on top of the v4-0003 on the current
master.


Thanks for the patch!

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Crash by targetted recovery

2020-03-08 Thread Kyotaro Horiguchi
At Sat, 7 Mar 2020 01:46:16 +0900, Fujii Masao  
wrote in 
> > (It seems retroverting to the first patch when I started this...)
> > The second place covers wider cases so I reverted the first place.
> 
> Thanks for updating the patch that way.
> Not sure which patch you're mentioning, though.

That meant 0003.

> Regarding 0003 patch, I added a bit more detail comments into
> the patch so that we can understand the code more easily.
> Updated version of 0003 patch attached. Barring any objection,
> at first, I plan to commit this patch.

Looks good to me. Thanks for writing the detailed comments.

> >> + lastSourceFailed = false; /* We haven't failed on the new source */
> >>
> >> Is this really necessary? Since ReadRecord() always reset
> >> lastSourceFailed to false, it seems not necessary.
> > It's just to make sure.  Actually lastSourceFailed is always false
> > when we get there.  But when the source is switched, lastSourceFailed
> > should be changed to false as a matter of design.  I'd like to do that
> > unless that harms.
> 
> OK.

Thanks.

> >> -  else if (currentSource == 0)
> >> +  else if (currentSource == 0 ||
> >>
> >> Though this is a *separate topic*, 0 should be XLOG_FROM_ANY?
> >> There are some places where 0 is used as the value of currentSource.
> >> IMO they should be updated so that XLOG_FROM_ANY is used instead of 0.
> > Yeah, I've thought that many times but have neglected since it is not
> > critical and trivial as a separate patch.  I'd take the chance to do
> > that now.  Another minor glitch is "int oldSource = currentSource;" it
> > is not debugger-friendly so I changed it to XLogSource.  It is added
> > as a new patch file before the main patch.
> 
> There seems to be more other places where XLogSource and
> XLOG_FROM_XXX are not used yet. For example, the initial values
> of readSource and XLogReceiptSource, the type of argument
> "source" in XLogFileReadAnyTLI() and XLogFileRead(), etc.
> These also should be updated?

Right. I checked through the file and AFAICS that's all.  The attachec
v5-0001-Tidy...patch is the fix on top of the v4-0003 on the current
master.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 358f37899a02a8ae6f803fe32b97c2cbe302786f Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 6 Mar 2020 10:11:59 +0900
Subject: [PATCH v5] Tidy up XLogSource usage.

We used interger 0 instead of XLOG_FROM_ANY and defined a variable to
store the type with int. Tidy them up for readability and
debugger-friendliness.
---
 src/backend/access/transam/xlog.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 21c0acb740..9ebfbf31c5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -795,7 +795,7 @@ static int	readFile = -1;
 static XLogSegNo readSegNo = 0;
 static uint32 readOff = 0;
 static uint32 readLen = 0;
-static XLogSource readSource = 0;	/* XLOG_FROM_* code */
+static XLogSource readSource = XLOG_FROM_ANY;
 
 /*
  * Keeps track of which source we're currently reading from. This is
@@ -804,7 +804,7 @@ static XLogSource readSource = 0;	/* XLOG_FROM_* code */
  * attempt to read from currentSource failed, and we should try another source
  * next.
  */
-static XLogSource currentSource = 0;	/* XLOG_FROM_* code */
+static XLogSource currentSource = XLOG_FROM_ANY;
 static bool lastSourceFailed = false;
 
 typedef struct XLogPageReadPrivate
@@ -823,7 +823,7 @@ typedef struct XLogPageReadPrivate
  * XLogReceiptSource tracks where we last successfully read some WAL.)
  */
 static TimestampTz XLogReceiptTime = 0;
-static XLogSource XLogReceiptSource = 0;	/* XLOG_FROM_* code */
+static XLogSource XLogReceiptSource = XLOG_FROM_ANY;
 
 /* State information for XLOG reading */
 static XLogRecPtr ReadRecPtr;	/* start of last record read */
@@ -886,8 +886,8 @@ static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
    bool find_free, XLogSegNo max_segno,
    bool use_lock);
 static int	XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
-		 int source, bool notfoundOk);
-static int	XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source);
+		 XLogSource source, bool notfoundOk);
+static int	XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source);
 static int	XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
 		 int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
 static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
@@ -3633,7 +3633,7 @@ XLogFileOpen(XLogSegNo segno)
  */
 static int
 XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
-			 int source, bool notfoundOk)
+			 XLogSource source, bool notfoundOk)
 {
 	char		xlogfname[MAXFNAMELEN];
 	char		activitymsg[MAXFNAMELEN + 16];
@@ -3715,7 +3715,7 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
  * This version searches 

Re: Crash by targetted recovery

2020-03-06 Thread Fujii Masao



On 2020/03/06 10:29, Kyotaro Horiguchi wrote:

At Thu, 5 Mar 2020 19:51:11 +0900, Fujii Masao  
wrote in



On 2020/03/05 12:08, Kyotaro Horiguchi wrote:

I understand that the reconnection for REDO record is useless. Ok I
take the !StandbyMode way.
The attached is the test script that is changed to count the added
test, and the slight revised main patch.


Thanks for the patch!

+ /* Wal receiver should not active when entring XLOG_FROM_ARCHIVE */
+   Assert(!WalRcvStreaming());

+1 to add this assertion check.

Isn't it better to always check this while trying to read WAL from
archive or pg_wal? So, what about the following change?

 {
 case XLOG_FROM_ARCHIVE:
 case XLOG_FROM_PG_WAL:
+   /*
+ * WAL receiver should not be running while trying to
+* read WAL from archive or pg_wal.
+*/
+   Assert(!WalRcvStreaming());
+
 /* Close any old file we might have open. */
 if (readFile >= 0)


(It seems retroverting to the first patch when I started this...)
The second place covers wider cases so I reverted the first place.


Thanks for updating the patch that way.
Not sure which patch you're mentioning, though.

Regarding 0003 patch, I added a bit more detail comments into
the patch so that we can understand the code more easily.
Updated version of 0003 patch attached. Barring any objection,
at first, I plan to commit this patch.


+ lastSourceFailed = false; /* We haven't failed on the new source */

Is this really necessary? Since ReadRecord() always reset
lastSourceFailed to false, it seems not necessary.


It's just to make sure.  Actually lastSourceFailed is always false
when we get there.  But when the source is switched, lastSourceFailed
should be changed to false as a matter of design.  I'd like to do that
unless that harms.


OK.
 

-   else if (currentSource == 0)
+   else if (currentSource == 0 ||

Though this is a *separate topic*, 0 should be XLOG_FROM_ANY?
There are some places where 0 is used as the value of currentSource.
IMO they should be updated so that XLOG_FROM_ANY is used instead of 0.


Yeah, I've thought that many times but have neglected since it is not
critical and trivial as a separate patch.  I'd take the chance to do
that now.  Another minor glitch is "int oldSource = currentSource;" it
is not debugger-friendly so I changed it to XLogSource.  It is added
as a new patch file before the main patch.


There seems to be more other places where XLogSource and
XLOG_FROM_XXX are not used yet. For example, the initial values
of readSource and XLogReceiptSource, the type of argument
"source" in XLogFileReadAnyTLI() and XLogFileRead(), etc.
These also should be updated?

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 4361568882..b0e953f894 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7392,7 +7392,11 @@ StartupXLOG(void)
 * We are now done reading the xlog from stream. Turn off streaming
 * recovery to force fetching the files (which would be required at end 
of
 * recovery, e.g., timeline history file) from archive or pg_wal.
+*
+* Note that standby mode must be turned off after killing WAL receiver,
+* i.e., calling ShutdownWalRcv().
 */
+   Assert(!WalRcvStreaming());
StandbyMode = false;
 
/*
@@ -11855,12 +11859,23 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool 
randAccess,
 * values for "check trigger", "rescan timelines", and "sleep" states,
 * those actions are taken when reading from the previous source fails, 
as
 * part of advancing to the next state.
+*
+* If standby mode is turned off while reading WAL from stream, we move
+* to XLOG_FROM_ARCHIVE and reset lastSourceFailed, to force fetching
+* the files (which would be required at end of recovery, e.g., timeline
+* history file) from archive or pg_wal. We don't need to kill WAL 
receiver
+* here because it's already stopped when standby mode is turned off at
+* the end of recovery.
 *---
 */
if (!InArchiveRecovery)
currentSource = XLOG_FROM_PG_WAL;
-   else if (currentSource == 0)
+   else if (currentSource == 0 ||
+(!StandbyMode && currentSource == XLOG_FROM_STREAM))
+   {
+   lastSourceFailed = false;
currentSource = XLOG_FROM_ARCHIVE;
+   }
 
for (;;)
{
@@ -12054,6 +12069,12 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool 
randAccess,
{
  

Re: Crash by targetted recovery

2020-03-05 Thread Kyotaro Horiguchi
At Thu, 5 Mar 2020 19:51:11 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2020/03/05 12:08, Kyotaro Horiguchi wrote:
> > I understand that the reconnection for REDO record is useless. Ok I
> > take the !StandbyMode way.
> > The attached is the test script that is changed to count the added
> > test, and the slight revised main patch.
> 
> Thanks for the patch!
> 
> + /* Wal receiver should not active when entring XLOG_FROM_ARCHIVE */
> + Assert(!WalRcvStreaming());
> 
> +1 to add this assertion check.
> 
> Isn't it better to always check this while trying to read WAL from
> archive or pg_wal? So, what about the following change?
> 
> {
> case XLOG_FROM_ARCHIVE:
> case XLOG_FROM_PG_WAL:
> +   /*
> + * WAL receiver should not be running while trying to
> +* read WAL from archive or pg_wal.
> +*/
> +   Assert(!WalRcvStreaming());
> +
> /* Close any old file we might have open. */
> if (readFile >= 0)

(It seems retroverting to the first patch when I started this...)
The second place covers wider cases so I reverted the first place.

> + lastSourceFailed = false; /* We haven't failed on the new source */
> 
> Is this really necessary? Since ReadRecord() always reset
> lastSourceFailed to false, it seems not necessary.

It's just to make sure.  Actually lastSourceFailed is always false
when we get there.  But when the source is switched, lastSourceFailed
should be changed to false as a matter of design.  I'd like to do that
unless that harms.

> - else if (currentSource == 0)
> + else if (currentSource == 0 ||
> 
> Though this is a *separate topic*, 0 should be XLOG_FROM_ANY?
> There are some places where 0 is used as the value of currentSource.
> IMO they should be updated so that XLOG_FROM_ANY is used instead of 0.

Yeah, I've thought that many times but have neglected since it is not
critical and trivial as a separate patch.  I'd take the chance to do
that now.  Another minor glitch is "int oldSource = currentSource;" it
is not debugger-friendly so I changed it to XLogSource.  It is added
as a new patch file before the main patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 839edcba7e47156e77d5dc9ec7dfb9babc9ba846 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 26 Feb 2020 20:41:11 +0900
Subject: [PATCH v4 1/3] TAP test for a crash bug

---
 src/test/recovery/t/003_recovery_targets.pl | 34 -
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index fd14bab208..5b23ceb3c4 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 10;
 use Time::HiRes qw(usleep);
 
 # Create and test a standby from given backup, with a certain recovery target.
@@ -167,3 +167,35 @@ foreach my $i (0..100)
 $logfile = slurp_file($node_standby->logfile());
 ok($logfile =~ qr/FATAL:  recovery ended before configured recovery target was reached/,
 	'recovery end before target reached is a fatal error');
+
+# Corner case where targetted promotion happens on segment boundary
+$node_standby = get_new_node('standby_9');
+$node_standby->init_from_backup($node_master, 'my_backup',
+has_restoring => 1, has_streaming => 1);
+$node_standby->start;
+## read wal_segment_size
+my $result =
+  $node_standby->safe_psql('postgres', "SHOW wal_segment_size");
+die "unknown format of wal_segment_size: $result\n"
+  if ($result !~ /^([0-9]+)MB$/);
+my $segsize = $1 * 1024 * 1024;
+## stop just before the next segment boundary
+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($seg, $off) = split('/', $result);
+my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * $segsize);
+
+$node_standby->stop;
+$node_standby->append_conf('postgresql.conf', qq(
+recovery_target_inclusive=no
+recovery_target_lsn='$target'
+recovery_target_action='promote'
+));
+$node_standby->start;
+## do targetted promote
+$node_master->safe_psql('postgres', "CREATE TABLE t(); DROP TABLE t;");
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal(); CHECKPOINT;");
+my $caughtup_query = "SELECT NOT pg_is_in_recovery()";
+$node_standby->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for standby to promote";
+pass("targetted promotion on segment bounary");
-- 
2.18.2

>From 82274c41cb529d87270d4e2a78dbf06cd139b8f6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 6 Mar 2020 10:11:59 +0900
Subject: [PATCH v4 2/3] Tidy up XLogSource usage.

We used interger 0 instead of XLOG_FROM_ANY and 

Re: Crash by targetted recovery

2020-03-05 Thread Fujii Masao




On 2020/03/05 12:08, Kyotaro Horiguchi wrote:

At Mon, 2 Mar 2020 20:54:04 +0900, Fujii Masao  
wrote in

And random access during StandbyMode ususally (always?) lets RecPtr go
back.  I'm not sure WaitForWALToBecomeAvailable works correctly if we
don't have a file in pg_wal and the REDO point is far back by more
than a segment from the initial checkpoint record.


It works correctly. This is why WaitForWALToBecomeAvailable() uses
fetching_ckpt argument.


Hmm. You're right. We start streaming from RedoStartLSN when
fetching_ckpt.  So that doesn't happen.


If we go back to XLOG_FROM_ARCHIVE by random access, it correctly
re-connects to the primary for the past segment.


But this can lead to unnecessary restart of walreceiver. Since
fetching_ckpt ensures that the WAL file containing the REDO
starting record exists in pg_wal, there is no need to reconnect
to the primary when reading the REDO starting record.

Is there other case where we need to go back to XLOG_FROM_ARCHIVE
by random access?


I understand that the reconnection for REDO record is useless. Ok I
take the !StandbyMode way.

The attached is the test script that is changed to count the added
test, and the slight revised main patch.


Thanks for the patch!

+   /* Wal receiver should not active when entring 
XLOG_FROM_ARCHIVE */
+   Assert(!WalRcvStreaming());

+1 to add this assertion check.

Isn't it better to always check this while trying to read WAL from
archive or pg_wal? So, what about the following change?

{
case XLOG_FROM_ARCHIVE:
case XLOG_FROM_PG_WAL:
+   /*
+* WAL receiver should not be running while 
trying to
+* read WAL from archive or pg_wal.
+*/
+   Assert(!WalRcvStreaming());
+
/* Close any old file we might have open. */
if (readFile >= 0)


+   lastSourceFailed = false; /* We haven't failed on the new 
source */

Is this really necessary? Since ReadRecord() always reset
lastSourceFailed to false, it seems not necessary.


-   else if (currentSource == 0)
+   else if (currentSource == 0 ||

Though this is a *separate topic*, 0 should be XLOG_FROM_ANY?
There are some places where 0 is used as the value of currentSource.
IMO they should be updated so that XLOG_FROM_ANY is used instead of 0.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Crash by targetted recovery

2020-03-04 Thread Kyotaro Horiguchi
At Mon, 2 Mar 2020 20:54:04 +0900, Fujii Masao  
wrote in 
> > And random access during StandbyMode ususally (always?) lets RecPtr go
> > back.  I'm not sure WaitForWALToBecomeAvailable works correctly if we
> > don't have a file in pg_wal and the REDO point is far back by more
> > than a segment from the initial checkpoint record.
> 
> It works correctly. This is why WaitForWALToBecomeAvailable() uses
> fetching_ckpt argument.

Hmm. You're right. We start streaming from RedoStartLSN when
fetching_ckpt.  So that doesn't happen.

> > If we go back to XLOG_FROM_ARCHIVE by random access, it correctly
> > re-connects to the primary for the past segment.
> 
> But this can lead to unnecessary restart of walreceiver. Since
> fetching_ckpt ensures that the WAL file containing the REDO
> starting record exists in pg_wal, there is no need to reconnect
> to the primary when reading the REDO starting record.
> 
> Is there other case where we need to go back to XLOG_FROM_ARCHIVE
> by random access?

I understand that the reconnection for REDO record is useless. Ok I
take the !StandbyMode way.

The attached is the test script that is changed to count the added
test, and the slight revised main patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 5165a55c11aff5f18a8d39029b9291070a908744 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 26 Feb 2020 20:41:11 +0900
Subject: [PATCH v3 1/2] TAP test for a crash bug

---
 src/test/recovery/t/003_recovery_targets.pl | 34 -
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index fd14bab208..5b23ceb3c4 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 10;
 use Time::HiRes qw(usleep);
 
 # Create and test a standby from given backup, with a certain recovery target.
@@ -167,3 +167,35 @@ foreach my $i (0..100)
 $logfile = slurp_file($node_standby->logfile());
 ok($logfile =~ qr/FATAL:  recovery ended before configured recovery target was reached/,
 	'recovery end before target reached is a fatal error');
+
+# Corner case where targetted promotion happens on segment boundary
+$node_standby = get_new_node('standby_9');
+$node_standby->init_from_backup($node_master, 'my_backup',
+has_restoring => 1, has_streaming => 1);
+$node_standby->start;
+## read wal_segment_size
+my $result =
+  $node_standby->safe_psql('postgres', "SHOW wal_segment_size");
+die "unknown format of wal_segment_size: $result\n"
+  if ($result !~ /^([0-9]+)MB$/);
+my $segsize = $1 * 1024 * 1024;
+## stop just before the next segment boundary
+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($seg, $off) = split('/', $result);
+my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * $segsize);
+
+$node_standby->stop;
+$node_standby->append_conf('postgresql.conf', qq(
+recovery_target_inclusive=no
+recovery_target_lsn='$target'
+recovery_target_action='promote'
+));
+$node_standby->start;
+## do targetted promote
+$node_master->safe_psql('postgres', "CREATE TABLE t(); DROP TABLE t;");
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal(); CHECKPOINT;");
+my $caughtup_query = "SELECT NOT pg_is_in_recovery()";
+$node_standby->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for standby to promote";
+pass("targetted promotion on segment bounary");
-- 
2.18.2

>From 246a7894684434dbfcba8c11545458f1633e76b5 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 5 Mar 2020 11:54:07 +0900
Subject: [PATCH v3 2/2] Fix a crash bug of targetted promotion.

After recovery target is reached, StartupXLOG turns off standby mode
then refetches the last record. If the last record starts from the
previous segment at the time, WaitForWALToBecomeAvailable crashes with
assertion failure.  WaitForWALToBecomeAvailable should move back to
XLOG_FROM_ARCHIVE if standby mode is turned off while
XLOG_FROM_STREAM.
---
 src/backend/access/transam/xlog.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d19408b3be..50ad5079b6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11831,7 +11831,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	 * 1. Read from either archive or pg_wal (XLOG_FROM_ARCHIVE), or just
 	 *	  pg_wal (XLOG_FROM_PG_WAL)
 	 * 2. Check trigger file
-	 * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM)
+	 * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM).
+	 *If StandbyMode is turned off, the state machine goes back to 1.
 	 * 4. Rescan timelines
 	 * 5. Sleep wal_retrieve_retry_interval 

Re: Crash by targetted recovery

2020-03-02 Thread Fujii Masao




On 2020/02/28 12:13, Kyotaro Horiguchi wrote:

At Thu, 27 Feb 2020 20:04:41 +0900, Fujii Masao  
wrote in



On 2020/02/27 17:05, Kyotaro Horiguchi wrote:

Thank you for the comment.
At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao
 wrote in

On 2020/02/27 15:23, Kyotaro Horiguchi wrote:

I failed to understand why random access while reading from
stream is bad idea. Could you elaborate why?

It seems to me the word "streaming" suggests that WAL record should be
read sequentially. Random access, which means reading from arbitrary
location, breaks a stream.  (But the patch doesn't try to stop wal
sender if randAccess.)


Isn't it sufficient to set currentSource to 0 when disabling
StandbyMode?

I thought that and it should work, but I hesitated to manipulate on
currentSource in StartupXLOG. currentSource is basically a private
state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
think it's not good to modify it out of the the logic in
WaitForWALToBecomeAvailable.


If so, what about adding the following at the top of
WaitForWALToBecomeAvailable()?

  if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
   currentSource = 0;

It works virtually the same way. I'm happy to do that if you don't
agree to using randAccess. But I'd rather do that in the 'if
(!InArchiveRecovery)' section.


The approach using randAccess seems unsafe. Please imagine
the case where currentSource is changed to XLOG_FROM_ARCHIVE
because randAccess is true, while walreceiver is still running.
For example, this case can occur when the record at REDO
starting point is fetched with randAccess = true after walreceiver
is invoked to fetch the last checkpoint record. The situation
"currentSource != XLOG_FROM_STREAM while walreceiver is
  running" seems invalid. No?


When I mentioned an possibility of changing ReadRecord so that it
modifies randAccess instead of currentSource, I thought that
WaitForWALToBecomeAvailable should shutdown wal receiver as
needed.

At Thu, 27 Feb 2020 15:23:07 +0900 (JST), Kyotaro Horiguchi 
 wrote in
me> location, breaks a stream.  (But the patch doesn't try to stop wal
me> sender if randAccess.)


Sorry, I failed to notice that.


And random access during StandbyMode ususally (always?) lets RecPtr go
back.  I'm not sure WaitForWALToBecomeAvailable works correctly if we
don't have a file in pg_wal and the REDO point is far back by more
than a segment from the initial checkpoint record.


It works correctly. This is why WaitForWALToBecomeAvailable() uses
fetching_ckpt argument.


If we go back to XLOG_FROM_ARCHIVE by random access, it correctly
re-connects to the primary for the past segment.


But this can lead to unnecessary restart of walreceiver. Since
fetching_ckpt ensures that the WAL file containing the REDO
starting record exists in pg_wal, there is no need to reconnect
to the primary when reading the REDO starting record.

Is there other case where we need to go back to XLOG_FROM_ARCHIVE
by random access?

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Crash by targetted recovery

2020-02-27 Thread Kyotaro Horiguchi
At Thu, 27 Feb 2020 20:04:41 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2020/02/27 17:05, Kyotaro Horiguchi wrote:
> > Thank you for the comment.
> > At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao
> >  wrote in
> >> On 2020/02/27 15:23, Kyotaro Horiguchi wrote:
>  I failed to understand why random access while reading from
>  stream is bad idea. Could you elaborate why?
> >>> It seems to me the word "streaming" suggests that WAL record should be
> >>> read sequentially. Random access, which means reading from arbitrary
> >>> location, breaks a stream.  (But the patch doesn't try to stop wal
> >>> sender if randAccess.)
> >>>
>  Isn't it sufficient to set currentSource to 0 when disabling
>  StandbyMode?
> >>> I thought that and it should work, but I hesitated to manipulate on
> >>> currentSource in StartupXLOG. currentSource is basically a private
> >>> state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
> >>> think it's not good to modify it out of the the logic in
> >>> WaitForWALToBecomeAvailable.
> >>
> >> If so, what about adding the following at the top of
> >> WaitForWALToBecomeAvailable()?
> >>
> >>  if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
> >>   currentSource = 0;
> > It works virtually the same way. I'm happy to do that if you don't
> > agree to using randAccess. But I'd rather do that in the 'if
> > (!InArchiveRecovery)' section.
> 
> The approach using randAccess seems unsafe. Please imagine
> the case where currentSource is changed to XLOG_FROM_ARCHIVE
> because randAccess is true, while walreceiver is still running.
> For example, this case can occur when the record at REDO
> starting point is fetched with randAccess = true after walreceiver
> is invoked to fetch the last checkpoint record. The situation
> "currentSource != XLOG_FROM_STREAM while walreceiver is
>  running" seems invalid. No?

When I mentioned an possibility of changing ReadRecord so that it
modifies randAccess instead of currentSource, I thought that
WaitForWALToBecomeAvailable should shutdown wal receiver as
needed.

At Thu, 27 Feb 2020 15:23:07 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
me> location, breaks a stream.  (But the patch doesn't try to stop wal
me> sender if randAccess.)

And random access during StandbyMode ususally (always?) lets RecPtr go
back.  I'm not sure WaitForWALToBecomeAvailable works correctly if we
don't have a file in pg_wal and the REDO point is far back by more
than a segment from the initial checkpoint record.  (It seems to cause
assertion failure, but I haven't checked that.)

If we go back to XLOG_FROM_ARCHIVE by random access, it correctly
re-connects to the primary for the past segment.

> So I think that the approach that I proposed is better.

It depends on how far we assume RecPtr go back.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Crash by targetted recovery

2020-02-27 Thread Fujii Masao




On 2020/02/27 17:05, Kyotaro Horiguchi wrote:

Thank you for the comment.
At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao  
wrote in

On 2020/02/27 15:23, Kyotaro Horiguchi wrote:

I failed to understand why random access while reading from
stream is bad idea. Could you elaborate why?

It seems to me the word "streaming" suggests that WAL record should be
read sequentially. Random access, which means reading from arbitrary
location, breaks a stream.  (But the patch doesn't try to stop wal
sender if randAccess.)


Isn't it sufficient to set currentSource to 0 when disabling
StandbyMode?

I thought that and it should work, but I hesitated to manipulate on
currentSource in StartupXLOG. currentSource is basically a private
state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
think it's not good to modify it out of the the logic in
WaitForWALToBecomeAvailable.


If so, what about adding the following at the top of
WaitForWALToBecomeAvailable()?

 if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
  currentSource = 0;


It works virtually the same way. I'm happy to do that if you don't
agree to using randAccess. But I'd rather do that in the 'if
(!InArchiveRecovery)' section.


The approach using randAccess seems unsafe. Please imagine
the case where currentSource is changed to XLOG_FROM_ARCHIVE
because randAccess is true, while walreceiver is still running.
For example, this case can occur when the record at REDO
starting point is fetched with randAccess = true after walreceiver
is invoked to fetch the last checkpoint record. The situation
"currentSource != XLOG_FROM_STREAM while walreceiver is
 running" seems invalid. No?

So I think that the approach that I proposed is better.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Crash by targetted recovery

2020-02-27 Thread Kyotaro Horiguchi
Thank you for the comment.
At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao  
wrote in 
> On 2020/02/27 15:23, Kyotaro Horiguchi wrote:
> >> I failed to understand why random access while reading from
> >> stream is bad idea. Could you elaborate why?
> > It seems to me the word "streaming" suggests that WAL record should be
> > read sequentially. Random access, which means reading from arbitrary
> > location, breaks a stream.  (But the patch doesn't try to stop wal
> > sender if randAccess.)
> > 
> >> Isn't it sufficient to set currentSource to 0 when disabling
> >> StandbyMode?
> > I thought that and it should work, but I hesitated to manipulate on
> > currentSource in StartupXLOG. currentSource is basically a private
> > state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
> > think it's not good to modify it out of the the logic in
> > WaitForWALToBecomeAvailable.
> 
> If so, what about adding the following at the top of
> WaitForWALToBecomeAvailable()?
> 
> if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
>  currentSource = 0;

It works virtually the same way. I'm happy to do that if you don't
agree to using randAccess. But I'd rather do that in the 'if
(!InArchiveRecovery)' section.

> > Come to think of that I got to think the
> > following part in ReadRecord should use randAccess instead..
> > xlog.c:4384
> >>  /*
> > -  * Before we retry, reset lastSourceFailed and currentSource
> > -  * so that we will check the archive next.
> > +  * Streaming has broken, we retry from the same LSN.
> >>   */
> >>  lastSourceFailed = false;
> > - currentSource = 0;
> > + private->randAccess = true;
> 
> Sorry, I failed to understand why this change is necessary...

It's not necessary, just for being tidy about the responsibility on
currentSource.

> At least the comment that you added seems incorrect
> because WAL streaming should not have started yet when
> we reach the above point.

Oops, right.

-  * Streaming has broken, we retry from the same LSN.
+  * Restart recovery from the current LSN.

For clarity, I don't insist on the change at all. If it were
necessary, it's another topic, anyway. Please forget it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 8cb817a43226a0d60dd62c6205219a2f0c807b9e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 26 Feb 2020 20:41:11 +0900
Subject: [PATCH v2 1/2] TAP test for a crash bug

---
 src/test/recovery/t/003_recovery_targets.pl | 32 +
 1 file changed, 32 insertions(+)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index fd14bab208..8e71788981 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -167,3 +167,35 @@ foreach my $i (0..100)
 $logfile = slurp_file($node_standby->logfile());
 ok($logfile =~ qr/FATAL:  recovery ended before configured recovery target was reached/,
 	'recovery end before target reached is a fatal error');
+
+
+# Edge case where targetted promotion happens on segment boundary
+$node_standby = get_new_node('standby_9');
+$node_standby->init_from_backup($node_master, 'my_backup',
+has_restoring => 1, has_streaming => 1);
+$node_standby->start;
+## read wal_segment_size
+my $result =
+  $node_standby->safe_psql('postgres', "SHOW wal_segment_size");
+die "unknown format of wal_segment_size: $result\n"
+  if ($result !~ /^([0-9]+)MB$/);
+my $segsize = $1 * 1024 * 1024;
+## stop just before the next segment boundary
+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($seg, $off) = split('/', $result);
+my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * $segsize);
+
+$node_standby->stop;
+$node_standby->append_conf('postgresql.conf', qq(
+recovery_target_inclusive=no
+recovery_target_lsn='$target'
+recovery_target_action='promote'
+));
+$node_standby->start;
+## do targetted promote
+$node_master->safe_psql('postgres', "CREATE TABLE t(); DROP TABLE t;");
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal(); CHECKPOINT;");
+my $caughtup_query = "SELECT NOT pg_is_in_recovery()";
+$node_standby->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for standby to promote";
-- 
2.18.2

>From d917638b787b9d1393cbc0d77586168da844756b Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 26 Feb 2020 20:41:37 +0900
Subject: [PATCH v2 2/2] Fix a crash bug of targetted promotion.

After recovery target is reached, StartupXLOG turns off standby mode
then refetches the last record. If the last record starts from the
previous segment at the time, WaitForWALToBecomeAvailable crashes with
assertion failure.  WaitForWALToBecomeAvailable should move back to
XLOG_FROM_ARCHIVE if standby mode is turned off while
XLOG_FROM_STREAM.
---
 src/backend/access/transam/xlog.c | 13 +++--
 1 file changed, 11 insertions(+), 2 

Re: Crash by targetted recovery

2020-02-26 Thread Fujii Masao




On 2020/02/27 15:23, Kyotaro Horiguchi wrote:

At Thu, 27 Feb 2020 14:40:55 +0900, Fujii Masao  
wrote in



On 2020/02/27 12:48, Kyotaro Horiguchi wrote:

Hello.
We found that targetted promotion can cause an assertion failure.  The
attached TAP test causes that.


TRAP: FailedAssertion("StandbyMode", File: "xlog.c", Line: 12078)

After recovery target is reached, StartupXLOG turns off standby mode
then refetches the last record. If the last record starts from the
previous WAL segment, the assertion failure is triggered.


Good catch!


The wrong point is that StartupXLOG does random access fetching while
WaitForWALToBecomeAvailable is thinking it is still in streaming.  I
think if it is called with random access mode,
WaitForWALToBecomeAvailable should move to XLOG_FROM_ARCHIVE even
though it is thinking that it is still reading from stream.


I failed to understand why random access while reading from
stream is bad idea. Could you elaborate why?


It seems to me the word "streaming" suggests that WAL record should be
read sequentially. Random access, which means reading from arbitrary
location, breaks a stream.  (But the patch doesn't try to stop wal
sender if randAccess.)


Isn't it sufficient to set currentSource to 0 when disabling
StandbyMode?


I thought that and it should work, but I hesitated to manipulate on
currentSource in StartupXLOG. currentSource is basically a private
state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
think it's not good to modify it out of the the logic in
WaitForWALToBecomeAvailable.


If so, what about adding the following at the top of
WaitForWALToBecomeAvailable()?

if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
 currentSource = 0;


Come to think of that I got to think the
following part in ReadRecord should use randAccess instead..

xlog.c:4384

 /*

-  * Before we retry, reset lastSourceFailed and currentSource
-  * so that we will check the archive next.
+  * Streaming has broken, we retry from the same LSN.

  */
 lastSourceFailed = false;

- currentSource = 0;
+ private->randAccess = true;


Sorry, I failed to understand why this change is necessary...
At least the comment that you added seems incorrect
because WAL streaming should not have started yet when
we reach the above point.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Crash by targetted recovery

2020-02-26 Thread Kyotaro Horiguchi
At Thu, 27 Feb 2020 14:40:55 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2020/02/27 12:48, Kyotaro Horiguchi wrote:
> > Hello.
> > We found that targetted promotion can cause an assertion failure.  The
> > attached TAP test causes that.
> > 
> >> TRAP: FailedAssertion("StandbyMode", File: "xlog.c", Line: 12078)
> > After recovery target is reached, StartupXLOG turns off standby mode
> > then refetches the last record. If the last record starts from the
> > previous WAL segment, the assertion failure is triggered.
> 
> Good catch!
> 
> > The wrong point is that StartupXLOG does random access fetching while
> > WaitForWALToBecomeAvailable is thinking it is still in streaming.  I
> > think if it is called with random access mode,
> > WaitForWALToBecomeAvailable should move to XLOG_FROM_ARCHIVE even
> > though it is thinking that it is still reading from stream.
> 
> I failed to understand why random access while reading from
> stream is bad idea. Could you elaborate why?

It seems to me the word "streaming" suggests that WAL record should be
read sequentially. Random access, which means reading from arbitrary
location, breaks a stream.  (But the patch doesn't try to stop wal
sender if randAccess.)

> Isn't it sufficient to set currentSource to 0 when disabling
> StandbyMode?

I thought that and it should work, but I hesitated to manipulate on
currentSource in StartupXLOG. currentSource is basically a private
state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
think it's not good to modify it out of the the logic in
WaitForWALToBecomeAvailable.  Come to think of that I got to think the
following part in ReadRecord should use randAccess instead..

xlog.c:4384
> /*
-  * Before we retry, reset lastSourceFailed and currentSource
-  * so that we will check the archive next.
+  * Streaming has broken, we retry from the same LSN.
>  */
> lastSourceFailed = false;
- currentSource = 0;
+ private->randAccess = true;

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Crash by targetted recovery

2020-02-26 Thread Fujii Masao




On 2020/02/27 12:48, Kyotaro Horiguchi wrote:

Hello.

We found that targetted promotion can cause an assertion failure.  The
attached TAP test causes that.


TRAP: FailedAssertion("StandbyMode", File: "xlog.c", Line: 12078)


After recovery target is reached, StartupXLOG turns off standby mode
then refetches the last record. If the last record starts from the
previous WAL segment, the assertion failure is triggered.


Good catch!


The wrong point is that StartupXLOG does random access fetching while
WaitForWALToBecomeAvailable is thinking it is still in streaming.  I
think if it is called with random access mode,
WaitForWALToBecomeAvailable should move to XLOG_FROM_ARCHIVE even
though it is thinking that it is still reading from stream.


I failed to understand why random access while reading from
stream is bad idea. Could you elaborate why?

Isn't it sufficient to set currentSource to 0 when disabling
StandbyMode?

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Crash by targetted recovery

2020-02-26 Thread Kyotaro Horiguchi
Hello.

We found that targetted promotion can cause an assertion failure.  The
attached TAP test causes that.

> TRAP: FailedAssertion("StandbyMode", File: "xlog.c", Line: 12078)

After recovery target is reached, StartupXLOG turns off standby mode
then refetches the last record. If the last record starts from the
previous WAL segment, the assertion failure is triggered.

The wrong point is that StartupXLOG does random access fetching while
WaitForWALToBecomeAvailable is thinking it is still in streaming.  I
think if it is called with random access mode,
WaitForWALToBecomeAvailable should move to XLOG_FROM_ARCHIVE even
though it is thinking that it is still reading from stream.

regards.

-- Kyotaro Horiguchi
NTT Open Source Software Center
>From 8cb817a43226a0d60dd62c6205219a2f0c807b9e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 26 Feb 2020 20:41:11 +0900
Subject: [PATCH 1/2] TAP test for a crash bug

---
 src/test/recovery/t/003_recovery_targets.pl | 32 +
 1 file changed, 32 insertions(+)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index fd14bab208..8e71788981 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -167,3 +167,35 @@ foreach my $i (0..100)
 $logfile = slurp_file($node_standby->logfile());
 ok($logfile =~ qr/FATAL:  recovery ended before configured recovery target was reached/,
 	'recovery end before target reached is a fatal error');
+
+
+# Edge case where targetted promotion happens on segment boundary
+$node_standby = get_new_node('standby_9');
+$node_standby->init_from_backup($node_master, 'my_backup',
+has_restoring => 1, has_streaming => 1);
+$node_standby->start;
+## read wal_segment_size
+my $result =
+  $node_standby->safe_psql('postgres', "SHOW wal_segment_size");
+die "unknown format of wal_segment_size: $result\n"
+  if ($result !~ /^([0-9]+)MB$/);
+my $segsize = $1 * 1024 * 1024;
+## stop just before the next segment boundary
+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($seg, $off) = split('/', $result);
+my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * $segsize);
+
+$node_standby->stop;
+$node_standby->append_conf('postgresql.conf', qq(
+recovery_target_inclusive=no
+recovery_target_lsn='$target'
+recovery_target_action='promote'
+));
+$node_standby->start;
+## do targetted promote
+$node_master->safe_psql('postgres', "CREATE TABLE t(); DROP TABLE t;");
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal(); CHECKPOINT;");
+my $caughtup_query = "SELECT NOT pg_is_in_recovery()";
+$node_standby->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for standby to promote";
-- 
2.18.2

>From 424729eafeeab8c96074f4c8d5b87d3a23cbf73a Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 26 Feb 2020 20:41:37 +0900
Subject: [PATCH 2/2] Fix a crash bug of targetted promotion.

After recovery target is reached, StartupXLOG turns off standby mode
then refetches the last record. If the last record starts from the
previous segment at the time, WaitForWALToBecomeAvailable crashes with
assertion failure.  WaitForWALToBecomeAvailable should move back to
XLOG_FROM_ARCHIVE if random access is specified while streaming.
---
 src/backend/access/transam/xlog.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d19408b3be..41ed916342 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11831,7 +11831,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	 * 1. Read from either archive or pg_wal (XLOG_FROM_ARCHIVE), or just
 	 *	  pg_wal (XLOG_FROM_PG_WAL)
 	 * 2. Check trigger file
-	 * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM)
+	 * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM).
+	 *Random access mode rewinds the state machine to 1.
 	 * 4. Rescan timelines
 	 * 5. Sleep wal_retrieve_retry_interval milliseconds, and loop back to 1.
 	 *
@@ -11846,7 +11847,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	 */
 	if (!InArchiveRecovery)
 		currentSource = XLOG_FROM_PG_WAL;
-	else if (currentSource == 0)
+	else if (currentSource == 0 || randAccess)
 		currentSource = XLOG_FROM_ARCHIVE;
 
 	for (;;)
-- 
2.18.2