At Wed, 17 Jan 2024 14:32:00 +0900, Michael Paquier <mich...@paquier.xyz> wrote 
in 
> On Tue, Jan 16, 2024 at 02:46:02PM +0300, Aleksander Alekseev wrote:
> >> For now, let me explain the basis for this patch. The fundamental
> >> issue is that these warnings that always appear are, in practice, not
> >> a problem in almost all cases. Some of those who encounter them for
> >> the first time may feel uneasy and reach out with inquiries. On the
> >> other hand, those familiar with these warnings tend to ignore them and
> >> only pay attention to details when actual issues arise. Therefore, the
> >> intention of this patch is to label them as "no issue" unless a
> >> problem is blatantly evident, in order to prevent unnecessary concern.
> > 
> > I agree and don't mind affecting the error message per se.
> > 
> > However I see that the actual logic of how WAL is processed is being
> > changed. If we do this, at very least it requires thorough thinking. I
> > strongly suspect that the proposed code is wrong and/or not safe
> > and/or less safe than it is now for the reasons named above.
> 
> FWIW, that pretty much sums up my feeling regarding this patch,
> because an error, basically any error, would hurt back very badly.
> Sure, the error messages we generate now when reaching the end of WAL
> can sound scary, and they are (I suspect that's not really the case
> for anybody who has history doing support with PostgreSQL because a
> bunch of these messages are old enough to vote, but I can understand
> that anybody would freak out the first time they see that).
> 
> However, per the recent issues we've had in this area, like
> cd7f19da3468 but I'm more thinking about 6b18b3fe2c2f and
> bae868caf222, I am of the opinion that the header validation, the
> empty page case in XLogReaderValidatePageHeader() and the record read
> changes are risky enough that I am not convinced that the gains are
> worth the risks taken.
> 
> The error stack in the WAL reader is complicated enough that making it
> more complicated as the patch proposes does not sound like not a good
> tradeoff to me to make the reports related to the end of WAL cleaner
> for the end-user.  I agree that we should do something, but the patch
> does not seem like a good step towards this goal.  Perhaps somebody
> would be more excited about this proposal than I am, of course.

Thank you both for the comments. The criticism seems valid. The
approach to identifying the end-of-WAL state in this patch is quite
heuristic, and its validity or safety can certainly be contested. On
the other hand, if we seek perfection in this area of judgment, we may
need to have the WAL format itself more robust. In any case, since the
majority of the feedback on this patch seems to be negative, I am
going to withdraw it if no supportive opinions emerge during this
commit-fest.

The attached patch addresses the errors reported by CF-bot.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 933d10fa6c7b71e4684f5ba38e85177afaa56f58 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota....@gmail.com>
Date: Tue, 7 Mar 2023 14:55:58 +0900
Subject: [PATCH v29] Make End-Of-Recovery error less scary

When recovery in any type ends, we see a bit scary error message like
"invalid record length" that suggests something serious is
happening. Actually if recovery meets a record with length = 0, that
usually means it finished applying all available WAL records.

Make this message less scary as "reached end of WAL". Instead, raise
the error level for other kind of WAL failure to WARNING.  To make
sure that the detection is correct, this patch checks if all trailing
bytes in the same page are zeroed in that case.
---
 src/backend/access/transam/xlogreader.c   | 147 ++++++++++++++++++----
 src/backend/access/transam/xlogrecovery.c |  96 ++++++++++----
 src/backend/replication/walreceiver.c     |   7 +-
 src/bin/pg_waldump/pg_waldump.c           |  13 +-
 src/bin/pg_waldump/t/001_basic.pl         |   5 +-
 src/include/access/xlogreader.h           |   1 +
 src/test/recovery/t/035_recovery.pl       | 130 +++++++++++++++++++
 src/test/recovery/t/039_end_of_wal.pl     |  47 +++++--
 8 files changed, 383 insertions(+), 63 deletions(-)
 create mode 100644 src/test/recovery/t/035_recovery.pl

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 7190156f2f..94861969eb 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -48,6 +48,8 @@ static int	ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr,
 							 int reqLen);
 static void XLogReaderInvalReadState(XLogReaderState *state);
 static XLogPageReadResult XLogDecodeNextRecord(XLogReaderState *state, bool nonblocking);
+static bool ValidXLogRecordLength(XLogReaderState *state, XLogRecPtr RecPtr,
+								  XLogRecord *record);
 static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
 								  XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess);
 static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record,
@@ -149,6 +151,7 @@ XLogReaderAllocate(int wal_segment_size, const char *waldir,
 		pfree(state);
 		return NULL;
 	}
+	state->EndOfWAL = false;
 	state->errormsg_buf[0] = '\0';
 
 	/*
@@ -553,6 +556,7 @@ XLogDecodeNextRecord(XLogReaderState *state, bool nonblocking)
 	/* reset error state */
 	state->errormsg_buf[0] = '\0';
 	decoded = NULL;
+	state->EndOfWAL = false;
 
 	state->abortedRecPtr = InvalidXLogRecPtr;
 	state->missingContrecPtr = InvalidXLogRecPtr;
@@ -636,16 +640,12 @@ restart:
 	Assert(pageHeaderSize <= readOff);
 
 	/*
-	 * Read the record length.
+	 * Verify the record header.
 	 *
 	 * NB: Even though we use an XLogRecord pointer here, the whole record
-	 * header might not fit on this page. xl_tot_len is the first field of the
-	 * struct, so it must be on this page (the records are MAXALIGNed), but we
-	 * cannot access any other fields until we've verified that we got the
-	 * whole header.
+	 * header might not fit on this page.
 	 */
 	record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ);
-	total_len = record->xl_tot_len;
 
 	/*
 	 * If the whole record header is on this page, validate it immediately.
@@ -664,19 +664,19 @@ restart:
 	}
 	else
 	{
-		/* There may be no next page if it's too small. */
-		if (total_len < SizeOfXLogRecord)
-		{
-			report_invalid_record(state,
-								  "invalid record length at %X/%X: expected at least %u, got %u",
-								  LSN_FORMAT_ARGS(RecPtr),
-								  (uint32) SizeOfXLogRecord, total_len);
+		/*
+		 * xl_tot_len is the first field of the struct, so it must be on this
+		 * page (the records are MAXALIGNed), but we cannot access any other
+		 * fields until we've verified that we got the whole header.
+		 */
+		if (!ValidXLogRecordLength(state, RecPtr, record))
 			goto err;
-		}
-		/* We'll validate the header once we have the next page. */
+
 		gotheader = false;
 	}
 
+	total_len = record->xl_tot_len;
+
 	/*
 	 * Try to find space to decode this record, if we can do so without
 	 * calling palloc.  If we can't, we'll try again below after we've
@@ -1119,25 +1119,80 @@ XLogReaderInvalReadState(XLogReaderState *state)
 	state->readLen = 0;
 }
 
+/*
+ * Validate record length of an XLOG record header.
+ *
+ * This is substantially a part of ValidXLogRecordHeader.  But XLogReadRecord
+ * needs this separate from the function in case of a partial record header.
+ *
+ * Returns true if the xl_tot_len header field has a seemingly valid value,
+ * which means the caller can proceed reading to the following part of the
+ * record.
+ */
+static bool
+ValidXLogRecordLength(XLogReaderState *state, XLogRecPtr RecPtr,
+					  XLogRecord *record)
+{
+	if (record->xl_tot_len == 0)
+	{
+		char	   *p;
+		char	   *pe;
+
+		/*
+		 * We are almost sure reaching the end of WAL, make sure that the
+		 * whole page after the record is filled with zeroes.
+		 */
+		p = (char *) record;
+		pe = p + XLOG_BLCKSZ - (RecPtr & (XLOG_BLCKSZ - 1));
+
+		while (p < pe && *p == 0)
+			p++;
+
+		if (p == pe)
+		{
+			/*
+			 * Consider it as end-of-WAL if all subsequent bytes of this page
+			 * are zero. We don't bother checking the subsequent pages since
+			 * they are not zeroed in the case of recycled segments.
+			 */
+			report_invalid_record(state, "empty record at %X/%X",
+								  LSN_FORMAT_ARGS(RecPtr));
+
+			/* notify end-of-wal to callers */
+			state->EndOfWAL = true;
+			return false;
+		}
+	}
+
+	if (record->xl_tot_len < SizeOfXLogRecord)
+	{
+		report_invalid_record(state,
+							  "invalid record length at %X/%X: expected at least %u, got %u",
+							  LSN_FORMAT_ARGS(RecPtr),
+							  (uint32) SizeOfXLogRecord, record->xl_tot_len);
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * Validate an XLOG record header.
  *
- * This is just a convenience subroutine to avoid duplicated code in
+ * This is just a convenience subroutine to avoid duplicate code in
  * XLogReadRecord.  It's not intended for use from anywhere else.
+ *
+ * Returns true if the header fields have the valid values and the caller can
+ * proceed reading to the following part of the record.
  */
 static bool
 ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
 					  XLogRecPtr PrevRecPtr, XLogRecord *record,
 					  bool randAccess)
 {
-	if (record->xl_tot_len < SizeOfXLogRecord)
-	{
-		report_invalid_record(state,
-							  "invalid record length at %X/%X: expected at least %u, got %u",
-							  LSN_FORMAT_ARGS(RecPtr),
-							  (uint32) SizeOfXLogRecord, record->xl_tot_len);
+	if (!ValidXLogRecordLength(state, RecPtr, record))
 		return false;
-	}
+
 	if (!RmgrIdIsValid(record->xl_rmid))
 	{
 		report_invalid_record(state,
@@ -1235,6 +1290,44 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
 	XLByteToSeg(recptr, segno, state->segcxt.ws_segsize);
 	offset = XLogSegmentOffset(recptr, state->segcxt.ws_segsize);
 
+	StaticAssertStmt(XLOG_PAGE_MAGIC != 0, "XLOG_PAGE_MAGIC is zero");
+
+	if (hdr->xlp_magic == 0)
+	{
+		/* Regard an empty page as End-Of-WAL */
+		int			i;
+
+		for (i = 0; i < XLOG_BLCKSZ && phdr[i] == 0; i++);
+		if (i == XLOG_BLCKSZ)
+		{
+			char		fname[MAXFNAMELEN];
+
+			XLogFileName(fname, state->seg.ws_tli, segno,
+						 state->segcxt.ws_segsize);
+
+			/*
+			 * Consider an empty page as end-of-WAL only when reading the first
+			 * part of a record.
+			 */
+			if (state->currRecPtr / XLOG_BLCKSZ == recptr / XLOG_BLCKSZ)
+			{
+				report_invalid_record(state,
+									  "empty page in WAL segment %s, offset %u",
+									  fname, offset);
+				state->EndOfWAL = true;
+			}
+			else
+				report_invalid_record(state,
+									  "empty page in WAL segment %s, offset %u while reading continuation record at %X/%X",
+									  fname, offset,
+									  LSN_FORMAT_ARGS(state->currRecPtr));
+				
+			return false;
+		}
+
+		/* The same condition will be caught as invalid magic number */
+	}
+
 	if (hdr->xlp_magic != XLOG_PAGE_MAGIC)
 	{
 		char		fname[MAXFNAMELEN];
@@ -1324,6 +1417,14 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
 							  fname,
 							  LSN_FORMAT_ARGS(recptr),
 							  offset);
+
+		/*
+		 * If the page address is less than expected we assume it is an unused
+		 * page in a recycled segment.
+		 */
+		if (hdr->xlp_pageaddr < recptr)
+			state->EndOfWAL = true;
+
 		return false;
 	}
 
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 1b48d7171a..3b56bd04a6 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1678,7 +1678,7 @@ PerformWalRecovery(void)
 		/* just have to read next record after CheckPoint */
 		Assert(xlogreader->ReadRecPtr == CheckPointLoc);
 		replayTLI = CheckPointTLI;
-		record = ReadRecord(xlogprefetcher, LOG, false, replayTLI);
+		record = ReadRecord(xlogprefetcher, WARNING, false, replayTLI);
 	}
 
 	if (record != NULL)
@@ -1785,7 +1785,7 @@ PerformWalRecovery(void)
 			}
 
 			/* Else, try to fetch the next WAL record */
-			record = ReadRecord(xlogprefetcher, LOG, false, replayTLI);
+			record = ReadRecord(xlogprefetcher, WARNING, false, replayTLI);
 		} while (record != NULL);
 
 		/*
@@ -1839,12 +1839,19 @@ PerformWalRecovery(void)
 
 		InRedo = false;
 	}
-	else
+	else if (xlogreader->EndOfWAL)
 	{
 		/* there are no WAL records following the checkpoint */
 		ereport(LOG,
 				(errmsg("redo is not required")));
 	}
+	else
+	{
+		/* broken record found */
+		ereport(WARNING,
+				errmsg("redo is skipped"),
+				errhint("This suggests WAL file corruption. You might need to check the database."));
+	}
 
 	/*
 	 * This check is intentionally after the above log messages that indicate
@@ -3095,6 +3102,7 @@ ReadRecord(XLogPrefetcher *xlogprefetcher, int emode,
 	for (;;)
 	{
 		char	   *errormsg;
+		XLogRecPtr	ErrRecPtr = InvalidXLogRecPtr;
 
 		record = XLogPrefetcherReadRecord(xlogprefetcher, &errormsg);
 		if (record == NULL)
@@ -3116,6 +3124,18 @@ ReadRecord(XLogPrefetcher *xlogprefetcher, int emode,
 			{
 				abortedRecPtr = xlogreader->abortedRecPtr;
 				missingContrecPtr = xlogreader->missingContrecPtr;
+				ErrRecPtr = abortedRecPtr;
+			}
+			else
+			{
+				/*
+				 * EndRecPtr is the LSN we tried to read but failed. In the
+				 * case of decoding error, it is at the end of the failed
+				 * record but we don't have a means for now to know EndRecPtr
+				 * is pointing to which of the beginning or ending of the
+				 * failed record.
+				 */
+				ErrRecPtr = xlogreader->EndRecPtr;
 			}
 
 			if (readFile >= 0)
@@ -3126,13 +3146,15 @@ ReadRecord(XLogPrefetcher *xlogprefetcher, int emode,
 
 			/*
 			 * We only end up here without a message when XLogPageRead()
-			 * failed - in that case we already logged something. In
-			 * StandbyMode that only happens if we have been triggered, so we
-			 * shouldn't loop anymore in that case.
+			 * failed- in that case we already logged something. In StandbyMode
+			 * that only happens if we have been triggered, so we shouldn't
+			 * loop anymore in that case. When EndOfWAL is true, we don't emit
+			 * the message immediately and instead will show it as a part of a
+			 * decent end-of-wal message later.
 			 */
-			if (errormsg)
-				ereport(emode_for_corrupt_record(emode, xlogreader->EndRecPtr),
-						(errmsg_internal("%s", errormsg) /* already translated */ ));
+			if (!xlogreader->EndOfWAL && errormsg)
+				ereport(emode_for_corrupt_record(emode, ErrRecPtr),
+						errmsg_internal("%s", errormsg) /* already translated */ );
 		}
 
 		/*
@@ -3163,11 +3185,14 @@ ReadRecord(XLogPrefetcher *xlogprefetcher, int emode,
 			/* Great, got a record */
 			return record;
 		}
-		else
+
+		Assert(ErrRecPtr != InvalidXLogRecPtr);
+
+		/* No valid record available from this source */
+		lastSourceFailed = true;
+
+		if (!fetching_ckpt)
 		{
-			/* No valid record available from this source */
-			lastSourceFailed = true;
-
 			/*
 			 * If archive recovery was requested, but we were still doing
 			 * crash recovery, switch to archive recovery and retry using the
@@ -3180,11 +3205,16 @@ ReadRecord(XLogPrefetcher *xlogprefetcher, int emode,
 			 * we'd have no idea how far we'd have to replay to reach
 			 * consistency.  So err on the safe side and give up.
 			 */
-			if (!InArchiveRecovery && ArchiveRecoveryRequested &&
-				!fetching_ckpt)
+			if (!InArchiveRecovery && ArchiveRecoveryRequested)
 			{
+				/*
+				 * We don't report this as LOG, since we don't stop recovery
+				 * here
+				 */
 				ereport(DEBUG1,
-						(errmsg_internal("reached end of WAL in pg_wal, entering archive recovery")));
+						errmsg_internal("reached end of WAL at %X/%X on timeline %u in pg_wal during crash recovery, entering archive recovery",
+										LSN_FORMAT_ARGS(ErrRecPtr),
+										replayTLI));
 				InArchiveRecovery = true;
 				if (StandbyModeRequested)
 					EnableStandbyMode();
@@ -3205,12 +3235,24 @@ ReadRecord(XLogPrefetcher *xlogprefetcher, int emode,
 				continue;
 			}
 
-			/* In standby mode, loop back to retry. Otherwise, give up. */
-			if (StandbyMode && !CheckForStandbyTrigger())
-				continue;
-			else
-				return NULL;
+			/*
+			 * recovery ended.
+			 *
+			 * Emit a decent message if we met end-of-WAL. Otherwise we should
+			 * have already emitted an error message.
+			 */
+			if (xlogreader->EndOfWAL)
+				ereport(LOG,
+						errmsg("reached end of WAL at %X/%X on timeline %u",
+							   LSN_FORMAT_ARGS(ErrRecPtr), replayTLI),
+						(errormsg ? errdetail_internal("%s", errormsg) : 0));
 		}
+
+		/* In standby mode, loop back to retry. Otherwise, give up. */
+		if (StandbyMode && !CheckForStandbyTrigger())
+			continue;
+		else
+			return NULL;
 	}
 }
 
@@ -3307,11 +3349,17 @@ retry:
 			case XLREAD_WOULDBLOCK:
 				return XLREAD_WOULDBLOCK;
 			case XLREAD_FAIL:
+				/* make sure we didn't exit standby mode without trigger */
+				Assert(!StandbyMode || CheckForStandbyTrigger());
+
 				if (readFile >= 0)
 					close(readFile);
 				readFile = -1;
 				readLen = 0;
 				readSource = XLOG_FROM_ANY;
+
+				/* exit by promotion is not end-of-WAL */
+				xlogreader->EndOfWAL = !StandbyMode;
 				return XLREAD_FAIL;
 			case XLREAD_SUCCESS:
 				break;
@@ -3979,7 +4027,11 @@ emode_for_corrupt_record(int emode, XLogRecPtr RecPtr)
 {
 	static XLogRecPtr lastComplaint = 0;
 
-	if (readSource == XLOG_FROM_PG_WAL && emode == LOG)
+	/*
+	 * readSource cannot be used in place of currentSource because readSource
+	 * is reset on failure
+	 */
+	if (currentSource == XLOG_FROM_PG_WAL && emode <= WARNING)
 	{
 		if (RecPtr == lastComplaint)
 			emode = DEBUG1;
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 728059518e..c69e33dbe7 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -497,10 +497,9 @@ WalReceiverMain(void)
 						else if (len < 0)
 						{
 							ereport(LOG,
-									(errmsg("replication terminated by primary server"),
-									 errdetail("End of WAL reached on timeline %u at %X/%X.",
-											   startpointTLI,
-											   LSN_FORMAT_ARGS(LogstreamResult.Write))));
+									errmsg("replication terminated by primary server at %X/%X on timeline %u.",
+										   LSN_FORMAT_ARGS(LogstreamResult.Write),
+										   startpointTLI));
 							endofwal = true;
 							break;
 						}
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 1f9403fc5c..1c45a25313 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -1309,9 +1309,16 @@ main(int argc, char **argv)
 		exit(0);
 
 	if (errormsg)
-		pg_fatal("error in WAL record at %X/%X: %s",
-				 LSN_FORMAT_ARGS(xlogreader_state->ReadRecPtr),
-				 errormsg);
+	{
+		if (xlogreader_state->EndOfWAL)
+			pg_log_info("end of WAL at %X/%X: %s",
+						LSN_FORMAT_ARGS(xlogreader_state->EndRecPtr),
+						errormsg);
+		else
+			pg_fatal("error in WAL record at %X/%X: %s",
+					 LSN_FORMAT_ARGS(xlogreader_state->EndRecPtr),
+					 errormsg);
+	}
 
 	XLogReaderFree(xlogreader_state);
 
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 082cb8a589..b15e7a2ca3 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -147,10 +147,11 @@ command_fails_like([ 'pg_waldump', $node->data_dir . '/pg_wal/' . $start_walfile
 command_like([ 'pg_waldump', $node->data_dir . '/pg_wal/' . $start_walfile, $node->data_dir . '/pg_wal/' . $end_walfile ], qr/./, 'runs with start and end segment specified');
 command_fails_like([ 'pg_waldump', '-p', $node->data_dir ], qr/error: no start WAL location given/, 'path option requires start location');
 command_like([ 'pg_waldump', '-p', $node->data_dir, '--start', $start_lsn, '--end', $end_lsn ], qr/./, 'runs with path option and start and end locations');
-command_fails_like([ 'pg_waldump', '-p', $node->data_dir, '--start', $start_lsn ], qr/error: error in WAL record at/, 'falling off the end of the WAL results in an error');
+
+command_checks_all([ 'pg_waldump', '-p', $node->data_dir, '--start', $start_lsn ], 0, [qr/./], [qr/pg_waldump: end of WAL at.*: empty record at/], 'the end of the WAL doesn\'t result in an error');
 
 command_like([ 'pg_waldump', '--quiet', $node->data_dir . '/pg_wal/' . $start_walfile ], qr/^$/, 'no output with --quiet option');
-command_fails_like([ 'pg_waldump', '--quiet', '-p', $node->data_dir, '--start', $start_lsn ], qr/error: error in WAL record at/, 'errors are shown with --quiet');
+command_checks_all([ 'pg_waldump', '--quiet', '-p', $node->data_dir, '--start', $start_lsn ], 0, [qr/^$/], [qr/pg_waldump: end of WAL at .*: empty record at .*/], 'end-status is shown with --quiet');
 
 
 # Test for: Display a message that we're skipping data if `from`
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 2e9e5f43eb..7f879281f5 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -205,6 +205,7 @@ struct XLogReaderState
 	 */
 	XLogRecPtr	ReadRecPtr;		/* start of last record read */
 	XLogRecPtr	EndRecPtr;		/* end+1 of last record read */
+	bool		EndOfWAL;		/* was the last attempt EOW? */
 
 	/*
 	 * Set at the end of recovery: the start point of a partial record at the
diff --git a/src/test/recovery/t/035_recovery.pl b/src/test/recovery/t/035_recovery.pl
new file mode 100644
index 0000000000..7107df6509
--- /dev/null
+++ b/src/test/recovery/t/035_recovery.pl
@@ -0,0 +1,130 @@
+
+# Copyright (c) 2021-2023, PostgreSQL Global Development Group
+
+# Minimal test testing recovery process
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+use IPC::Run;
+
+my $reached_eow_pat = "reached end of WAL at ";
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init(allows_streaming => 1);
+$node->start;
+
+my ($stdout, $stderr) = ('', '');
+
+my $segsize = $node->safe_psql('postgres',
+	   qq[SELECT setting FROM pg_settings WHERE name = 'wal_segment_size';]);
+
+# make sure no records afterwards go to the next segment
+$node->safe_psql('postgres', qq[
+				 SELECT pg_switch_wal();
+				 CHECKPOINT;
+				 CREATE TABLE t();
+]);
+$node->stop('immediate');
+
+# identify REDO WAL file
+my $cmd = "pg_controldata -D " . $node->data_dir();
+$cmd = ['pg_controldata', '-D', $node->data_dir()];
+$stdout = '';
+$stderr = '';
+IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
+ok($stdout =~ /^Latest checkpoint's REDO WAL file:[ \t] *(.+)$/m,
+   "checkpoint file is identified");
+my $chkptfile = $1;
+
+# identify the last record
+my $walfile = $node->data_dir() . "/pg_wal/$chkptfile";
+$cmd = ['pg_waldump', $walfile];
+$stdout = '';
+$stderr = '';
+my $lastrec;
+IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
+foreach my $l (split(/\r?\n/, $stdout))
+{
+	$lastrec = $l;
+}
+ok(defined $lastrec, "last WAL record is extracted");
+ok($stderr =~ /end of WAL at ([0-9A-F\/]+): .* at \g1/,
+   "pg_waldump emits the correct ending message");
+
+# read the last record LSN excluding leading zeroes
+ok ($lastrec =~ /, lsn: 0\/0*([1-9A-F][0-9A-F]+),/,
+	"LSN of the last record identified");
+my $lastlsn = $1;
+
+# corrupt the last record
+my $offset = hex($lastlsn) % $segsize;
+open(my $segf, '+<', $walfile) or die "failed to open $walfile\n";
+seek($segf, $offset, 0);  # zero xl_tot_len, leaving following bytes alone.
+print $segf "\0\0\0\0";
+close($segf);
+
+# pg_waldump complains about the corrupted record
+$stdout = '';
+$stderr = '';
+IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
+ok($stderr =~ /error: error in WAL record at 0\/$lastlsn: .* at 0\/$lastlsn/,
+   "pg_waldump emits the correct error message");
+
+# also server complains for the same reason
+my $logstart = -s $node->logfile;
+$node->start;
+ok($node->wait_for_log(
+	   "WARNING:  invalid record length at 0/$lastlsn: expected at least 24, got 0",
+	   $logstart),
+   "header error is correctly logged at $lastlsn");
+
+# no end-of-wal message should be seen this time
+ok(!find_in_log($node, $reached_eow_pat, $logstart),
+   "false log message is not emitted");
+
+# Create streaming standby linking to primary
+my $backup_name = 'my_backup';
+$node->backup($backup_name);
+my $node_standby = PostgreSQL::Test::Cluster->new('standby');
+$node_standby->init_from_backup($node, $backup_name, has_streaming => 1);
+$node_standby->start;
+$node->safe_psql('postgres', 'CREATE TABLE t ()');
+my $primary_lsn = $node->lsn('write');
+$node->wait_for_catchup($node_standby, 'write', $primary_lsn);
+
+$node_standby->stop();
+$node->stop('immediate');
+
+# crash restart the primary
+$logstart = -s $node->logfile;
+$node->start();
+ok($node->wait_for_log($reached_eow_pat, $logstart),
+   'primary properly emits end-of-WAL message');
+
+# restart the standby
+$logstart = -s $node_standby->logfile;
+$node_standby->start();
+ok($node->wait_for_log($reached_eow_pat, $logstart),
+   'standby properly emits end-of-WAL message');
+
+$node_standby->stop();
+$node->stop();
+
+done_testing();
+
+#### helper routines
+# find $pat in logfile of $node after $off-th byte
+sub find_in_log
+{
+	my ($node, $pat, $off) = @_;
+
+	$off = 0 unless defined $off;
+	my $log = PostgreSQL::Test::Utils::slurp_file($node->logfile);
+	return 0 if (length($log) <= $off);
+
+	$log = substr($log, $off);
+
+	return $log =~ m/$pat/;
+}
diff --git a/src/test/recovery/t/039_end_of_wal.pl b/src/test/recovery/t/039_end_of_wal.pl
index f9acc83c7d..5a17f68512 100644
--- a/src/test/recovery/t/039_end_of_wal.pl
+++ b/src/test/recovery/t/039_end_of_wal.pl
@@ -258,16 +258,29 @@ my $prev_lsn;
 note "Single-page end-of-WAL detection";
 ###########################################################################
 
-# xl_tot_len is 0 (a common case, we hit trailing zeroes).
-emit_message($node, 0);
-$end_lsn = advance_out_of_record_splitting_zone($node);
+# empty record without trailing garbage bytes until the page end - not error
 $node->stop('immediate');
 my $log_size = -s $node->logfile;
 $node->start;
+ok( $node->log_contains(
+		"LOG:  reached end of WAL at.*\n.* DETAIL:  empty record at",
+		$log_size),
+	"end-of-WAL by empty record");
+
+# xl_tot_len is 0 with following garbage bytes in the page
+emit_message($node, 0);
+$end_lsn = advance_out_of_record_splitting_zone($node);
+$node->stop('immediate');
+write_wal($node, $TLI,
+		  # last byte in the page at $end_lsn
+		  $end_lsn - ($end_lsn % $WAL_BLOCK_SIZE) + $WAL_BLOCK_SIZE - 1,
+		  pack("c", 1)); # garbage byte
+$log_size = -s $node->logfile;
+$node->start;
 ok( $node->log_contains(
 		"invalid record length at .*: expected at least 24, got 0", $log_size
 	),
-	"xl_tot_len zero");
+	"zero xl_tot_len followed by garbage bytes");
 
 # xl_tot_len is < 24 (presumably recycled garbage).
 emit_message($node, 0);
@@ -328,7 +341,7 @@ note "Multi-page end-of-WAL detection, header is not split";
 # This series of tests requires a valid xl_prev set in the record header
 # written to WAL.
 
-# Good xl_prev, we hit zero page next (zero magic).
+# Good xl_prev, we hit zero page next
 emit_message($node, 0);
 $prev_lsn = advance_out_of_record_splitting_zone($node);
 $end_lsn = emit_message($node, 0);
@@ -337,8 +350,24 @@ write_wal($node, $TLI, $end_lsn,
 	build_record_header(2 * 1024 * 1024 * 1024, 0, $prev_lsn));
 $log_size = -s $node->logfile;
 $node->start;
-ok($node->log_contains("invalid magic number 0000 .* LSN .*", $log_size),
-	"xlp_magic zero");
+ok( $node->log_contains("WARNING:  empty page in WAL segment .*, offset .* while reading continuation record at .*", $log_size),
+   "empty page");
+
+# Good xl_prev, we hit zero page magic with following garbage bytes.
+emit_message($node, 0);
+$prev_lsn = advance_out_of_record_splitting_zone($node);
+$end_lsn = emit_message($node, 0);
+$node->stop('immediate');
+write_wal($node, $TLI, $end_lsn,
+		  build_record_header(2 * 1024 * 1024 * 1024, 0, $prev_lsn));
+# place garbage at the end of the next page
+write_wal($node, $TLI,
+		  start_of_next_page(start_of_next_page($end_lsn)) - 1,
+		  pack("i", 1));
+$log_size = -s $node->logfile;
+$node->start;
+ok( $node->log_contains("invalid magic number 0000 .* LSN .*", $log_size),
+   "bad magic");
 
 # Good xl_prev, we hit garbage page next (bad magic).
 emit_message($node, 0);
@@ -442,8 +471,8 @@ write_wal($node, $TLI, $end_lsn,
 	build_record_header(2 * 1024 * 1024 * 1024, 0, 0xdeadbeef));
 $log_size = -s $node->logfile;
 $node->start;
-ok($node->log_contains("invalid magic number 0000 .* LSN .*", $log_size),
-	"xlp_magic zero (split record header)");
+ok( $node->log_contains("WARNING:  empty page in WAL segment .*, offset .* while reading continuation record at .*", $log_size),
+	"zero page while reading a record (split record header)");
 
 # And we'll also check xlp_pageaddr before any header checks.
 emit_message($node, 0);
-- 
2.39.3

Reply via email to