Hello. Thank you for looking this.

At Mon, 16 Oct 2017 17:58:03 +0900, Michael Paquier <michael.paqu...@gmail.com> 
wrote in <cab7npqr+j1xw+yzfsrehiq+rh3ac+n5seugp7uoq4_ymfno...@mail.gmail.com>
> On Thu, Sep 7, 2017 at 12:33 PM, Kyotaro HORIGUCHI
> <horiguchi.kyot...@lab.ntt.co.jp> wrote:
> > At Wed, 6 Sep 2017 12:23:53 -0700, Andres Freund <and...@anarazel.de> wrote 
> > in <20170906192353.ufp2dq7wm5fd6...@alap3.anarazel.de>
> >> I'm not following. All we need to use is the beginning of the relevant
> >> records, that's easy enough to keep track of. We don't need to read the
> >> WAL or anything.
> >
> > The beginning is already tracked and nothing more to do.
> 
> I have finally allocated time to look at your newly-proposed patch,
> sorry for the time it took. Patch 0002 forgot to include sys/stat.h to
> allow the debugging tool to compile :)
> 
> > The first *problem* was WaitForWALToBecomeAvailable requests the
> > beginning of a record, which is not on the page the function has
> > been told to fetch. Still tliRecPtr is required to determine the
> > TLI to request, it should request RecPtr to be streamed.
> 
> [...]
> 
> > The rest to do is let XLogPageRead retry other sources
> > immediately. To do this I made ValidXLogPageHeader@xlogreader.c
> > public (and renamed to XLogReaderValidatePageHeader).
> >
> > The patch attached fixes the problem and passes recovery
> > tests. However, the test for this problem is not added. It needs
> > to go to the last page in a segment then put a record continues
> > to the next segment, then kill the standby after receiving the
> > previous segment but before receiving the whole record.
> 
> +typedef struct XLogPageHeaderData *XLogPageHeader;
> [...]
> +/* Validate a page */
> +extern bool XLogReaderValidatePageHeader(XLogReaderState *state,
> +                                       XLogRecPtr recptr, XLogPageHeader 
> hdr);
> Instead of exposing XLogPageHeaderData, I would recommend just using
> char* and remove this declaration. The comment on top of
> XLogReaderValidatePageHeader needs to make clear what caller should
> provide.

Seems reasonable. Added several lines in the comment for the
function.

> +    if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
> +                                      (XLogPageHeader) readBuf))
> +        goto next_record_is_invalid;
> +
> [...]
> -                            ptr = tliRecPtr;
> +                            ptr = RecPtr;
>                              tli = tliOfPointInHistory(tliRecPtr, 
> expectedTLEs);
> 
>                              if (curFileTLI > 0 && tli < curFileTLI)
> The core of the patch is here (the patch has no comment so it is hard
> to understand what's the point of what is being done), and if I

Hmm, sorry. Added a brief comment there.

> understand that correctly, you allow the receiver to fetch the
> portions of a record spawned across multiple segments from different
> sources, and I am not sure that we'd want to break that promise.

We are allowing consecutive records at a segment boundary from
different sources are in the same series of xlog records. A
continuation records never spans over two TLIs but I might be
missing something here. (I found that an error message shows an
incorrect record pointer. The error message seems still be
useful.)

> Shouldn't we instead have the receiver side track the beginning of the
> record and send that position for the physical slot's restart_lsn?

The largest obstacle to do that is that walreceiver is not
utterly concerned to record internals. In other words, it doesn't
know what a record is. Teaching that introduces much complexity
and the complexity slows down the walreceiver.

Addition to that, this "problem" occurs also on replication
without a slot. The latest patch also help the case.

> This way the receiver would retain WAL segments from the real
> beginning of a record. restart_lsn for replication slots is set when
> processing the standby message in ProcessStandbyReplyMessage() using
> now the flush LSN, so a more correct value should be provided using
> that. Andres, what's your take on the matter?


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From b23c1d69ad86fcbb992cb21c604f587d82441001 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp>
Date: Thu, 7 Sep 2017 12:14:55 +0900
Subject: [PATCH 1/2] Allow switch WAL source midst of record.

The corrent recovery machinary assumes the whole of a record is
avaiable from single source. This prevents a standby from restarting
under a certain condition. This patch allows source switching during
reading a series of continuation records.
---
 src/backend/access/transam/xlog.c       | 14 ++++++++++++--
 src/backend/access/transam/xlogreader.c | 28 ++++++++++++++++++----------
 src/include/access/xlogreader.h         |  4 ++++
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dd028a1..0d639ec 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11647,6 +11647,10 @@ retry:
 	Assert(reqLen <= readLen);
 
 	*readTLI = curFileTLI;
+
+	if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
+		goto next_record_is_invalid;
+
 	return readLen;
 
 next_record_is_invalid:
@@ -11781,12 +11785,18 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 						}
 						else
 						{
-							ptr = tliRecPtr;
+							/*
+							 * Trying from the current RecPtr, not from the
+							 * beginning of the current record. The record may
+							 * be no longer available from the master.
+							 */
+							ptr = RecPtr;
 							tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);
 
 							if (curFileTLI > 0 && tli < curFileTLI)
 								elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came from timeline %u",
-									 (uint32) (ptr >> 32), (uint32) ptr,
+									 (uint32) (tliRecPtr >> 32),
+									 (uint32) tliRecPtr,
 									 tli, curFileTLI);
 						}
 						curFileTLI = tli;
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index b1f9b90..78a721a 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -27,8 +27,6 @@
 
 static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength);
 
-static bool ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr,
-					XLogPageHeader hdr);
 static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
 					  XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess);
 static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record,
@@ -533,7 +531,6 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)
 	 */
 	if (targetSegNo != state->readSegNo && targetPageOff != 0)
 	{
-		XLogPageHeader hdr;
 		XLogRecPtr	targetSegmentPtr = pageptr - targetPageOff;
 
 		readLen = state->read_page(state, targetSegmentPtr, XLOG_BLCKSZ,
@@ -545,9 +542,8 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)
 		/* we can be sure to have enough WAL available, we scrolled back */
 		Assert(readLen == XLOG_BLCKSZ);
 
-		hdr = (XLogPageHeader) state->readBuf;
-
-		if (!ValidXLogPageHeader(state, targetSegmentPtr, hdr))
+		if (!XLogReaderValidatePageHeader(state, targetSegmentPtr,
+										  state->readBuf))
 			goto err;
 	}
 
@@ -584,7 +580,7 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)
 	/*
 	 * Now that we know we have the full header, validate it.
 	 */
-	if (!ValidXLogPageHeader(state, pageptr, hdr))
+	if (!XLogReaderValidatePageHeader(state, pageptr, (char *) hdr))
 		goto err;
 
 	/* update read state information */
@@ -710,14 +706,26 @@ ValidXLogRecord(XLogReaderState *state, XLogRecord *record, XLogRecPtr recptr)
 
 /*
  * Validate a page header
+ *
+ * Check if phdr is valid as the XLog page header of the XLog page of the page
+ * pointed by recptr.
+ *
+ * phdr is read as XLogPageHeaderData and check if it has valid magic, has no
+ * usused flag bits set and belongs to correct page pointed by recptr. The
+ * incorrect page address is detected usually when we read a page in a
+ * recycled segment.
+ *
+ * If it is the first page in a segment or detected rewind of state,
+ * additional checks are performed.
  */
-static bool
-ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr,
-					XLogPageHeader hdr)
+bool
+XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
+							 char *phdr)
 {
 	XLogRecPtr	recaddr;
 	XLogSegNo	segno;
 	int32		offset;
+	XLogPageHeader hdr = (XLogPageHeader) phdr;
 
 	Assert((recptr % XLOG_BLCKSZ) == 0);
 
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 3a9ebd4..758d880 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -205,6 +205,10 @@ extern void XLogReaderFree(XLogReaderState *state);
 extern struct XLogRecord *XLogReadRecord(XLogReaderState *state,
 			   XLogRecPtr recptr, char **errormsg);
 
+/* Validate a page */
+extern bool XLogReaderValidatePageHeader(XLogReaderState *state,
+					XLogRecPtr recptr, char *phdr);
+
 /* Invalidate read state */
 extern void XLogReaderInvalReadState(XLogReaderState *state);
 
-- 
2.9.2

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to