Hello. Thank you for looking this. At Mon, 16 Oct 2017 17:58:03 +0900, Michael Paquier <michael.paqu...@gmail.com> wrote in <cab7npqr+j1xw+yzfsrehiq+rh3ac+n5seugp7uoq4_ymfno...@mail.gmail.com> > On Thu, Sep 7, 2017 at 12:33 PM, Kyotaro HORIGUCHI > <horiguchi.kyot...@lab.ntt.co.jp> wrote: > > At Wed, 6 Sep 2017 12:23:53 -0700, Andres Freund <and...@anarazel.de> wrote > > in <20170906192353.ufp2dq7wm5fd6...@alap3.anarazel.de> > >> I'm not following. All we need to use is the beginning of the relevant > >> records, that's easy enough to keep track of. We don't need to read the > >> WAL or anything. > > > > The beginning is already tracked and nothing more to do. > > I have finally allocated time to look at your newly-proposed patch, > sorry for the time it took. Patch 0002 forgot to include sys/stat.h to > allow the debugging tool to compile :) > > > The first *problem* was WaitForWALToBecomeAvailable requests the > > beginning of a record, which is not on the page the function has > > been told to fetch. Still tliRecPtr is required to determine the > > TLI to request, it should request RecPtr to be streamed. > > [...] > > > The rest to do is let XLogPageRead retry other sources > > immediately. To do this I made ValidXLogPageHeader@xlogreader.c > > public (and renamed to XLogReaderValidatePageHeader). > > > > The patch attached fixes the problem and passes recovery > > tests. However, the test for this problem is not added. It needs > > to go to the last page in a segment then put a record continues > > to the next segment, then kill the standby after receiving the > > previous segment but before receiving the whole record. > > +typedef struct XLogPageHeaderData *XLogPageHeader; > [...] > +/* Validate a page */ > +extern bool XLogReaderValidatePageHeader(XLogReaderState *state, > + XLogRecPtr recptr, XLogPageHeader > hdr); > Instead of exposing XLogPageHeaderData, I would recommend just using > char* and remove this declaration. The comment on top of > XLogReaderValidatePageHeader needs to make clear what caller should > provide.
Seems reasonable. Added several lines in the comment for the function. > + if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, > + (XLogPageHeader) readBuf)) > + goto next_record_is_invalid; > + > [...] > - ptr = tliRecPtr; > + ptr = RecPtr; > tli = tliOfPointInHistory(tliRecPtr, > expectedTLEs); > > if (curFileTLI > 0 && tli < curFileTLI) > The core of the patch is here (the patch has no comment so it is hard > to understand what's the point of what is being done), and if I Hmm, sorry. Added a brief comment there. > understand that correctly, you allow the receiver to fetch the > portions of a record spawned across multiple segments from different > sources, and I am not sure that we'd want to break that promise. We are allowing consecutive records at a segment boundary from different sources are in the same series of xlog records. A continuation records never spans over two TLIs but I might be missing something here. (I found that an error message shows an incorrect record pointer. The error message seems still be useful.) > Shouldn't we instead have the receiver side track the beginning of the > record and send that position for the physical slot's restart_lsn? The largest obstacle to do that is that walreceiver is not utterly concerned to record internals. In other words, it doesn't know what a record is. Teaching that introduces much complexity and the complexity slows down the walreceiver. Addition to that, this "problem" occurs also on replication without a slot. The latest patch also help the case. > This way the receiver would retain WAL segments from the real > beginning of a record. restart_lsn for replication slots is set when > processing the standby message in ProcessStandbyReplyMessage() using > now the flush LSN, so a more correct value should be provided using > that. Andres, what's your take on the matter? regards, -- Kyotaro Horiguchi NTT Open Source Software Center
>From b23c1d69ad86fcbb992cb21c604f587d82441001 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp> Date: Thu, 7 Sep 2017 12:14:55 +0900 Subject: [PATCH 1/2] Allow switch WAL source midst of record. The corrent recovery machinary assumes the whole of a record is avaiable from single source. This prevents a standby from restarting under a certain condition. This patch allows source switching during reading a series of continuation records. --- src/backend/access/transam/xlog.c | 14 ++++++++++++-- src/backend/access/transam/xlogreader.c | 28 ++++++++++++++++++---------- src/include/access/xlogreader.h | 4 ++++ 3 files changed, 34 insertions(+), 12 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index dd028a1..0d639ec 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -11647,6 +11647,10 @@ retry: Assert(reqLen <= readLen); *readTLI = curFileTLI; + + if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf)) + goto next_record_is_invalid; + return readLen; next_record_is_invalid: @@ -11781,12 +11785,18 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, } else { - ptr = tliRecPtr; + /* + * Trying from the current RecPtr, not from the + * beginning of the current record. The record may + * be no longer available from the master. + */ + ptr = RecPtr; tli = tliOfPointInHistory(tliRecPtr, expectedTLEs); if (curFileTLI > 0 && tli < curFileTLI) elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came from timeline %u", - (uint32) (ptr >> 32), (uint32) ptr, + (uint32) (tliRecPtr >> 32), + (uint32) tliRecPtr, tli, curFileTLI); } curFileTLI = tli; diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index b1f9b90..78a721a 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -27,8 +27,6 @@ static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength); -static bool ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr, - XLogPageHeader hdr); static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr, XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess); static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record, @@ -533,7 +531,6 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen) */ if (targetSegNo != state->readSegNo && targetPageOff != 0) { - XLogPageHeader hdr; XLogRecPtr targetSegmentPtr = pageptr - targetPageOff; readLen = state->read_page(state, targetSegmentPtr, XLOG_BLCKSZ, @@ -545,9 +542,8 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen) /* we can be sure to have enough WAL available, we scrolled back */ Assert(readLen == XLOG_BLCKSZ); - hdr = (XLogPageHeader) state->readBuf; - - if (!ValidXLogPageHeader(state, targetSegmentPtr, hdr)) + if (!XLogReaderValidatePageHeader(state, targetSegmentPtr, + state->readBuf)) goto err; } @@ -584,7 +580,7 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen) /* * Now that we know we have the full header, validate it. */ - if (!ValidXLogPageHeader(state, pageptr, hdr)) + if (!XLogReaderValidatePageHeader(state, pageptr, (char *) hdr)) goto err; /* update read state information */ @@ -710,14 +706,26 @@ ValidXLogRecord(XLogReaderState *state, XLogRecord *record, XLogRecPtr recptr) /* * Validate a page header + * + * Check if phdr is valid as the XLog page header of the XLog page of the page + * pointed by recptr. + * + * phdr is read as XLogPageHeaderData and check if it has valid magic, has no + * usused flag bits set and belongs to correct page pointed by recptr. The + * incorrect page address is detected usually when we read a page in a + * recycled segment. + * + * If it is the first page in a segment or detected rewind of state, + * additional checks are performed. */ -static bool -ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr, - XLogPageHeader hdr) +bool +XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr, + char *phdr) { XLogRecPtr recaddr; XLogSegNo segno; int32 offset; + XLogPageHeader hdr = (XLogPageHeader) phdr; Assert((recptr % XLOG_BLCKSZ) == 0); diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h index 3a9ebd4..758d880 100644 --- a/src/include/access/xlogreader.h +++ b/src/include/access/xlogreader.h @@ -205,6 +205,10 @@ extern void XLogReaderFree(XLogReaderState *state); extern struct XLogRecord *XLogReadRecord(XLogReaderState *state, XLogRecPtr recptr, char **errormsg); +/* Validate a page */ +extern bool XLogReaderValidatePageHeader(XLogReaderState *state, + XLogRecPtr recptr, char *phdr); + /* Invalidate read state */ extern void XLogReaderInvalReadState(XLogReaderState *state); -- 2.9.2
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers