At Thu, 25 Oct 2018 21:55:18 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
<horiguchi.kyot...@lab.ntt.co.jp> wrote in 
<20181025.215518.189844649.horiguchi.kyot...@lab.ntt.co.jp>
> > =# alter system set max_slot_wal_keep_size to '64MB'; -- while
> > wal_keep_segments is 0
> > =# select pg_reload_conf();
> > =# select slot_name, wal_status, remain, pg_size_pretty(remain) as
> > remain_pretty from pg_replication_slots ;
> >  slot_name | wal_status |  remain  | remain_pretty
> > -----------+------------+----------+---------------
> >  1         | streaming  | 83885648 | 80 MB
> > (1 row)
> > 
> > ** consume 80MB WAL, and do CHECKPOINT **
> > 
> > =# select slot_name, wal_status, remain, pg_size_pretty(remain) as
> > remain_pretty from pg_replication_slots ;
> >  slot_name | wal_status | remain | remain_pretty
> > -----------+------------+--------+---------------
> >  1         | lost       |      0 | 0 bytes
> > (1 row)
> > =# select count(*) from pg_logical_slot_get_changes('1', NULL, NULL);
> >  count
> > -------
> >     15
> > (1 row)
> 
> Mmm. The function looks into the segment already open before
> losing the segment in the file system (precisely, its direcotory
> entry has been deleted). So just 1 lost segment doesn't
> matter. Please try losing more one segment.

I considered this a bit more and the attached patch let
XLogReadRecord() check for segment removal every time it is
called and emits the following error in the case.

> =# select * from pg_logical_slot_get_changes('s1', NULL, NULL);
> ERROR:  WAL record at 0/870001B0 no longer available
> DETAIL:  The segment for the record has been removed.

The reason for doing that in the fucntion is it can happen also
for physical replication when walsender is active but far
behind. The removed(renamed)-but-still-open segment may be
recycled and can be overwritten while reading, and it will be
caught by page/record validation. It is substantially lost in
that sense.  I don't think the strictness is useful for anything..

Thoughts?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 775f6366d78ac6818023cc158e37c70119246e19 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp>
Date: Fri, 26 Oct 2018 10:07:05 +0900
Subject: [PATCH 5/5] Check removal of in-read segment file.

Checkpoint can remove or recycle a segment file while it is being read
by ReadRecord. This patch checks for the case and error out
immedaitely.  Reading recycled file is basically safe and
inconsistenty caused by overwrites as new segment will be caught by
page/record validation. So this is only for keeping consistency with
the wal_status shown in pg_replication_slots.
---
 src/backend/access/transam/xlogreader.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 0768ca7822..a6c97cf260 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -217,6 +217,7 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 {
 	XLogRecord *record;
 	XLogRecPtr	targetPagePtr;
+	XLogSegNo	targetSegNo;
 	bool		randAccess;
 	uint32		len,
 				total_len;
@@ -270,6 +271,18 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 	targetPagePtr = RecPtr - (RecPtr % XLOG_BLCKSZ);
 	targetRecOff = RecPtr % XLOG_BLCKSZ;
 
+	/*
+	 * checkpoint can remove the segment currently looking for.  make sure the
+	 * current segment is still exists. We check this only once per record.
+	 */
+	XLByteToSeg(targetPagePtr, targetSegNo, state->wal_segment_size);
+	if (targetSegNo <= XLogGetLastRemovedSegno())
+		ereport(ERROR,
+				(errcode(ERRCODE_NO_DATA),
+				 errmsg("WAL record at %X/%X no longer available",
+						(uint32)(RecPtr >> 32), (uint32) RecPtr),
+				 errdetail("The segment for the record has been removed.")));
+			
 	/*
 	 * Read the page containing the record into state->readBuf. Request enough
 	 * byte to cover the whole record header, or at least the part of it that
-- 
2.16.3

Reply via email to