Re: [HACKERS] Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

Heikki Linnakangas Fri, 18 Jan 2013 01:21:23 -0800

On 18.01.2013 02:35, Andres Freund wrote:

On 2013-01-18 08:24:31 +0900, Michael Paquier wrote:

On Fri, Jan 18, 2013 at 3:05 AM, Fujii Masao<[email protected]>  wrote:

  I encountered the problem that the timeline switch is not performed
expectedly.
I set up one master, one standby and one cascade standby. All the servers
share the archive directory. restore_command is specified in the
recovery.conf
in those two standbys.

I shut down the master, and then promoted the standby. In this case, the
cascade standby should switch to new timeline and replication should be
successfully restarted. But the timeline was never changed, and the
following
log messages were kept outputting.

sby2 LOG:  restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG:  replication terminated by primary server
sby2 DETAIL:  End of WAL reached on timeline 1
sby2 LOG:  restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG:  replication terminated by primary server
sby2 DETAIL:  End of WAL reached on timeline 1
sby2 LOG:  restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG:  replication terminated by primary server
sby2 DETAIL:  End of WAL reached on timeline 1

I am seeing similar issues with master at 88228e6.
This is easily reproducible by setting up 2 slaves under a master, then
kill the master. Promote slave 1 and  reconnect slave 2 to slave 1, then
you will notice that the timeline jump is not done.

I don't know if Masao tried to put in sync the slave that reconnects to the
promoted slave, but in this case slave2 stucks in "potential" state". That
is due to timeline that has not changed on slave2 but better to let you
know...


Ok, I know whats causing this now. Rather ugly.

Whenever accessing a page in a segment we haven't accessed before we
read the first page to do an extra bit of validation as the first page
in a segment contains more information.

Suppose timeline 1 ends at 0/6087088, xlog.c notices that WAL ends
there, wants to read the new timeline, requests record
0/06087088. xlogreader wants to do its validation and goes back to the
first page in the segment which triggers xlog.c to rerequest timeline1
to be transferred..

Hmm, so it's the same issue I thought I fixed yesterday. My patch onlyfixed it for the case that the timeline switch is in the first page ofthe segment. When it's not, you still get two calls for a WAL record,first one for the first page in the segment, to verify that, and thenthe page that actually contains the record. The first call leadsXLogPageRead to think it needs to read from the old timeline.

We didn't have this problem before the xlogreader refactoring becauseXLogPageRead() was always called with the RecPtr of the record, evenwhen we actually read the segment header from the file first. We'll haveto somehow get that same information, the RecPtr of the record we'reactually interested in, to XLogPageRead(). We could add a new argumentto the callback for that, or we could keep xlogreader.c as it is andpass it through from ReadRecord to XLogPageRead() in the private struct.

An explicit argument to the callback is probably best. That'sstraightforward, and it might be useful for the callback to know theactual WAL position that xlogreader.c is interested in anyway. See attached.


- Heikki

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 90ba32e..3ac3b76 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -626,9 +626,10 @@ static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 			 int source, bool notexistOk);
 static int XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source);
 static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
-				 int reqLen, char *readBuf, TimeLineID *readTLI);
+			 int reqLen, XLogRecPtr targetRecPtr, char *readBuf,
+			 TimeLineID *readTLI);
 static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
-							bool fetching_ckpt);
+							bool fetching_ckpt, XLogRecPtr tliRecPtr);
 static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
 static void XLogFileClose(void);
 static void PreallocXlogFiles(XLogRecPtr endptr);
@@ -8832,7 +8833,7 @@ CancelBackup(void)
  */
 static int
 XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
-			 char *readBuf, TimeLineID *readTLI)
+			 XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *readTLI)
 {
 	XLogPageReadPrivate *private =
 		(XLogPageReadPrivate *) xlogreader->private_data;
@@ -8880,7 +8881,8 @@ retry:
 		{
 			if (!WaitForWALToBecomeAvailable(targetPagePtr + reqLen,
 											 private->randAccess,
-											 private->fetching_ckpt))
+											 private->fetching_ckpt,
+											 targetRecPtr))
 				goto triggered;
 		}
 		/* In archive or crash recovery. */
@@ -8980,11 +8982,19 @@ triggered:
 }
 
 /*
- * In standby mode, wait for the requested record to become available, either
+ * In standby mode, wait for WAL at position 'RecPtr' to become available, either
  * via restore_command succeeding to restore the segment, or via walreceiver
  * having streamed the record (or via someone copying the segment directly to
  * pg_xlog, but that is not documented or recommended).
  *
+ * If 'fetching_ckpt' is true, we're fetching a checkpoint record, and should
+ * prepare to read WAL starting from RedoStartLSN after this.
+ *
+ * 'RecPtr' might not point to the beginning of the record we're interested
+ * in, it might also point to the page or segment header. In that case,
+ * 'tliRecPtr' is the position of the WAL record we're interested in. It is
+ * used to decide which timeline to stream the requested WAL from.
+ *
  * When the requested record becomes available, the function opens the file
  * containing it (if not open already), and returns true. When end of standby
  * mode is triggered by the user, and there is no more WAL available, returns
@@ -8992,7 +9002,7 @@ triggered:
  */
 static bool
 WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
-							bool fetching_ckpt)
+							bool fetching_ckpt, XLogRecPtr tliRecPtr)
 {
 	static pg_time_t last_fail_time = 0;
 	pg_time_t now;
@@ -9076,7 +9086,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 						else
 						{
 							ptr = RecPtr;
-							tli = tliOfPointInHistory(ptr, expectedTLEs);
+							tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);
 
 							if (curFileTLI > 0 && tli < curFileTLI)
 								elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came from timeline %u",
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 9499f84..a358a3d 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -216,6 +216,8 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 		randAccess = true;		/* allow readPageTLI to go backwards too */
 	}
 
+	state->currRecPtr = RecPtr;
+
 	targetPagePtr = RecPtr - (RecPtr % XLOG_BLCKSZ);
 	targetRecOff = RecPtr % XLOG_BLCKSZ;
 
@@ -503,6 +505,7 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)
 		XLogRecPtr	targetSegmentPtr = pageptr - targetPageOff;
 
 		readLen = state->read_page(state, targetSegmentPtr, XLOG_BLCKSZ,
+								   state->currRecPtr,
 								   state->readBuf, &state->readPageTLI);
 		if (readLen < 0)
 			goto err;
@@ -521,6 +524,7 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)
 	 * so that we can validate it.
 	 */
 	readLen = state->read_page(state, pageptr, Max(reqLen, SizeOfXLogShortPHD),
+							   state->currRecPtr,
 							   state->readBuf, &state->readPageTLI);
 	if (readLen < 0)
 		goto err;
@@ -539,6 +543,7 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)
 	if (readLen < XLogPageHeaderSize(hdr))
 	{
 		readLen = state->read_page(state, pageptr, XLogPageHeaderSize(hdr),
+								   state->currRecPtr,
 								   state->readBuf, &state->readPageTLI);
 		if (readLen < 0)
 			goto err;
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 36907d6..3829ce2 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -27,6 +27,7 @@ typedef struct XLogReaderState XLogReaderState;
 typedef int (*XLogPageReadCB) (XLogReaderState *xlogreader,
 										   XLogRecPtr targetPagePtr,
 										   int reqLen,
+										   XLogRecPtr targetRecPtr,
 										   char *readBuf,
 										   TimeLineID *pageTLI);
 
@@ -46,11 +47,17 @@ struct XLogReaderState
 	 * -1 on failure.  The callback shall sleep, if necessary, to wait for the
 	 * requested bytes to become available.  The callback will not be invoked
 	 * again for the same page unless more than the returned number of bytes
-	 * are necessary.
+	 * are needed.
 	 *
-	 * *pageTLI should be set to the TLI of the file the page was read from.
-	 * It is currently used only for error reporting purposes, to reconstruct
-	 * the name of the WAL file where an error occurred.
+	 * targetRecPtr is the position of the WAL record we're reading.  Usually
+	 * it is equal to targetPagePtr + reqLen, but sometimes xlogreader needs
+	 * to read and verify the page or segment header, before it reads the
+	 * actual WAL record it's interested in.  In that case, targetRecPtr can
+	 * be used to determine which timeline to read the page from.
+	 *
+	 * The callback shall set *pageTLI to the TLI of the file the page was
+	 * read from.  It is currently used only for error reporting purposes, to
+	 * reconstruct the name of the WAL file where an error occurred.
 	 */
 	XLogPageReadCB read_page;
 
@@ -90,6 +97,9 @@ struct XLogReaderState
 	XLogRecPtr	latestPagePtr;
 	TimeLineID	latestPageTLI;
 
+	/* beginning of the WAL record being read. */
+	XLogRecPtr	currRecPtr;
+
 	/* Buffer for current ReadRecord result (expandable) */
 	char	   *readRecordBuf;
 	uint32		readRecordBufSize;

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

Reply via email to