On 25.04.2013 18:56, Kyotaro HORIGUCHI wrote:
Can you share the modified script, please?

Please find the attached files:
   test.sh : test script. most significant change is the load.
                I used simple insert instead of pgbench.
                It might need some more adjustment for other environment
                as my usual.
   xlog.c.diff : Additional log output I thought to be useful to diagnose.

Ok, thanks, I see what's going on now. The problem is that once XLogFileRead() finds a file with tli X, it immediately sets curFileTLI = X. XLogFileReadAnyTLI() never tries to read files with tli < curFileTLI. So, if recovery finds a file with the right filename, e.g 000000030000000000000008, it never tries to open 000000020000000000000008 anymore, even if the contents of 000000030000000000000008 later turn out to be bogus.

One idea to fix this is to not set curFileTLI, until the page header on the just-opened file has been verified. Another idea is to change the check in XLogFileReadAnyTLI() that currently forbids curFileTLI from moving backwards. We could allow curFileTLI to move backwards, as long as the tli is >= ThisTimeLineID (ThisTimeLineID is the current timeline we're recovering records from).

Attached is a patch for the 2nd approach. With the patch, the test script works for me. Thoughts?

PS. This wasn't caused by the 9.2.4 change to do crash recovery before entering archive recovery. The test script fails in the same way with 9.2.3 as well.

- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 690077c..4ca75d7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -251,8 +251,7 @@ static bool recoveryStopAfter;
  * curFileTLI: the TLI appearing in the name of the current input WAL file.
  * (This is not necessarily the same as ThisTimeLineID, because we could
  * be scanning data that was copied from an ancestor timeline when the current
- * file was created.)  During a sequential scan we do not allow this value
- * to decrease.
+ * file was created.)
  */
 static TimeLineID recoveryTargetTLI;
 static bool recoveryTargetIsLatest = false;
@@ -657,7 +656,7 @@ static bool InstallXLogFileSegment(uint32 *log, uint32 *seg, char *tmppath,
 static int XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
 			 int source, bool notexistOk);
 static int XLogFileReadAnyTLI(uint32 log, uint32 seg, int emode,
-				   int sources);
+				   int sources, bool randAccess);
 static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt,
 			 bool randAccess);
 static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
@@ -2899,7 +2898,8 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
  * This version searches for the segment with any TLI listed in expectedTLIs.
  */
 static int
-XLogFileReadAnyTLI(uint32 log, uint32 seg, int emode, int sources)
+XLogFileReadAnyTLI(uint32 log, uint32 seg, int emode, int sources,
+				   bool randAccess)
 {
 	char		path[MAXPGPATH];
 	ListCell   *cell;
@@ -2909,17 +2909,16 @@ XLogFileReadAnyTLI(uint32 log, uint32 seg, int emode, int sources)
 	 * Loop looking for a suitable timeline ID: we might need to read any of
 	 * the timelines listed in expectedTLIs.
 	 *
-	 * We expect curFileTLI on entry to be the TLI of the preceding file in
-	 * sequence, or 0 if there was no predecessor.	We do not allow curFileTLI
-	 * to go backwards; this prevents us from picking up the wrong file when a
-	 * parent timeline extends to higher segment numbers than the child we
-	 * want to read.
+	 * During a sequential read, do not check TLIs smaller than the timeline
+	 * we're currently recovering (ThisTimeLineID); this prevents us from
+	 * picking up the wrong file when a parent timeline extends to higher
+	 * segment numbers than the child we want to read.
 	 */
 	foreach(cell, expectedTLIs)
 	{
 		TimeLineID	tli = (TimeLineID) lfirst_int(cell);
 
-		if (tli < curFileTLI)
+		if (!randAccess && tli < ThisTimeLineID)
 			break;				/* don't bother looking at too-old TLIs */
 
 		if (sources & XLOG_FROM_ARCHIVE)
@@ -10510,9 +10509,6 @@ retry:
 						close(readFile);
 						readFile = -1;
 					}
-					/* Reset curFileTLI if random fetch. */
-					if (randAccess)
-						curFileTLI = 0;
 
 					/*
 					 * Try to restore the file from archive, or read an
@@ -10573,7 +10569,7 @@ retry:
 					/* Don't try to read from a source that just failed */
 					sources &= ~failedSources;
 					readFile = XLogFileReadAnyTLI(readId, readSeg, DEBUG2,
-												  sources);
+												  sources, randAccess);
 					switched_segment = true;
 					if (readFile >= 0)
 						break;
@@ -10607,16 +10603,12 @@ retry:
 			{
 				int			sources;
 
-				/* Reset curFileTLI if random fetch. */
-				if (randAccess)
-					curFileTLI = 0;
-
 				sources = XLOG_FROM_PG_XLOG;
 				if (InArchiveRecovery)
 					sources |= XLOG_FROM_ARCHIVE;
 
 				readFile = XLogFileReadAnyTLI(readId, readSeg, emode,
-											  sources);
+											  sources, randAccess);
 				switched_segment = true;
 				if (readFile < 0)
 					return false;
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to