On 25.10.2010 19:04, Jeff Davis wrote:
On Mon, 2010-10-25 at 14:44 +0300, Heikki Linnakangas wrote:
It seems we should use ReadRecord instead of the lower-level
XLogPageRead function. One difference is that ReadRecord performs a
bunch of sanity checks on the record, while XLogPageRead just reads the
raw page. Extra sanity checking before removing backup_label seems like
a good idea. Another difference is that in standby-mode, ReadRecord will
retry until it succeeds. A standby server should keep retrying, even the
very first record, until it succeeds, otherwise we have a change in
behavior.

The reason I didn't use ReadRecord is because it sets a global variable
to point to the next location in the log, so that subsequent calls can
just pass NULL for the location.

True. XLogPageRead is new in 9.0, however. We'll have to use ReadRecord or invent something new for back-branches anyway.

It looks like the patch leaves the global variable pointing just after
the redo location rather than the checkpoint. I haven't tested your
patch yet, but it looks like some of the following code depends on
ReadRecord(NULL,...) fetching the record right after the checkpoint
record; so I think something else is required if you want to use
ReadRecord.

Hmm, the next call to ReadRecord is this:

                /*
                 * Find the first record that logically follows the checkpoint 
--- it
                 * might physically precede it, though.
                 */
                if (XLByteLT(checkPoint.redo, RecPtr))
                {
                        /* back up to find the record */
                        record = ReadRecord(&(checkPoint.redo), PANIC, false);
                }
                else
                {
                        /* just have to read next record after CheckPoint */
                        record = ReadRecord(NULL, LOG, false);
                }

In the first case, the location is given explicitly. In the second case, the redo pointer equals the checkpoint record, so the current position is correct even with the patch. It makes me slightly nervous, though. It's correct today, but if someone adds code between the backup_label check and this that assumes that the current position is the checkpoint record, it'll fail. Then again, any new ReadRecord call in such added code would also break the assumption in the above block that the current position is the checkpoint record.

In the case that the redo pointer is the same as the checkpoint record, we don't need to re-fetch the checkpoint record. I've added a test for that in the attached patch.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 22fd578..6f1fedd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5839,14 +5839,29 @@ StartupXLOG(void)
 		record = ReadCheckpointRecord(checkPointLoc, 0);
 		if (record != NULL)
 		{
+			memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
 			ereport(DEBUG1,
 					(errmsg("checkpoint record is at %X/%X",
 							checkPointLoc.xlogid, checkPointLoc.xrecoff)));
 			InRecovery = true;	/* force recovery even if SHUTDOWNED */
+
+			/*
+			 * Make sure that REDO location exists. This may not be
+			 * the case if there was a crash during an online backup,
+			 * which left a backup_label around that references a WAL
+			 * segment that's already been archived.
+			 */
+			if (XLByteLT(checkPoint.redo, checkPointLoc))
+			{
+				if (!ReadRecord(&(checkPoint.redo), LOG, false))
+					ereport(FATAL,
+							(errmsg("could not find redo location referenced by checkpoint record"),
+							 errhint("If you are not restoring from a backup, try removing the file \"%s/backup_label\".", DataDir)));
+			}
 		}
 		else
 		{
-			ereport(PANIC,
+			ereport(FATAL,
 					(errmsg("could not locate required checkpoint record"),
 					 errhint("If you are not restoring from a backup, try removing the file \"%s/backup_label\".", DataDir)));
 		}
@@ -5892,10 +5907,10 @@ StartupXLOG(void)
 				ereport(PANIC,
 					 (errmsg("could not locate a valid checkpoint record")));
 		}
+		memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
 	}
 
 	LastRec = RecPtr = checkPointLoc;
-	memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
 	wasShutdown = (record->xl_info == XLOG_CHECKPOINT_SHUTDOWN);
 
 	ereport(DEBUG1,
-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply via email to