On 05.12.2012 14:32, Amit Kapila wrote:
On Tuesday, December 04, 2012 10:01 PM Heikki Linnakangas wrote:
After some diversions to fix bugs and refactor existing code, I've
committed a couple of small parts of this patch, which just add some
sanity checks to notice incorrect PITR scenarios. Here's a new version
of the main patch based on current HEAD.

After testing with the new patch, the following problems are observed.

Defect - 1:

     1. start primary A
     2. start standby B following A
     3. start cascade standby C following B.
     4. start another standby D following C.
     5. Promote standby B.
     6. After successful time line switch in cascade standby C&  D, stop D.
     7. Restart D, Startup is successful and connecting to standby C.
     8. Stop C.
     9. Restart C, startup is failing.

Ok, the error I get in that scenario is:

C 2012-12-05 19:55:43.840 EET 9283 FATAL: requested timeline 2 does not contain minimum recovery point 0/3023F08 on timeline 1 C 2012-12-05 19:55:43.841 EET 9282 LOG: startup process (PID 9283) exited with exit code 1 C 2012-12-05 19:55:43.841 EET 9282 LOG: aborting startup due to startup process failure

It seems that the commits I made to master already:

http://archives.postgresql.org/pgsql-committers/2012-12/msg00116.php
http://archives.postgresql.org/pgsql-committers/2012-12/msg00111.php

were a few bricks shy of a load. The problem is that if recovery stops at a checkpoint record that changes timeline, so that minRecoveryPoint points to the end of the checkpoint record, we still record the old TLI as the TLI of minRecoveryPoint. This is because 1) there's a silly bug in the patch; replayEndTLI is not updated along with replayEndRecPtr. But even after fixing that, we're not good.

The problem is that replayEndRecPtr is currently updated *before* calling the redo function. replayEndRecPtr is what becomes minRecoveryPoint when XLogFlush() is called. If the record is a checkpoint record, redoing it will switch recovery to the new timeline, but replayEndTLI will not be updated until the next record.

IOW, as far as minRecoveryPoint is concerned, a checkpoint record that switches timeline is considered to be part of the old timeline. But when a server is promoted and a new timeline is created, the checkpoint record is considered to be part of the new timeline; that's what we write in the page header and in the control file.

That mismatch causes the error. I'd like to fix this by always treating the checkpoint record to be part of the new timeline. That feels more correct. The most straightforward way to implement that would be to peek at the xlog record before updating replayEndRecPtr and replayEndTLI. If it's a checkpoint record that changes TLI, set replayEndTLI to the new timeline before calling the redo-function. But it's a bit of a modularity violation to peek into the record like that.

Or we could just revert the sanity check at beginning of recovery that throws the "requested timeline 2 does not contain minimum recovery point 0/3023F08 on timeline 1" error. The error I added to redo of checkpoint record that says "unexpected timeline ID %u in checkpoint record, before reaching minimum recovery point %X/%X on timeline %u" checks basically the same thing, but at a later stage. However, the way minRecoveryPointTLI is updated still seems wrong to me, so I'd like to fix that.

I'm thinking of something like the attached (with some more comments before committing). Thoughts?

- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 702ea7c..bdae7a4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5822,6 +5822,7 @@ StartupXLOG(void)
 			 */
 			do
 			{
+				TimeLineID EndTLI;
 #ifdef WAL_DEBUG
 				if (XLOG_DEBUG ||
 				 (rmid == RM_XACT_ID && trace_recovery_messages <= DEBUG2) ||
@@ -5895,8 +5896,20 @@ StartupXLOG(void)
 				 * Update shared replayEndRecPtr before replaying this record,
 				 * so that XLogFlush will update minRecoveryPoint correctly.
 				 */
+				if (record->xl_rmid == RM_XLOG_ID &&
+					(record->xl_info & ~XLR_INFO_MASK) == XLOG_CHECKPOINT_SHUTDOWN)
+				{
+					CheckPoint	checkPoint;
+
+					memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
+					EndTLI = checkPoint.ThisTimeLineID;
+				}
+				else
+					EndTLI = ThisTimeLineID;
+
 				SpinLockAcquire(&xlogctl->info_lck);
 				xlogctl->replayEndRecPtr = EndRecPtr;
+				xlogctl->replayEndTLI = EndTLI;
 				recoveryPause = xlogctl->recoveryPause;
 				SpinLockRelease(&xlogctl->info_lck);
 
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to