Re: [HACKERS] Switching timeline over streaming replication

Heikki Linnakangas Wed, 05 Dec 2012 11:23:17 -0800

On 05.12.2012 14:32, Amit Kapila wrote:

On Tuesday, December 04, 2012 10:01 PM Heikki Linnakangas wrote:

After some diversions to fix bugs and refactor existing code, I've
committed a couple of small parts of this patch, which just add some
sanity checks to notice incorrect PITR scenarios. Here's a new version
of the main patch based on current HEAD.


After testing with the new patch, the following problems are observed.

Defect - 1:

     1. start primary A
     2. start standby B following A
     3. start cascade standby C following B.
     4. start another standby D following C.
     5. Promote standby B.
     6. After successful time line switch in cascade standby C&  D, stop D.
     7. Restart D, Startup is successful and connecting to standby C.
     8. Stop C.
     9. Restart C, startup is failing.


Ok, the error I get in that scenario is:

C 2012-12-05 19:55:43.840 EET 9283 FATAL: requested timeline 2 does notcontain minimum recovery point 0/3023F08 on timeline 1C 2012-12-05 19:55:43.841 EET 9282 LOG: startup process (PID 9283)exited with exit code 1C 2012-12-05 19:55:43.841 EET 9282 LOG: aborting startup due to startupprocess failure


It seems that the commits I made to master already:

http://archives.postgresql.org/pgsql-committers/2012-12/msg00116.php
http://archives.postgresql.org/pgsql-committers/2012-12/msg00111.php

were a few bricks shy of a load. The problem is that if recovery stopsat a checkpoint record that changes timeline, so that minRecoveryPointpoints to the end of the checkpoint record, we still record the old TLIas the TLI of minRecoveryPoint. This is because 1) there's a silly bugin the patch; replayEndTLI is not updated along with replayEndRecPtr.But even after fixing that, we're not good.

The problem is that replayEndRecPtr is currently updated *before*calling the redo function. replayEndRecPtr is what becomesminRecoveryPoint when XLogFlush() is called. If the record is acheckpoint record, redoing it will switch recovery to the new timeline,but replayEndTLI will not be updated until the next record.

IOW, as far as minRecoveryPoint is concerned, a checkpoint record thatswitches timeline is considered to be part of the old timeline. But whena server is promoted and a new timeline is created, the checkpointrecord is considered to be part of the new timeline; that's what wewrite in the page header and in the control file.

That mismatch causes the error. I'd like to fix this by always treatingthe checkpoint record to be part of the new timeline. That feels morecorrect. The most straightforward way to implement that would be to peekat the xlog record before updating replayEndRecPtr and replayEndTLI. Ifit's a checkpoint record that changes TLI, set replayEndTLI to the newtimeline before calling the redo-function. But it's a bit of amodularity violation to peek into the record like that.

Or we could just revert the sanity check at beginning of recovery thatthrows the "requested timeline 2 does not contain minimum recovery point0/3023F08 on timeline 1" error. The error I added to redo of checkpointrecord that says "unexpected timeline ID %u in checkpoint record, beforereaching minimum recovery point %X/%X on timeline %u" checks basicallythe same thing, but at a later stage. However, the wayminRecoveryPointTLI is updated still seems wrong to me, so I'd like tofix that.

I'm thinking of something like the attached (with some more commentsbefore committing). Thoughts?


- Heikki

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 702ea7c..bdae7a4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5822,6 +5822,7 @@ StartupXLOG(void)
 			 */
 			do
 			{
+				TimeLineID EndTLI;
 #ifdef WAL_DEBUG
 				if (XLOG_DEBUG ||
 				 (rmid == RM_XACT_ID && trace_recovery_messages <= DEBUG2) ||
@@ -5895,8 +5896,20 @@ StartupXLOG(void)
 				 * Update shared replayEndRecPtr before replaying this record,
 				 * so that XLogFlush will update minRecoveryPoint correctly.
 				 */
+				if (record->xl_rmid == RM_XLOG_ID &&
+					(record->xl_info & ~XLR_INFO_MASK) == XLOG_CHECKPOINT_SHUTDOWN)
+				{
+					CheckPoint	checkPoint;
+
+					memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
+					EndTLI = checkPoint.ThisTimeLineID;
+				}
+				else
+					EndTLI = ThisTimeLineID;
+
 				SpinLockAcquire(&xlogctl->info_lck);
 				xlogctl->replayEndRecPtr = EndRecPtr;
+				xlogctl->replayEndTLI = EndTLI;
 				recoveryPause = xlogctl->recoveryPause;
 				SpinLockRelease(&xlogctl->info_lck);

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Switching timeline over streaming replication

Reply via email to