At Tue, 2 Mar 2021 17:56:03 -0800, Soumyadeep Chakraborty 
<soumyadeep2...@gmail.com> wrote in 
> Hello hackers,
> 
> We came across an issue where the checkpointer writes to the older
> timeline while a promotion is ongoing after reaching the recovery point
> in a PITR, when there are prepared transactions before the recovery
> point. We came across this issue first in REL_12_STABLE and saw that it
> also exists in devel.

Good Catch!  I can reproduce that.

> When there are prepared transactions in an older timeline, in the
> checkpointer, a call to CheckPointTwoPhase() and subsequently to
> XlogReadTwoPhaseData() and subsequently to read_local_xlog_page() leads
> to the following line:
> 
> read_upto = GetXLogReplayRecPtr(&ThisTimeLineID);
> 
> GetXLogReplayRecPtr() will change ThisTimeLineID to 1, in order to read
> the two phase WAL records in the older timeline. This variable will
> remain unchanged and the checkpointer ends up writing the checkpoint
> record into the older WAL segment (when XLogBeginInsert() is called
> within CreateCheckPoint(), the value is still 1). The value is not
> synchronized as even if RecoveryInProgress() is called,
> xlogctl->SharedRecoveryState is not RECOVERY_STATE_DONE
> (SharedRecoveryInProgress = true in older versions) as the startup
> process waits for the checkpointer inside RequestCheckpoint() (since
> recovery_target_action='promote' involves a non-fast promotion). Thus,
> InitXLOGAccess() is not called and the value of ThisTimeLineID is not
> updated before the checkpoint record write.
> 
> Since 1148e22a82e, GetXLogReplayRecPtr() is called with ThisTimeLineID
> instead of a local variable, within read_local_xlog_page().
> 
> PFA a small patch that fixes the problem by explicitly calling
> InitXLOGAccess() in CheckPointTwoPhase(), after the two phase state data
> is read, in order to update ThisTimeLineID to the latest timeline. It is
> okay to call InitXLOGAccess() as it is lightweight and would mostly be
> a no-op.

It is correct that read_local_xlog_page() changes ThisTimeLineID, but
InitXLOGAccess() is correctly called in CreateCheckPoint:

|       /*
|        * An end-of-recovery checkpoint is created before anyone is allowed to
|        * write WAL. To allow us to write the checkpoint record, temporarily
|        * enable XLogInsertAllowed.  (This also ensures ThisTimeLineID is
|        * initialized, which we need here and in AdvanceXLInsertBuffer.)
|        */
|       if (flags & CHECKPOINT_END_OF_RECOVERY)
|               LocalSetXLogInsertAllowed();

It seems to e suficcient to recover ThisTimeLineID from the checkpoint
record to be written, as attached?

regareds.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 377afb8732..06d3ce676e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9088,6 +9088,13 @@ CreateCheckPoint(int flags)
 	if (!shutdown && XLogStandbyInfoActive())
 		LogStandbySnapshot();
 
+	/*
+	 * ThisTimeLineID may have moved so far so we need to restore it so that
+	 * the checkpoint record goes to the right timeline.  (Currently
+	 * CheckPointTwoPhase() is known to have a chance to change it.)
+	 */
+	ThisTimeLineID = checkPoint.ThisTimeLineID;
+
 	START_CRIT_SECTION();
 
 	/*

Reply via email to