At Mon, 07 Feb 2022 10:16:34 +0900 (JST), Kyotaro Horiguchi 
<horikyota....@gmail.com> wrote in 
> At Fri, 4 Feb 2022 10:59:04 +0900, Fujii Masao <masao.fu...@oss.nttdata.com> 
> wrote in 
> > On 2022/02/03 15:50, Kyotaro Horiguchi wrote:
> > > By the way, restart point should start only while recoverying, and at
> > > the timeof the start both checkpoint.redo and checkpoint LSN are
> > > already past. We shouldn't update minRecovery point after promotion,
> > > but is there any reason for not updating the checkPoint and
> > > checkPointCopy?  If we update them after promotion, the
> > > which-LSN-to-show problem would be gone.
> > 
> > I tried to find the reason by reading the past discussion, but have
> > not found that yet.
> > 
> > If we update checkpoint and REDO LSN at pg_control in that case, we
> > also need to update min recovery point at pg_control? Otherwise the
> > min recovery point at pg_control still indicates the old LSN that
> > previous restart point set.
> 
> I had an assuption that the reason I think it shouldn't update
> minRecoveryPoint is that it has been or is going to be reset to
> invalid LSN by promotion and the checkpoint should refrain from
> touching it.

Hmm.. It doesn't seem to be the case. If a server crashes just after
promotion and before requesting post-promtion checkpoint,
minRecoveryPoint stays at a valid LSN.

(Promoted at 0/7000028)
Database cluster state:               in production
Latest checkpoint location:           0/6000060
Latest checkpoint's REDO location:    0/6000028
Latest checkpoint's REDO WAL file:    000000010000000000000006
Minimum recovery ending location:     0/7000090
Min recovery ending loc's timeline:   2

minRecoveryPoint/TLI are ignored in any case where a server in
in-production state is started.  In other words, the values are
useless.  There's no clear or written reason for unrecording the last
ongoing restartpoint after promotion.

Before fast-promotion was introduced, we shouldn't get there after
end-of-recovery checkpoint (but somehow reached sometimes?) but it is
quite normal nowadays.  Or to the contrary, we're expecting it to
happen and it is regarded as a normal checkponit. So we should do
there nowadays are as the follows.

- If any later checkpoint/restartpoint has been established, just skip
  remaining task then return false. (!chkpt_was_latest)
  (I'm not sure this can happen, though.)

- we update control file only when archive recovery is still ongoing.

- Otherwise reset minRecoveryPoint then continue.

Do you have any thoughts or opinions?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 958220c495..ab8a4d9a1b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9658,6 +9658,7 @@ CreateRestartPoint(int flags)
        XLogRecPtr      endptr;
        XLogSegNo       _logSegNo;
        TimestampTz xtime;
+       bool            chkpt_was_latest = false;
 
        /* Get a local copy of the last safe checkpoint record. */
        SpinLockAcquire(&XLogCtl->info_lck);
@@ -9752,44 +9753,69 @@ CreateRestartPoint(int flags)
        PriorRedoPtr = ControlFile->checkPointCopy.redo;
 
        /*
-        * Update pg_control, using current time.  Check that it still shows
-        * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do 
nothing;
-        * this is a quick hack to make sure nothing really bad happens if 
somehow
-        * we get here after the end-of-recovery checkpoint.
+        * Update pg_control, using current time if no later checkpoints have 
been
+        * performed.
         */
        LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
-       if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
-               ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
+       if (ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
        {
+               chkpt_was_latest = true;
                ControlFile->checkPoint = lastCheckPointRecPtr;
                ControlFile->checkPointCopy = lastCheckPoint;
 
                /*
-                * Ensure minRecoveryPoint is past the checkpoint record.  
Normally,
-                * this will have happened already while writing out dirty 
buffers,
-                * but not necessarily - e.g. because no buffers were dirtied.  
We do
-                * this because a non-exclusive base backup uses 
minRecoveryPoint to
-                * determine which WAL files must be included in the backup, 
and the
-                * file (or files) containing the checkpoint record must be 
included,
-                * at a minimum. Note that for an ordinary restart of recovery 
there's
-                * no value in having the minimum recovery point any earlier 
than this
+                * Ensure minRecoveryPoint is past the checkpoint record while 
archive
+                * recovery is still ongoing.  Normally, this will have happened
+                * already while writing out dirty buffers, but not necessarily 
-
+                * e.g. because no buffers were dirtied.  We do this because a
+                * non-exclusive base backup uses minRecoveryPoint to determine 
which
+                * WAL files must be included in the backup, and the file (or 
files)
+                * containing the checkpoint record must be included, at a
+                * minimum. Note that for an ordinary restart of recovery 
there's no
+                * value in having the minimum recovery point any earlier than 
this
                 * anyway, because redo will begin just after the checkpoint 
record.
                 */
-               if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
+               if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY)
                {
-                       ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
-                       ControlFile->minRecoveryPointTLI = 
lastCheckPoint.ThisTimeLineID;
+                       if (ControlFile->minRecoveryPoint < 
lastCheckPointEndPtr)
+                       {
+                               ControlFile->minRecoveryPoint = 
lastCheckPointEndPtr;
+                               ControlFile->minRecoveryPointTLI = 
lastCheckPoint.ThisTimeLineID;
 
-                       /* update local copy */
-                       minRecoveryPoint = ControlFile->minRecoveryPoint;
-                       minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+                               /* update local copy */
+                               minRecoveryPoint = 
ControlFile->minRecoveryPoint;
+                               minRecoveryPointTLI = 
ControlFile->minRecoveryPointTLI;
+                       }
+                       if (flags & CHECKPOINT_IS_SHUTDOWN)
+                               ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
+               }
+               else
+               {
+                       /*
+                        * We have exited from archive-recovery mode after 
starting. Crash
+                        * recovery ever after should always recover to the end 
of WAL
+                        */
+                       ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
+                       ControlFile->minRecoveryPointTLI = 0;
                }
-               if (flags & CHECKPOINT_IS_SHUTDOWN)
-                       ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
                UpdateControlFile();
        }
        LWLockRelease(ControlFileLock);
 
+       /*
+        * Skip all post-checkpoint work if others have done that with later
+        * checkpoints.
+        */
+       if (!chkpt_was_latest)
+       {
+               ereport((log_checkpoints ? LOG : DEBUG2),
+                               (errmsg("post-restartpoint cleanup is skpped at 
%X/%X, because later restartpoints have been already performed",
+                                               
LSN_FORMAT_ARGS(lastCheckPoint.redo))));
+
+               /* this checkpoint has not bee established */
+               return false;
+       }
+
        /*
         * Update the average distance between checkpoints/restartpoints if the
         * prior checkpoint exists.

Reply via email to