Hi hackers,
I ran into this while working on recovery pre-check logic that relies on
pg_controldata to verify whether replay has reached a specific restore point.
Reproducer:
```
-- on primary:
CHECKPOINT;
SELECT pg_create_restore_point('test_rp');
-- recover with:
-- recovery_target_name = 'test_rp'
-- recovery_target_action = 'shutdown'
-- after recovery shuts down:
pg_controldata shows minRecoveryPoint 104 bytes behind
pg_create_restore_point's return value (104 bytes = one
RESTORE_POINT WAL record).
```
My RCA:
When recovery_target_action=shutdown triggers, the checkpointer performs a
shutdown restartpoint via CreateRestartPoint(). If a CHECKPOINT record was
replayed shortly before the recovery target, CreateRestartPoint advances
minRecoveryPoint to the end of that CHECKPOINT record.
However, any no-op records replayed after the CHECKPOINT (such as
RESTORE_POINT) do not dirty pages, so the lazy minRecoveryPoint update that
normally happens during page flushes never fires for them. As a result,
minRecoveryPoint in pg_control ends up behind the actual replay position.
My Fix:
The attached patch fixes this by reading the current replay position from
shared memory after advancing minRecoveryPoint to the checkpoint end, and
advancing further if replay has progressed past it. This is safe because
CheckPointGuts() has already flushed all dirty buffers and the startup process
has exited, so replayEndRecPtr is stable and all pages are on disk.
--
Adam
>From 8a6b070d860b8241a41057f021a621b7daa55f22 Mon Sep 17 00:00:00 2001
From: Adam Lee <[email protected]>
Date: Tue, 31 Mar 2026 18:43:53 +0800
Subject: [PATCH] Fix minRecoveryPoint not advanced past checkpoint in
CreateRestartPoint
When recovery_target_action=shutdown triggers, the checkpointer performs
a shutdown restartpoint via CreateRestartPoint. If a new CHECKPOINT
record was replayed shortly before the recovery target, the restartpoint
advances minRecoveryPoint to the end of that CHECKPOINT record. And the
following replay doesn't advance minRecoveryPoint, it's assumed that
flushing the buffers will do that as a side-effect.
But no-op records replayed after the CHECKPOINT (such as RESTORE_POINT) do
not dirty any pages, so the minRecoveryPoint is not updated as expected.
As a result, minRecoveryPoint in pg_control ends up behind the actual
replay position. This does not cause a recovery correctness issue,
however the inaccurate pg_controldata "Minimum recovery ending location"
prevents users or tools from using this value to verify that recovery
has reached a specific restore point.
Fix by reading the current replay position from shared memory after
advancing minRecoveryPoint to the checkpoint, and advancing it further
if replay has progressed past the checkpoint.
Reproducer:
CHECKPOINT; SELECT pg_create_restore_point('test_rp');
-- recover with recovery_target_name + recovery_target_action=shutdown
-- pg_controldata shows minRecoveryPoint 104 bytes behind
---
src/backend/access/transam/xlog.c | 29 ++++++++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 2c1c6f88b74..ec639054620 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7868,11 +7868,34 @@ CreateRestartPoint(int flags)
{
ControlFile->minRecoveryPoint =
lastCheckPointEndPtr;
ControlFile->minRecoveryPointTLI =
lastCheckPoint.ThisTimeLineID;
+ }
+
+ /*
+ * Also advance minRecoveryPoint past any WAL replayed
after
+ * the checkpoint. Normally this happens as a side
effect of
+ * flushing dirty buffers, but during a shutdown
restartpoint
+ * there may be records between the checkpoint and the
+ * recovery target that didn't dirty any buffers (e.g. a
+ * RESTORE_POINT record). Without this, a shutdown
triggered
+ * by recovery_target_action leaves minRecoveryPoint
behind
+ * the actual replay position.
+ */
+ {
+ XLogRecPtr replayPtr;
+ TimeLineID replayTLI;
- /* update local copy */
- LocalMinRecoveryPoint =
ControlFile->minRecoveryPoint;
- LocalMinRecoveryPointTLI =
ControlFile->minRecoveryPointTLI;
+ replayPtr = GetCurrentReplayRecPtr(&replayTLI);
+ if (ControlFile->minRecoveryPoint < replayPtr)
+ {
+ ControlFile->minRecoveryPoint =
replayPtr;
+ ControlFile->minRecoveryPointTLI =
replayTLI;
+ }
}
+
+ /* update local copy */
+ LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
+ LocalMinRecoveryPointTLI =
ControlFile->minRecoveryPointTLI;
+
if (flags & CHECKPOINT_IS_SHUTDOWN)
ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
}
--
2.47.3