On 11/20/23 15:08, David Steele wrote:
On 11/20/23 14:27, Andres Freund wrote:
I've wondered whether it's worth also adding an explicit message
just after
ReachedEndOfBackup(), but it seems far less urgent due to the existing
"consistent recovery state reached at %X/%X" message.
I think the current message is sufficient, but what do you have in mind?
Well, the consistency message is emitted after every restart. Whereas
a single
instance only should go through backup recovery once. So it seems
worthwhile
to differentiate the two in log messages.
Ah, right. That works for me, then.
Any status on this patch? If we do back patch it would be nice to see
this in the upcoming minor releases. I'm in favor of a back patch, as I
think this is minimally invasive and would be very useful for debugging
recovery issues.
I like the phrasing you demonstrated in [1] but doesn't seem like
there's a new patch for that, so I have attached one.
Happy to do whatever else I can to get this across the line.
Regards,
-David
---
[1]
https://www.postgresql.org/message-id/20231120183633.c4lhoq4hld4u56dd%40awork3.anarazel.dediff --git a/src/backend/access/transam/xlogrecovery.c
b/src/backend/access/transam/xlogrecovery.c
index 1b48d7171a..c0918e6c75 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -603,6 +603,22 @@ InitWalRecovery(ControlFileData *ControlFile, bool
*wasShutdown_ptr,
if (StandbyModeRequested)
EnableStandbyMode();
+ /*
+ * Omitting backup_label when creating a new replica, PITR node
etc.
+ * unfortunately is a common cause of corruption. Logging that
+ * backup_label was used makes it a bit easier to exclude that
as the
+ * cause of observed corruption.
+ *
+ * Do so before we try to read the checkpoint record (which can
fail),
+ * as otherwise it can be hard to understand why a checkpoint
other
+ * than ControlFile->checkPoint is used.
+ */
+ ereport(LOG,
+ (errmsg("starting backup recovery with redo LSN
%X/%X, checkpoint LSN %X/%X, on timeline ID %u",
+ LSN_FORMAT_ARGS(RedoStartLSN),
+ LSN_FORMAT_ARGS(CheckPointLoc),
+ CheckPointTLI)));
+
/*
* When a backup_label file is present, we want to roll forward
from
* the checkpoint it identifies, rather than using pg_control.
@@ -742,6 +758,16 @@ InitWalRecovery(ControlFileData *ControlFile, bool
*wasShutdown_ptr,
EnableStandbyMode();
}
+ /*
+ * For the same reason as when starting up with backup_label
present,
+ * emit a log message when we continue initializing from a base
+ * backup.
+ */
+ if (ControlFile->backupStartPoint != InvalidXLogRecPtr)
+ ereport(LOG,
+ (errmsg("restarting backup recovery
with redo LSN %X/%X",
+
LSN_FORMAT_ARGS(ControlFile->backupStartPoint))));
+
/* Get the last valid checkpoint record. */
CheckPointLoc = ControlFile->checkPoint;
CheckPointTLI = ControlFile->checkPointCopy.ThisTimeLineID;
@@ -2155,6 +2181,8 @@ CheckRecoveryConsistency(void)
if (!XLogRecPtrIsInvalid(backupEndPoint) &&
backupEndPoint <= lastReplayedEndRecPtr)
{
+ XLogRecPtr oldBackupStartPoint = backupStartPoint;
+
elog(DEBUG1, "end of backup reached");
/*
@@ -2165,6 +2193,10 @@ CheckRecoveryConsistency(void)
backupStartPoint = InvalidXLogRecPtr;
backupEndPoint = InvalidXLogRecPtr;
backupEndRequired = false;
+
+ ereport(LOG,
+ (errmsg("completed backup recovery with redo
LSN %X/%X",
+
LSN_FORMAT_ARGS(oldBackupStartPoint))));
}
/*