Hi hackers, During replication, when a new timeline is detected, PostgreSQL creates a new zero-filled WAL segment on the new timeline instead of copying the partial segment from the previous timeline. This diverges from the behavior during timeline switches at startup. This discrepancy can cause problems — especially under slow replication. Consider the following scenario:
last record in TLI | | timeline switch point
v v
|-----TLI N---------------|0000000000000000000
|
|-----TLI N+1--00000000000|0000000000000000000
If a standby is promoted before the WAL segment containing the last record
of the previous timeline has been fully copied to the new timeline, startup
may fail. We have observed this in production, where recovery aborts with
"PANIC: invalid magic number 0000 in WAL segment ..."
I’ve attached:
* a patch and a TAP test that reproduce the issue;
* a draft patch that, on timeline switch during recovery, copies the
remainder of the current WAL segment from the old timeline — matching the
behavior used after crash recovery at startup.
All existing regression tests pass with the patch applied, but I plan to
add more targeted test cases.
I’d appreciate your feedback. In particular:
* Is this behavior (not copying the segment during replication) intentional?
* Are there edge cases I might be overlooking?
---
Best wishes,
Alena Vinter
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index ac802ae85b4..8bc48b77430 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -463,6 +463,10 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
WalRcvComputeNextWakeup(WALRCV_WAKEUP_PING, now);
XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1,
startpointTLI);
+ for (;startpointTLI == 2;)
+ {
+ ProcessInterrupts();
+ }
}
else if (len == 0)
break;
recovery_tli_switch_test.pl
Description: Perl program
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 430a38b1a21..2cc37ef0c5b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5262,7 +5262,7 @@ str_time(pg_time_t tnow, char *buf, size_t bufsize)
/*
* Initialize the first WAL segment on new timeline.
*/
-static void
+void
XLogInitNewTimeline(TimeLineID endTLI, XLogRecPtr endOfLog, TimeLineID newTLI)
{
char xlogfname[MAXFNAMELEN];
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 38b594d2170..33da9d68e5c 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -4235,6 +4235,10 @@ rescanLatestTimeLine(TimeLineID replayTLI, XLogRecPtr replayLSN)
list_free_deep(expectedTLEs);
expectedTLEs = newExpectedTLEs;
+ SetInstallXLogFileSegmentActive();
+ XLogInitNewTimeline(oldtarget, replayLSN, newtarget);
+ ResetInstallXLogFileSegmentActive();
+
/*
* As in StartupXLOG(), try to ensure we have all the history files
* between the old target and new target in pg_wal.
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 605280ed8fb..87cd59d74b9 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -210,6 +210,9 @@ extern bool XLogNeedsFlush(XLogRecPtr record);
extern int XLogFileInit(XLogSegNo logsegno, TimeLineID logtli);
extern int XLogFileOpen(XLogSegNo segno, TimeLineID tli);
+extern void XLogInitNewTimeline(TimeLineID endTLI, XLogRecPtr endOfLog,
+ TimeLineID newTLI);
+
extern void CheckXLogRemoved(XLogSegNo segno, TimeLineID tli);
extern XLogSegNo XLogGetLastRemovedSegno(void);
extern XLogSegNo XLogGetOldestSegno(TimeLineID tli);
