I like the idea of preventing promotion to avoid such failures -- it sounds
reasonable.

However, we still have the problem: if the standby is stopped with
non-replicated TLI 2, it will fail to start:
"FATAL: according to history file, WAL location Y belongs to timeline X,
but previous recovered WAL file came from timeline X+1".
This happens even if no promotion is attempted — just a plain restart of
the standby. So the issue isn’t only about when to allow promotion.

Regarding my proposed solution: could you clarify why it isn’t correct? I’d
appreciate more detail so I can address your concerns.

---
Alena Vinter
2025-12-25 15:44:13.010 +07 postmaster[474660] LOG:  listening on Unix socket "/tmp/QPQwr4NLnl/.s.PGSQL.28826"
2025-12-25 15:44:13.017 +07 startup[474666] LOG:  database system was interrupted; last known up at 2025-12-25 15:44:10 +07
2025-12-25 15:44:14.596 +07 startup[474666] LOG:  starting backup recovery with redo LSN 0/02000028, checkpoint LSN 0/02000080, on timeline ID 1
2025-12-25 15:44:14.597 +07 startup[474666] LOG:  entering standby mode
2025-12-25 15:44:14.603 +07 startup[474666] LOG:  redo starts at 0/02000028 on TLI 1
2025-12-25 15:44:14.605 +07 startup[474666] LOG:  completed backup recovery with redo LSN 0/02000028 and end LSN 0/02000120
2025-12-25 15:44:14.605 +07 startup[474666] LOG:  consistent recovery state reached at 0/02000120
2025-12-25 15:44:14.605 +07 postmaster[474660] LOG:  database system is ready to accept read-only connections
2025-12-25 15:44:14.612 +07 walreceiver[474690] LOG:  fetching timeline history file for timeline 2 from primary server
2025-12-25 15:44:14.617 +07 walreceiver[474690] LOG:  started streaming WAL from primary at 0/03000000 on timeline 1
2025-12-25 15:44:14.639 +07 walreceiver[474690] LOG:  replication terminated by primary server
2025-12-25 15:44:14.639 +07 walreceiver[474690] DETAIL:  End of WAL reached on timeline 1 at 0/030B20E8.
2025-12-25 15:44:14.667 +07 startup[474666] LOG:  new target timeline is 2
2025-12-25 15:44:14.667 +07 startup[474666] LOG:  invalid record length at 0/030B20E8: expected at least 24, got 0
2025-12-25 15:44:14.667 +07 walreceiver[474690] LOG:  restarted WAL streaming at 0/03000000 on timeline 2
2025-12-25 15:44:19.698 +07 postmaster[474660] LOG:  received fast shutdown request
2025-12-25 15:44:19.704 +07 postmaster[474660] LOG:  aborting any active transactions
2025-12-25 15:44:19.704 +07 walreceiver[474690] FATAL:  terminating walreceiver process due to administrator command
2025-12-25 15:44:19.710 +07 checkpointer[474664] LOG:  shutting down
2025-12-25 15:44:19.738 +07 postmaster[474660] LOG:  database system is shut down
2025-12-25 15:44:19.839 +07 postmaster[474716] LOG:  starting PostgreSQL 19devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 15.2.1 20251111 (Red Hat 15.2.1-4), 64-bit
2025-12-25 15:44:19.839 +07 postmaster[474716] LOG:  listening on Unix socket "/tmp/QPQwr4NLnl/.s.PGSQL.28826"
2025-12-25 15:44:19.845 +07 startup[474722] LOG:  database system was shut down in recovery at 2025-12-25 15:44:19 +07
2025-12-25 15:44:19.845 +07 startup[474722] LOG:  entering standby mode
2025-12-25 15:44:19.848 +07 startup[474722] LOG:  redo starts at 0/02000028 on TLI 1
2025-12-25 15:44:19.850 +07 startup[474722] LOG:  invalid magic number 0000 in WAL segment 000000020000000000000003, LSN 0/03020000, offset 131072
2025-12-25 15:44:19.850 +07 startup[474722] FATAL:  according to history file, WAL location 0/0301FFD0 belongs to timeline 1, but previous recovered WAL file came from timeline 2
2025-12-25 15:44:19.855 +07 postmaster[474716] LOG:  startup process (PID 474722) exited with exit code 1
2025-12-25 15:44:19.855 +07 postmaster[474716] LOG:  terminating any other active server processes
2025-12-25 15:44:19.855 +07 postmaster[474716] LOG:  shutting down due to startup process failure
2025-12-25 15:44:19.856 +07 postmaster[474716] LOG:  database system is shut down

Attachment: recovery_tli_switch_test_without_standby_promotion.pl
Description: Perl program

Reply via email to