Re: pgsql: Prevent invalidation of newly synced replication slots.

Robert Haas Tue, 27 Jan 2026 09:43:32 -0800

On Tue, Jan 27, 2026 at 11:37 AM Andres Freund <[email protected]> wrote:
> > But I wonder if you can't rename("old", "new") where "new" is a file that
> > has already been unlinked (or renamed over) that someone still holds open,
> > or something like that...
>
> I don't see a source of that that would be specific to this test though :(. We
> do wait for pg_basebackup to have shut down, which wrote backup.label (which
> was "manifactured" during streaming by basebackup.c).
>
> Perhaps we should crank up log level in the test? No idea if it'll help, but
> right now I don't even know where to start looking.


I tried sticking a pg_sleep(30) in just before starting the standby
node, and that didn't help, so it doesn't seem like it's a race
condition.

Here's what the standby log file looks like with log_min_messages=DEBUG2:

2026-01-27 17:19:25.262 GMT postmaster[4932] DEBUG:  registering
background worker "logical replication launcher"
2026-01-27 17:19:25.264 GMT postmaster[4932] DEBUG:  dynamic shared
memory system will support 229 segments
2026-01-27 17:19:25.264 GMT postmaster[4932] DEBUG:  created dynamic
shared memory control segment 3769552926 (9176 bytes)
2026-01-27 17:19:25.266 GMT postmaster[4932] DEBUG:  max_safe_fds =
990, usable_fds = 1000, already_open = 3
2026-01-27 17:19:25.268 GMT postmaster[4932] LOG:  starting PostgreSQL
19devel on x86_64-windows, compiled by msvc-19.29.30159, 64-bit
2026-01-27 17:19:25.271 GMT postmaster[4932] LOG:  listening on Unix
socket "C:/Windows/TEMP/3xesO1s4ba/.s.PGSQL.17575"
2026-01-27 17:19:25.273 GMT postmaster[4932] DEBUG:  updating PMState
from PM_INIT to PM_STARTUP
2026-01-27 17:19:25.273 GMT postmaster[4932] DEBUG:  assigned pm child
slot 57 for io worker
2026-01-27 17:19:25.275 GMT postmaster[4932] DEBUG:  assigned pm child
slot 58 for io worker
2026-01-27 17:19:25.277 GMT postmaster[4932] DEBUG:  assigned pm child
slot 59 for io worker
2026-01-27 17:19:25.278 GMT postmaster[4932] DEBUG:  assigned pm child
slot 56 for checkpointer
2026-01-27 17:19:25.280 GMT postmaster[4932] DEBUG:  assigned pm child
slot 55 for background writer
2026-01-27 17:19:25.281 GMT postmaster[4932] DEBUG:  assigned pm child
slot 89 for startup
2026-01-27 17:19:25.308 GMT checkpointer[6560] DEBUG:  checkpointer
updated shared memory configuration values
2026-01-27 17:19:25.314 GMT startup[2488] LOG:  database system was
interrupted; last known up at 2026-01-27 17:19:21 GMT
2026-01-27 17:19:25.317 GMT startup[2488] DEBUG:  removing all
temporary WAL segments
The system cannot find the file specified.
2026-01-27 17:19:25.336 GMT startup[2488] DEBUG:  could not restore
file "00000002.history" from archive: child process exited with exit
code 1
2026-01-27 17:19:25.337 GMT startup[2488] DEBUG:  backup time
2026-01-27 17:19:21 GMT in file "backup_label"
2026-01-27 17:19:25.337 GMT startup[2488] DEBUG:  backup label
pg_basebackup base backup in file "backup_label"
2026-01-27 17:19:25.337 GMT startup[2488] DEBUG:  backup timeline 1 in
file "backup_label"
2026-01-27 17:19:25.337 GMT startup[2488] LOG:  starting backup
recovery with redo LSN 0/2A000028, checkpoint LSN 0/2A000080, on
timeline ID 1
The system cannot find the file specified.
2026-01-27 17:19:25.352 GMT startup[2488] DEBUG:  could not restore
file "00000001000000000000002A" from archive: child process exited
with exit code 1
2026-01-27 17:19:25.353 GMT startup[2488] DEBUG:  checkpoint record is
at 0/2A000080
2026-01-27 17:19:25.353 GMT startup[2488] LOG:  entering standby mode
2026-01-27 17:19:25.353 GMT startup[2488] DEBUG:  redo record is at
0/2A000028; shutdown false
2026-01-27 17:19:25.353 GMT startup[2488] DEBUG:  next transaction ID:
769; next OID: 24576
2026-01-27 17:19:25.353 GMT startup[2488] DEBUG:  next MultiXactId: 1;
next MultiXactOffset: 1
2026-01-27 17:19:25.353 GMT startup[2488] DEBUG:  oldest unfrozen
transaction ID: 760, in database 1
2026-01-27 17:19:25.353 GMT startup[2488] DEBUG:  oldest MultiXactId:
1, in database 1
2026-01-27 17:19:25.353 GMT startup[2488] DEBUG:  commit timestamp Xid
oldest/newest: 0/0
2026-01-27 17:19:25.353 GMT startup[2488] DEBUG:  transaction ID wrap
limit is 2147484407, limited by database with OID 1
2026-01-27 17:19:25.353 GMT startup[2488] DEBUG:  MultiXactId wrap
limit is 2147483648, limited by database with OID 1
2026-01-27 17:19:25.354 GMT startup[2488] DEBUG:  starting up replication slots
2026-01-27 17:19:25.354 GMT startup[2488] DEBUG:  xmin required by
slots: data 0, catalog 0
2026-01-27 17:19:25.354 GMT startup[2488] DEBUG:  starting up
replication origin progress state
2026-01-27 17:19:25.354 GMT startup[2488] DEBUG:  didn't need to
unlink permanent stats file "pg_stat/pgstat.stat" - didn't exist
2026-01-27 17:19:38.938 GMT startup[2488] FATAL:  could not rename
file "backup_label" to "backup_label.old": Permission denied
2026-01-27 17:19:38.983 GMT postmaster[4932] DEBUG:  releasing pm child slot 89
2026-01-27 17:19:38.983 GMT postmaster[4932] LOG:  startup process
(PID 2488) exited with exit code 1
2026-01-27 17:19:38.983 GMT postmaster[4932] LOG:  aborting startup
due to startup process failure
2026-01-27 17:19:38.983 GMT postmaster[4932] DEBUG:  cleaning up
dynamic shared memory control segment with ID 3769552926
2026-01-27 17:19:38.985 GMT postmaster[4932] LOG:  database system is shut down

Unfortunately, I don't see any clues there. The "The system cannot
find the file specified." messages look like they might be a clue, but
I think they are not, because they also occur in
040_standby_failover_slots_sync_standby1.log, and that test passes. At
the point where this log file shows the FATAL error, that log file
continues thus:

2026-01-27 17:18:36.905 GMT startup[1420] DEBUG:  resetting unlogged
relations: cleanup 1 init 0
2026-01-27 17:18:36.906 GMT startup[1420] DEBUG:  initializing for hot standby
2026-01-27 17:18:36.906 GMT startup[1420] LOG:  redo starts at 0/02000028
2026-01-27 17:18:36.906 GMT startup[1420] DEBUG:  recovery snapshots
are now enabled
2026-01-27 17:18:36.906 GMT startup[1420] CONTEXT:  WAL redo at
0/02000048 for Standby/RUNNING_XACTS: nextXid 769 latestCompletedXid
768 oldestRunningXid 769
2026-01-27 17:18:36.907 GMT startup[1420] DEBUG:  end of backup record reached
2026-01-27 17:18:36.907 GMT startup[1420] CONTEXT:  WAL redo at
0/02000100 for XLOG/BACKUP_END: 0/02000028
2026-01-27 17:18:36.907 GMT startup[1420] DEBUG:  end of backup reached

Which again seems totally normal.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pgsql: Prevent invalidation of newly synced replication slots.

Reply via email to