On Mon, Feb 21, 2022 at 05:19:48PM -0800, Nathan Bossart wrote: > I also spent some time investigating whether durably renaming the archive > status files was even necessary. In theory, it shouldn't be. If a crash > happens before the rename is persisted, we might try to re-archive files, > but your archive_command/archive_library should handle that. If the file > was already recycled or removed, the archiver will skip it (thanks to > 6d8727f). But when digging further, I found that WAL file recyling uses > durable_rename_excl(), which has the following note: > > * Note that a crash in an unfortunate moment can leave you with two > links to > * the target file. > > IIUC this means that in theory, a crash at an unfortunate moment could > leave you with a .ready file, the file to archive, and another link to that > file with a "future" WAL filename. If you re-archive the file after it has > been reused, you could end up with corrupt WAL archives. I think this > might already be possible today. Syncing the directory after every rename > probably reduces the likelihood quite substantially, but if > RemoveOldXlogFiles() quickly picks up the .done file and attempts > recycling before durable_rename() calls fsync() on the directory, > presumably the same problem could occur.
In my testing, I found that when I killed the server just before unlink() during WAL recyling, I ended up with links to the same file in pg_wal after restarting. My latest test produced links to the same file for the current WAL file and the next one. Maybe WAL recyling should use durable_rename() instead of durable_rename_excl(). -- Nathan Bossart Amazon Web Services: https://aws.amazon.com