On Mon, Oct 16, 2023 at 11:45 AM David Steele <da...@pgmasters.net> wrote: > Hmmm, the reason to back patch this is that it would fix [1], which sure > looks like a problem to me even if it is not a "bug". We can certainly > require backup software to retry pg_control until the checksum is valid > but that seems like a pretty big ask, even considering how complicated > backup is.
That seems like a problem with pg_control not being written atomically when the standby server is updating it during recovery, rather than a problem with backup_label not being used at the start of recovery. Unless I'm confused. > If you start from the last checkpoint (which is what will generally be > stored in pg_control) then the effect is pretty similar. If the backup didn't span a checkpoint, then restoring from the one in pg_control actually works fine. Not that I'm encouraging that. But if you replay WAL from the control file, you at least get the last checkpoint's worth of WAL; if you use pg_resetwal, you get nothing. I don't really want to get hung up on this though. My main point here is that I have trouble believing that an error after you've already screwed up your backup helps much. I think what we need is to make it less likely that you will screw up your backup in the first place. > Right now the user can remove backup_label and get a "successful" > restore and not realize that they have just corrupted their cluster. > This is independent of the backup/restore tool doing all the right things. I don't think it's independent of that at all. -- Robert Haas EDB: http://www.enterprisedb.com