>> Send a SIGQUIT to the postmaster to simulate a crash. When you bring it >> back up, it thinks it is recovering from a backup, so it reads >> backup_label. The checkpoint for the backup label is in 00...6, so it >> reads that just fine. But then it tries to read the WAL starting at the >> redo location from that checkpoint, which is in 00...5 and it doesn't >> exist and PANICs. >> >> Ordinarily you might say that this is just confusion over whether it's >> recovering from a backup or not, and you just need to remove >> backup_label and try again. But that doesn't work: at this point >> StartupXLOG has already done two things: >> 1. renamed the backup file to .old >> 2. updated the control file
Good catch! > I still think it would be nice if postgres knew whether it was restoring > a backup or recovering from a crash, otherwise it's hard to > automatically recover from failures. I thought about using the presence > of recoveryRestoreCommand or PrimaryConnInfo to determine that. But it > seemed potentially dangerous if the person restoring a backup simply > forgot to set those, and then it tries restoring from the controldata > instead (which is unsafe to do during a backup). Yep, to automatically delete backup_label and continue recovery seems to be dangerous. How about just emitting FATAL error when neither restore_command nor primary_conninfo is supplied and backup_label exists? This seems to be simpler than your proposed patch (i.e., check whether REDO location exists). Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs