Re: [BUGS] Recovery bug

Fujii Masao Mon, 18 Oct 2010 01:03:09 -0700

>> Send a SIGQUIT to the postmaster to simulate a crash. When you bring it
>> back up, it thinks it is recovering from a backup, so it reads
>> backup_label. The checkpoint for the backup label is in 00...6, so it
>> reads that just fine. But then it tries to read the WAL starting at the
>> redo location from that checkpoint, which is in 00...5 and it doesn't
>> exist and PANICs.
>>
>> Ordinarily you might say that this is just confusion over whether it's
>> recovering from a backup or not, and you just need to remove
>> backup_label and try again. But that doesn't work: at this point
>> StartupXLOG has already done two things:
>>  1. renamed the backup file to .old
>>  2. updated the control file


Good catch!

> I still think it would be nice if postgres knew whether it was restoring
> a backup or recovering from a crash, otherwise it's hard to
> automatically recover from failures. I thought about using the presence
> of recoveryRestoreCommand or PrimaryConnInfo to determine that. But it
> seemed potentially dangerous if the person restoring a backup simply
> forgot to set those, and then it tries restoring from the controldata
> instead (which is unsafe to do during a backup).

Yep, to automatically delete backup_label and continue recovery seems to be
dangerous. How about just emitting FATAL error when neither restore_command
nor primary_conninfo is supplied and backup_label exists? This seems to be
simpler than your proposed patch (i.e., check whether REDO location exists).

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-bugs mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Recovery bug

Reply via email to