On 12/11/2014 05:45 AM, Andres Freund wrote:
A customer recently reported getting "backup_label contains data inconsistent with control file" after taking a basebackup from a standby and starting it with a typo in primary_conninfo.When starting postgres from a basebackup StartupXLOG() has the follow code to deal with backup labels: if (haveBackupLabel) { ControlFile->backupStartPoint = checkPoint.redo; ControlFile->backupEndRequired = backupEndRequired; if (backupFromStandby) { if (dbstate_at_startup != DB_IN_ARCHIVE_RECOVERY) ereport(FATAL, (errmsg("backup_label contains data inconsistent with control file"), errhint("This means that the backup is corrupted and you will " "have to use another backup for recovery."))); ControlFile->backupEndPoint = ControlFile->minRecoveryPoint; } } while I'm not enthusiastic about the error message, that bit of code looks sane at first glance. We certainly expect the control file to indicate we're in recovery. Since we're unlinking the backup label shortly afterwards we'd normally not expect to hit that case after a shutdown in recovery.
Check.
The problem is that after reading the backup label we also have to read the corresponding checkpoing from pg_xlog. If primary_conninfo and/or restore_command are misconfigured and can't restore files that can only be fixed by shutting down the cluster and fixing up recovery.conf - which sets DB_SHUTDOWNED_IN_RECOVERY in the control file.
No it doesn't. The state is set to DB_SHUTDOWNED_IN_RECOVERY in CreateRestartPoint(). If you shut down the server before it has even read the initial checkpoint record, it will not attempt to create a restartpoint nor update the control file.
The easiest solution seems to be to simply also allow that as a state in the above check. It might be nicer to not allow a ShutdownXLOG to modify the control file et al at that stage, but I think that'd end up being more invasive. A short search shows that that also looks like a credible explanation for #12128...
Yeah. I was not able to reproduce this, but I'm clearly missing something, since both you and Sergey have seen this happening. Can you write a script to reproduce?
- Heikki -- Sent via pgsql-hackers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
