On Tue, 17 Oct 2006, Tom Lane wrote:

> Dirk Lutzebaeck and I just spent a tense couple of hours trying to
> figure out why a large database Down Under wasn't coming up after being
> reloaded from a base backup plus PITR recovery.  The symptoms were that
> the recovery went fine, but backend processes would fail at startup or
> soon after with "could not open relation XX/XX/XX: No such file" type of
> errors.
>
> The answer that ultimately emerged was that they'd been running a
> nightly maintenance script that did REINDEX SYSTEM (among other things
> I suppose).  The PITR base backup included pg_internal.init files that
> were appropriate when it was taken, and the PITR recovery process did
> nothing whatsoever to update 'em :-(.  So incoming backends picked up
> init files with obsolete relfilenode values.

Ouch.

> We don't actually need to *update* the file, per se, we only need to
> remove it if no longer valid --- the next incoming backend will rebuild
> it.  I could see fixing this by making WAL recovery run around and zap
> all the .init files (only problem is to find 'em), or we could add a new
> kind of WAL record saying "remove the .init file for database XYZ"
> to be emitted whenever someone removes the active one.  Thoughts?

The latter seems the Right Way except, I guess, that the decision to
remove the file is buried deep inside inval.c.

Thanks,

Gavin

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to