On Tue, 17 Oct 2006, Tom Lane wrote: > Dirk Lutzebaeck and I just spent a tense couple of hours trying to > figure out why a large database Down Under wasn't coming up after being > reloaded from a base backup plus PITR recovery. The symptoms were that > the recovery went fine, but backend processes would fail at startup or > soon after with "could not open relation XX/XX/XX: No such file" type of > errors. > > The answer that ultimately emerged was that they'd been running a > nightly maintenance script that did REINDEX SYSTEM (among other things > I suppose). The PITR base backup included pg_internal.init files that > were appropriate when it was taken, and the PITR recovery process did > nothing whatsoever to update 'em :-(. So incoming backends picked up > init files with obsolete relfilenode values.
Ouch. > We don't actually need to *update* the file, per se, we only need to > remove it if no longer valid --- the next incoming backend will rebuild > it. I could see fixing this by making WAL recovery run around and zap > all the .init files (only problem is to find 'em), or we could add a new > kind of WAL record saying "remove the .init file for database XYZ" > to be emitted whenever someone removes the active one. Thoughts? The latter seems the Right Way except, I guess, that the decision to remove the file is buried deep inside inval.c. Thanks, Gavin ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend