Dirk Lutzebaeck and I just spent a tense couple of hours trying to figure out why a large database Down Under wasn't coming up after being reloaded from a base backup plus PITR recovery. The symptoms were that the recovery went fine, but backend processes would fail at startup or soon after with "could not open relation XX/XX/XX: No such file" type of errors.
The answer that ultimately emerged was that they'd been running a nightly maintenance script that did REINDEX SYSTEM (among other things I suppose). The PITR base backup included pg_internal.init files that were appropriate when it was taken, and the PITR recovery process did nothing whatsoever to update 'em :-(. So incoming backends picked up init files with obsolete relfilenode values. We don't actually need to *update* the file, per se, we only need to remove it if no longer valid --- the next incoming backend will rebuild it. I could see fixing this by making WAL recovery run around and zap all the .init files (only problem is to find 'em), or we could add a new kind of WAL record saying "remove the .init file for database XYZ" to be emitted whenever someone removes the active one. Thoughts? Meanwhile, if you're trying to recover from a PITR backup and it's not working, try removing any pg_internal.init files you can find. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster