We've seen two recent reports: http://archives.postgresql.org/pgsql-admin/2005-04/msg00008.php http://archives.postgresql.org/pgsql-general/2005-05/msg01143.php of postmaster restart failing because the WAL contains a reference to a page that no longer exists.
I can think of a couple of possible explanations: 1. filesystem corruption, ie the page should exist in the file but the kernel has forgotten about it; 2. we truncated the file subsequent to the WAL record that causes the panic. However, neither of these theories is entirely satisfying, because the WAL replay logic has always acted like this; why haven't we seen similar reports ever since 7.1? And why are both of these reports connected to btrees, when file truncation probably happens far more often on regular tables? But, setting those nagging doubts aside, theory #2 seems like a definite bug that we ought to do something about. The only really clean answer I can see is for file truncation to force a checkpoint just before issuing the ftruncate call. That way, no WAL records referencing the to-be-deleted pages would need to be replayed in a subsequent crash. However, checkpoints are expensive enough to make this solution very unattractive from a performance point of view. And I fear it's not a 100% solution anyway: what about the PITR scenario, where you need to replay a WAL log that was made concurrently with a filesystem backup being taken? The backup might well include the truncated version of the file, but you can't avoid replaying the beginning portion of the WAL log. Plan B is for WAL replay to always be willing to extend the file to whatever record number is mentioned in the log, even though this may require inventing the contents of empty pages; we trust that their contents won't matter because they'll be truncated again later in the replay sequence. This seems pretty messy though, especially for indexes. The major objection to it is that it gives up error detection in real filesystem-corruption cases: we'll just silently build an invalid index and then try to run with it. (Still, that might be better than refusing to start; at least you can REINDEX afterwards.) Any thoughts? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match