I'm reporting this as a PostgreSQL bug because it involves an index corruption. I can't see any other way our application should be able to corrupt an index. I will attach the tail of the log when the corruption was detected (and the postmaster shut itself down), as well as the subsequent attempt to start. Fortunately we run our web site off of a farm of four database servers, so we are taking one of the others out of the mix, stopping postmaster, and copying its data directory over to this machine for recovery, so we don't need advice on that aspect of things; but, we'd like to do what we can to help track down the cause to prevent a recurrence. We have renamed the data directory to make room for recovery at the normal location, but otherwise the failing data directory structure is unmodified.
For context, this is running on Windows 2003 Server. Eight Xeon box, no HT, 6 GB RAM, 13 drive 15,000 RPM RAID5 array through battery backed controller for everything. This database is about 180 GB with about 300 tables. We have autovacuum running every ten seconds because of a few very small tables with very high update rates, and we have a scheduled VACUUM ANALYZE VERBOSE every night. It appears that last night's vacuum found the problem, which the previous night's vacuum didn't. We had some event which started at 14:25 yesterday which persisted until we restarted the middle tier at 15:04. The symptom was that a fraction of the queries which normally run in a few ms were timing out on a 20 second limit. pg_locks showed no blocking. We've been getting episodes with these symptoms occassionally, but they have only lasted a minute or two; this duration was unusual. We haven't identified a cause. One odd thing is that with the number of queries per second that we run, the number of timeouts during an episode is too small to support the notion that _all_ similar queries are failing. How best to proceed? -Kevin
postgresql-2006-04-06_000000-tail.log
Description: Binary data
postgresql-2006-04-06_072250.log
Description: Binary data
---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster