Testing partial-write crash-recovery in 9.4 (e12d7320ca494fd05134847e30) with foreign keys, I found some btree index corruption.
28807 VACUUM 2014-05-21 15:33:46.878 PDT:ERROR: right sibling 4044 of block 460 is not next child 23513 of block 1264 in index "foo_p_id_idx" 28807 VACUUM 2014-05-21 15:33:46.878 PDT:STATEMENT: VACUUM; It took ~8 hours on 8 cores to encounter this problem. This is a single occurrence, it has not yet been reproduced. I don't know that the partial-writes, or the crash recovery, or the foreign key, parts of this test are important--it could be a more generic problem that only happened to be observed here. Nor do I know yet if it occurs in 9_3_STABLE. Below is the testing harness and the data directory (massively bloated at 3.7GB once uncompressed). It is currently in wrap-around shutdown, but that is the effect of persistent vacuum failures, not the cause of them. You can restart the data directory and it will repeat the above sibling error once autovac kicks in. I don't know if the bloat is due to the vacuum failure or if it was already in process before the failures started. I've cranked up the logging on that front future efforts. I'm using some fast-foward code on the xid consumption so that freezing occurs more often, and some people have expressed reservations that the code might be imperfect, and I can't rule that out as the cause (but I've never traced any other problems back to that code). But it did make it through 4 complete wraps before this problem was encountered, so if that is the problem it must be probabilistic rather than deterministic. https://drive.google.com/folderview?id=0Bzqrh1SO9FcENWd6ZXlwVWpxU0E&usp=sharing Cheers, Jeff