Using both 9.2.9 and 9_2_STABLE 9b468bcec15f1ea7433d4, I get a fairly reproducible startup failure.
What I was doing is restore a database from a base backup and roll it forward with a recovery.conf until it completes and starts up. Then I truncate an unlogged table and start repopulating it with a large slow 'insert into ...select from' query. While that is running, if I reboot the server, when it restarts it does not come back up initially. This is what I get in the log from the attempted restart: PST LOG: database system was interrupted; last known up at 2014-11-25 15:40:33 PST PST LOG: database system was not properly shut down; automatic recovery in progress PST LOG: redo starts at 84/EF000080 PST LOG: record with zero length at 84/EF09AE18 PST LOG: redo done at 84/EF09AD28 PST LOG: last completed transaction was at log time 2014-11-25 15:42:09.173599-08 PST LOG: checkpoint starting: end-of-recovery immediate PST LOG: checkpoint complete: wrote 103 buffers (0.2%); 0 transaction log file(s) added, 246 removed, 7 recycled; write=0.002 s, sync=0.020 s, total=0.526 s; sync files=51, longest=0.003 s, average=0.000 s PST FATAL: could not create file "base/16416/59288": File exists PST LOG: startup process (PID 2472) exited with exit code 1 PST LOG: aborting startup due to startup process failure oid2name doesn't show me any 59288, so I think it is the new copy of the unlogged table which is being created at the moment of the reboot. I can't distribute a tarball of this particular database. How would I go about debugging this? Should I track down the source of the FATAL and convert it to a PANIC so I can get a core dump to look at? A second attempt to start up the server completes successfully. Cheers, Jeff