[HACKERS] 9.2 recovery/startup problems

Jeff Janes Tue, 25 Nov 2014 21:32:18 -0800

Using both 9.2.9 and 9_2_STABLE 9b468bcec15f1ea7433d4, I get a fairly
reproducible startup failure.


What I was doing is restore a database from a base backup and roll it
forward with a recovery.conf until it completes and starts up.  Then I
truncate an unlogged table and start repopulating it with a large slow
'insert into ...select from' query.

While that is running, if I reboot the server, when it restarts it does not
come back up initially.

This is what I get in the log from the attempted restart:

PST LOG:  database system was interrupted; last known up at 2014-11-25
15:40:33 PST
PST LOG:  database system was not properly shut down; automatic recovery in
progress
PST LOG:  redo starts at 84/EF000080
PST LOG:  record with zero length at 84/EF09AE18
PST LOG:  redo done at 84/EF09AD28
PST LOG:  last completed transaction was at log time 2014-11-25
15:42:09.173599-08
PST LOG:  checkpoint starting: end-of-recovery immediate
PST LOG:  checkpoint complete: wrote 103 buffers (0.2%); 0 transaction log
file(s) added, 246 removed, 7 recycled; write=0.002 s, sync=0.020 s,
total=0.526 s; sync files=51, longest=0.003 s, average=0.000 s
PST FATAL:  could not create file "base/16416/59288": File exists
PST LOG:  startup process (PID 2472) exited with exit code 1
PST LOG:  aborting startup due to startup process failure

oid2name doesn't show me any 59288, so I think it is the new copy of the
unlogged table which is being created at the moment of the reboot.

I can't distribute a tarball of this particular database.  How would I go
about debugging this?  Should I track down the source of the FATAL and
convert it to a PANIC so I can get a core dump to look at?

A second attempt to start up the server completes successfully.

Cheers,

Jeff

[HACKERS] 9.2 recovery/startup problems

Reply via email to