I am running 8.2.4 on Solaris 10 x86. I have setup WAL file shipping from a primary server to a warm standby. I am able to start the standby server using a backup from the primary and get it to apply the log files as they arrive. My problem comes when I want to trigger the standby server to come out of recovery mode.
Here is the log file starting from when the server comes up from the backup. Just prior to the error I have "triggered" the server to exit recovery mode by making my restore_command return 1 instead of 0. 2010-04-27 15:00:58 CDT :LOG: database system was interrupted at 2010-04-27 10:10:08 CDT 2010-04-27 15:00:58 CDT :LOG: starting archive recovery 2010-04-27 15:00:58 CDT :LOG: restore_command = "/opt/data/restore.sh /opt/wal/archwalremote/%f %p" 2010-04-27 15:00:58 CDT :LOG: restored log file "000000010000009F000000BA.00000278.backup" from archive 2010-04-27 15:00:59 CDT :LOG: restored log file "000000010000009F000000BA" from archive 2010-04-27 15:00:59 CDT :LOG: checkpoint record is at 9F/BA000278 2010-04-27 15:00:59 CDT :LOG: redo record is at 9F/BA000278; undo record is at 0/0; shutdown FALSE 2010-04-27 15:00:59 CDT :LOG: next transaction ID: 0/325985316; next OID: 823081 2010-04-27 15:00:59 CDT :LOG: next MultiXactId: 2127; next MultiXactOffset: 4278 2010-04-27 15:00:59 CDT :LOG: automatic recovery in progress 2010-04-27 15:00:59 CDT :LOG: redo starts at 9F/BA0002C0 2010-04-27 15:01:00 CDT :LOG: restored log file "000000010000009F000000BB" from archive 2010-04-27 15:01:02 CDT :LOG: restored log file "000000010000009F000000BC" from archive <snip, many more files restored> 2010-04-27 15:03:19 CDT :LOG: restored log file "000000010000009F000000FE" from archive 2010-04-27 15:03:20 CDT :LOG: restored log file "00000001000000A000000000" from archive 2010-04-27 15:06:00 CDT :LOG: restored log file "00000001000000A000000001" from archive 2010-04-27 15:09:21 CDT :LOG: could not open file "pg_xlog/00000001000000A000000002" (log file 160, segment 2): No such file or directory 2010-04-27 15:09:21 CDT :LOG: redo done at A0/1000068 2010-04-27 15:09:21 CDT :PANIC: could not open file "pg_xlog/00000001000000A000000001" (log file 160, segment 1): No such file or directory 2010-04-27 15:09:26 CDT :LOG: startup process (PID 22490) was terminated by signal 6 2010-04-27 15:09:26 CDT :LOG: aborting startup due to startup process failure 2010-04-27 15:09:26 CDT :LOG: logger shutting down -------------------- At this point the server will now enter a restart loop and constantly generate log files like this : 2010-04-27 15:09:26 CDT :LOG: database system was interrupted while in recovery at log time 2010-04-27 15:05:08 CDT 2010-04-27 15:09:26 CDT :HINT: If this has occurred more than once some data may be corrupted and you may need to choose an earlier recovery target. 2010-04-27 15:09:26 CDT :LOG: starting archive recovery 2010-04-27 15:09:26 CDT :LOG: restore_command = "/opt/data/restore.sh /opt/wal/archwalremote/%f %p" 2010-04-27 15:09:26 CDT :LOG: could not open file "pg_xlog/00000001000000A000000001" (log file 160, segment 1): No such file or directory 2010-04-27 15:09:26 CDT :LOG: invalid primary checkpoint record 2010-04-27 15:09:26 CDT :LOG: could not open file "pg_xlog/000000010000009F000000BA" (log file 159, segment 186): No such file or directory 2010-04-27 15:09:26 CDT :LOG: invalid secondary checkpoint record 2010-04-27 15:09:26 CDT :PANIC: could not locate a valid checkpoint record 2010-04-27 15:09:30 CDT :LOG: startup process (PID 24191) was terminated by signal 6 2010-04-27 15:09:30 CDT :LOG: aborting startup due to startup process failure 2010-04-27 15:09:30 CDT :LOG: logger shutting down -------------------- Any help is greatly appreciated. Please let me know if I can provide any more information that will be helpful. -Chris