[GENERAL] Server Panic when trying to stop point in time recovery

Chris Copeland Tue, 27 Apr 2010 13:49:36 -0700

I am running 8.2.4 on Solaris 10 x86.

I have setup WAL file shipping from a primary server to a warm standby.  I
am able to start the standby server using a backup from the primary and get
it to apply the log files as they arrive.  My problem comes when I want to
trigger the standby server to come out of recovery mode.


Here is the log file starting from when the server comes up from the
backup.  Just prior to the error I have "triggered" the server to exit
recovery mode by making my restore_command return 1 instead of 0.

2010-04-27 15:00:58 CDT :LOG:  database system was interrupted at 2010-04-27
10:10:08 CDT
2010-04-27 15:00:58 CDT :LOG:  starting archive recovery
2010-04-27 15:00:58 CDT :LOG:  restore_command = "/opt/data/restore.sh
/opt/wal/archwalremote/%f %p"
2010-04-27 15:00:58 CDT :LOG:  restored log file
"000000010000009F000000BA.00000278.backup" from archive
2010-04-27 15:00:59 CDT :LOG:  restored log file "000000010000009F000000BA"
from archive
2010-04-27 15:00:59 CDT :LOG:  checkpoint record is at 9F/BA000278
2010-04-27 15:00:59 CDT :LOG:  redo record is at 9F/BA000278; undo record is
at 0/0; shutdown FALSE
2010-04-27 15:00:59 CDT :LOG:  next transaction ID: 0/325985316; next OID:
823081
2010-04-27 15:00:59 CDT :LOG:  next MultiXactId: 2127; next MultiXactOffset:
4278
2010-04-27 15:00:59 CDT :LOG:  automatic recovery in progress
2010-04-27 15:00:59 CDT :LOG:  redo starts at 9F/BA0002C0
2010-04-27 15:01:00 CDT :LOG:  restored log file "000000010000009F000000BB"
from archive
2010-04-27 15:01:02 CDT :LOG:  restored log file "000000010000009F000000BC"
from archive

<snip, many more files restored>

2010-04-27 15:03:19 CDT :LOG:  restored log file "000000010000009F000000FE"
from archive
2010-04-27 15:03:20 CDT :LOG:  restored log file "00000001000000A000000000"
from archive
2010-04-27 15:06:00 CDT :LOG:  restored log file "00000001000000A000000001"
from archive
2010-04-27 15:09:21 CDT :LOG:  could not open file
"pg_xlog/00000001000000A000000002" (log file 160, segment 2): No such file
or directory
2010-04-27 15:09:21 CDT :LOG:  redo done at A0/1000068
2010-04-27 15:09:21 CDT :PANIC:  could not open file
"pg_xlog/00000001000000A000000001" (log file 160, segment 1): No such file
or directory
2010-04-27 15:09:26 CDT :LOG:  startup process (PID 22490) was terminated by
signal 6
2010-04-27 15:09:26 CDT :LOG:  aborting startup due to startup process
failure
2010-04-27 15:09:26 CDT :LOG:  logger shutting down

--------------------

At this point the server will now enter a restart loop and constantly
generate log files like this :

2010-04-27 15:09:26 CDT :LOG:  database system was interrupted while in
recovery at log time 2010-04-27 15:05:08 CDT
2010-04-27 15:09:26 CDT :HINT:  If this has occurred more than once some
data may be corrupted and you may need to choose an earlier recovery target.
2010-04-27 15:09:26 CDT :LOG:  starting archive recovery
2010-04-27 15:09:26 CDT :LOG:  restore_command = "/opt/data/restore.sh
/opt/wal/archwalremote/%f %p"
2010-04-27 15:09:26 CDT :LOG:  could not open file
"pg_xlog/00000001000000A000000001" (log file 160, segment 1): No such file
or directory
2010-04-27 15:09:26 CDT :LOG:  invalid primary checkpoint record
2010-04-27 15:09:26 CDT :LOG:  could not open file
"pg_xlog/000000010000009F000000BA" (log file 159, segment 186): No such file
or directory
2010-04-27 15:09:26 CDT :LOG:  invalid secondary checkpoint record
2010-04-27 15:09:26 CDT :PANIC:  could not locate a valid checkpoint record
2010-04-27 15:09:30 CDT :LOG:  startup process (PID 24191) was terminated by
signal 6
2010-04-27 15:09:30 CDT :LOG:  aborting startup due to startup process
failure
2010-04-27 15:09:30 CDT :LOG:  logger shutting down

--------------------

Any help is greatly appreciated.  Please let me know if I can provide any
more information that will be helpful.

-Chris

[GENERAL] Server Panic when trying to stop point in time recovery

Reply via email to