[ADMIN] WAL recovery, stop and resume recovery?

David Wall Fri, 11 Jan 2008 11:47:25 -0800

Using PG 8.2, I have a database in recovery mode using pg_standby tohandle the WAL restores.

Is it allowable to have a backup database in recovery mode, then stoprecovery (in this case, by putting the trigger file in place to stoppg_standby), check out that the backup db appears up to date, stop thenow active backup db, and then restart it in recover mode again to haveit resume its backup role?


I have had some success doing this, with the restart in recovery showing:

LOG:  starting archive recovery

LOG: restore_command = "~/postgresql/bin/pg_standby -l -d -s 2 -k 20 -t~/postgresql/restoreWALs/STOP_RESTORE ~/postgresql/restoreWALs %f %p 2>>~/pg_standby.log"

LOG:  restored log file "000000010000000500000018" from archive
*LOG:  invalid xl_info in primary checkpoint record*
LOG:  using previous checkpoint record at 5/18000020
LOG:  redo record is at 5/18000020; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 0/1535389; next OID: 53990
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
*LOG:  automatic recovery in progress*
LOG:  redo starts at 5/18000068

But there are times when I do this that it cannot. Is this because thesteps are an issue (after all, I did stop recovery and go activebriefly, though I didn't update the db during that time, just did \d andselect queries to see that DDL and row data were updated on the backup),or is it related to not keeping enough WAL files around (pg_standby -k20 was chosen, but it's not clear how to select this value, and itsounds like 8.3 gets rid of that issue entirely) to find the 'secondarycheckpoint record'.


Here's the sort of error I get when it doesn't allow me to restart:

LOG:  database system was shut down at 2008-01-11 11:40:05 PST
LOG:  starting archive recovery

LOG: restore_command = "~/postgresql/bin/pg_standby -l -d -s 2 -k 20 -t~/postgresql/restoreWALs/STOP_RESTORE ~/postgresql/restoreWALs %f %p 2>>~/pg_standby.log"

*LOG:  restored log file "00000001000000050000001D" from archive
LOG:  invalid record length at 5/1D000068
LOG:  invalid primary checkpoint record
LOG:  restored log file "00000001000000050000001D" from archive
LOG:  invalid resource manager ID in secondary checkpoint record
PANIC:  could not locate a valid checkpoint record*
LOG:  startup process (PID 9219) was terminated by signal 6
LOG:  aborting startup due to startup process failure
LOG:  logger shutting down


Thanks,
David

[ADMIN] WAL recovery, stop and resume recovery?

Reply via email to