Hi I've now setup a warm-standby machine by using wal archiving. The restore_command on the warm-standby machine loops until the wal requested by postgres appears, instead of returning 1. Additionally, restore_command check for two special flag-files "abort" and "take_online". If "take_online" exists, then it exists with code 1 in case of a non-existant wal - this allows me to take the slave online if the master fails.
This methods seems to work, but it is neither particularly fool-proof nor administrator friendly. It's not possible e.g. to reboot the slave without postgres abortint the recovery, and therefor processing all wals generated since the last backup all over again. Monitoring this system is hard too, since there is no easy way to detect errors while restoring a particular wal. I think that all those problems could be solved if postgres provided a standalone application that could restore one wal into a specified data-dir. It should be possible to call this application repeatedly to restore wals as they are received from the master. Since "pg_restorelog" would be call seperately for every wal, I'd be easy to detect errors recovering a specific wal. Do you think this idea is feaseable? How hard would it be to turn the current archived-wal-recovery-code into a standalone executable (That of course needs to be called when postgres is _not_ running.) greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq