On Thu, Aug 11, 2016 at 10:39 PM, James Sewell <james.sew...@jirotech.com> wrote:
> Hello, > > We recently experienced a critical failure when failing to a DR > environment. > > This is in the following environment: > > > - 3 x PostgreSQL machines in Prod in a sync replication cluster > - 3 x PostgreSQL machines in DR, with a single machine async and the > other two cascading from the first machine. > > There was network failure which isolated Production from everything else, > Production has no errors during this time (and has now come back OK). > > DR did not tolerate the break, the following appeared in the logs and none > of them can start postgres. There were no queries coming into DR at the > time of the break. > > Please note that the "Host Key verification failed" messages are due to > the scp command not functioning. This means restore_command is not working > to restore from the XLOG archive, but should not effect anything else. > In my experience, PostgreSQL issues its own error messages when restore_command fails. So I see both the error from the command itself, and an error from PostgreSQL. Why don't you see that? Is the restore_command failing, but then reporting that it succeeded? And if you can't get files from the XLOG archive, why do you think that that is OK? Cheers, Jeff