Hello, I'm trying to find out how to rewind a cluster that was not shut down cleanly, in order to implement pg_rewind support in patroni (an automated failover system, https://github.com/zalando/patroni).
At the moment, pg_rewind limits itself to only cleanly shut down clusters. This works nicely in the case of a split brain caused by the network partitioning. However, it doesn't cover the important case of a suddenly crashed master: the crashed cluster cannot be rewound to the new master. One idea to overcome this limitation is to start the former master for a short time, just to let automatic recovery do its job, and stop it cleanly afterwards. There are some indications on the list that it works: http://www.postgresql.org/message-id/79f6ceb4-f519-40fa-9c72-167def1eb...@simply.name However, in our testing we had an issue with a missing WAL segment on a former master, which prevented pg_rewind from bringing it up to date with the current master: Suppose, the current XLOG segment right before we crash the master is: postgres=# select * from pg_xlogfile_name(pg_current_xlog_location()); pg_xlogfile_name -------------------------- 000000010000000000000003 (1 row) (the master is configured to archive all segments into the external directory). The latest checkpoint location right before the crash is: Latest checkpoint's REDO location: 0/2000028 Latest checkpoint's REDO WAL file: 000000010000000000000002 and pg_xlog contains the following data $ ls -R postgresql0/pg_xlog/ 000000010000000000000001 000000010000000000000002.00000028.backup archive_status 000000010000000000000002 000000010000000000000003 postgresql0/pg_xlog//archive_status: 000000010000000000000001.done 000000010000000000000002.done 000000010000000000000002.00000028.backup.done Now, if we crash the master by sending it SIGKILL, and then start it again with: $ postgres -D data/postgresql0 -c "max_replication_slots=5" -c "wal_level=hot_standby" -c "wal_log_hints=on" LOG: database system was interrupted; last known up at 2015-10-05 17:28:04 CEST LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/2000028 LOG: invalid record length at 0/3000060 LOG: redo done at 0/3000028 we'll get the following contents of postgresql0/pg_xlog: $ ls -R postgresql0/pg_xlog/ 000000010000000000000002.00000028.backup 000000010000000000000004 archive_status 000000010000000000000003 000000010000000000000005 postgresql0/pg_xlog//archive_status: 000000010000000000000002.00000028.backup.done Note, that at some moment the master removed the segment 000000010000000000000002 from its pg_xlog. In the pg_controldata, I get: Latest checkpoint's REDO location: 0/3000108 Latest checkpoint's REDO WAL file: 000000010000000000000003 When I try to run pg_rewind, I'm getting: pg_rewind -D postgresql0 --source-server="host=127.0.0.1 port=5433 dbname=postgres" The servers diverged at WAL position 0/3000060 on timeline 1. could not open file "data/postgresql0/pg_xlog/000000010000000000000002": No such file or directory Note that this problem happens not 100% of time during the tests, sometimes pg_rewind can actually rewind the former master. I know I can copy the segment back from the archive, but I'd like to avoid putting this logic into the failover tool if possible. Is there anything we can do to avoid the problem described above, or is there a better way to bring up the former master after the crash with pg_rewind? Kind regards, -- Oleksii Kliukin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers