On Mon, Mar 27, 2017 at 11:26 AM, Tsunakawa, Takayuki < tsunakawa.ta...@jp.fujitsu.com> wrote:
> From: pgsql-hackers-ow...@postgresql.org > > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier > > Moved to CF 2017-03. Both patches still apply. > > Sorry to be late for reviewing this, but done now. The patch applied, > make check passed, and the code looks almost good. I could successfully > perform a simple archive recovery. Finally, I broke the 2pc state file > while the server is down, and could confirm that the server failed to start > as expected, emitting a FATAL message. Worked nicely. > I've tested moving recovery.conf away case which was originally reported by Magnus. When I'm trying to promote standby when recovery.conf is moved away, I get FATAL error. waiting for server to promote....2017-03-27 17:12:38.225 MSK [30307] LOG: received promote request 2017-03-27 17:12:38.225 MSK [30311] FATAL: terminating walreceiver process due to administrator command 2017-03-27 17:12:38.225 MSK [30307] LOG: redo done at 0/3000028 2017-03-27 17:12:38.226 MSK [30307] LOG: selected new timeline ID: 2 2017-03-27 17:12:38.253 MSK [30307] FATAL: could not open file "recovery.conf": No such file or directory 2017-03-27 17:12:38.253 MSK [30306] LOG: startup process (PID 30307) exited with exit code 1 2017-03-27 17:12:38.253 MSK [30306] LOG: terminating any other active server processes 2017-03-27 17:12:38.256 MSK [30306] LOG: database system is shut down ........................................................... stopped waiting server is still promoting If I try to start it after crash, it becomes a master on timeline 1. Just like in case we deleted recovery.conf while server was shut down. waiting for server to start....2017-03-27 17:16:54.186 MSK [30413] LOG: listening on IPv6 address "::1", port 5430 2017-03-27 17:16:54.186 MSK [30413] LOG: listening on IPv6 address "fe80::1%lo0", port 5430 2017-03-27 17:16:54.186 MSK [30413] LOG: listening on IPv4 address "127.0.0.1", port 5430 2017-03-27 17:16:54.187 MSK [30413] LOG: listening on Unix socket "/tmp/.s.PGSQL.5430" 2017-03-27 17:16:54.195 MSK [30414] LOG: database system was interrupted while in recovery at log time 2017-03-27 17:10:23 MSK 2017-03-27 17:16:54.195 MSK [30414] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2017-03-27 17:16:54.218 MSK [30414] LOG: database system was not properly shut down; automatic recovery in progress 2017-03-27 17:16:54.219 MSK [30414] LOG: redo starts at 0/2000060 2017-03-27 17:16:54.219 MSK [30414] LOG: consistent recovery state reached at 0/3000060 2017-03-27 17:16:54.219 MSK [30414] LOG: invalid record length at 0/3000060: wanted 24, got 0 2017-03-27 17:16:54.219 MSK [30414] LOG: redo done at 0/3000028 2017-03-27 17:16:54.224 MSK [30413] LOG: database system is ready to accept connections done server started # select pg_is_in_recovery(); pg_is_in_recovery ─────────────────── f (1 row) # select pg_walfile_name(pg_current_wal_location()); pg_walfile_name ────────────────────────── 000000010000000000000003 (1 row) If instead I put recovery back and start server, then streaming replication continues normally. waiting for server to start....2017-03-27 17:32:16.887 MSK [30783] LOG: listening on IPv6 address "::1", port 5430 2017-03-27 17:32:16.887 MSK [30783] LOG: listening on IPv6 address "fe80::1%lo0", port 5430 2017-03-27 17:32:16.887 MSK [30783] LOG: listening on IPv4 address "127.0.0.1", port 5430 2017-03-27 17:32:16.888 MSK [30783] LOG: listening on Unix socket "/tmp/.s.PGSQL.5430" 2017-03-27 17:32:16.897 MSK [30784] LOG: database system was interrupted while in recovery at log time 2017-03-27 17:28:05 MSK 2017-03-27 17:32:16.897 MSK [30784] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2017-03-27 17:32:16.914 MSK [30784] LOG: entering standby mode 2017-03-27 17:32:16.916 MSK [30784] LOG: redo starts at 0/8000028 2017-03-27 17:32:16.916 MSK [30784] LOG: consistent recovery state reached at 0/9000060 2017-03-27 17:32:16.916 MSK [30784] LOG: invalid record length at 0/9000060: wanted 24, got 0 2017-03-27 17:32:16.916 MSK [30783] LOG: database system is ready to accept read only connections 2017-03-27 17:32:16.920 MSK [30788] LOG: started streaming WAL from primary at 0/9000000 on timeline 1 done server started # select pg_is_in_recovery(); pg_is_in_recovery ─────────────────── t (1 row) Thus, I think patch is working as expected in this case. Also, I'd like to notice that it passes check-world without warnings on my laptop. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company