If not set, could you add recovery.conf file recovery_target_timeline='latest' parameter? https://www.postgresql.org/docs/devel/static/recovery-target-settings.html
2018-03-08 10:41 GMT+03:00 Dylan Luong <dylan.lu...@unisa.edu.au>: > Hi Michael, > > I tested the failover today and the slave 2 failed to resync with the new > master (old slave1). > > After I promoted the slave1 to become master, I was able to use pg_rewind > on the old master and bring it back as new slave. > > I then stopped slave2 and ran pg_rewind on slave2 against new master, it > report that no rewind was required: > > $ pg_rewind -D /var/lib/pgsql/9.6/data > --source-server="host=xxxxx.xxx.xxxx > port=5432 user=postgres" > servers diverged at WAL position 1BB/AB000098 on timeline 5 > no rewind required > > So I then updated the recovery.conf on slave2 with primary_conninfo equal > to the new master IP. > When starting up posgres, it failed with the following error in the logs: > > database system was shut down in recovery at 2018-03-08 17:52:10 ACDT > 2018-03-08 17:56:27 ACDT [23026]: [2-1] db=,user= app=,host= LOG: > entering standby mode > cp: cannot stat '/pg_backup/backup/archive /00000005.history': No such > file or directory > cp: cannot stat '/pg_backup/backup/archive /00000005000001BB000000AB': No > such file or directory > 2018-03-08 17:56:27 ACDT [23026]: [3-1] db=,user= app=,host= LOG: > consistent recovery state reached at 1BB/AB000098 > 2018-03-08 17:56:27 ACDT [23026]: [4-1] db=,user= app=,host= LOG: record > with incorrect prev-link 1B9/73000040 at 1BB/AB000098 > 2018-03-08 17:56:27 ACDT [23024]: [3-1] db=,user= app=,host= LOG: > database system is ready to accept read only connections > 2018-03-08 17:56:27 ACDT [23032]: [1-1] db=,user= app=,host= LOG: started > streaming WAL from primary at 1BB/AB000000 on timeline 5 > 2018-03-08 17:56:27 ACDT [23032]: [2-1] db=,user= app=,host= LOG: > replication terminated by primary server > 2018-03-08 17:56:27 ACDT [23032]: [3-1] db=,user= app=,host= DETAIL: End > of WAL reached on timeline 5 at 1BB/AB000098. > cp: cannot stat '/pg_backup/backup/archive_sync/00000005000001BB000000AB': > No such file or directory > 2018-03-08 17:56:27 ACDT [23032]: [4-1] db=,user= app=,host= LOG: > restarted WAL streaming at 1BB/AB000000 on timeline 5 > 2018-03-08 17:56:27 ACDT [23032]: [5-1] db=,user= app=,host= LOG: > replication terminated by primary server > 2018-03-08 17:56:27 ACDT [23032]: [6-1] db=,user= app=,host= DETAIL: End > of WAL reached on timeline 5 at 1BB/AB000098. > > > On the new master in the /pg_backup/backup/archive folder I can see a file > 00000005000001BB000000AB.partial > Eg. > ls -l > -rw-------. 1 postgres postgres 16777216 Mar 8 16:48 > 00000005000001BB000000AB.partial > -rw-------. 1 postgres postgres 16777216 Mar 8 16:49 > 00000006000001BB000000AB > -rw-------. 1 postgres postgres 16777216 Mar 8 16:49 > 00000006000001BB000000AC > -rw-------. 1 postgres postgres 16777216 Mar 8 16:49 > 00000006000001BB000000AD > -rw-------. 1 postgres postgres 16777216 Mar 8 16:49 > 00000006000001BB000000AE > -rw-------. 1 postgres postgres 16777216 Mar 8 16:49 > 00000006000001BB000000AF > -rw-------. 1 postgres postgres 16777216 Mar 8 16:49 > 00000006000001BB000000B0 > -rw-------. 1 postgres postgres 16777216 Mar 8 16:49 > 00000006000001BB000000B1 > -rw-------. 1 postgres postgres 16777216 Mar 8 16:49 > 00000006000001BB000000B2 > -rw-------. 1 postgres postgres 16777216 Mar 8 16:50 > 00000006000001BB000000B3 > -rw-------. 1 postgres postgres 16777216 Mar 8 17:01 > 00000006000001BB000000B4 > -rw-------. 1 postgres postgres 16777216 Mar 8 17:14 > 00000006000001BB000000B5 > -rw-------. 1 postgres postgres 218 Mar 8 16:48 00000006.history > > Any ideas? > > Dylan > > -----Original Message----- > From: Michael Paquier [mailto:mich...@paquier.xyz] > Sent: Tuesday, 6 March 2018 5:55 PM > To: Dylan Luong <dylan.lu...@unisa.edu.au> > Cc: pgsql-generallists.postgresql.org <pgsql-general@lists.postgresql.org> > Subject: Re: Resync second slave to new master > > On Tue, Mar 06, 2018 at 06:00:40AM +0000, Dylan Luong wrote: > > So everytime after promoting Slave to become master (either manually > > or automatic), just stop Slave2 and run pg_rewind on slave2 against > > the new maser (old slave1). And when old master server is available > > again, use pg_rewind on that serve as well against new master to > > return to original configuration. > > Yes. That's exactly the idea. Running pg_rewind on the old master will > be necessary anyway because you need to stop it cleanly once, which will > cause it to generate WAL records at least for the shutdown checkpoint, > while doing it on slave 2 may be optional, still safer to do. > -- > Michael > >