RE: Resync second slave to new master

2018-03-08 Thread Dylan Luong
Thanks! That fixed the issue.

-Original Message-
From: Michael Paquier [mailto:mich...@paquier.xyz] 
Sent: Thursday, 8 March 2018 6:41 PM
To: Yavuz Selim Sertoğlu <yavuzselimserto...@gmail.com>
Cc: Dylan Luong <dylan.lu...@unisa.edu.au>; pgsql-generallists.postgresql.org 
<pgsql-general@lists.postgresql.org>
Subject: Re: Resync second slave to new master

On Thu, Mar 08, 2018 at 10:48:29AM +0300, Yavuz Selim Sertoğlu wrote:
> If not set, could you add recovery.conf file
> recovery_target_timeline='latest'
> parameter?
> https://www.postgresql.org/docs/devel/static/recovery-target-settings.html

Yes, that's visibly the issue here.
--
Michael


Re: Resync second slave to new master

2018-03-08 Thread Michael Paquier
On Thu, Mar 08, 2018 at 10:48:29AM +0300, Yavuz Selim Sertoğlu wrote:
> If not set, could you add recovery.conf file
> recovery_target_timeline='latest'
> parameter?
> https://www.postgresql.org/docs/devel/static/recovery-target-settings.html

Yes, that's visibly the issue here.
--
Michael


signature.asc
Description: PGP signature


Re: Resync second slave to new master

2018-03-07 Thread Yavuz Selim Sertoğlu
If not set, could you add recovery.conf file
recovery_target_timeline='latest'
parameter?
https://www.postgresql.org/docs/devel/static/recovery-target-settings.html


2018-03-08 10:41 GMT+03:00 Dylan Luong <dylan.lu...@unisa.edu.au>:

> Hi Michael,
>
> I tested the failover today and the slave 2 failed to resync with the new
> master (old slave1).
>
> After I promoted the slave1 to become master,  I was able to use pg_rewind
> on the old master and bring it back as new slave.
>
> I then stopped slave2 and ran pg_rewind on slave2 against new master, it
> report that no rewind was required:
>
>   $ pg_rewind -D /var/lib/pgsql/9.6/data 
> --source-server="host=x.xxx.
> port=5432 user=postgres"
>   servers diverged at WAL position 1BB/AB98 on timeline 5
>   no rewind required
>
> So I then updated the recovery.conf on slave2 with primary_conninfo equal
> to the new master IP.
> When starting up posgres, it failed with the following error in the logs:
>
> database system was shut down in recovery at 2018-03-08 17:52:10 ACDT
> 2018-03-08 17:56:27 ACDT [23026]: [2-1] db=,user= app=,host= LOG:
> entering standby mode
> cp: cannot stat '/pg_backup/backup/archive /0005.history': No such
> file or directory
> cp: cannot stat '/pg_backup/backup/archive /000501BB00AB': No
> such file or directory
> 2018-03-08 17:56:27 ACDT [23026]: [3-1] db=,user= app=,host= LOG:
> consistent recovery state reached at 1BB/AB98
> 2018-03-08 17:56:27 ACDT [23026]: [4-1] db=,user= app=,host= LOG:  record
> with incorrect prev-link 1B9/7340 at 1BB/AB98
> 2018-03-08 17:56:27 ACDT [23024]: [3-1] db=,user= app=,host= LOG:
> database system is ready to accept read only connections
> 2018-03-08 17:56:27 ACDT [23032]: [1-1] db=,user= app=,host= LOG:  started
> streaming WAL from primary at 1BB/AB00 on timeline 5
> 2018-03-08 17:56:27 ACDT [23032]: [2-1] db=,user= app=,host= LOG:
> replication terminated by primary server
> 2018-03-08 17:56:27 ACDT [23032]: [3-1] db=,user= app=,host= DETAIL:  End
> of WAL reached on timeline 5 at 1BB/AB98.
> cp: cannot stat '/pg_backup/backup/archive_sync/000501BB00AB':
> No such file or directory
> 2018-03-08 17:56:27 ACDT [23032]: [4-1] db=,user= app=,host= LOG:
> restarted WAL streaming at 1BB/AB00 on timeline 5
> 2018-03-08 17:56:27 ACDT [23032]: [5-1] db=,user= app=,host= LOG:
> replication terminated by primary server
> 2018-03-08 17:56:27 ACDT [23032]: [6-1] db=,user= app=,host= DETAIL:  End
> of WAL reached on timeline 5 at 1BB/AB98.
>
>
> On the new master in the /pg_backup/backup/archive folder I can see a file
> 000501BB00AB.partial
> Eg.
> ls -l
> -rw---. 1 postgres postgres 16777216 Mar  8 16:48
> 000501BB00AB.partial
> -rw---. 1 postgres postgres 16777216 Mar  8 16:49
> 000601BB00AB
> -rw---. 1 postgres postgres 16777216 Mar  8 16:49
> 000601BB00AC
> -rw---. 1 postgres postgres 16777216 Mar  8 16:49
> 000601BB00AD
> -rw---. 1 postgres postgres 16777216 Mar  8 16:49
> 000601BB00AE
> -rw---. 1 postgres postgres 16777216 Mar  8 16:49
> 000601BB00AF
> -rw---. 1 postgres postgres 16777216 Mar  8 16:49
> 000601BB00B0
> -rw---. 1 postgres postgres 16777216 Mar  8 16:49
> 000601BB00B1
> -rw---. 1 postgres postgres 16777216 Mar  8 16:49
> 000601BB00B2
> -rw---. 1 postgres postgres 16777216 Mar  8 16:50
> 000601BB00B3
> -rw---. 1 postgres postgres 16777216 Mar  8 17:01
> 000601BB00B4
> -rw---. 1 postgres postgres 16777216 Mar  8 17:14
> 000601BB00B5
> -rw---. 1 postgres postgres  218 Mar  8 16:48 0006.history
>
> Any ideas?
>
> Dylan
>
> -----Original Message-
> From: Michael Paquier [mailto:mich...@paquier.xyz]
> Sent: Tuesday, 6 March 2018 5:55 PM
> To: Dylan Luong <dylan.lu...@unisa.edu.au>
> Cc: pgsql-generallists.postgresql.org <pgsql-general@lists.postgresql.org>
> Subject: Re: Resync second slave to new master
>
> On Tue, Mar 06, 2018 at 06:00:40AM +, Dylan Luong wrote:
> > So everytime after promoting Slave to become master (either manually
> > or automatic), just stop Slave2 and run pg_rewind on slave2 against
> > the new maser (old slave1). And when old master server is available
> > again, use pg_rewind on that serve as well against new master to
> > return to original configuration.
>
> Yes.  That's exactly the idea.  Running pg_rewind on the old master will
> be necessary anyway because you need to stop it cleanly once, which will
> cause it to generate WAL records at least for the shutdown checkpoint,
> while doing it on slave 2 may be optional, still safer to do.
> --
> Michael
>
>


RE: Resync second slave to new master

2018-03-07 Thread Dylan Luong
Hi Michael,

I tested the failover today and the slave 2 failed to resync with the new 
master (old slave1).

After I promoted the slave1 to become master,  I was able to use pg_rewind on 
the old master and bring it back as new slave.

I then stopped slave2 and ran pg_rewind on slave2 against new master, it report 
that no rewind was required:

  $ pg_rewind -D /var/lib/pgsql/9.6/data 
--source-server="host=x.xxx. port=5432 user=postgres"
  servers diverged at WAL position 1BB/AB98 on timeline 5
  no rewind required

So I then updated the recovery.conf on slave2 with primary_conninfo equal to 
the new master IP.
When starting up posgres, it failed with the following error in the logs:

database system was shut down in recovery at 2018-03-08 17:52:10 ACDT
2018-03-08 17:56:27 ACDT [23026]: [2-1] db=,user= app=,host= LOG:  entering 
standby mode
cp: cannot stat '/pg_backup/backup/archive /0005.history': No such file or 
directory
cp: cannot stat '/pg_backup/backup/archive /000501BB00AB': No such 
file or directory
2018-03-08 17:56:27 ACDT [23026]: [3-1] db=,user= app=,host= LOG:  consistent 
recovery state reached at 1BB/AB98
2018-03-08 17:56:27 ACDT [23026]: [4-1] db=,user= app=,host= LOG:  record with 
incorrect prev-link 1B9/7340 at 1BB/AB98
2018-03-08 17:56:27 ACDT [23024]: [3-1] db=,user= app=,host= LOG:  database 
system is ready to accept read only connections
2018-03-08 17:56:27 ACDT [23032]: [1-1] db=,user= app=,host= LOG:  started 
streaming WAL from primary at 1BB/AB00 on timeline 5
2018-03-08 17:56:27 ACDT [23032]: [2-1] db=,user= app=,host= LOG:  replication 
terminated by primary server
2018-03-08 17:56:27 ACDT [23032]: [3-1] db=,user= app=,host= DETAIL:  End of 
WAL reached on timeline 5 at 1BB/AB98.
cp: cannot stat '/pg_backup/backup/archive_sync/000501BB00AB': No 
such file or directory
2018-03-08 17:56:27 ACDT [23032]: [4-1] db=,user= app=,host= LOG:  restarted 
WAL streaming at 1BB/AB00 on timeline 5
2018-03-08 17:56:27 ACDT [23032]: [5-1] db=,user= app=,host= LOG:  replication 
terminated by primary server
2018-03-08 17:56:27 ACDT [23032]: [6-1] db=,user= app=,host= DETAIL:  End of 
WAL reached on timeline 5 at 1BB/AB98.


On the new master in the /pg_backup/backup/archive folder I can see a file 
000501BB00AB.partial
Eg.
ls -l
-rw---. 1 postgres postgres 16777216 Mar  8 16:48 
000501BB00AB.partial
-rw---. 1 postgres postgres 16777216 Mar  8 16:49 000601BB00AB
-rw---. 1 postgres postgres 16777216 Mar  8 16:49 000601BB00AC
-rw---. 1 postgres postgres 16777216 Mar  8 16:49 000601BB00AD
-rw---. 1 postgres postgres 16777216 Mar  8 16:49 000601BB00AE
-rw---. 1 postgres postgres 16777216 Mar  8 16:49 000601BB00AF
-rw---. 1 postgres postgres 16777216 Mar  8 16:49 000601BB00B0
-rw---. 1 postgres postgres 16777216 Mar  8 16:49 000601BB00B1
-rw---. 1 postgres postgres 16777216 Mar  8 16:49 000601BB00B2
-rw---. 1 postgres postgres 16777216 Mar  8 16:50 000601BB00B3
-rw---. 1 postgres postgres 16777216 Mar  8 17:01 000601BB00B4
-rw---. 1 postgres postgres 16777216 Mar  8 17:14 000601BB00B5
-rw---. 1 postgres postgres  218 Mar  8 16:48 0006.history

Any ideas?

Dylan

-Original Message-
From: Michael Paquier [mailto:mich...@paquier.xyz] 
Sent: Tuesday, 6 March 2018 5:55 PM
To: Dylan Luong <dylan.lu...@unisa.edu.au>
Cc: pgsql-generallists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Resync second slave to new master

On Tue, Mar 06, 2018 at 06:00:40AM +, Dylan Luong wrote:
> So everytime after promoting Slave to become master (either manually
> or automatic), just stop Slave2 and run pg_rewind on slave2 against
> the new maser (old slave1). And when old master server is available
> again, use pg_rewind on that serve as well against new master to
> return to original configuration.

Yes.  That's exactly the idea.  Running pg_rewind on the old master will
be necessary anyway because you need to stop it cleanly once, which will
cause it to generate WAL records at least for the shutdown checkpoint,
while doing it on slave 2 may be optional, still safer to do.
--
Michael



Re: Resync second slave to new master

2018-03-05 Thread Michael Paquier
On Tue, Mar 06, 2018 at 06:00:40AM +, Dylan Luong wrote:
> So everytime after promoting Slave to become master (either manually
> or automatic), just stop Slave2 and run pg_rewind on slave2 against
> the new maser (old slave1). And when old master server is available
> again, use pg_rewind on that serve as well against new master to
> return to original configuration.

Yes.  That's exactly the idea.  Running pg_rewind on the old master will
be necessary anyway because you need to stop it cleanly once, which will
cause it to generate WAL records at least for the shutdown checkpoint,
while doing it on slave 2 may be optional, still safer to do.
--
Michael


signature.asc
Description: PGP signature


RE: Resync second slave to new master

2018-03-05 Thread Dylan Luong
Thanks Michael,
So everytime after promoting Slave to become master (either manually or 
automatic), just stop Slave2 and run pg_rewind on slave2 against the new maser 
(old slave1). And when old master server is available again, use pg_rewind on 
that serve as well against new master to return to original configuration.

-Original Message-
From: Michael Paquier [mailto:mich...@paquier.xyz] 
Sent: Tuesday, 6 March 2018 3:54 PM
To: Dylan Luong <dylan.lu...@unisa.edu.au>
Cc: pgsql-generallists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Resync second slave to new master

On Tue, Mar 06, 2018 at 04:45:10AM +, Dylan Luong wrote:
> After a failover (promote) to the Slave1,  is it easily resync the
> Slave2 to the new master (old slave1)? Do we need to do  full rebuild 
> of the Slave2 from new master everytime we failover to Slave1 from 
> Master? Can we use pg_rewind on Slave2 to resyn it with new master 
> (old slave1)?

After promoting slave 1, it could be possible that some records have slipped to 
slave 2 from the primary.  In this case, a rewind would be recommended.  You 
should be careful that your slave 2 has not received WAL to a position newer 
than where WAL has forked because of the promotion.  If that happened, then a 
rewind would be necessary before replugging slave 2 to the newly-promoted 
server.  Be very careful with your failover flow here.  pg_rewind also would 
not run if it finds that the target server does not need a rewind, so you could 
stop the slave 2, and run pg_rewind unconditionally to keep things simple.
--
Michael



Re: Resync second slave to new master

2018-03-05 Thread Michael Paquier
On Tue, Mar 06, 2018 at 04:45:10AM +, Dylan Luong wrote:
> After a failover (promote) to the Slave1,  is it easily resync the
> Slave2 to the new master (old slave1)? Do we need to do  full rebuild
> of the Slave2 from new master everytime we failover to Slave1 from
> Master? Can we use pg_rewind on Slave2 to resyn it with new master
> (old slave1)? 

After promoting slave 1, it could be possible that some records have
slipped to slave 2 from the primary.  In this case, a rewind would be
recommended.  You should be careful that your slave 2 has not received
WAL to a position newer than where WAL has forked because of the
promotion.  If that happened, then a rewind would be necessary before
replugging slave 2 to the newly-promoted server.  Be very careful with
your failover flow here.  pg_rewind also would not run if it finds that
the target server does not need a rewind, so you could stop the slave 2,
and run pg_rewind unconditionally to keep things simple.
--
Michael


signature.asc
Description: PGP signature


Resync second slave to new master

2018-03-05 Thread Dylan Luong
Hi

We are currently on PostgreSQL 9.6 and our HA is configured as  master/slave 
with wal streaming replication.
We would like to setup a second slave which replicates from the same master.

Eg:
Current setup:

Slave <-- Master
Slave is setup for automatic failover and we use pg_rewind after failover to 
rewind the old master as new slave.

New setup:

Slave1   <-- Master --> Slave2

Slave 2 (new) is for reporting and nightly pg_dumps, as we have some very big 
databases.
Slave 1 is for HA and is setup automatic failover.

After a failover (promote) to the Slave1,  is it easily resync the Slave2 to 
the new master (old slave1)? Do we need to do  full rebuild of the Slave2 from 
new master everytime we failover to Slave1 from Master? Can we use pg_rewind on 
Slave2 to resyn it with new master (old slave1)?

Thanks
Dylan