Re: [HACKERS] [GENERAL] Multiple Slave Failover with PITR

Sergey Konoplev Mon, 03 Sep 2012 02:08:26 -0700

On Sun, Sep 2, 2012 at 4:12 PM, Bruce Momjian <br...@momjian.us> wrote:
> Do we ever want to document a way to connect slaves to a new master,
> rather than recreating the slave?


I have written an instruction for myself and I am planning to publish
it on http://code.google.com/p/pgcookbook/. See the attachment.

Hope you will find it useful. If anybody would like to provide any
criticism I will highly appreciate it.

>
> ---------------------------------------------------------------------------
>
> On Tue, Mar 27, 2012 at 10:47:48AM -0700, Ken Brush wrote:
>> Hello everyone,
>>
>> I notice that the documentation at:
>> http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial
>>
>> Doesn't contain steps in a Multiple Slave setup for re-establishing
>> them after a slave has become the new master.
>>
>> Based on the documentation, here are the most fail-proof steps I came up 
>> with:
>>
>> 1. Master dies :(
>> 2. Touch the trigger file on the most caught up slave.
>> 3. Slave is now the new master :)
>> 4. use pg_basebackup or other binary replication trick (rsync, tar
>> over ssh, etc...) to bring the other slaves up to speed with the new
>> master.
>> 5. start the other slaves pointing to the new master.
>>
>> But, that can take time (about 1-2 hours) with my medium sized DB
>> (580GB currently).
>>
>> After testing a few different ideas that I gleaned from posts on the
>> mail list, I came up with this alternative method:
>>
>> 1. Master dies :(
>> 2. Touch the trigger file on the most caught up slave
>> 3. Slave is now the new master.
>> 4. On the other slaves do the following:
>> 5. Shutdown postgres on the slave
>> 6. Delete every file in /data/pgsql/data/pg_xlog
>> 7. Modify the recovery.conf file to point to the new master and
>> include the line "recovery_target_timeline='latest'"
>> 8. Copy the history file from the new master to the slave (it's the
>> most recent #.history file in the xlog directory)
>> 9. Startup postgres on the slave and watch it sync up to the new
>> master (about 1-5 minutes usually)
>>
>> My question is this. Is the alternative method adequate? I tested it a
>> bit and couldn't find any problems with data loss or inconsistency.
>>
>> I still use the fail-proof method above to re-incorporate the old
>> master as a new slave.
>>
>> Sincerely,
>> -Ken
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
> --
>   Bruce Momjian  <br...@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
>
>   + It's impossible for everything to be true. +
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sergey Konoplev

a database and software architect
http://www.linkedin.com/in/grayhemp

Jabber: gray...@gmail.com Skype: gray-hemp Phone: +79160686204

Streaming replication based failover

Let us suppose that there is a hot standby replication set up in a cluster. The 
db1 server is a master and dbX are replicas. Also suppose that we need to do a 
failover on one of our slaves.

First we need to define what replica will be a new master.

In case of the master failure you need to find the most caught up replica. To 
do this compare WAL replay locations on replicas and chose the one with the 
biggest value.

postgres=# SELECT pg_last_xlog_replay_location();
 pg_last_xlog_replay_location 
------------------------------
 BAA/37DA2888
(1 row)

Note that If you chose not the most caught up one than other replicas that have 
replayed later WAL entries must be reconfigured from scratch. Otherwise their 
data can be corrupted and you will not get any warnings about it.

In case of the planned switchover choose one that will be a new master on your 
own.

Then stop all the slaves except the new master. Let say the remaining slave is 
db2.

Use the command below to help the master and the remaining slave get ahead of 
other slaves if you are not sure that they already are. The command creates a 
minimal WAL entry.

postgres=# SELECT txid_current();

Now touch the failover file on the remaining slave to promote it as a new 
master.

postgres@db2: ~ $ touch /path/to/failover

And you will see similar things in logs.

2011-04-05 11:46:48 MSD @ 69971 [4d9ac05d.11153-5]:LOG:  trigger file found: 
/db/data/failover
2011-04-05 11:46:48 MSD @ 69974 [4d9ac05d.11156-2]:FATAL:  terminating 
walreceiver process due to administrator command
2011-04-05 11:46:48 MSD @ 69971 [4d9ac05d.11153-6]:LOG:  redo done at 0/4012A68
2011-04-05 11:46:48 MSD @ 69971 [4d9ac05d.11153-7]:LOG:  last completed 
transaction was at log time 2011-04-05 11:41:29.199406+04
2011-04-05 11:46:48 MSD @ 69971 [4d9ac05d.11153-8]:LOG:  selected new timeline 
ID: 2
2011-04-05 11:46:49 MSD @ 69971 [4d9ac05d.11153-9]:LOG:  archive recovery 
complete
2011-04-05 11:46:49 MSD @ 69972 [4d9ac05d.11154-1]:LOG:  checkpoint starting: 
end-of-recovery immediate wait
2011-04-05 11:46:49 MSD @ 69972 [4d9ac05d.11154-2]:LOG:  checkpoint complete: 
wrote 22 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 3 
recycled; write=0.010 s, sync=0.044 s, total=0.060 s
2011-04-05 11:46:49 MSD @ 69969 [4d9ac05c.11151-2]:LOG:  database system is 
ready to accept connections

The failover file will be removed automatically and recovery.conf will be 
renamed to recovery.done.

On the stopped slaves delete everything in the pg_xlog directory.

postgres@dbX: ~ $ rm -rf /db/data/pg_xlog/*

And copy the history file from the new master.

postgres@dbX: ~ $ scp db2:/db/data/pg_xlog/*.history /db/data/pg_xlog/

Then change the DSN to the new master and add the following instruction in 
recovery.conf. This will make replicas to follow the latest created timeline.

recovery_target_timeline = 'latest'

Now start the postgres service and wait until it is synced up to the new master.

dbX: ~ # /etc/init.d/postgresql start

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Multiple Slave Failover with PITR

Reply via email to