On Tue, 1 Dec 2015 12:34:59 +0300 Andrei Borzenkov <[email protected]> wrote:
> On Tue, Dec 1, 2015 at 12:08 PM, Jehan-Guillaume de Rorthais > <[email protected]> wrote: > > On Tue, 1 Dec 2015 06:36:35 +0300 > > Andrei Borzenkov <[email protected]> wrote: > > > >> 26.11.2015 03:52, Jehan-Guillaume de Rorthais пишет: > >> > Hi guys, > >> > > >> > While working on our pgsqlms agent[1], we are now studying how to control > >> > all the steps of a switchover process from the resource agent. > >> > > >> > The tricky part here is the 2nd step of a successful swithover with > >> > PostgreSQL (9.3+): > >> > (1) shutdown the master first > >> > (2) make sure the designated slave received **everything** from the old > >> > master > >> > >> I am not familiar with PG, but it sounds backwards. Once master > >> (replication source) is shut down, there is no way to verify anything on > >> slave (replication target) side. > > > > Once the master is shut down, the slave are still running, we can check > > whatever we want on them. > > > >> Is there any way to tell PG to "prepare to switch" and wait until it is > >> complete on demote? > > > > Demoting a master in PG is: shutdown -> start as slave. > > > >> Or do you mean waiting until slave finished replaying pending > >> replication stream? In this case I expect it should be possible to check > >> on slave side (something like "we have 5 files to replay left")? > > > > Yes, that is what I mean. > > > > In normal situation, the master (PG 9.3+) will wait for its standbies to > > receive everything, then do a "shutdown checkpoint" which is streamed to > > the slaves as well. At this point, slaves are aware the master did a clean > > shutdown. > > > > Dring a switchover, we **must** check the new master received the old-master > > "shutdown checkpoint". If promotion occurs before this xlog record, the old > > master will not be able to replicate from the new master. > > > > If PG waits for soundbys to "receive everything", how is it possible > that slave is promoted too early? Pacemaker should wait for demote to > complete and demote will wait for slaves to get everything. At least > that what follows from your explanation. I probably miss something > here. As explained below, a network issue or moving the master IP address is enough to break this. I has been bitten by the later during tests when setting up colocation without asymmetrical order (ie. promote/start IP and demote/stop IP). > > During this shutdown window, any kind of network issue or just a wrong setup > > (like the master IP being moved **before** the demote) will forbid a clean > > switchover and old master will never catchup the new one. > > What would be the correct action in this case? Block promoting of slave? > > I think it may be possible to use notifications here. If demoting was > announced and master was active at this point, you know pacemaker > intended to stop master and so should check for completion. Although I > admit I do not know which notifications are sent for failed resource > and for failed node. -- Jehan-Guillaume de Rorthais Dalibo http://www.dalibo.com _______________________________________________ Developers mailing list [email protected] http://clusterlabs.org/mailman/listinfo/developers
