On Fri, Dec 4, 2015 at 4:11 PM, Jehan-Guillaume de Rorthais <[email protected]> wrote: > On Wed, 2 Dec 2015 14:02:23 +1100 > Andrew Beekhof <[email protected]> wrote: > >> >> > On 26 Nov 2015, at 11:52 AM, Jehan-Guillaume de Rorthais <[email protected]> >> > wrote: >> > >> > Hi guys, >> > >> > While working on our pgsqlms agent[1], we are now studying how to control >> > all the steps of a switchover process from the resource agent. >> > >> > The tricky part here is the 2nd step of a successful swithover with >> > PostgreSQL (9.3+): >> > (1) shutdown the master first >> > (2) make sure the designated slave received **everything** from the old >> > master >> >> How can you achieve (2) if (1) has already occurred? > > This check consist of validating the last transaction log entry the slave > received. It must be the "shutdown checkpoint" from the old master. > >> There’s no-one for the designated slave to talk to in the case of errors... > > I was explaining the steps for a successful switchover in PostgreSQL, outside > of Pacemaker. Sorry for the confusion if it wasn't clear enough :/ > > This is currently done by hands. Should an error occurs (the > slave did non received the shutdown checkpoint of the master), the human > operator simply restart/promote the master and the slave get back to its > replication from it. > >> > (3) promote the designated slave as master >> > (4) start the old master as slave >> >> (4) is pretty tricky. Assuming you use master/slave, its supposed to be in >> this state already after the demote in step (1). > > Back to Pacemaker and our RA. A demote in PostgreSQL is really a stop + start > as > slave. So after a demote, as the master actually did stopped and restart as > slave, the designated slave to be promoted must have the "shutdown checkpoint" > in its transaction log from the old master. > >> If you’re just using clones, >> then you’re in even more trouble because pacemaker either wouldn’t have >> stopped it or won’t want to start it again. > > We are using stateful clones with the master/slave role. > During a Pacemaker "move" (what I call a switchover), the resource is demoted > in the source node and promoted in destination one. Considering a demote in > PostgreSQL is a stop/start(as slave), we are fine with (1) (3) and (4): > > (1) the demote did stop the old master (and restarted it as slave) > (3) the designated slave is promoted > (4) the old master, connect to the new master > > About (4), as the old master is restarted as a slave in (1), it just wait to > be able to connect to the new master during (2) and (3) occurs. It might be > either the "master IP address" that finally appears or some setup in the "post > promote" notification, etc. > >> See more below. >> >> > As far as we understand Pacemaker, migrate-to and migrate-from capabilities >> > allows to distinguish if we are moving a resource because of a failure or >> > for a controlled switchover situation. Unfortunately, these capabilities >> > are ignored for cloned and multi-state resources… >> >> Yeah, this isn’t really the right use-case. >> You need to be looking more at the promote/demote cycle. >> >> If you turn on notifications, then in a graceful switchover (eg. the node is >> going into standby) you will get information about which node has been >> selected to become the new master when calling demote on the old master. >> Perhaps you could ensure (2) while performing (1). > > Our RA is already working. It already uses promote/demode notifications. See > > > https://github.com/dalibo/pgsql-resource-agent/blob/master/multistate/script/pgsqlms > > But I fail to understand how I could distinguish, even from notifications, a > failing scenario from a move/switchover one. >
Does it really matter? You have asynchronous replication. In case of involuntary failover you are bound to lose some in-flight transactions. If you accept it, I do not see why you care in case of voluntary failover. How is the situation worse than sudden host crash millisecond before you were ready to move master to another host? > During a failure on master, Pacemaker will first try to demote it and even > fence the node if needed. In notification, I will receive the same > informations > than during a move, isn't it? > > Or maybe you think about comparing active/master/slave/stop/inactive resources > from notification between the pre and post-demote to deduce if the old master > is still alive as a slave [1]? In this scenario, I suppose we would have to > keep > the name of the old master in a private attribute in the designated slave to > be > promoted to compare the states of the old master? > > [1] > https://github.com/ClusterLabs/pacemaker/blob/master/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt#L942 > >> Its not ideal, but you could have (4) happen in the post-promote >> notification. >> Notify actions aren’t /supposed/ to change resource state but it has been >> done before. > > The step 4 is fine, no problem with it, no need to mess with it, again, sorry > for the confusion. > > I am sure we can probably find a workaround to this problem, but it seems to > me > it requires some struggling and wrestling in the code to bend it to what we > try > to achieve. > > I thought using migrate-to/migrate-from would have been much cleaner code and > almost self documented compare to some more conditional blocks with complex > manipulation and computation (eg. dealing with array of nodes to compare > states > during pre/post demote). > > > _______________________________________________ > Developers mailing list > [email protected] > http://clusterlabs.org/mailman/listinfo/developers _______________________________________________ Developers mailing list [email protected] http://clusterlabs.org/mailman/listinfo/developers
