On Wed, 2 Dec 2015 14:02:23 +1100 Andrew Beekhof <[email protected]> wrote:
> > > On 26 Nov 2015, at 11:52 AM, Jehan-Guillaume de Rorthais <[email protected]> > > wrote: > > > > Hi guys, > > > > While working on our pgsqlms agent[1], we are now studying how to control > > all the steps of a switchover process from the resource agent. > > > > The tricky part here is the 2nd step of a successful swithover with > > PostgreSQL (9.3+): > > (1) shutdown the master first > > (2) make sure the designated slave received **everything** from the old > > master > > How can you achieve (2) if (1) has already occurred? This check consist of validating the last transaction log entry the slave received. It must be the "shutdown checkpoint" from the old master. > There’s no-one for the designated slave to talk to in the case of errors... I was explaining the steps for a successful switchover in PostgreSQL, outside of Pacemaker. Sorry for the confusion if it wasn't clear enough :/ This is currently done by hands. Should an error occurs (the slave did non received the shutdown checkpoint of the master), the human operator simply restart/promote the master and the slave get back to its replication from it. > > (3) promote the designated slave as master > > (4) start the old master as slave > > (4) is pretty tricky. Assuming you use master/slave, its supposed to be in > this state already after the demote in step (1). Back to Pacemaker and our RA. A demote in PostgreSQL is really a stop + start as slave. So after a demote, as the master actually did stopped and restart as slave, the designated slave to be promoted must have the "shutdown checkpoint" in its transaction log from the old master. > If you’re just using clones, > then you’re in even more trouble because pacemaker either wouldn’t have > stopped it or won’t want to start it again. We are using stateful clones with the master/slave role. During a Pacemaker "move" (what I call a switchover), the resource is demoted in the source node and promoted in destination one. Considering a demote in PostgreSQL is a stop/start(as slave), we are fine with (1) (3) and (4): (1) the demote did stop the old master (and restarted it as slave) (3) the designated slave is promoted (4) the old master, connect to the new master About (4), as the old master is restarted as a slave in (1), it just wait to be able to connect to the new master during (2) and (3) occurs. It might be either the "master IP address" that finally appears or some setup in the "post promote" notification, etc. > See more below. > > > As far as we understand Pacemaker, migrate-to and migrate-from capabilities > > allows to distinguish if we are moving a resource because of a failure or > > for a controlled switchover situation. Unfortunately, these capabilities > > are ignored for cloned and multi-state resources… > > Yeah, this isn’t really the right use-case. > You need to be looking more at the promote/demote cycle. > > If you turn on notifications, then in a graceful switchover (eg. the node is > going into standby) you will get information about which node has been > selected to become the new master when calling demote on the old master. > Perhaps you could ensure (2) while performing (1). Our RA is already working. It already uses promote/demode notifications. See https://github.com/dalibo/pgsql-resource-agent/blob/master/multistate/script/pgsqlms But I fail to understand how I could distinguish, even from notifications, a failing scenario from a move/switchover one. During a failure on master, Pacemaker will first try to demote it and even fence the node if needed. In notification, I will receive the same informations than during a move, isn't it? Or maybe you think about comparing active/master/slave/stop/inactive resources from notification between the pre and post-demote to deduce if the old master is still alive as a slave [1]? In this scenario, I suppose we would have to keep the name of the old master in a private attribute in the designated slave to be promoted to compare the states of the old master? [1] https://github.com/ClusterLabs/pacemaker/blob/master/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt#L942 > Its not ideal, but you could have (4) happen in the post-promote notification. > Notify actions aren’t /supposed/ to change resource state but it has been > done before. The step 4 is fine, no problem with it, no need to mess with it, again, sorry for the confusion. I am sure we can probably find a workaround to this problem, but it seems to me it requires some struggling and wrestling in the code to bend it to what we try to achieve. I thought using migrate-to/migrate-from would have been much cleaner code and almost self documented compare to some more conditional blocks with complex manipulation and computation (eg. dealing with array of nodes to compare states during pre/post demote). _______________________________________________ Developers mailing list [email protected] http://clusterlabs.org/mailman/listinfo/developers
