> On 5 Dec 2015, at 12:11 AM, Jehan-Guillaume de Rorthais <[email protected]> > wrote: > > On Wed, 2 Dec 2015 14:02:23 +1100 > Andrew Beekhof <[email protected]> wrote: > >> >>> On 26 Nov 2015, at 11:52 AM, Jehan-Guillaume de Rorthais <[email protected]> >>> wrote: >>> >>> Hi guys, >>> >>> While working on our pgsqlms agent[1], we are now studying how to control >>> all the steps of a switchover process from the resource agent. >>> >>> The tricky part here is the 2nd step of a successful swithover with >>> PostgreSQL (9.3+): >>> (1) shutdown the master first >>> (2) make sure the designated slave received **everything** from the old >>> master >> >> How can you achieve (2) if (1) has already occurred? > > This check consist of validating the last transaction log entry the slave > received. It must be the "shutdown checkpoint" from the old master. > >> There’s no-one for the designated slave to talk to in the case of errors... > > I was explaining the steps for a successful switchover in PostgreSQL, outside > of Pacemaker. Sorry for the confusion if it wasn't clear enough :/ > > This is currently done by hands. Should an error occurs (the > slave did non received the shutdown checkpoint of the master), the human > operator simply restart/promote the master and the slave get back to its > replication from it.
Why not do it as part of the promote action? Loop until you see the checkpoint. Thats what galera does. You may want the on-fail=block for the promote action though. in galera the datastore usually ends up corrupted if you stop half-way through an rsync, so we tell pacemaker to leave it alone :-( > >>> (3) promote the designated slave as master >>> (4) start the old master as slave >> >> (4) is pretty tricky. Assuming you use master/slave, its supposed to be in >> this state already after the demote in step (1). > > Back to Pacemaker and our RA. A demote in PostgreSQL is really a stop + start > as > slave. So after a demote, as the master actually did stopped and restart as > slave, the designated slave to be promoted must have the "shutdown checkpoint" > in its transaction log from the old master. > >> If you’re just using clones, >> then you’re in even more trouble because pacemaker either wouldn’t have >> stopped it or won’t want to start it again. > > We are using stateful clones with the master/slave role. > During a Pacemaker "move" (what I call a switchover), the resource is demoted > in the source node and promoted in destination one. Considering a demote in > PostgreSQL is a stop/start(as slave), we are fine with (1) (3) and (4): > > (1) the demote did stop the old master (and restarted it as slave) > (3) the designated slave is promoted > (4) the old master, connect to the new master > > About (4), as the old master is restarted as a slave in (1), it just wait to > be able to connect to the new master during (2) and (3) occurs. It might be > either the "master IP address" that finally appears or some setup in the "post > promote" notification, etc. > >> See more below. >> >>> As far as we understand Pacemaker, migrate-to and migrate-from capabilities >>> allows to distinguish if we are moving a resource because of a failure or >>> for a controlled switchover situation. Unfortunately, these capabilities >>> are ignored for cloned and multi-state resources… >> >> Yeah, this isn’t really the right use-case. >> You need to be looking more at the promote/demote cycle. >> >> If you turn on notifications, then in a graceful switchover (eg. the node is >> going into standby) you will get information about which node has been >> selected to become the new master when calling demote on the old master. >> Perhaps you could ensure (2) while performing (1). > > Our RA is already working. It already uses promote/demode notifications. See > > > https://github.com/dalibo/pgsql-resource-agent/blob/master/multistate/script/pgsqlms > > But I fail to understand how I could distinguish, even from notifications, a > failing scenario from a move/switchover one. > > During a failure on master, Pacemaker will first try to demote it and even > fence the node if needed. In notification, I will receive the same > informations > than during a move, isn't it? not quite > > Or maybe you think about comparing active/master/slave/stop/inactive resources > from notification between the pre and post-demote to deduce if the old master > is still alive as a slave [1]? right. if its a migration, then the old master will appear in both $OCF_RESKEY_CRM_meta_notify_master_uname and $OCF_RESKEY_CRM_meta_notify_demote_uname but not $OCF_RESKEY_CRM_meta_notify_stop_uname > In this scenario, I suppose we would have to keep > the name of the old master in a private attribute in the designated slave to > be > promoted to compare the states of the old master? OCF_RESKEY_CRM_meta_notify_master_uname should already have it > > [1] > https://github.com/ClusterLabs/pacemaker/blob/master/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt#L942 > >> Its not ideal, but you could have (4) happen in the post-promote >> notification. >> Notify actions aren’t /supposed/ to change resource state but it has been >> done before. > > The step 4 is fine, no problem with it, no need to mess with it, again, sorry > for the confusion. > > I am sure we can probably find a workaround to this problem, but it seems to > me > it requires some struggling and wrestling in the code to bend it to what we > try > to achieve. > > I thought using migrate-to/migrate-from would have been much cleaner code and > almost self documented compare to some more conditional blocks with complex > manipulation and computation (eg. dealing with array of nodes to compare > states > during pre/post demote). Far from it, migrate-to/migrate-from are incredibly complex inside pacemaker. _______________________________________________ Developers mailing list [email protected] http://clusterlabs.org/mailman/listinfo/developers
