(Sorry for the top post) Thank you all for your time, answers and advices. They are much appreciated.
I have no bandwidth right now to process your inputs, for some days / weeks (work and moving away to a new city) and my colleague is overwhelmed as well :( We'll get back to the list soon with some feedback about our attempt to implement your advices. Season's greetings all! Le Wed, 9 Dec 2015 18:04:47 -0600, Ken Gaillot <[email protected]> a écrit : > On 12/08/2015 05:52 AM, Andrei Borzenkov wrote: > > On Fri, Dec 4, 2015 at 4:11 PM, Jehan-Guillaume de Rorthais > > <[email protected]> wrote: > > > >> > >> But I fail to understand how I could distinguish, even from notifications, > >> a failing scenario from a move/switchover one. > >> > > > > On demote fetch current log position and store it in cluster > > attribute. On promote fetch previous master position, wait until > > current instance caught up and delete attribute. If attribute is not > > present on promote, master was down so do not wait and proceed. > > > > If you set transient attribute, cluster will forget about previous > > master on restart. If you set persistent attribute, it will allow you > > to ensure no data loss has (automatically) occurred even on cluster > > restart. > > > > Where do you envision problems here? > > This is more or less what was suggested in the original post :) and > after discussing this some more, I tend to agree with this approach > (using an attribute, as opposed to clone notifications, or the proposed > migration support for the master role). > > The demote action would set an attribute. It would be best to use a > private attribute (attrd_updater --private --update), so setting it > doesn't trigger further pacemaker activity. Since the attribute is set > by demote, it will work whether the move is initiated by the cluster or > externally (by a sysadmin). To initiate it manually, you can set a > negative location constraint for the master role on the current master. > > The promote action would check for that attribute (attrd_updater > --private --query --all). If it exists, then it's an orderly handover, > and it should wait for the replication checkpoint. On success, remove > the attribute. There should be a timeout on the waiting (less than the > timeout for the promote operation as a whole), for when there is a > network issue during the transfer. You could decide whether timeout > means "grab the master role immediately" or "fail the promote". > > I do see the logical appeal of migrate_to/migrate_from for the master > role, but that would be a long-term project. _______________________________________________ Developers mailing list [email protected] http://clusterlabs.org/mailman/listinfo/developers
