On Thu, 2018-05-31 at 22:43 +0200, Jehan-Guillaume de Rorthais wrote: > On Thu, 31 May 2018 22:52:12 +0300 > Andrei Borzenkov <arvidj...@gmail.com> wrote: > > > 31.05.2018 22:18, Jehan-Guillaume de Rorthais пишет: > > > Sorry for getting back to you so late. > > > > > > On Fri, 25 May 2018 11:58:59 -0600 > > > Casey & Gina <caseyandg...@icloud.com> wrote: > > > > > > > > On May 25, 2018, at 7:01 AM, Casey Allen Shobe <caseyandgina@ > > > > > icloud.com> > > > > > wrote: > > > > > > Actually, why is Pacemaker fencing the standby node just > > > > > > because a > > > > > > resource fails to start there? I thought only the master > > > > > > should be > > > > > > fenced if it were assumed to be broken. > > > > > > > > This is probably the most important thing to ask outside of the > > > > PAF > > > > resource agent which many may not be as fluent with as > > > > pacemaker itself, > > > > and perhaps the most indicative of me setting something up > > > > incorrectly > > > > outside of that resource agent. > > > > > > > > My understanding of fencing was that pacemaker would only fence > > > > a node if > > > > it was the master but had stopped responding, to avoid a split- > > > > brain > > > > situation. Why would pacemaker ever fence a standby node with > > > > no resources > > > > currently allocated to it? > > > > > > So, as discussed on IRC and for the mailing list history, here is > > > the > > > answer: > > > > > > https://clusterlabs.github.io/PAF/administration.html#failover > > > > > > In short: after a failure (either on a primary or a standby), you > > > MUST fix > > > things on the node before starting Pacemaker. > > > > > > If you don't, PAF will detect something incoherent and raise an > > > error, > > > leading Pacemaker to most likely fence your node, again. > > > > > > > Well, that does not sound very polite to user :) > > Sure :) > > But at least, It's been documented as you pointed earlier. > > After a failure and an automatic failover, either you have some > automatic > failback process somewhere...or you have to fix some things around. > > PAF is not able to do automatic failback. > > > Another database RA I mentioned somewhere in this thread has > > different > > approach - it starts database in its monitor action and start > > action is > > effectively dummy. > > Mh, I would have to study that. But I'm not thrill about such > behavior at a > first look. > > > So start always succeeds from pacemaker point of > > view, but database won't be started until manually synchronized > > again by > > administrator. > > It seems scary...What about the stop action? What if the monitor > detect an > error? Well, I really should check this RA you are talking about to > answer my > questions. > > > Downside is that pacemaker resource status does not reflect > > database > > status. I wish pacemaker supported something like "requires manual > > intervention" resource state that would not be treated like error > > (causing all sorts of fatal consequences) but still evaluated for > > dependencies (i.e. dependent resources would not be started). That > > would > > be ideal for such case.
I'm not clear what such a result would mean. Is the goal to stop dependent resources, but not the resource itself? And/or to block all further management of the resource? > Good idea. > > I have a couple more: > * handling errors from notify actions I could imagine notify supporting on-fail, defaulting to ignore. Would that do what you want? Should notify errors count toward the resource fail count? > * supporting migrate-to/from for multistate RA > * having real infinite master score :) What behavior isn't supported by current infinity? > > Cheers, -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org