Hello all, According to the developers guide, when calling demote on a stopped resources, the RA should returns a soft error:
http://www.linux-ha.org/doc/dev-guides/_literal_demote_literal_action.html « foobar_monitor rc=$? case "$rc" in [...] "$OCF_NOT_RUNNING") # Currently not running. Getting a demote action # in this state is unexpected. Exit with an error # and let the cluster manager recover. ocf_log err "Resource is currently not running" exit $OCF_ERR_GENERIC ;; [...] » But to recover a master resource that is fount not running, PEngine produce a transition with the following actions: demote -> stop -> start -> promote. If we follow the dev guide, the recover action is not possible on a stopped master as the first action of the transition will always fail, leading to a migration and a -inf score on the old master node. My first though was «why doing a demote -> stop that breaks everything when it knows the resource is already stopped?!» If I understand correctly, I guess PEngine **must** produce such a transition so the notify actions are triggered should other leaving clone need to process them. Is it right? If this is right, then maybe we should relax a bit what is written in the ocf dev guide? To be able to deal with this in our RA, if the resource is stopped during the demote action, we silently start it as a slave and return OCF_ERR_GENERIC If we couldn't start the resource. We return OCF_SUCCESS if it succeed (I guess we could juste return OCF_SUCCESS without starting it if the transition plans to stop it according to the notify variables). Comments? Advices? Regards, -- Jehan-Guillaume de Rorthais Dalibo _______________________________________________ Developers mailing list [email protected] http://clusterlabs.org/mailman/listinfo/developers
