On 04/12/19 14:53 +0900, Ondrej wrote: > When adding 'LSB' script to pacemaker cluster I can see that > pacemaker advertises 'restart' and 'force-reload' operations to be > present - regardless if the LSB script supports it or not. This > seems to be coming from following piece of code. > > https://github.com/ClusterLabs/pacemaker/blob/92b0c1d69ab1feb0b89e141b5007f8792e69655e/lib/services/services_lsb.c#L39-L40 > > Questions: > 1. When the 'restart' and 'force-reload' operations are called on > the LSB script cluster resource?
[reordered] > I would have expected that 'restart' operation would be called when > using 'crm_resource --restart --resource myResource', but I can see > that 'stop' and 'start' operations are used in that case instead. This is due to how "crm_resource --restart" is arranged, directly in the implementation of this CLI tool itself (see tools/crm_resource_runtime.c:cli_resource_restart): - first, target-role meta-attribute for resource is set to Stopped - then, once the activity settled, it is set back to the target-role it was originally at Performing this stepwise like this, there's no reasonably implementable mapping back to a single step being the actual composition (stop, start -> restart) when the plan is not shared in full in advance (it is not) with the respective moving parts. And there's plain common sense that would still preclude it (below). Hence, it is in actuality a great discovery that "restart" trigerring verb/action is in fact completely neglected and bogus when it comes to handling by pacemaker. If it implements any optimizations (thanks to having the intimate knowledge of the resource at hand, plus knowing before-after state combo and possibly how to transition in one go), cluster resource management won't benefit from that in any way. Interestingly, such optimizations are exactly what the original OCF draft had in mind :-) https://github.com/ClusterLabs/OCF-spec/blob/start/resource_agent/API/02#L225 (even more interestingly, only to be reconsidered again some decades later: https://github.com/ClusterLabs/OCF-spec/issues/10; yeah, aren't we masters of following targets moving to the extent they are sometimes contradictory? I'd blame a desperate lack of written [and easily obtainable] design decisions made in the past for that) They are mandated by LSB as well, but hey, in systemd era, we are now _free_ to call LSB severely broken as it (shamefully, I'd say) never even tried to accommodate proper dealing with dependency chains (and actual serializability thereof!), as explained in an example below. Or put in other words, LSB was never meant to stand for a holistic resource management, something both systemd and pacemaker attempt to cover (single/multi-machine wide). OTOH, this enforced split of state transitions is perhaps what makes the transaction (comprising perhaps countless other interdependent resources) serializable and thus feasible at all (think: you cannot nest any further handling -- so as to satisfy given constraints -- in between stop and start when that's an atom, otherwise), and that's exactly how, say, systemd approaches that, likely for that very reason: https://github.com/systemd/systemd/commit/6539dd7c42946d9ba5dc43028b8b5785eb2db3c5 So I see a room for improvement here as our takeaway: * resource agents: - some agents declare/implement "restart" action when there is no practical reason to (AudibleAlarm, Xinetd, dhcpd, etc.) [as a side note, there are non-sensical considerations, such as when default "start" and "stop" timeouts for dhcpd are 20 seconds each, how come, then, that "restart" defined as "stop; start" would also make do with 20 seconds altogher, unless there is some amortized work I fail to see :-)] * pacemaker: - artificially generated meta-data mention "restart" action when there is no good reason to (lib/services/services_lsb.c) - there are some correct clues in Pacemaker Explained, but perhaps, it shall take a time to emphasize that whenever "restart" is referred, it is never an atomic step, but always a sequence of two steps that may be considered atomic on their own, but possibly interleaved with other steps so as to retain soundness wrt. the imposed constraints and/or changes made in parallel - the same gist of "restart" shall be sketched in a help screen of crm_resource > For 'force-reload' I have no idea on how to try trigger it looking > at 'crm_resource --help' output. Sorry, that's even more bogus, as there's no relevance whatsoever. It needs to either be dropped from artificially generated meta-data as well, or investigated further whether there's any reason to make of such an operation triggerable by users, and if positive, how much of impact spread to be expected when implemented (do the dependent services need to be reloaded or "restarted" as well, since the change might be non-local? any precedent there? again, hard to analyse in the lack of written design decisions that would provide an immediate frame for thinking about this) [reordered] > 2. How can I trigger 'restart' and 'force-reload' operation on LSB > script cluster resource in pacemaker? > > Cluster resource definition looks like this: > <primitive class="lsb" id="myResource" type="script.sh"> > <operations> > <op id="myResource-force-reload-interval-0s" interval="0s" > name="force-reload" timeout="15s"/> > <op id="myResource-monitor-interval-15" interval="15" name="monitor" > timeout="15"/> > <op id="myResource-restart-interval-0s" interval="0s" name="restart" > timeout="15"/> > <op id="myResource-start-interval-0s" interval="0s" name="start" > timeout="15"/> > <op id="myResource-stop-interval-0s" interval="0s" name="stop" > timeout="15"/> > </operations> > <instance_attributes id="myResource-instance_attributes"/> > <meta_attributes id="myResource-meta_attributes"/> > </primitive> > > [...] > > I want to make sure that cluster will not attempt running 'restart' > nor 'force-reload' on script that is not implementing them. Understood, I am reasonably sure about the former and definitely sure about the latter, in the current state of implementation anyway. That you even need to stress about these bogus circumstances doesn't put us in a good light, but the more important this feedback loop is. > As for now I'm considering to return exit code '3' from script when > these actions are called to indicate that they are 'unimplemented > feature' as suggested by LSB specification below. However I would > like to verify that this works as expected. > http://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html If your resource is solely to be run under pacemaker, I'd prune all those those quirks altogethher, to make one's life easier. -- Jan (Poki)
pgp8bHvgs21Lh.pgp
Description: PGP signature
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/