On Tue, 2017-10-31 at 18:44 +0100, Ferenc Wágner wrote: > Ken Gaillot <kgail...@redhat.com> writes: > > > The pe-input is indeed entirely sufficient. > > > > I forgot to check why the reload was not possible in this case. It > > turns out it is this: > > > > trace: check_action_definition: Resource vm-alder doesn't > > know > > how to reload > > > > Does the resource agent implement the "reload" action and advertise > > it > > in the <actions> section of its metadata? > > Absolutely, I use this operation routinely. > > $ /usr/sbin/crm_resource --show-metadata=ocf:niif:TransientDomain > [...] > <actions> > <action name="start" timeout="10" /> > <action name="stop" timeout="60" /> > <action name="monitor" timeout="10" interval="30" /> > <action name="migrate_to" timeout="120" /> > <action name="migrate_from" timeout="5" /> > <action name="meta-data" timeout="5" /> > <action name="validate-all" timeout="5" /> > <action name="reload" timeout="5" /> > </actions> > </resource-agent> > > And the implementation is just a no-op. > > vm-alder is based on a template, just like all other VMs: > > <primitive id="vm-alder" class="ocf" provider="niif" > type="TransientDomain"> > <instance_attributes id="vm-template-instance_attributes"> > <nvpair id="vm-template-instance_attributes-migr_timeout" > name="migr_timeout" value="120"/> > [...] > </instance_attributes> > [...] > <instance_attributes id="vm-alder-instance_attributes"> > <nvpair id="vm-alder-instance_attributes-migr_timeout" > name="migr_timeout" value="10"/> > [...] > <nvpair id="vm-alder-instance_attributes-admins" name="admins" > value="kissg wferi"/> > </instance_attributes> > <operations> > <op id="vm-alder-migrate_to-0" interval="0" name="migrate_to" > timeout="1500" record-pending="true"/> > <op id="vm-alder-stop-0" interval="0" name="stop" timeout="120" > record-pending="true"/> > <op id="vm-template-migrate_from-0" interval="0" > name="migrate_from" timeout="20"/> > <op id="vm-template-monitor-60" interval="60" name="monitor" > timeout="20"/> > <op id="vm-template-start-0" interval="0" name="start" > timeout="120" record-pending="true"/> > </operations> > [...] > </primitive> > > I wonder why it wouldn't know how to reload. How is that visible in > the > pe-input file? I'd check the other resources...
When an operation completes, a history entry (<lrm_rsc_op>) is added to the pe-input file. If the agent supports reload, the entry will include op-force-restart and op-restart-digest fields. Now I see those are present in the vm-alder_last_0 entry, so agent support isn't the issue. However, the operation is recorded as a *failed* probe (i.e. the resource was running where it wasn't expected). This gets recorded as a separate vm-alder_last_failure_0 entry, which does not get the special fields. It looks to me like this failure entry is forcing the restart. That would be a good idea if it's an actual failure; if we find a resource unexpectedly running, we don't know how it was started, so a full restart makes sense. However, I'm guessing it may not have been a real error, but a resource cleanup. A cleanup clears the history so the resource is re-probed, and I suspect that re-probe is what got recorded here as a failure. Does that match what actually happened? -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org