On 07.06.2022 11:50, Klaus Wenninger wrote: >> >> From the documentation is not clear to me whether this would be: >> a) multiple fencing where ipmi would be first level and sbd would be a >> second level fencing (where sbd always succeeds) >> b) or this is considered a single level fencing with a timeout > > With b) falling back to watchdog-fencing wouldn't work properly > although I remember > some recent change that might make it fall back without issues.
b) works here: Jun 07 17:35:50 ha2 pacemaker-controld[7069]: notice: Requesting fencing (reboot) of node qnetd Jun 07 17:35:50 ha2 pacemaker-fenced[7065]: notice: Client pacemaker-controld.7069 wants to fence (reboot) qnetd using any device Jun 07 17:35:50 ha2 pacemaker-fenced[7065]: notice: Requesting peer fencing (reboot) targeting qnetd Jun 07 17:35:50 ha2 pacemaker-fenced[7065]: notice: watchdog is not eligible to fence (reboot) qnetd: static-list Jun 07 17:35:50 ha2 pacemaker-schedulerd[7068]: warning: Calculated transition 14 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-95.bz2 Jun 07 17:35:50 ha2 pacemaker-fenced[7065]: notice: Requesting that ha1 perform 'reboot' action targeting qnetd Jun 07 17:35:53 ha2 pacemaker-fenced[7065]: notice: Requesting that ha2 perform 'reboot' action targeting qnetd Jun 07 17:35:53 ha2 pacemaker-fenced[7065]: notice: watchdog is not eligible to fence (reboot) qnetd: static-list Jun 07 17:35:55 ha2 stonith[11138]: external_reset_req: '_dummy reset' for host qnetd failed with rc 1 Jun 07 17:35:57 ha2 stonith[11142]: external_reset_req: '_dummy reset' for host qnetd failed with rc 1 Jun 07 17:35:57 ha2 pacemaker-fenced[7065]: error: Operation 'reboot' [11141] targeting qnetd using dummy_stonith returned 1 Jun 07 17:35:57 ha2 pacemaker-fenced[7065]: warning: dummy_stonith[11141] [ Performing: stonith -t external/_dummy -E -T reset qnetd ] Jun 07 17:35:57 ha2 pacemaker-fenced[7065]: warning: dummy_stonith[11141] [ failed: qnetd 5 ] Jun 07 17:35:57 ha2 pacemaker-fenced[7065]: notice: Couldn't find anyone to fence (reboot) qnetd using any device Jun 07 17:35:57 ha2 pacemaker-fenced[7065]: notice: Waiting 10s for qnetd to self-fence (reboot) for client pacemaker-controld.7069 Jun 07 17:36:07 ha2 pacemaker-fenced[7065]: notice: Self-fencing (reboot) by qnetd for pacemaker-controld.7069 assumed complete Jun 07 17:36:07 ha2 pacemaker-fenced[7065]: notice: Operation 'reboot' targeting qnetd by ha2 for pacemaker-controld.7069@ha2: OK (complete) Jun 07 17:36:07 ha2 pacemaker-controld[7069]: notice: Fence operation 7 for qnetd passed Jun 07 17:36:07 ha2 pacemaker-controld[7069]: notice: Transition 14 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-95.bz2): Complete Jun 07 17:36:07 ha2 pacemaker-controld[7069]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE Jun 07 17:36:07 ha2 pacemaker-controld[7069]: notice: Peer qnetd was terminated (reboot) by ha2 on behalf of pacemaker-controld.7069@ha2: OK The only gotcha is this stray error after everything have already completed. Jun 07 17:37:05 ha2 pacemaker-fenced[7065]: notice: Peer's 'reboot' action targeting qnetd for client pacemaker-controld.7069 timed out Jun 07 17:37:05 ha2 pacemaker-fenced[7065]: notice: Couldn't find anyone to fence (reboot) qnetd using any device Jun 07 17:37:05 ha2 pacemaker-fenced[7065]: error: request_peer_fencing: Triggered fatal assertion at fenced_remote.c:1799 : op->state < st_done bor@bor-Latitude-E5450:~/src/ClusterLabs/pacemaker$ > I would try to go for a) as with a reasonably current > pacemaker-version (iirc 2.1.0 and above) > you should be able to make the watchdog-fencing-device visible as with > other fencing-devices Yep. dummy_stonith watchdog 2 fence devices found > (just use fence_watchdog as the fence-agent - still implemented inside > pacemaker > fence-watchdog-binary actually just provides the meta-data). > Like this you can limit watchdog-fencing to certain-nodes that do > actually provide a proper > hardware-watchdog and you can add it to a topology. > Well, as could be seen from above even though "watchdog" is not eligible, pacemaker is still using it. So I am not sure it will work. _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/