On 03.06.2022 11:18, Zoran Bošnjak wrote: > Hi all, > I would appreciate an advice about sbd fencing (without shared storage). > > I am using ubuntu 20.04., with default packages from the repository > (pacemaker, corosync, fence-agents, ipmitool, pcs...). > > HW watchdog is present on servers. The first problem was to load/unload the > watchdog module. For some reason the module is blacklisted on ubuntu,
What makes you think so? bor@bor-Latitude-E5450:~$ lsb_release -d Description: Ubuntu 20.04.4 LTS bor@bor-Latitude-E5450:~$ modprobe -c | grep ipmi_watchdog bor@bor-Latitude-E5450:~$ > so I've created a service for this purpose. > man modules-load.d > --- file: /etc/systemd/system/watchdog.service > [Unit] > Description=Load watchdog timer module > After=syslog.target > Without any explicit dependencies stop will be attempted as soon as possible. > [Service] > Type=oneshot > RemainAfterExit=yes > ExecStart=/sbin/modprobe ipmi_watchdog > ExecStop=/sbin/rmmod ipmi_watchdog > Why on earth do you need to unload kernel driver when system reboots? > [Install] > WantedBy=multi-user.target > --- > > Is this a proper way to load watchdog module under ubuntu? > There is standard way to load non-autoloaded drivers on *any* systemd based distribution. Which is modules-load.d. > Anyway, once the module is loaded, the /dev/watchdog (which is required by > 'sbd') is present. > Next, the 'sbd' is installed by > > sudo apt install sbd > (followed by one reboot to get the sbd active) > > The configuration of the 'sbd' is default. The sbd reacts to network failure > as expected (reboots the server). However, when the 'sbd' is active, the > server won't reboot normally any more. For example from the command line > "sudo reboot", it gets stuck at the end of the reboot sequence. There is a > message on the console: > > ... reboot progress > [ OK ] Finished Reboot. > [ OK ] Reached target Reboot. > [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog! > [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog! > ... it gets stuck at this point > > After some long timeout, it looks like the watchdog timer expires and server > boots, but the failure indication remains on the front panel of the server. > If I uninstall the 'sbd' package, the "sudo reboot" works normally again. > > My question is: How do I configure the system, to have the 'sbd' function > present, but still be able to reboot the system normally. > As the first step - do not unload watchdog driver on shutdown. _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/