On 03.06.2022 11:18, Zoran Bošnjak wrote:
> Hi all,
> I would appreciate an advice about sbd fencing (without shared storage).
> 
> I am using ubuntu 20.04., with default packages from the repository 
> (pacemaker, corosync, fence-agents, ipmitool, pcs...).
> 
> HW watchdog is present on servers. The first problem was to load/unload the 
> watchdog module. For some reason the module is blacklisted on ubuntu,

What makes you think so?

bor@bor-Latitude-E5450:~$ lsb_release  -d

Description:    Ubuntu 20.04.4 LTS

bor@bor-Latitude-E5450:~$ modprobe -c | grep ipmi_watchdog

bor@bor-Latitude-E5450:~$





> so I've created a service for this purpose.
>

man modules-load.d


> --- file: /etc/systemd/system/watchdog.service
> [Unit]
> Description=Load watchdog timer module
> After=syslog.target
> 

Without any explicit dependencies stop will be attempted as soon as
possible.

> [Service]
> Type=oneshot
> RemainAfterExit=yes
> ExecStart=/sbin/modprobe ipmi_watchdog
> ExecStop=/sbin/rmmod ipmi_watchdog
> 

Why on earth do you need to unload kernel driver when system reboots?

> [Install]
> WantedBy=multi-user.target
> ---
> 
> Is this a proper way to load watchdog module under ubuntu?
> 

There is standard way to load non-autoloaded drivers on *any* systemd
based distribution. Which is modules-load.d.

> Anyway, once the module is loaded, the /dev/watchdog (which is required by 
> 'sbd') is present.
> Next, the 'sbd' is installed by
> 
> sudo apt install sbd
> (followed by one reboot to get the sbd active)
> 
> The configuration of the 'sbd' is default. The sbd reacts to network failure 
> as expected (reboots the server). However, when the 'sbd' is active, the 
> server won't reboot normally any more. For example from the command line 
> "sudo reboot", it gets stuck at the end of the reboot sequence. There is a 
> message on the console:
> 
> ... reboot progress
> [ OK ] Finished Reboot.
> [ OK ] Reached target Reboot.
> [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> ... it gets stuck at this point
> 
> After some long timeout, it looks like the watchdog timer expires and server 
> boots, but the failure indication remains on the front panel of the server. 
> If I uninstall the 'sbd' package, the "sudo reboot" works normally again.
> 
> My question is: How do I configure the system, to have the 'sbd' function 
> present, but still be able to reboot the system normally.
> 

As the first step - do not unload watchdog driver on shutdown.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to