Hi,

On 10.12.21 15:22, Stefan Radman wrote:
> What is the reason for hardcoding the watchdog timeout into 
> pve-ha-manager/watchdog-mux.c?

Note that this is the multiplexer, the actual timeout for its clients is 60s.

The MUX opens the actual watchdog, it's a really small C program with a very 
small
footprint and static resource usage, so it won't ever fail to update the 
watchdog
in any situation where the system isn't total lost.

The MUX then checks the actual clients, if those did not ping in the last 60s 
the
MUX will stop updating the actual watchdog, causing a reset around 0s to 10s 
later.

So the in-practice timeout for the watchdog services the MUX provides is 60 to 
70
seconds, not ten.

> 
> https://git.proxmox.com/?p=pve-ha-manager.git;a=blob;f=src/watchdog-mux.c#l33 
> <https://git.proxmox.com/?p=pve-ha-manager.git;a=blob;f=src/watchdog-mux.c#l33>
>   33 
> <https://git.proxmox.com/?p=pve-ha-manager.git;a=blob;f=src/watchdog-mux.c#l33>
>  int watchdog_timeout = 10;
> https://git.proxmox.com/?p=pve-ha-manager.git;a=blob;f=src/watchdog-mux.c#l157
>  
> <https://git.proxmox.com/?p=pve-ha-manager.git;a=blob;f=src/watchdog-mux.c#l157>
>  157 
> <https://git.proxmox.com/?p=pve-ha-manager.git;a=blob;f=src/watchdog-mux.c#l157>
>      if (ioctl(watchdog_fd, WDIOC_SETTIMEOUT, &watchdog_timeout) == -1) {
> 
> I am trying to use a more conservative 5 minute timeout for the IPMI watchdog 
> but it gets changed to 10 seconds when the watchdog-mux.service starts.

That's not a reasonable timeout for Proxmox VE's HA self fencing as pmxcfs 
locks have
a timeout of 2 minutes, if you go above that all consistency guarantees from 
the self
fencing are void and a HA Service can be recovered while the original one still 
access
some of its resources, iow. there be dragons.

ps. Personally I'd only rely on a HW watchdog if I'm really sure it runs 
stable, most
of the time their firmware is just a mess and they have so many bugs that the 
softdog
of the kernel, which itself is a quite small and simple kernel module, works 
more
stable. YMMV, but I never saw a situation where the softdog didn't do its job 
but we
got some report of failing HW watchdogs - not /that/ many, but most users go 
for the
default setup so this may be biased.

hope that helps,
Thomas


_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to