I always used this one for triggering kdump when using sbd:https://www.suse.com/support/kb/doc/?id=000019873 On Fri, Feb 25, 2022 at 21:34, Reid Wahl<nw...@redhat.com> wrote: On Fri, Feb 25, 2022 at 3:47 AM Andrei Borzenkov <arvidj...@gmail.com> wrote: > > On Fri, Feb 25, 2022 at 2:23 PM Reid Wahl <nw...@redhat.com> wrote: > > > > On Fri, Feb 25, 2022 at 3:22 AM Reid Wahl <nw...@redhat.com> wrote: > > > > ... > > > > > > > > So what happens most likely is that the watchdog terminates the kdump. > > > > In that case all the mess with fence_kdump won't help, right? > > > > > > You can configure extra_modules in your /etc/kdump.conf file to > > > include the watchdog module, and then restart kdump.service. For > > > example: > > > > > > # grep ^extra_modules /etc/kdump.conf > > > extra_modules i6300esb > > > > > > If you're not sure of the name of your watchdog module, wdctl can help > > > you find it. sbd needs to be stopped first, because it keeps the > > > watchdog device timer busy. > > > > > > # pcs cluster stop --all > > > # wdctl | grep Identity > > > Identity: i6300ESB timer [version 0] > > > # lsmod | grep -i i6300ESB > > > i6300esb 13566 0 > > > > > > > > > If you're also using fence_sbd (poison-pill fencing via block device), > > > then you should be able to protect yourself from that during a dump by > > > configuring fencing levels so that fence_kdump is level 1 and > > > fence_sbd is level 2. > > > > RHKB, for anyone interested: > > - sbd watchdog timeout causes node to reboot during crash kernel > > execution (https://access.redhat.com/solutions/3552201) > > What is not clear from this KB (and quotes from it above) - what > instance updates watchdog? Quoting (emphasis mine) > > --><-- > With the module loaded, the timer *CAN* be updated so that it does not > expire and force a reboot in the middle of vmcore generation. > --><-- > > Sure it can, but what program exactly updates the watchdog during > kdump execution? I am pretty sure that sbd does not run at this point.
That's a valid question. I found this approach to work back in 2018 after a fair amount of frustration, and didn't question it too deeply at the time. The answer seems to be that the kernel does it. - https://stackoverflow.com/a/2020717 - https://stackoverflow.com/a/42589110 > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > -- Regards, Reid Wahl (He/Him), RHCA Senior Software Maintenance Engineer, Red Hat CEE - Platform Support Delivery - ClusterHA _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/