On 10/1/2024 5:02 PM, mike tancsa wrote:
On 10/1/2024 4:03 PM, mike tancsa wrote:
On 10/1/2024 2:07 AM, Stephane Rochoy wrote:
mike tancsa <[email protected]> writes:
WARNING: This e-mail comes from someone outside your organisation.
Do not click
on links or open attachments if you do not know the sender and are
not sure that
the content is safe.
On 9/30/2024 3:18 AM, Stephane Rochoy wrote:
mike tancsa <[email protected]> writes:
Do you know off hand how to set the system to just reboot ? The
ddb man
page seems to imply I need options DDB as well, which is not in
GENERIC
in order to set script actions.
I would try the following:
ddb script kdb.enter.default=reset
If I build a custom kernel then that will work. But with GENERIC (I am
tracking project via freebsd-update), it fails
# ddb script kdb.enter.default=reset
ddb: sysctl: debug.ddb.scripting.scripts: No such file or directory
With a customer kernel, adding
options DDB
it works perfectly.
Is there any way to get this to work without having ddb custom
compiled in ?
I don't understand what's happening here. AFAIK, the code
corresponding to the soft watchdog being triggered is the
following:
static void
wd_timeout_cb(void *arg)
{
const char *type = arg;
#ifdef DDB
if ((wd_pretimeout_act & WD_SOFT_DDB)) {
char kdb_why[80];
snprintf(kdb_why, sizeof(kdb_why), "watchdog %s-timeout",
type);
kdb_backtrace();
kdb_enter(KDB_WHY_WATCHDOG, kdb_why);
}
#endif
if ((wd_pretimeout_act & WD_SOFT_LOG))
log(LOG_EMERG, "watchdog %s-timeout, WD_SOFT_LOG\n", type);
if ((wd_pretimeout_act & WD_SOFT_PRINTF))
printf("watchdog %s-timeout, WD_SOFT_PRINTF\n", type);
if ((wd_pretimeout_act & WD_SOFT_PANIC))
panic("watchdog %s-timeout, WD_SOFT_PANIC set", type);
}
So without DDB, it should call panic. But in your case, it
called kdb_backtrace. So initial hypothesis was wrong. What I
missed is that panic was natively able to kdb_backtrace if gently
asked to do so:
#ifdef KDB
if ((newpanic || trace_all_panics) && trace_on_panic)
kdb_backtrace();
if (debugger_on_panic)
kdb_enter(KDB_WHY_PANIC, "panic");
else if (!newpanic && debugger_on_recursive_panic)
kdb_enter(KDB_WHY_PANIC, "re-panic");
#endif
/*thread_lock(td); */
td->td_flags |= TDF_INPANIC;
/* thread_unlock(td); */
if (!sync_on_panic)
bootopt |= RB_NOSYNC;
if (poweroff_on_panic)
bootopt |= RB_POWEROFF;
if (powercycle_on_panic)
bootopt |= RB_POWERCYCLE;
kern_reboot(bootopt);
So it definitely should reboot but as it don't, maybe playing with
kern.powercycle_on_panic would help?
Thank you for your continued help on this. Still no luck with the
GENERIC kernel
0{p9999}# sysctl -w kern.powercycle_on_panic=1
kern.powercycle_on_panic: 0 -> 1
0{p9999}# ps -auxwww | grep dog
root 4752 0.0 0.2 12820 12916 - S<s 15:38 0:00.01
watchdogd --softtimeout-action panic -t 10
root 4792 0.0 0.0 12808 2644 u0 S+ 15:39 0:00.00 grep dog
0{p9999}# kill -9 4752
0{p9999}# KDB: stack backtrace:
#0 0xffffffff80b7fefd at kdb_backtrace+0x5d
#1 0xffffffff80abec93 at hardclock+0x103
#2 0xffffffff80abfe8b at handleevents+0xab
#3 0xffffffff80ac0b7c at timercb+0x24c
#4 0xffffffff810d0ebb at lapic_handle_timer+0xab
#5 0xffffffff80fd8a71 at Xtimerint+0xb1
#6 0xffffffff804b3685 at acpi_cpu_idle+0x2c5
#7 0xffffffff80fc48f6 at cpu_idle_acpi+0x46
#8 0xffffffff80fc49ad at cpu_idle+0x9d
#9 0xffffffff80b67bb6 at sched_idletd+0x576
#10 0xffffffff80aecf7f at fork_exit+0x7f
#11 0xffffffff80fd7dae at fork_trampoline+0xe
0{p9999}#
Where would be the best place to hack in something like this in the
driver ?
sysctl -w debug.kdb.panic_str="Watchdog Panic"
which actually does panic the box
One other datapoint. It seems starting
watchdogd --softtimeout-action panic --softtimeout -t 10
After kill -9
it eventually prints out
watchdog soft-timeout, WD_SOFT_LOG
to dmesg. But after that, I cannot start a new watchdogd with just
watchdogd --softtimeout-action panic -t 10
I get
watchdogd: setting WDIOC_SETSOFT 1: Invalid argument
watchdogd: patting the dog: Invalid argument
I made these 2 changes to the driver
--- watchdog.c 2024-10-01 20:37:28.667869000 -0400
+++ /tmp/watchdog.c 2024-10-01 20:36:59.764330000 -0400
@@ -61,7 +61,8 @@
static struct callout wd_softtimeo_handle;
static int wd_softtimer; /* true = use softtimer instead of hardware
watchdog */
-static int wd_softtimeout_act = WD_SOFT_LOG; /* action for the
software timeout */
+// static int wd_softtimeout_act = WD_SOFT_LOG; /* action for
the software timeout */
+static int wd_softtimeout_act = WD_SOFT_PANIC; /* action for the
software timeout */
static struct cdev *wd_dev;
static volatile u_int wd_last_u; /* last timeout value set by
kern_do_pat */
@@ -241,6 +242,7 @@
wd_timeout_cb(void *arg)
{
const char *type = arg;
+ panic("mdt watchdog %s-timeout, WD_SOFT_PANIC set", type);
#ifdef DDB
if ((wd_pretimeout_act & WD_SOFT_DDB)) {
and it works now
KDB: stack backtrace:
#0 0xffffffff80b8943d at kdb_backtrace+0x5d
#1 0xffffffff80b3bfd1 at vpanic+0x131
#2 0xffffffff80b3be93 at panic+0x43
#3 0xffffffff8098b585 at wd_timeout_cb+0x15
#4 0xffffffff80b59fcc at softclock_call_cc+0x12c
#5 0xffffffff80b5b815 at softclock_thread+0xe5
#6 0xffffffff80af61df at fork_exit+0x7f
#7 0xffffffff80ff76ce at fork_trampoline+0xe
Uptime: 1m13s
it seems the soft timeout value action is never overridden for some reason.
This kinda feels like a bug / pr ?
---Mike