Re: [Intel-gfx] [PATCH 2/2] RFC: soft/hardlookup: taint kernel
On Thu, May 9, 2019 at 11:24 AM Sergey Senozhatsky wrote: > > On (05/02/19 21:42), Daniel Vetter wrote: > [..] > > @@ -469,6 +469,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct > > hrtimer *hrtimer) > > add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK); > > if (softlockup_panic) > > panic("softlockup: hung tasks"); > > + else > > + add_taint(TAINT_WARN, LOCKDEP_STILL_OK); > > __this_cpu_write(soft_watchdog_warn, true); > > Soft lockup sets TAINT_SOFTLOCKUP bit. Would it be enough for your CI? I'm blind :-/ Yes this is totally useful. > [..] > > @@ -154,6 +154,8 @@ static void watchdog_overflow_callback(struct > > perf_event *event, > > > > if (hardlockup_panic) > > nmi_panic(regs, "Hard LOCKUP"); > > + else > > + add_taint(TAINT_WARN, LOCKDEP_STILL_OK); > > Maybe you can mirror what soft lockup does. Add a HARDLOCKUP taint bit We'd also want a taint for hung tasks (separate patch, same idea), not sure it's a good idea to use a new taint bit for all of them? Atm we don't check for all taint bits (some of them are set because of things our testcases do, like module reload or setting unsafe kernel options meant for testing only, so picking one of the bits we check already was least resistance. > +++ b/include/linux/kernel.h > @@ -571,7 +571,8 @@ extern enum system_states { > #define TAINT_LIVEPATCH15 > #define TAINT_AUX 16 > #define TAINT_RANDSTRUCT 17 > -#define TAINT_FLAGS_COUNT 18 > +#define TAINT_HARDLOCKUP 18 > +#define TAINT_FLAGS_COUNT 19 > > and then set TAINT_HARDLOCKUP in watchdog_overflow_callback(). > > Just a small idea, I'll leave this to more experienced people. The hung_tasks taint wasn't all that positively received, I feels like this will stay a hack private to our CI. Except if someone else pipes up who wants this, then I'm happy to polish. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] RFC: soft/hardlookup: taint kernel
On Thu, 2019-05-02 at 21:42 +0200, Daniel Vetter wrote: > There's the soft/hardlookup_panic sysctls, but that's a bit an > extreme > measure. As a fallback taint at least the machine. > > Our CI uses this to decide when a reboot is necessary, plus to figure > out whether the kernel is still happy. > > Signed-off-by: Daniel Vetter > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Peter Zijlstra > Cc: Valdis Kletnieks > Cc: Laurence Oberman > Cc: Vincent Whitchurch > Cc: Don Zickus > Cc: Andrew Morton > Cc: Sergey Senozhatsky > Cc: Sinan Kaya > Cc: Daniel Vetter > --- > kernel/watchdog.c | 2 ++ > kernel/watchdog_hld.c | 2 ++ > 2 files changed, 4 insertions(+) > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 6a5787233113..de7a60503517 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -469,6 +469,8 @@ static enum hrtimer_restart > watchdog_timer_fn(struct hrtimer *hrtimer) > add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK); > if (softlockup_panic) > panic("softlockup: hung tasks"); > + else > + add_taint(TAINT_WARN, LOCKDEP_STILL_OK); > __this_cpu_write(soft_watchdog_warn, true); > } else > __this_cpu_write(soft_watchdog_warn, false); > diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c > index 247bf0b1582c..cce46cf75d76 100644 > --- a/kernel/watchdog_hld.c > +++ b/kernel/watchdog_hld.c > @@ -154,6 +154,8 @@ static void watchdog_overflow_callback(struct > perf_event *event, > > if (hardlockup_panic) > nmi_panic(regs, "Hard LOCKUP"); > + else > + add_taint(TAINT_WARN, LOCKDEP_STILL_OK); > > __this_cpu_write(hard_watchdog_warn, true); > return; This looks OK to me, could be useful to know we would have triggered had the flags been set. Reviewed-by: Laurence Oberman ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 2/2] RFC: soft/hardlookup: taint kernel
There's the soft/hardlookup_panic sysctls, but that's a bit an extreme measure. As a fallback taint at least the machine. Our CI uses this to decide when a reboot is necessary, plus to figure out whether the kernel is still happy. Signed-off-by: Daniel Vetter Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Valdis Kletnieks Cc: Laurence Oberman Cc: Vincent Whitchurch Cc: Don Zickus Cc: Andrew Morton Cc: Sergey Senozhatsky Cc: Sinan Kaya Cc: Daniel Vetter --- kernel/watchdog.c | 2 ++ kernel/watchdog_hld.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 6a5787233113..de7a60503517 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -469,6 +469,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK); if (softlockup_panic) panic("softlockup: hung tasks"); + else + add_taint(TAINT_WARN, LOCKDEP_STILL_OK); __this_cpu_write(soft_watchdog_warn, true); } else __this_cpu_write(soft_watchdog_warn, false); diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c index 247bf0b1582c..cce46cf75d76 100644 --- a/kernel/watchdog_hld.c +++ b/kernel/watchdog_hld.c @@ -154,6 +154,8 @@ static void watchdog_overflow_callback(struct perf_event *event, if (hardlockup_panic) nmi_panic(regs, "Hard LOCKUP"); + else + add_taint(TAINT_WARN, LOCKDEP_STILL_OK); __this_cpu_write(hard_watchdog_warn, true); return; -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx