Re: [Intel-gfx] [PATCH 2/2] RFC: soft/hardlookup: taint kernel

2019-05-09 Thread Daniel Vetter
On Thu, May 9, 2019 at 11:24 AM Sergey Senozhatsky
 wrote:
>
> On (05/02/19 21:42), Daniel Vetter wrote:
> [..]
> > @@ -469,6 +469,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
> > hrtimer *hrtimer)
> >   add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
> >   if (softlockup_panic)
> >   panic("softlockup: hung tasks");
> > + else
> > + add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
> >   __this_cpu_write(soft_watchdog_warn, true);
>
> Soft lockup sets TAINT_SOFTLOCKUP bit. Would it be enough for your CI?

I'm blind :-/ Yes this is totally useful.

> [..]
> > @@ -154,6 +154,8 @@ static void watchdog_overflow_callback(struct 
> > perf_event *event,
> >
> >   if (hardlockup_panic)
> >   nmi_panic(regs, "Hard LOCKUP");
> > + else
> > + add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
>
> Maybe you can mirror what soft lockup does. Add a HARDLOCKUP taint bit

We'd also want a taint for hung tasks (separate patch, same idea), not
sure it's a good idea to use a new taint bit for all of them? Atm we
don't check for all taint bits (some of them are set because of things
our testcases do, like module reload or setting unsafe kernel options
meant for testing only, so picking one of the bits we check already
was least resistance.

> +++ b/include/linux/kernel.h
> @@ -571,7 +571,8 @@ extern enum system_states {
>  #define TAINT_LIVEPATCH15
>  #define TAINT_AUX  16
>  #define TAINT_RANDSTRUCT   17
> -#define TAINT_FLAGS_COUNT  18
> +#define TAINT_HARDLOCKUP   18
> +#define TAINT_FLAGS_COUNT  19
>
> and then set TAINT_HARDLOCKUP in watchdog_overflow_callback().
>
> Just a small idea, I'll leave this to more experienced people.

The hung_tasks taint wasn't all that positively received, I feels like
this will stay a hack private to our CI. Except if someone else pipes
up who wants this, then I'm happy to polish.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 2/2] RFC: soft/hardlookup: taint kernel

2019-05-02 Thread Laurence Oberman
On Thu, 2019-05-02 at 21:42 +0200, Daniel Vetter wrote:
> There's the soft/hardlookup_panic sysctls, but that's a bit an
> extreme
> measure. As a fallback taint at least the machine.
> 
> Our CI uses this to decide when a reboot is necessary, plus to figure
> out whether the kernel is still happy.
> 
> Signed-off-by: Daniel Vetter 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Valdis Kletnieks 
> Cc: Laurence Oberman 
> Cc: Vincent Whitchurch 
> Cc: Don Zickus 
> Cc: Andrew Morton 
> Cc: Sergey Senozhatsky 
> Cc: Sinan Kaya 
> Cc: Daniel Vetter 
> ---
>  kernel/watchdog.c | 2 ++
>  kernel/watchdog_hld.c | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 6a5787233113..de7a60503517 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -469,6 +469,8 @@ static enum hrtimer_restart
> watchdog_timer_fn(struct hrtimer *hrtimer)
>   add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
>   if (softlockup_panic)
>   panic("softlockup: hung tasks");
> + else
> + add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
>   __this_cpu_write(soft_watchdog_warn, true);
>   } else
>   __this_cpu_write(soft_watchdog_warn, false);
> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> index 247bf0b1582c..cce46cf75d76 100644
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -154,6 +154,8 @@ static void watchdog_overflow_callback(struct
> perf_event *event,
>  
>   if (hardlockup_panic)
>   nmi_panic(regs, "Hard LOCKUP");
> + else
> + add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
>  
>   __this_cpu_write(hard_watchdog_warn, true);
>   return;

This looks OK to me, could be useful to know we would have triggered
had the flags been set.

Reviewed-by: Laurence Oberman 

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 2/2] RFC: soft/hardlookup: taint kernel

2019-05-02 Thread Daniel Vetter
There's the soft/hardlookup_panic sysctls, but that's a bit an extreme
measure. As a fallback taint at least the machine.

Our CI uses this to decide when a reboot is necessary, plus to figure
out whether the kernel is still happy.

Signed-off-by: Daniel Vetter 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Valdis Kletnieks 
Cc: Laurence Oberman 
Cc: Vincent Whitchurch 
Cc: Don Zickus 
Cc: Andrew Morton 
Cc: Sergey Senozhatsky 
Cc: Sinan Kaya 
Cc: Daniel Vetter 
---
 kernel/watchdog.c | 2 ++
 kernel/watchdog_hld.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 6a5787233113..de7a60503517 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -469,6 +469,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
if (softlockup_panic)
panic("softlockup: hung tasks");
+   else
+   add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
__this_cpu_write(soft_watchdog_warn, true);
} else
__this_cpu_write(soft_watchdog_warn, false);
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..cce46cf75d76 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -154,6 +154,8 @@ static void watchdog_overflow_callback(struct perf_event 
*event,
 
if (hardlockup_panic)
nmi_panic(regs, "Hard LOCKUP");
+   else
+   add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
 
__this_cpu_write(hard_watchdog_warn, true);
return;
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx