On Tue, 21 Jul 2015 15:07:57 -0700 Spencer Baugh <sba...@catern.com> wrote:

> From: Joern Engel <jo...@logfs.org>
> 
> We have observed cases where the soft lockup detector triggered, but no
> kernel bug existed.  Instead we had a buggy realtime thread that
> monopolized a cpu.  So let's kill the responsible party and not panic
> the entire system.
> 
> ...
>
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -428,7 +428,10 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
> hrtimer *hrtimer)
>               }
>  
>               add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
> -             if (softlockup_panic)
> +             if (rt_prio(current->prio)) {
> +                     pr_emerg("killing realtime thread\n");
> +                     send_sig(SIGILL, current, 0);

Why choose SIGILL?

> +             } else if (softlockup_panic)
>                       panic("softlockup: hung tasks");
>               __this_cpu_write(soft_watchdog_warn, true);

But what about a non-buggy realtime thread which happens to
occasionally spend 15 seconds doing stuff?

Old behaviour: kernel blurts a softlockup message, everything keeps running.

New behaviour: thread gets killed, plane crashes.


Possibly a better approach would be to only kill the thread if
softlockup_panic was set, because the system is going down anyway.

Also, perhaps some users would prefer that the kernel simply suppress
the softlockup warning in this situation, rather than killing stuff!




Really, what you're trying to implement here is a watchdog for runaway
realtime threads.  And that sounds a worthy project but it's a rather
separate thing from the softlockup detector.  A realtime thread
watchdog feature might have things as

- timeout duration separately configurable from softlockup

- enabled independently from sotflockup: people might want one and
  not the other.

- configurable signal, perhaps?

Now, the *implementation* of the realtime thread watchdog may well
share code with the softlockup detector.  But from a
conceptual/configuration/documentation point of view, it's a separate
thing, no?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to