On Tue 2016-12-06 11:17 -0500, Don Zickus wrote: > On an overloaded system, it is possible that a change in the watchdog > threshold > can be delayed long enough to trigger a false positive. > > This can easily be achieved by having a cpu spinning indefinitely on a task, > while another cpu updates watchdog threshold. > > What happens is while trying to park the watchdog threads, the hrtimers on the > other cpus trigger and reprogram themselves with the new slower watchdog > threshold. Meanwhile, the nmi watchdog is still programmed with the old > faster > threshold. > > Because the one cpu is blocked, it prevents the thread parking on the other > cpus from completing, which is needed to shutdown the nmi watchdog and > reprogram it correctly. As a result, a false positive from the nmi watchdog > is > reported. > > Fix this by setting a park_in_progress flag to block all lockups > until the parking is complete. > > Fix provided by Ulrich Obergfell. > > Cc: Ulrich Obergfell <uober...@redhat.com> > Signed-off-by: Don Zickus <dzic...@redhat.com> > --- > include/linux/nmi.h | 1 + > kernel/watchdog.c | 9 +++++++++ > kernel/watchdog_hld.c | 3 +++ > 3 files changed, 13 insertions(+)
Looks fine to me. Reviewed-by: Aaron Tomlin <atom...@redhat.com> -- Aaron Tomlin