On Mon, Mar 30, 2015 at 04:02:06PM -0400, Don Zickus wrote:
> On Mon, Mar 30, 2015 at 03:32:55PM -0400, Chris Metcalf wrote:
> > On 03/30/2015 03:12 PM, Don Zickus wrote:
> > >On Mon, Mar 30, 2015 at 02:51:05PM -0400, cmetc...@ezchip.com wrote:
> > >>From: Chris Metcalf <cmetc...@ezchip.com>
> > >>
> > >>Running watchdog can be a helpful debugging feature on regular
> > >>cores, but it's incompatible with nohz_full, since it forces
> > >>regular scheduling events.  Accordingly, just exit out immediately
> > >>from any nohz_full core.
> > >>
> > >>An alternate approach would be to add a flags field or function to
> > >>smp_hotplug_thread to control on which cores the percpu threads
> > >>are created, but it wasn't clear that much mechanism was useful.
> > >Hi Chris,
> > >
> > >It seems like the correct solution would be to hook into the idle_loop
> > >somehow.  If the cpu is idle, then it seems unlikely that a lockup could
> > >occur.
> > 
> > With nohz_full, though, the cpu might be running userspace code
> > with the intention of keeping kernel ticks disabled.  Even returning
> > to kernel mode to try to figure out if we "should" be running the
> > watchdog on a given core will induce exactly the kind of interrupts
> > that nohz_full is designed to prevent.
> > 
> > My assumption is generally that nohz_full cores don't spend a lot of
> > time in the kernel anyway, as they are optimized for user space.
> > 
> > I guess you could imagine doing something per-cpu on the nohz_full
> > cores where we effectively call watchdog_disable() whenever a
> > nohz_full core enters userspace, and watchdog_enable() whenever it
> > enters the kernel.  We could add some per-cpu state in the watchdog
> > code to track whether that core was currently enabled or disabled
> > to avoid double-enabling or double-disabling.  I would think
> > context_tracking_user_exit()/_enter() would be the place to do this.
> > 
> > This feels like a lot of overhead, potentially.  Thoughts?
> 
> A few months ago I might have thought that a reasonable approach.  But
> recently we have added code to make the watchdog an all or nothing approach
> across the system.  This might make it difficult to do what you are
> suggesting.
> 
> I do not know enough about the nohz code to know what the right approach is
> here.  Perhaps Federic can enlighten me?

Well, cancelling/rearming a timer on every userspace round trip sounds way too
much overhead to me :-)

But Ingo's suggestion to disable it properly (only on nohz full core) looks 
good.
And we should be able to re-enable it everywhere with "sysctl -w 
kernel.watchdog=1"
and you need to warn about this on boot.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to