On Mon, Mar 30, 2015 at 04:02:06PM -0400, Don Zickus wrote: > On Mon, Mar 30, 2015 at 03:32:55PM -0400, Chris Metcalf wrote: > > On 03/30/2015 03:12 PM, Don Zickus wrote: > > >On Mon, Mar 30, 2015 at 02:51:05PM -0400, cmetc...@ezchip.com wrote: > > >>From: Chris Metcalf <cmetc...@ezchip.com> > > >> > > >>Running watchdog can be a helpful debugging feature on regular > > >>cores, but it's incompatible with nohz_full, since it forces > > >>regular scheduling events. Accordingly, just exit out immediately > > >>from any nohz_full core. > > >> > > >>An alternate approach would be to add a flags field or function to > > >>smp_hotplug_thread to control on which cores the percpu threads > > >>are created, but it wasn't clear that much mechanism was useful. > > >Hi Chris, > > > > > >It seems like the correct solution would be to hook into the idle_loop > > >somehow. If the cpu is idle, then it seems unlikely that a lockup could > > >occur. > > > > With nohz_full, though, the cpu might be running userspace code > > with the intention of keeping kernel ticks disabled. Even returning > > to kernel mode to try to figure out if we "should" be running the > > watchdog on a given core will induce exactly the kind of interrupts > > that nohz_full is designed to prevent. > > > > My assumption is generally that nohz_full cores don't spend a lot of > > time in the kernel anyway, as they are optimized for user space. > > > > I guess you could imagine doing something per-cpu on the nohz_full > > cores where we effectively call watchdog_disable() whenever a > > nohz_full core enters userspace, and watchdog_enable() whenever it > > enters the kernel. We could add some per-cpu state in the watchdog > > code to track whether that core was currently enabled or disabled > > to avoid double-enabling or double-disabling. I would think > > context_tracking_user_exit()/_enter() would be the place to do this. > > > > This feels like a lot of overhead, potentially. Thoughts? > > A few months ago I might have thought that a reasonable approach. But > recently we have added code to make the watchdog an all or nothing approach > across the system. This might make it difficult to do what you are > suggesting. > > I do not know enough about the nohz code to know what the right approach is > here. Perhaps Federic can enlighten me?
Well, cancelling/rearming a timer on every userspace round trip sounds way too much overhead to me :-) But Ingo's suggestion to disable it properly (only on nohz full core) looks good. And we should be able to re-enable it everywhere with "sysctl -w kernel.watchdog=1" and you need to warn about this on boot. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/