On Mon, Jul 21, 2014 at 08:59:22AM -0700, Paul E. McKenney wrote: > On Mon, Jul 21, 2014 at 12:12:48AM +0200, Frederic Weisbecker wrote: > > On Sun, Jul 20, 2014 at 04:47:59AM -0700, Paul E. McKenney wrote: > > > On Sat, Jul 19, 2014 at 08:01:24PM +0200, Frederic Weisbecker wrote: > > > > On Sat, Jul 19, 2014 at 09:53:50AM -0700, Paul E. McKenney wrote: > > > > > If a non-nohz_full= CPU is non-idle, it will have a scheduling-clock > > > > > interrupt, and therefore doesn't need the timekeeping CPU to keep > > > > > its scheduling-clock interrupt going. This commit therefore ignores > > > > > the idle state of non-nohz_full CPUs when determining whether or not > > > > > the timekeeping CPU can safely turn off its scheduling-clock > > > > > interrupt. > > > > > > > > > > Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com> > > > > > > > > Unfortunately that's not how things work. Running a CPU tick doesn't > > > > necessarily > > > > imply to run the timekeeping duty. > > > > > > > > Only the timekeeper can update the timekeeping. There is an exception > > > > though: > > > > the timekeeping is also updated by dynticks idle CPUs when they wake up > > > > in an > > > > interrupt from idle. > > > > > > > > Here is in practice why it doesn't work: > > > > > > > > So lets say CPU 0 is timekeeper, CPU 1 a non-nohz-full CPU and all > > > > others are full-nohz. > > > > CPU 0 is sleeping. CPU 1 wakes up from idle, so it has an uptodate > > > > timekeeping but then > > > > if it continues to execute further without waking up CPU 0, it risks > > > > stale timestamps. > > > > > > > > This can be changed by allowing timekeeping duty from all non-nohz_full > > > > CPUs, that's > > > > the initial direction I took, but it involved a lot of complications > > > > and scalability > > > > issues. > > > > > > So we really have to have -all- the CPUs be idle to turn off the > > > timekeeper. > > > This won't make the battery-powered embedded guys happy... > > > > I can imagine all sorts of solutions to solve this. None of them look simple > > though. And I'm really convinced this isn't worth until some user comes up > > and report me that 1) he seriously uses full dynticks and 2) he needs > > non-full-nohz > > CPUs other than CPU 0. > > > > If 1 and 2 ever happen, I'll gladly work on this. > > Does the thought of special-casing the situation where CONFIG_NO_HZ_FULL=y, > CONFIG_NO_HZ_FULL_SYSIDLE=y, and there are no nohz_full= CPUs make sense?
Yes. Distros seem to want to make full dynticks available for users but they also want the off case (when nohz_full= isn't passed) to keep the lowest overhead as possible. So CONFIG_NO_HZ_FULL_SYSIDLE=y should probably do the same as it's expected to be a default choice as well. > > > > Other thoughts on this? We really should not be setting > > > CONFIG_NO_HZ_FULL_SYSIDLE by default until this is solved. > > > > Well it's better to save energy when all CPUs are idle than never. > > Fair point! > > Thanx, Paul > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/