Hi, On 6 March 2014 09:41, Len Brown <l...@kernel.org> wrote: > > On Mon, Feb 24, 2014 at 1:29 AM, Tuukka Tikkanen > <tuukka.tikka...@linaro.org> wrote: > > Sometimes (fairly often) when the cpuidle menu governor is making a decision > > about idle state to enter the next timer for the cpu appears to expire in > > the past. The menu governor expects the expiry to always be in the future > > and in fact stores the time delta in an unsigned variable. However, when > > the expiry is in the past, the value returned by tick_nohz_get_sleep_length > > can be negative. This patch prevents using negative values, instead making > > the governor return immediately similar to having latency requirement set > > to 0. > > > > Note: As with latency == 0, the return value is 0 with no check to see if > > the state 0 has been disabled or not. > > > > Signed-off-by: Tuukka Tikkanen <tuukka.tikka...@linaro.org> > > --- > > drivers/cpuidle/governors/menu.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/cpuidle/governors/menu.c > > b/drivers/cpuidle/governors/menu.c > > index 71b5232..c414468 100644 > > --- a/drivers/cpuidle/governors/menu.c > > +++ b/drivers/cpuidle/governors/menu.c > > @@ -302,8 +302,16 @@ static int menu_select(struct cpuidle_driver *drv, > > struct cpuidle_device *dev) > > if (unlikely(latency_req == 0)) > > return 0; > > > > - /* determine the expected residency time, round up */ > > + /* > > + * Determine the expected residency time. If the time is negative, > > + * a timer interrupt has probably just expired after disabling > > + * interrupts. Return as quickly as possible in the most shallow > > + * state possible. tv_nsec is always positive, so only check the > > + * seconds. > > + */ > > t = ktime_to_timespec(tick_nohz_get_sleep_length()); > > + if (t.tv_sec < 0) > > + return 0; > > data->next_timer_us = > > t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC; > > > > Are there special conditions that are necessary to provoke a negative > return value? > I've traced this code on several systems, and never seen a negative > return value. >
By changing the if statement to if (WARN_ONCE(t.tv_sec < 0, "Next timer in the past. tv_sec: %ld, tv_usec: %lu", t.tv_sec, t.tv_usec)) return 0; I eventually got [ 3092.355144] WARNING: CPU: 4 PID: 0 at drivers/cpuidle/governors/menu.c:315 menu_select+0x410/0x420() [ 3092.355145] Next timer in the past. tv_sec: -1, tv_usec: 999999645 ... [ 3092.355169] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.14.0-rc3+ #3 so the problem is still lurking there. However, as you can see from the timestamp, either something has changed recently and made it much much less frequent or I did not use the use case / hardware that triggers it more often. It has been quite a long time since I discovered and fixed this in my private trees, so I have forgotten the best way to reproduce this, sorry. In any case, existence is still confirmed. This was on a i7-3770K with 64 bit kernel, running make clean && make -j8 && make clean && make for kernel source tree. If I happen to remember the use case + hardware combination that triggers this rapidly, I'll let you know. That said, I agree dropping this patch and fixing the called function is a better idea. Tuukka > > However... > I do see values up to 300.2 seconds, and those large values seem to decay > at the rate of real-time so that after 5 minutes they are small, and then > jump back up to 300 seconds. > > Some folks at Oracle debugged it down to use of NEXT_TIMER_MAX_DELTA > when there is _no_ timer currently pending on that CPU. It seems this is > easier > to observe, the more CPUs a system has -- though I've been able to reproduce > it on a system as small as a single-package 8-cpu systems. > > One proposed way to address this is to cap large values at 1 second. > However, that will not recognize that for the period when the large > value decays to under 1 second, all of those are fiction. > > Also, if we could identify the case where there is no future timer, > it seems that re-using dev->last_residency would probably be > a more useful guess than pretending we'll have a timer expire in 1 second. > > thanks, > Len Brown, Intel Open Source Technology Cente -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/