The way loadavg is tracked during nohz only pays attention to the load upon entering nohz. This can be particularly noticeable if nohz is entered while non-idle, and then the cpu goes idle and stays that way for a long time. We've had reports of a loadavg near 150 on a mostly idle system.
Calling calc_load_nohz_start() regardless of whether the tick is already stopped addresses the issue when going idle. Tracking load changes when not going idle (e.g. multiple SCHED_FIFO tasks coming and going) is not addressed by this patch. Signed-off-by: Scott Wood <sw...@redhat.com> --- kernel/time/tick-sched.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 955851748dc3..f177d8168400 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -763,6 +763,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu) ts->do_timer_last = 0; } + /* Even if the tick was already stopped, load may have changed */ + calc_load_nohz_start(); + /* Skip reprogram of event if its not changed */ if (ts->tick_stopped && (expires == ts->next_tick)) { /* Sanity check: make sure clockevent is actually programmed */ @@ -783,7 +786,6 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu) * the scheduler tick in nohz_restart_sched_tick. */ if (!ts->tick_stopped) { - calc_load_nohz_start(); quiet_vmstat(); ts->last_tick = hrtimer_get_expires(&ts->sched_timer); -- 1.8.3.1