The way loadavg is tracked during nohz only pays attention to the load
upon entering nohz.  This can be particularly noticeable if nohz is
entered while non-idle, and then the cpu goes idle and stays that way for
a long time.  We've had reports of a loadavg near 150 on a mostly idle
system.

Calling calc_load_nohz_start() regardless of whether the tick is already
stopped addresses the issue when going idle.  Tracking load changes when
not going idle (e.g. multiple SCHED_FIFO tasks coming and going) is not
addressed by this patch.

Signed-off-by: Scott Wood <sw...@redhat.com>
---
 kernel/time/tick-sched.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 955851748dc3..f177d8168400 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -763,6 +763,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int 
cpu)
                ts->do_timer_last = 0;
        }
 
+       /* Even if the tick was already stopped, load may have changed */
+       calc_load_nohz_start();
+
        /* Skip reprogram of event if its not changed */
        if (ts->tick_stopped && (expires == ts->next_tick)) {
                /* Sanity check: make sure clockevent is actually programmed */
@@ -783,7 +786,6 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int 
cpu)
         * the scheduler tick in nohz_restart_sched_tick.
         */
        if (!ts->tick_stopped) {
-               calc_load_nohz_start();
                quiet_vmstat();
 
                ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
-- 
1.8.3.1

Reply via email to