Currently, when a new timer added to timer wheel for a nohz_active CPU, the target CPU will always be waked up.
In fact, if the new added timer is after the base->next_timer, we don't need wake up the target CPU since it will not change the sleep time. A lazy wake up is better in such scenario. I cooked a test scenario. On my 32 cores system, a driver on CPU 15 continuous enqueues timer to CPU 8/9/10/11 with random expire and then checks the idle_calls difference after 10 seconds. Below data shows that lazy wake up do reduce the wakeup a lot. w/o Lazy w/ lazy CPU 8: 135 88 CPU 9: 238 43 CPU 10: 157 83 CPU 11: 172 70 Signed-off-by: Yunhong Jiang <yunhong.ji...@linux.intel.com> --- kernel/time/timer.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index d3f5e92f722a..a039d9e6b55a 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -414,6 +414,8 @@ __internal_add_timer(struct tvec_base *base, struct timer_list *timer) static void internal_add_timer(struct tvec_base *base, struct timer_list *timer) { + bool kick_nohz = false; + /* Advance base->jiffies, if the base is empty */ if (!base->all_timers++) base->timer_jiffies = jiffies; @@ -424,9 +426,17 @@ static void internal_add_timer(struct tvec_base *base, struct timer_list *timer) */ if (!(timer->flags & TIMER_DEFERRABLE)) { if (!base->active_timers++ || - time_before(timer->expires, base->next_timer)) + time_before(timer->expires, base->next_timer)) { base->next_timer = timer->expires; - } + /* + * CPU in dynticks need reevaluate the timer wheel + * if newer timer added with next_timer updated. + */ + if (base->nohz_active) + kick_nohz = true; + } + } else if (base->nohz_active && tick_nohz_full_cpu(base->cpu)) + kick_nohz = true; /* * Check whether the other CPU is in dynticks mode and needs @@ -441,11 +451,8 @@ static void internal_add_timer(struct tvec_base *base, struct timer_list *timer) * require special care against races with idle_cpu(), lets deal * with that later. */ - if (base->nohz_active) { - if (!(timer->flags & TIMER_DEFERRABLE) || - tick_nohz_full_cpu(base->cpu)) - wake_up_nohz_cpu(base->cpu); - } + if (kick_nohz) + wake_up_nohz_cpu(base->cpu); } #ifdef CONFIG_TIMER_STATS -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/