Currently, when a new timer added to timer wheel for a nohz_active CPU,
the target CPU will always be waked up.

In fact, if the new added timer is after the base->next_timer, we don't
need wake up the target CPU since it will not change the sleep time. A
lazy wake up is better in such scenario.

I cooked a test scenario. On my 32 cores system, a driver on CPU 15
continuous enqueues timer to CPU 8/9/10/11 with random expire and then
checks the idle_calls difference after 10 seconds. Below data shows
that lazy wake up do reduce the wakeup a lot.

                w/o Lazy        w/ lazy
CPU 8:          135             88
CPU 9:          238             43
CPU 10:         157             83
CPU 11:         172             70

Signed-off-by: Yunhong Jiang <yunhong.ji...@linux.intel.com>
---
 kernel/time/timer.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index d3f5e92f722a..a039d9e6b55a 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -414,6 +414,8 @@ __internal_add_timer(struct tvec_base *base, struct 
timer_list *timer)
 
 static void internal_add_timer(struct tvec_base *base, struct timer_list 
*timer)
 {
+       bool kick_nohz = false;
+
        /* Advance base->jiffies, if the base is empty */
        if (!base->all_timers++)
                base->timer_jiffies = jiffies;
@@ -424,9 +426,17 @@ static void internal_add_timer(struct tvec_base *base, 
struct timer_list *timer)
         */
        if (!(timer->flags & TIMER_DEFERRABLE)) {
                if (!base->active_timers++ ||
-                   time_before(timer->expires, base->next_timer))
+                   time_before(timer->expires, base->next_timer)) {
                        base->next_timer = timer->expires;
-       }
+                       /*
+                        * CPU in dynticks need reevaluate the timer wheel
+                        * if newer timer added with next_timer updated.
+                        */
+                       if (base->nohz_active)
+                               kick_nohz = true;
+               }
+       } else if (base->nohz_active && tick_nohz_full_cpu(base->cpu))
+               kick_nohz = true;
 
        /*
         * Check whether the other CPU is in dynticks mode and needs
@@ -441,11 +451,8 @@ static void internal_add_timer(struct tvec_base *base, 
struct timer_list *timer)
         * require special care against races with idle_cpu(), lets deal
         * with that later.
         */
-       if (base->nohz_active) {
-               if (!(timer->flags & TIMER_DEFERRABLE) ||
-                   tick_nohz_full_cpu(base->cpu))
-                       wake_up_nohz_cpu(base->cpu);
-       }
+       if (kick_nohz)
+               wake_up_nohz_cpu(base->cpu);
 }
 
 #ifdef CONFIG_TIMER_STATS
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to