On Fri, 31 May 2019, Anna-Maria Gleixner wrote:

[...]
> I will think about the problem and your solution a little bit more and
> give you feedback hopefully on monday.

I'm sorry for the delay. But now I'm able to give you a detailed feedback:

The general problem is, that your solution is customized to a single
use-case: preventing softirq raise but only if there is _no_ timer
pending. To reach this goal without using locks, overhead is added to the
formerly optimized add/mod path of a timer. With your code the timer
softirq is raised even when there is a pending timer which does not has to
be expired right now. But there have been requests in the past for this use
case already.

I discussed with Thomas several approaches during the last week how to
solve the unconditional softirq timer raise in a more general way without
loosing the fast add/mod path of a timer. The approach which seems to be
the best has a dependency on a timer code change from a push to pull model
which is still under developement (see v2 patchset:
http://lkml.kernel.org/r/20170418111102.490432...@linutronix.de). The
patchset v2 has several problems but we are working on a solution for those
problems right now. When the timer pull model is in place the approach to
solve the unconditional timer softirq raise could look like the following:

---8<---
The next_expiry value of timer_base struct is used to store the next expiry
value even if timer_base is not idle. Therefore it is udpated after adding
or modifying a timer and also at the end of timer softirq. In case timer
softirq does not has to be raised, the timer_base->clk is incremented to
prevent stale clocks. Checking whether timer softirq has to be raised
cannot be done lockless.

This code is not compile tested nor boot tested.

---
 kernel/time/timer.c |   60 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 24 deletions(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -552,37 +552,32 @@ static void
 static void
 trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
 {
-       if (!is_timers_nohz_active())
-               return;
-
-       /*
-        * TODO: This wants some optimizing similar to the code below, but we
-        * will do that when we switch from push to pull for deferrable timers.
-        */
-       if (timer->flags & TIMER_DEFERRABLE) {
-               if (tick_nohz_full_cpu(base->cpu))
-                       wake_up_nohz_cpu(base->cpu);
-               return;
+       if (is_timers_nohz_active()) {
+               /*
+                * TODO: This wants some optimizing similar to the code
+                * below, but we will do that when we switch from push to
+                * pull for deferrable timers.
+                */
+               if (timer->flags & TIMER_DEFERRABLE) {
+                       if (tick_nohz_full_cpu(base->cpu))
+                               wake_up_nohz_cpu(base->cpu);
+                       return;
+               }
        }
 
-       /*
-        * We might have to IPI the remote CPU if the base is idle and the
-        * timer is not deferrable. If the other CPU is on the way to idle
-        * then it can't set base->is_idle as we hold the base lock:
-        */
-       if (!base->is_idle)
-               return;
-
        /* Check whether this is the new first expiring timer: */
        if (time_after_eq(timer->expires, base->next_expiry))
                return;
+       /* Update next expiry time */
+       base->next_expiry = timer->expires;
 
        /*
-        * Set the next expiry time and kick the CPU so it can reevaluate the
-        * wheel:
+        * We might have to IPI the remote CPU if the base is idle and the
+        * timer is not deferrable. If the other CPU is on the way to idle
+        * then it can't set base->is_idle as we hold the base lock:
         */
-       base->next_expiry = timer->expires;
-       wake_up_nohz_cpu(base->cpu);
+       if (is_timers_nohz_active() && base->is_idle)
+               wake_up_nohz_cpu(base->cpu);
 }
 
 static void
@@ -1684,6 +1679,7 @@ static inline void __run_timers(struct t
                while (levels--)
                        expire_timers(base, heads + levels);
        }
+       base->next_expiry = __next_timer_interrupt(base);
        base->running_timer = NULL;
        raw_spin_unlock_irq(&base->lock);
 }
@@ -1716,8 +1712,24 @@ void run_local_timers(void)
                base++;
                if (time_before(jiffies, base->clk))
                        return;
+               base--;
+       }
+
+       /*
+        * check for next expiry
+        *
+        * deferrable base is igonred here - it is only usable when
+        * switching from push to pull model for deferrable timers
+        */
+       raw_spin_lock_irq(&base->lock);
+       if (base->clk == base->next_expiry) {
+               raw_spin_unlock_irq(&base->lock);
+               raise_softirq(TIMER_SOFTIRQ);
+       } else {
+               base->clk++;
+               raw_spin_unlock_irq(&base->lock);
+               return;
        }
-       raise_softirq(TIMER_SOFTIRQ);
 }
 
 /*
---8<---


Thanks,

        Anna-Maria

Reply via email to