Currently, clockevents_program_min_delta() sets a clockevent device's ->next_event to the point in time where the minimum delta would actually expire:
delta = dev->min_delta_ns; dev->next_event = ktime_add_ns(ktime_get(), delta); For your reference, this is so since the initial advent of clockevents_program_min_delta() with commit d1748302f70b ("clockevents: Make minimum delay adjustments configurable"). clockevents_program_min_delta() is called from clockevents_program_event() only. More specifically, it is called if the latter's force argument is set and, neglecting the case of device programming failure for the moment, if the requested expiry is in the past. On the contrary, if the expiry requested from clockevents_program_event() is in the future, but less than ->min_delta_ns behind, then - ->next_event gets set to that expiry verbatim - but the clockevent device gets silently programmed to fire after ->min_delta_ns only. Thus, in the extreme cases of expires == ktime_get() and expires == ktime_get() + 1, the respective values of ->next_event would differ by ->min_delta_ns while the clockevent device would actually get programmed to fire at (almost) the same times (with force being set, of course). While this discontinuity of ->next_event at expires == ktime_get() is not a problem by itself, the mere use of ->min_delta_ns in the event programming path hinders upcoming changes making the clockevent core NTP correction aware: both, ->mult and ->min_delta_ns would need to get updated as well as consumed atomically and we'd rather like to avoid any locking here. Thus, let clockevents_program_event() unconditionally set ->next_event to the expiry time actually requested by its caller, i.e. don't set ->next_event from clockevents_program_min_delta(). A few notes on why this change is safe with the current consumers of ->next_event: 1. Note that a clockevents_program_event() with a requested expiry in the past and force being set basically means: "fire ASAP". Now, consider this so programmed event getting handed once again to clockevents_program_event(), i.e. that a clockevents_program_event(dev, dev->next_event, false) as in __clockevents_update_freq() is done. With this change applied, clockevents_program_event() would now properly detect the expiry being in the past and, due to the force argument being unset, wouldn't actually do anything. Before this change OTOH, there would be the (very unlikely) possibility that the requested event is still somewhere in the future and clockevents_program_event() would silently delay the event expiration by another ->min_delta_ns. 2. The periodic tick handlers on oneshot-only devices use ->next_event to calculate the followup expiry time. tick_handle_periodic() spins on reprogramming the clockevent device until some expiry in the future has been reached: ktime_t next = dev->next_event; ... for(;;) { next = ktime_add(next, tick_period); if (!clockevents_program_event(dev, next, false)) return; ... } Thus, tick_handle_periodic() isn't affected by this change. For tick_handle_periodic_broadcast(), the situation is different since commit 2951d5c031a3 ("tick: broadcast: Prevent livelock from event handler") though: a loop similar to the one from tick_handle_periodic() above got replaced by a single ktime_t next = ktime_add(dev->next_event, tick_period); clockevents_program_event(dev, next, true); In the case that dev->next_event + tick_period happens to be less than ktime_get() + ->min_delta_ns, without this change applied, ->next_event would get recovered to some point in the future after a single tick_handle_periodic_broadcast() event. On the contrary, with this patch applied, it could potentially take some number of tick_handle_periodic_broadcast() events, each separated by ->min_delta_ns only, until ->next_event is able to catch up with the current ktime_get(). However, if this turns out to become a problem, the reprogramming loop in tick_handle_periodic_broadcast() can probably be restored easily. 3. In kernel/time/tick-broadcast.c, the broadcast receiving clockevent devices' ->next_event is read multiple times in order to determine who's next or who must be pinged. These uses all continue to work. Moreover, clockevent devices getting programmed to something less than ktime_get() + ->min_delta_ns might not be the best candidates for a transition into C3 anyway. 4. Finally, a "sleep length" is calculated at the very end of tick_nohz_stop_sched_tick() as follows: ts->sleep_length = ktime_sub(dev->next_event, now); AFAICS, this can happen to be negative w/o this change applied already: in NOHZ_MODE_HIGHRES mode there can be some overdue hrtimers whose removal is blocked because tick_nohz_stop_sched_tick() gets called with interrupts disabled. Unfortunately, the only user, the menu cpuidle governor, can't cope with negative sleep lengths as it casts the return value of the tick_nohz_get_sleep_length() getter to an unsigned int. This change can very well make things worse here. A followup patch will force this ->sleep_length to >= 0. Signed-off-by: Nicolai Stange <nicsta...@gmail.com> --- kernel/time/clockevents.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index 8fddb67..f0a80fc 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -251,7 +251,6 @@ static int clockevents_program_min_delta(struct clock_event_device *dev) for (i = 0;;) { delta = dev->min_delta_ns; - dev->next_event = ktime_add_ns(ktime_get(), delta); if (clockevent_state_shutdown(dev)) return 0; @@ -288,7 +287,6 @@ static int clockevents_program_min_delta(struct clock_event_device *dev) int64_t delta; delta = dev->min_delta_ns; - dev->next_event = ktime_add_ns(ktime_get(), delta); if (clockevent_state_shutdown(dev)) return 0; -- 2.9.2