Hi, timer experts,

I have a question on the above function, and would appreciate your thoughts.

In debugging a kernel hrtimer stalling issue on RHEL7.x kernel, we
found a case when this function was called from hrtimer_interrupt(),
value of expires was before ktime_get(), not sure how that got there,
could it be ktime was running behind?. Anyways, that leaves a negative
delta, and since force was not set, clockevents_program_event()
skipped programming the unerlying device at (2), but next_event was
set to expires at (1).

Other part of kernel may make decisions based on next_event, e.g. in
3.10 kernel, tick_nohz_stop_sched_tick() compares next_event with
next_tick, and will skip programming the device if they match. And
that caused the hrtimer to stall on the CPU.

I have not gone through all the places that reference next_event to
know if it can be an issue else where or in the latest kernel, but
this side effect does not look right. So I want to raise the question
here:

Should clock_event_device->next_event be kept consistent with what's
programmed in the underlying device?

Thanks
-Jason

kernel/time/clockevents.c:
int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
                              bool force)
{
        unsigned long long clc;
        int64_t delta;
        int rc;

        if (unlikely(expires < 0)) {
                WARN_ON_ONCE(1);
                return -ETIME;
        }

        dev->next_event = expires;    <------ (1)

        if (clockevent_state_shutdown(dev))
                return 0;

        /* We must be in ONESHOT state here */
        WARN_ONCE(!clockevent_state_oneshot(dev), "Current state: %d\n",
                  clockevent_get_state(dev));

        /* Shortcut for clockevent devices that can deal with ktime. */
        if (dev->features & CLOCK_EVT_FEAT_KTIME)
                return dev->set_next_ktime(expires, dev);

        delta = ktime_to_ns(ktime_sub(expires, ktime_get()));
        if (delta <= 0)
                return force ? clockevents_program_min_delta(dev) :
-ETIME; <--- (2)

        delta = min(delta, (int64_t) dev->max_delta_ns);
        delta = max(delta, (int64_t) dev->min_delta_ns);

        clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
        rc = dev->set_next_event((unsigned long) clc, dev);

        return (rc && force) ? clockevents_program_min_delta(dev) : rc;
}

Reply via email to