On Mon, Apr 25, 2016 at 03:48:58PM +0200, Martin Schwidefsky wrote: > On Fri, 22 Apr 2016 11:40:11 +0100 > James Hogan <[email protected]> wrote: > > > Under virtualisation it is possible to get unexpected latency during a > > clockevent device's set_next_event() callback which can make it return > > -ETIME even for a delta based on min_delta_ns. > > Do you have an example for this behavior?
The place where I've observed it is arch/mips/kernel/cevt-r4k.c, which returns -ETIME when the delay is too short for it to be able to set it and read back the timer. I've also recently (Friday afternoon) seen a report of it apparently happening with the MIPS GIC clockevent driver too (drivers/clocksource/mips-gic-timer.c) which has similar logic, probably copied from cevt-r4k, and this patch appeared to help (I still need to confirm that one). That wasn't with virtualisation, but was on a multithreaded core being stress tested, a case when its also hard to find a guaranteed min delta. > I would call that a BUG in the implementation of the clockevent > device, no? Several drivers seem to do that. I'm open to alternatives. Do you think the driver should retry itself when it detects this race may have been hit? > > > The clockevents_program_min_delta() implementation for > > CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=n doesn't handle retries when this > > happens, nor does clockevents_program_event() or its callers when force > > is true (for example hrtimer_reprogram()). This can result in hangs > > until the clock event device does a full period. > > Is that because some clockevent devices can not program the minimum delta > in some corner cases? yes. I think it actually ended up causing an arithmetic overflow somewhere in ktime_get() (I'd have to dig through my notes to find specifics) which resulted in __iter_div_u64_rem() being given an excessively large dividend, which effectively hung the CPU. Thanks James > > > It isn't appropriate to use MIN_ADJUST in this case as occasional > > hypervisor induced high latency will cause min_delta_ns to quickly > > increase to the maximum. > > I agree, the whole minimum delta adjustment is quite broken on a virtualized > system. On s390 we have seen the rise of the min_delta_ns to the maximum > value due to a busy hypervisor. > > > Instead, borrow the retry pattern from the MIN_ADJUST case, but without > > making adjustments. We retry up to 10 times before giving up. > > That will add a few unnecessary instruction for architectures that have a > sane set_next_event function, namely those that always returns 0. Should > not be too bad though. > > -- > blue skies, > Martin. > > "Reality continues to ruin my life." - Calvin. >
signature.asc
Description: Digital signature

