Hi Guys, I have been chasing spurious clockevent interrupts (In HIGHRES mode) on my x86 IvyBridge server and stuck on some blockers (mostly due to lack of knowledge) and need some help..
What does spurious mean ? I take it as the case when a core is unnecessarily interrupted by the clockevent device while we didn't wanted it to. I tried to catch them with the help of few new trace points in hrtimer_interrupt() when it didn't serviced any pending hrtimers.. The causes I could find: - Clockevent device's counter overflow: When we need to program a clockevt device for times larger than it can handle, we program it for max time and then reprogram again on expiration. Because of hardware counter limitations we have to do it this way and this isn't really a spurious interrupt as we programmed it that way. - Hardware timer firing before its expiration: Yes, its hard to digest but from my tests it looks like this does happen. Will circulate another mail once I am sure about it. - Difference between time returned by hrtimer_update_base() and clockevt device's notion of time: Wanted to discuss this here.. An example of what I am witnessing in my tests: clockevt device programmed to fire after 199 ms and it fires exactly after that time (Verified with ftrace time-diff and tsc diff [tsc: timestamp counter instructions for x86]). But when hrtimer_interrupt() tries to find the hrtimer for which the event occurred, it finds the value returned by hrtimer_update_base() being lesser than the softexpires of the timer for which the event was programmed. And the diff is approximately 1 us most of the times I see the spurious interrupt. <idle>-0 [010] d... 68.961171: clk_evt_program: delta:199933679 expires=69071000000 /* programmed for 199 ms */ <idle>-0 [010] d... 68.961171: lapic_next_deadline: cur_tsc:365799488705427 new_tsc:365799907536051 delta:199443 usec /* sets new_tsc */ <idle>-0 [010] d.h. 69.161183: local_timer_entry: vector=239 <idle>-0 [010] d.h. 69.161185: local_apic_timer_interrupt: cur_tsc:365799907547595 delta:199448 usec /* Interrupt arrived afterexactly 199 ms */ <idle>-0 [010] d.h. 69.161187: hrtimer_interrupt:function=tick_sched_timer expires=69071000000 softexpires=69071000000 basenow:69070997234 /* But Softexpires > basenow ?? */ <idle>-0 [010] d.h. 69.161188: clk_evt_program: delta:1000 expires=69071000000 /* new expiry value (1 us) */ <idle>-0 [010] d.h. 69.161189: lapic_next_deadline: cur_tsc:365799907556179 new_tsc:365799907558259 delta:0 usec /* sets new tsc value */ <idle>-0 [010] d.h. 69.161189: local_timer_exit: vector=239 Tested on: Ivybrdge V2, 12 core X86 server, though it occurs on ARM as well (but not that frequent). Regards, Santosh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/