On 10/17/2014 02:27 PM, John Stultz wrote: > On Fri, Oct 17, 2014 at 11:23 AM, Prarit Bhargava <pra...@redhat.com> wrote: >> >> >> On 10/17/2014 02:17 PM, John Stultz wrote: >>> On Fri, Oct 17, 2014 at 6:57 AM, Prarit Bhargava <pra...@redhat.com> wrote: >>>> A bug report came in against an older kernel which output "backward time" >>>> messages and the report noted that the upstream kernel worked. After some >>>> investigation it turned out that one of the sockets was bad on the system >>>> and the "backward time" messages were caused by a real, but intermittent, >>>> hardware failure. >>>> >>>> Commit 09ec54429c6d10f87d1f084de53ae2c1c3a81108 ("clocksource: Move >>>> cycle_last validation to core code") modifies the x86 clocksource such that >>>> if a negative delta between two reads of time is calculated the >>>> clocksource_delta() code will return 0. There is no warning when this >>>> occurs and there really should be one in order to catch not only hardware >>>> issues like the issue above, but potential coding issues as the code is >>>> modified. This patch introduces a WARN() which will also dump a stack >>>> trace to the console so the exact code path can be evaluated. >>>> >>>> I tested this by booting on the broken hardware and left the system idle >>>> until a negative clocksource_delta() event occurred. >>>> >>>> Cc: John Stultz <john.stu...@linaro.org> >>>> Cc: Thomas Gleixner <t...@linutronix.de> >>>> Signed-off-by: Prarit Bhargava <pra...@redhat.com> >>>> --- >>>> kernel/time/timekeeping_internal.h | 7 ++++++- >>>> 1 file changed, 6 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/kernel/time/timekeeping_internal.h >>>> b/kernel/time/timekeeping_internal.h >>>> index 4ea005a..abe6bc8 100644 >>>> --- a/kernel/time/timekeeping_internal.h >>>> +++ b/kernel/time/timekeeping_internal.h >>>> @@ -17,7 +17,12 @@ static inline cycle_t clocksource_delta(cycle_t now, >>>> cycle_t last, cycle_t mask) >>>> { >>>> cycle_t ret = (now - last) & mask; >>>> >>>> - return (s64) ret > 0 ? ret : 0; >>>> + if ((s64)ret > 0) >>>> + return ret; >>>> + >>>> + WARN(1, "Clocksource calculated negative delta, %lld. last = >>>> %llu, now = %llu, mask = %llx\n", >>>> + (s64)ret, last, now, mask); >>>> + return 0; >>> >>> >>> I realize you followed up that this wasn't finished, but just as some >>> feedback, there's a number of types of hardware where there may be a >>> very slight skew between cpu TSC, and this will briefly trigger right >>> after each timekeeping update if a system is reading the clock >>> frequently (think of the case where the update happens on the cpu >>> thats just a little bit ahead, while a timestamping loop is running on >>> a cpu that is a little bit behind). >> >> Ah, interesting. Okay ... drop this patch then. Thanks for the info John. > > If you're wanting something that aids with debugging, maybe some sort > calmly stated warn-once in the dmesg might be ok, that or some other > flag exported via a debugging interface.
IMO with the clock code I'd prefer it to be 100% accurate. There is nothing more annoying than going through the rigors of debugging and discovering that it is a hardware issue of some sort... This can be dropped IMO. It's really not that important. P. > > thanks > -john > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/