John Stultz writes: > On Tue, Mar 15, 2016 at 11:50 AM, Gratian Crisan <[email protected]> > wrote: >> The clocksource watchdog can falsely trigger and disable the main >> clocksource when the watchdog wraps around. >> >> The reason is that an interrupt storm and/or high priority (FIFO/RR) tasks >> can preempt the timer softirq long enough for the watchdog to wrap around >> if it has a limited number of bits available by comparison to the main >> clocksource. One observed example is on a Intel Baytrail platform where TSC >> is the main clocksource, HPET is disabled due to a hardware bug and acpi_pm >> gets selected as the watchdog clocksource. >> >> Calculate the maximum number of nanoseconds the watchdog clocksource can >> represent without overflow and do not disqualify the main clocksource if >> the delta since the last time we have checked exceeds the measurement >> capabilities of the watchdog clocksource. > > Sorry for not getting back to you sooner on this. You managed to send > these both out while I was at a conference and on vacation, and so > they were deep in the mail backlog. :)
No worries, I'm actually "on the road" this week too (ELC). I appreciate the reply. > So I'm sympathetic to this issue, because I remember seeing similar > problems w/ runaway SCHED_FIFO tasks w/ PREEMPT_RT. Yeah, a runaway rt thread can easily do it. That's just bad design. In our case it was a bit more subtle bc. it was a combination of high priority interrupts and rt threads that would occasionally stack up to delay the timer softirq long enough to cause the watchdog wrap. > However, its really difficult to create a solution without opening new > cases where bad clocksources will be mis-identified as good (which > your solution seems to suffer as well, measuring the time past with a > known bad clocksource can easily result in large deltas, which will be > ignored if the watchdog has a short interval). Fair point. Ultimately you have to trust one of the clocksources. I guess I was naive in thinking that the main clocksource can't drift more than what the watchdog clocksource can measure within the WATCHDOG_INTERVAL. I'm glad I don't have to deal with hardware that lobotomized. Would a simple solution that exposes the config option for the clocksource wathchdog[1] (and defaults it to on) be an acceptable alternative? It will work for us because we test the stability of the main clocksource - part of the hardware bring-up. > A previous effort on this was made here, and there's a resulting > thread that didn't come to resolution: > https://lkml.org/lkml/2015/8/17/542 Sorry I've missed it. > Way back I had tried to come up with an approach where if the time > delta was large, it was divided by the watchdog interval, and then we > just compared the remainder with the current watchdog delta to see if > they matched (closely enough). Unfortunately this didn't work out for > me then, but perhaps it deserves a second try? I've entertained that idea too but I think I was trying to optimize things too early and do everything with the mult/shift math. That first attempt failed but I do need to try harder because it would be a better general solution. > thanks > -john Thanks, -Gratian [1] >From e942ddaba439cd6711e9eed44ceae34167b864f8 Mon Sep 17 00:00:00 2001 From: Gratian Crisan <[email protected]> Date: Wed, 6 Apr 2016 21:20:15 -0700 Subject: [PATCH] time: Make the clocksource watchdog user configurable The clocksource watchdog is used to detect instabilities in the current clocksource. This is a beneficial feature on new/unknown hardware however it can create problems by falsely triggering when the watchdog wraps. The reason is that an interrupt storm and/or high priority (FIFO/RR) tasks can preempt the timer softirq long enough for the watchdog to wrap if it has a limited number of bits available by comparison with the main clocksource. One observed example is on a Intel Baytrail platform where TSC is the main clocksource, HPET is disabled due to a hardware bug and acpi_pm gets selected as the watchdog clocksource. Provide the option to disable the clocksource watchdog for hardware where the clocksource stability has been validated. Signed-off-by: Gratian Crisan <[email protected]> --- arch/x86/Kconfig | 2 +- kernel/time/Kconfig | 12 +++++++++++- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 2dc18605..6da5d9e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -54,7 +54,7 @@ config X86 select CLKEVT_I8253 select CLKSRC_I8253 if X86_32 select CLOCKSOURCE_VALIDATE_LAST_CYCLE - select CLOCKSOURCE_WATCHDOG + select HAVE_CLOCKSOURCE_WATCHDOG select CLONE_BACKWARDS if X86_32 select COMPAT_OLD_SIGACTION if IA32_EMULATION select DCACHE_WORD_ACCESS diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig index 4008d9f..6707f1d 100644 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -5,7 +5,7 @@ # Options selectable by arch Kconfig # Watchdog function for clocksources to detect instabilities -config CLOCKSOURCE_WATCHDOG +config HAVE_CLOCKSOURCE_WATCHDOG bool # Architecture has extra clocksource data @@ -193,5 +193,15 @@ config HIGH_RES_TIMERS hardware is not capable then this option only increases the size of the kernel image. +config CLOCKSOURCE_WATCHDOG + bool "Clocksource watchdog" + depends on HAVE_CLOCKSOURCE_WATCHDOG + default y + help + This option enables the watchdog function for clocksources. It is + used to detect instabilities in the currently selected clocksource. + + Say Y if you are unsure. + endmenu endif -- 1.9.1

