Hi Thomas, On Wed, Mar 03, 2021 at 04:50:31PM +0100, Thomas Gleixner wrote: > On Tue, Mar 02 2021 at 20:06, Feng Tang wrote: > > On Tue, Mar 02, 2021 at 10:16:37AM +0100, Peter Zijlstra wrote: > >> On Tue, Mar 02, 2021 at 10:54:24AM +0800, Feng Tang wrote: > >> > clocksource watchdog runs every 500ms, which creates some OS noise. > >> > As the clocksource wreckage (especially for those that has per-cpu > >> > reading hook) usually happens shortly after CPU is brought up or > >> > after system resumes from sleep state, so add a time limit for > >> > clocksource watchdog to only run for a period of time, and make > >> > sure it run at least twice for each CPU. > >> > > >> > Regarding performance data, there is no improvement data with the > >> > micro-benchmarks we have like hackbench/netperf/fio/will-it-scale > >> > etc. But it obviously reduces periodic timer interrupts, and may > >> > help in following cases: > >> > * When some CPUs are isolated to only run scientific or high > >> > performance computing tasks on a NOHZ_FULL kernel, where there > >> > is almost no interrupts, this could make it more quiet > >> > * On a cluster which runs a lot of systems in parallel with > >> > barriers there are always enough systems which run the watchdog > >> > and make everyone else wait > >> > > >> > Signed-off-by: Feng Tang <feng.t...@intel.com> > >> > >> Urgh.. so this hopes and prays that the TSC wrackage happens in the > >> first 10 minutes after boot. > > which is wishful thinking.... > > > Yes, the 10 minutes part is only based on our past experience and we > > can make it longer. But if there was real case that the wrackage happened > > long after CPU is brought up like days, then this patch won't help > > much. > > It really depends on the BIOS wreckage. On one of my machine it takes up > to a day depending on the workload. > > Anything pre TSC_ADJUST wants the watchdog on. With TSC ADJUST available > we can probably avoid it. > > There is a caveat though. If the machine never goes idle then TSC adjust > is not able to detect a potential wreckage. OTOH, most of the broken > BIOSes tweak TSC only by a few cycles and that is usually detectable > during boot. So we might be clever about it and schedule a check every > hour when during the first 10 minutes a modification of TSC adjust is > seen on any CPU.
I've thought about implementing this (sorry for delay), and would clarify something to understand it correctly. This hourly check is only for x86's tsc_adjust overriden by BIOS, and not the general kernel watchdog? As the current clocksources have different wrap time, like acpi_pm timer will wrap around every 4 seconds, and hpet wraps about every 300 scconds, we can only either keep doing the watchdog check or cancel it. If so, we can start a timer fired 10 minutes later to check it, and extend the timer to 1 hour if there is no tsc_adjust overridden. I've checked one open-sourced BIOS code project: EDK2 (https://github.com/tianocore/edk2), where I did some grep and can't find places writting to tsc_adjust msr, which can give us more confidence that fewer and fewer BIOS will wrongly write to tsc_adjust msr :) Thanks, Feng