On Sat, Apr 10, 2021 at 10:41:21AM +0200, Thomas Gleixner wrote: > On Fri, Apr 02 2021 at 15:49, paulmck wrote: > > This commit therefore re-reads the watchdog clock on either side of > > 'This commit' is not any better than 'This patch' and this sentence > makes no sense. I might be missing something, but how exactly does "the > commit" re-read the watchdog clock? > > git grep 'This patch' Documentation/process/
I will rework this. > > the read from the clock under test. If the watchdog clock shows an > > +retry: > > local_irq_disable(); > > - csnow = cs->read(cs); > > - clocksource_watchdog_inject_delay(); > > wdnow = watchdog->read(watchdog); > > + clocksource_watchdog_inject_delay(); > > + csnow = cs->read(cs); > > + wdagain = watchdog->read(watchdog); > > local_irq_enable(); > > + delta = clocksource_delta(wdagain, wdnow, watchdog->mask); > > + wdagain_nsec = clocksource_cyc2ns(delta, watchdog->mult, > > watchdog->shift); > > That variable naming is confusing as hell. This is about the delta and > not about the second readout of the watchdog. How about wdagain_delta? > > + if (wdagain_nsec < 0 || wdagain_nsec > WATCHDOG_MAX_SKEW) { > > How exactly is this going negative especially with clocksources which > have a limited bitwidth? See clocksource_delta(). I thought that I had actually seen this happen, though it is of course quite possible that it was due to a bug in an early version of my changes. What I will do is to remove the less-than comparison and test with a WARN_ON(). If that doesn't trigger, I will drop the WARN_ON(). If it does trigger, I will figure out why. > > + wderr_nsec = wdagain_nsec; > > + if (nretries++ < max_read_retries) > > + goto retry; > > + } > > + if (nretries) > > + pr_warn("timekeeping watchdog on CPU%d: %s read-back > > delay of %lldns, attempt %d\n", > > + smp_processor_id(), watchdog->name, wderr_nsec, > > nretries); > > Lacks curly braces around the pr_warn() simply because it's not a single > line. Breaks my parser :) OK, will fix. ;-) > But if this ever happens to exceed max_read_retries, then what's the > point of continuing at all? The data is known to be crap already. If there are four delays in four consecutive attempts to read out the clocks -- with interrupts disabled -- then it is quite possible that the delay is actually caused by the attempt to read the clock. In which case, marking the clock bad due to skew is a reasonable choice. On the other hand, if the four consecutive delays are caused by something like an NMI storm, then as you say, you have worse problems. Thanx, Paul > > /* Clocksource initialized ? */ > > if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) || > > Thanks, > > tglx