On Tue, May 10, 2022 at 01:44:05PM +0200, Thomas Gleixner wrote: > On Tue, May 10 2022 at 21:16, Nicholas Piggin wrote: > > Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am: > >> + /* > >> + * If in use, the HPET hardlockup detector relies on tsc_khz. > >> + * Reconfigure it to make use of the refined tsc_khz. > >> + */ > >> + lockup_detector_reconfigure(); > > > > I don't know if the API is conceptually good. > > > > You change something that the lockup detector is currently using, > > *while* the detector is running asynchronously, and then reconfigure > > it. What happens in the window? If this code is only used for small > > adjustments maybe it does not really matter but in principle it's > > a bad API to export. > > > > lockup_detector_reconfigure as an internal API is okay because it > > reconfigures things while the watchdog is stopped [actually that > > looks untrue for soft dog which uses watchdog_thresh in > > is_softlockup(), but that should be fixed]. > > > > You're the arch so you're allowed to stop the watchdog and configure > > it, e.g., hardlockup_detector_perf_stop() is called in arch/. > > > > So you want to disable HPET watchdog if it was enabled, then update > > wherever you're using tsc_khz, then re-enable. > > The real question is whether making this refined tsc_khz value > immediately effective matters at all. IMO, it does not because up to > that point the watchdog was happily using the coarse calibrated value > and the whole use TSC to assess whether the HPET fired mechanism is just > a guestimate anyway. So what's the point of trying to guess 'more > correct'.
In some of my test systems I observed that, the TSC value does not fall within the expected error window the first time the HPET channel expires. I inferred that the error computed using the coarser tsc_khz was wrong. Recalculating the error window with refined tsc_khz would correct it. However, restarting the timer has the side-effect of kicking the timer and, therefore pushing the first HPET NMI further in the future. Perhaps kicking HPET channel, not recomputing the error window, corrected (masked?) the problem. I will investigate further and rework or drop this patch as needed. Thanks and BR, Ricardo _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu