On Sat, 2026-06-06 at 12:34 +0200, Thomas Gleixner wrote:
> On Fri, May 29 2026 at 07:43, Sean Christopherson wrote:
> 
> > Now that all paravirt code that explicitly specifies the TSC frequency
> > also sets X86_FEATURE_TSC_KNOWN_FREQ, replace all of the one-off code
> > and simply set X86_FEATURE_TSC_KNOWN_FREQ if the TSC frequency is known.
> > 
> > Do NOT force set TSC_KNOWN_FREQ if the "known" TSC frequency was provided
> > by the user.  Per commit bd35c77e32e4 ("x86/tsc: Add tsc_early_khz command
> > line parameter"), one of the goals of the param is to allow the refined
> > calibration work "to do meaningful error checking".
> > 
> > Note, preferring the user-provided TSC frequency over the frequency from
> > the hypervisor or trusted firmware, while simultaneously not treating the
> > user-provided frequency as gospel, is obviously incongruous.  Sweep the
> > problem under the rug for now to avoid opening a big can of worms that
> > likely doesn't have a great answer.
> 
> There is a good answer I think.
> 
> early_tsc_khz exists to cater for the overclocking crowd. On their
> modded systems the firmware supplied TSC frequency (CPUID/MSR) is not
> matching reality anymore. So they work around that by supplying a close
> enough tsc_early_khz and then they let the refined calibration work
> figure it out.
> 
> Arguably that's only relevant for bare metal systems and what's worse is
> that in virtual environments the refined calibration work can fail,
> which renders the TSC unstable.
> 
> So I'd rather say we change this logic to:
> 
>    if (!hypervisor_is_type(X86_HYPER_NATIVE)) {
>       tsc_khz = x86_init.....();
>       force(X86_FEATURE_TSC_KNOWN_FREQ);
>    } else if (tsc_khz_early) {
>       ....
>    } else {
>       ...
>    }
> 
> Along with:
> 
>    if (!hypervisor_is_type(X86_HYPER_NATIVE)) {
>       if (tsc_khz_early)
>          pr_warn("Ignoring non-sensical tsc_early_khz command line 
> argument\n");
> 
> or something daft like that.
> 
> The kernel has for various reasons always tried to cater for the needs
> of users who are plagued by bonkers firmware, but we have to stop to
> prioritize or treating equal ancient and modded out of spec hardware.
> 
> TBH, I consider that whole KVM clock nonsense to fall into the modded
> out of spec hardware realm. Do a reality check:
> 
>    How many production systems are out there still which run VMs on CPUs
>    with a broken TSC and the lack of VM TSC scaling?
> 
> I'm not saying that we should not support the few remaining systems
> anymore, but our tendency to pretend that we can keep all of this
> nonsense working and at the same time making progress is just a fallacy.

I don't know that we can take the KVM (and Xen) clock away from guests,
but all of the *horrid* part about it is the way it attempts to cope
with the possibility that the *host* timekeeping might flip away from
TSC-based mode at any point in time. By the end of my outstanding
cleanup series, that is the *only* thing the gtod_notifier remains for.

If we can trust the hardware *and* the host kernel, then KVM could
theoretically hardwire the kvmclock into 'master clock mode' where it
basically just advertises the TSC→kvmclock relationship *once* to all
CPUs and it never changes.

All the nonsense about updating it every time we enter a CPU could just
go away completely.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to