This is v6 of the series to clean up the KVM clock, rebased onto the
tip timers/ptp material (the timers-ptp-2026-06-13 merge, which includes
Thomas's ktime snapshot series and the read_snapshot patches).
The KVM clock has historically suffered from three problems:
1. Imprecision: get_kvmclock_ns() computed the clock from the *host*
TSC without applying guest TSC scaling, causing systemic drift from
the values the guest computes from its own TSC.
2. Unnecessary discontinuities: gratuitous KVM_REQ_MASTERCLOCK_UPDATE
requests caused the master clock reference point to be re-snapshotted,
yanking the guest's clock due to arithmetic precision differences.
3. No precise migration API: the existing KVM_[GS]ET_CLOCK only allows
setting the clock at a given UTC reference time, which is necessarily
imprecise. There was no way to preserve the exact arithmetic
relationship between guest TSC and KVM clock across live migration.
This series addresses all three, and adds new APIs for precise clock
migration and TSC frequency reporting. As an added bonus, it now rips
out the whole pvclock_gtod_data hack which was shadowing the kernel's
timekeeping, and uses ktime snapshots as $DEITY (well, Thomas) intended.
v5: https://lore.kernel.org/all/[email protected]/
Changes since v5:
- Rebased onto the tip timers-ptp-2026-06-13 merge.
- Series shape: two new patches ("KVM: selftests: Use UAPI pvclock-abi.h
in xen_shinfo_test" and "KVM: x86: Activate master clock from
kvm_arch_init_vm()"); "Replace pvclock_gtod_data vclock_mode with
boolean" is replaced by "Cache host vclock_mode for masterclock
eligibility checks".
- Sean Christopherson's review:
* KVM_VCPU_TSC_SCALE: return -ENXIO (not -EINVAL) from the get/set
device-attribute handlers when !has_tsc_control, and do so in the
patch that introduces the attribute.
* Clear SECONDARY_EXEC_TSC_SCALING in setup_vmcs_config() rather than
vmx_hardware_setup(), so the per-CPU configs recomputed by
vmx_check_processor_compat() stay consistent with the golden
vmcs_config.
- kernel test robot (0-day), i386 W=1 warnings:
* get_kvmclock(): move hv_clock into the use_master_clock block, drop
the now-unnecessary get_cpu()/put_cpu() pinning (use_master_clock
implies a stable synchronised TSC clocksource), and replace the
goto/'fallback:' label with a 'continue'.
* pvclock_gtod_notify(): move 'tk' inside CONFIG_X86_64.
- Correctness fixes from review:
* KVM_SET_CLOCK_GUEST: bound the shift in hvclock_to_hz(), tighten
tsc_shift validation to [-31, 31], and reject guest_tsc below
pvclock.tsc_timestamp.
* kvm_guest_time_update(): read kvmclock_offset inside the pvclock
seqcount loop to avoid a torn read.
* kvm_snapshot_has_tsc(): honour snap->valid and zero-init the
snapshot, avoiding use of uninitialised stack.
* kvm_synchronize_tsc(): advance the matched reference point to "now"
to preserve the 1-second TSC matching window.
* kvm_track_tsc_matching(): request a masterclock update when
all_vcpus_matched_tsc changes, so PVCLOCK_TSC_STABLE_BIT is
broadcast to the other vCPUs.
* kvm_arch_enable_virtualization_cpu(): adjust cur_tsc_offset together
with cur_tsc_write under tsc_write_lock on the backwards-TSC / host
S4 resume path.
* kvm_set_tsc_khz(): sample the guest TSC before changing the ratio,
preserving continuity across the frequency change.
* Keep the real vclock_mode (int) rather than collapsing it to a
bool, so kvm_check_tsc_unstable() still special-cases HVCLOCK.
* Activate the master clock, and establish the initial TSC generation
and kvmclock epoch, from kvm_arch_init_vm() instead of a synchronous
kvm_update_masterclock() at each vCPU creation (avoids O(N^2)).
- selftests: use kvm_vm_free() instead of kvm_vm_release(); add the
missing Makefile entry for xen_migration_test; guard on
KVM_GET_CLOCK_GUEST / KVM_VCPU_TSC_OFFSET availability; use the
KVM_VCPU_TSC_SCALE enum instead of a literal; overflow-safe
arithmetic and looser tolerances.
- Documentation: fix the KVM_VCPU_TSC_OFFSET / KVM_VCPU_TSC_SCALE ReST
heading underlines (Randy Dunlap).
- UAPI: asm/kvm.h now includes <asm/pvclock-abi.h> so the
KVM_[GS]ET_CLOCK_GUEST ioctls are self-contained.
- Collected Dongli Zhang's Tested-by (kexec/LUO testing of the KVM clock
accuracy, pvclock-abi UAPI move, KVM_[GS]ET_CLOCK_GUEST and
redundant-masterclock-update patches).
David Woodhouse (33):
KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init()
KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force
KVM: selftests: Use UAPI pvclock-abi.h in xen_shinfo_test
KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC
KVM: x86: Activate master clock immediately on vCPU creation
KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC
migration
KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host
KVM: x86: Fold __get_kvmclock() into get_kvmclock()
KVM: x86: Restructure get_kvmclock()
KVM: x86: Fix KVM clock precision in get_kvmclock() with TSC scaling
KVM: x86: Use get_kvmclock() in kvm_get_wall_clock_epoch()
KVM: x86: Fix compute_guest_tsc() to handle negative time deltas
KVM: x86: Restructure kvm_guest_time_update() for TSC upscaling
KVM: x86: Simplify and comment kvm_get_time_scale()
KVM: x86: Remove implicit rdtsc() from kvm_compute_l1_tsc_offset()
KVM: x86: Improve synchronization in kvm_synchronize_tsc()
KVM: x86: Kill last_tsc_{nsec,write,offset} fields
KVM: x86: Replace nr_vcpus_matched_tsc count with all_vcpus_matched_tsc
bool
KVM: x86: Allow KVM master clock mode when TSCs are offset from each other
KVM: selftests: Add master clock offset test
KVM: x86: Factor out kvm_use_master_clock()
KVM: x86: Avoid gratuitous global clock updates
KVM: x86/xen: Prevent runstate times from becoming negative
KVM: x86: Avoid redundant masterclock updates from multiple vCPUs
KVM: x86: Remove runtime Xen TSC frequency CPUID update
KVM: selftests: Add Xen/generic CPUID timing leaf test
KVM: x86: Re-synchronize TSC after KVM_SET_TSC_KHZ
KVM: selftests: Add Xen runstate migration test
KVM: x86: Use ktime_get_snapshot_id() for master clock
KVM: x86: Compute kvmclock base without pvclock_gtod_data
KVM: x86: Cache host vclock_mode for masterclock eligibility checks
KVM: x86: Remove pvclock_gtod_data and private timekeeping code
KVM: x86: Activate master clock from kvm_arch_init_vm()
Jack Allister (3):
UAPI: x86: Move pvclock-abi to UAPI for x86 platforms
KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration
KVM: selftests: Add KVM/PV clock selftest to prove timer correction
Documentation/virt/kvm/api.rst | 37 +
Documentation/virt/kvm/devices/vcpu.rst | 120 ++-
MAINTAINERS | 4 +-
arch/x86/include/asm/kvm_host.h | 16 +-
arch/x86/include/uapi/asm/kvm.h | 7 +
arch/x86/include/{ => uapi}/asm/pvclock-abi.h | 27 +-
arch/x86/kvm/cpuid.c | 16 -
arch/x86/kvm/svm/svm.c | 3 +-
arch/x86/kvm/vmx/vmx.c | 10 +
arch/x86/kvm/x86.c | 1104 ++++++++++++--------
arch/x86/kvm/xen.c | 30 +-
arch/x86/kvm/xen.h | 13 -
include/uapi/linux/kvm.h | 3 +
scripts/xen-hypercalls.sh | 2 +-
tools/testing/selftests/kvm/Makefile.kvm | 5 +
.../selftests/kvm/x86/masterclock_offset_test.c | 180 ++++
.../selftests/kvm/x86/pvclock_migration_test.c | 383 +++++++
tools/testing/selftests/kvm/x86/pvclock_test.c | 443 ++++++++
.../selftests/kvm/x86/xen_cpuid_timing_test.c | 230 ++++
.../testing/selftests/kvm/x86/xen_migration_test.c | 194 ++++
tools/testing/selftests/kvm/x86/xen_shinfo_test.c | 17 +-
21 files changed, 2318 insertions(+), 526 deletions(-)
base-commit: 2d6d57f889f3