On 6/8/26 7:47 AM, David Woodhouse wrote:
> diff --git a/Documentation/virt/kvm/devices/vcpu.rst
> b/Documentation/virt/kvm/devices/vcpu.rst
> index 5e3805820010..167aa4140d30 100644
> --- a/Documentation/virt/kvm/devices/vcpu.rst
> +++ b/Documentation/virt/kvm/devices/vcpu.rst
> @@ -243,7 +243,10 @@ Returns:
> Specifies the guest's TSC offset relative to the host's TSC. The guest's
> TSC is then derived by the following equation:
>
> - guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET
> + guest_tsc = ((host_tsc * tsc_scale_ratio) >> tsc_scale_bits) +
> KVM_VCPU_TSC_OFFSET
> +
> +The values of tsc_scale_ratio and tsc_scale_bits can be obtained using
> +the KVM_VCPU_TSC_SCALE attribute.
>
> This attribute is useful to adjust the guest's TSC on live migration,
> so that the TSC counts the time during which the VM was paused. The
> @@ -251,44 +254,100 @@ following describes a possible algorithm to use for
> this purpose.
>
> From the source VMM process:
>
> -1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_src),
> +1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (host_tsc_src),
> kvmclock nanoseconds (guest_src), and host CLOCK_REALTIME nanoseconds
> - (host_src).
> + (time_src) at a given moment (Tsrc).
> +
> +2. For each vCPU[i]:
> +
> + a. Read the KVM_VCPU_TSC_OFFSET attribute to record the guest TSC offset
> + (ofs_src[i]).
>
> -2. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the
> - guest TSC offset (ofs_src[i]).
> + b. Read the KVM_VCPU_TSC_SCALE attribute to record the guest TSC scaling
> + ratio (ratio_src[i], frac_bits_src[i]).
>
> -3. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the
> - guest's TSC (freq).
> + c. Use host_tsc_src and the scaling/offset factors to calculate this
> + vCPU's TSC at time Tsrc:
> +
> + tsc_src[i] = ((host_tsc_src * ratio_src[i]) >> frac_bits_src[i]) +
> ofs_src[i]
> +
> +3. Invoke the KVM_GET_CLOCK_GUEST ioctl on the boot vCPU to return the KVM
> + clock as a function of the guest TSC (pvti_src). (This ioctl may not
> + succeed if the host and guest TSCs are not consistent and well-behaved.)
>
> From the destination VMM process:
>
> -4. Invoke the KVM_SET_CLOCK ioctl, providing the source nanoseconds from
> - kvmclock (guest_src) and CLOCK_REALTIME (host_src) in their respective
> - fields. Ensure that the KVM_CLOCK_REALTIME flag is set in the provided
> - structure.
> +4. Before creating the vCPUs, invoke the KVM_SET_TSC_KHZ ioctl on the VM, to
> + set the scaled frequency of the guest's TSC (freq).
> +
> +5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (host_tsc_dst) and
> + host CLOCK_REALTIME nanoseconds (time_dst) at a given moment (Tdst).
> +
> +6. Calculate the number of nanoseconds elapsed between Tsrc and Tdst:
> +
> + ΔT = time_dst - time_src
> +
> +7. As each vCPU[i] is created:
> +
> + a. Read the KVM_VCPU_TSC_SCALE attribute to record the guest TSC scaling
> + ratio (ratio_dst[i], frac_bits_dst[i]).
> +
> + b. Calculate the intended guest TSC value at time Tdst:
> +
> + tsc_dst[i] = tsc_src[i] + (ΔT * freq[i])
>
> - KVM will advance the VM's kvmclock to account for elapsed time since
> - recording the clock values. Note that this will cause problems in
> - the guest (e.g., timeouts) unless CLOCK_REALTIME is synchronized
> - between the source and destination, and a reasonably short time passes
> - between the source pausing the VMs and the destination executing
> - steps 4-7.
> + c. Use host_tsc_dst and the scaling factors to calculate this vCPU's
> + raw scaled TSC at time Tdst without offsetting:
> +
> + raw_dst[i] = ((host_tsc_dst * ratio_dst[i]) >> frac_bits_dst[i])
> +
> + d. Calculate ofs_dst[i] = tsc_dst[i] - raw_dst[i] and set the resulting
> + offset using the KVM_VCPU_TSC_OFFSET attribute.
> +
> +8. If pvti_src was provided, invoke the KVM_SET_CLOCK_GUEST ioctl on the boot
> + vCPU to restore the KVM clock as a precise function of the guest TSC.
> +
> +9. If KVM_SET_CLOCK_GUEST was not available or failed (e.g. because the
> + master clock is not active), fall back to the KVM_SET_CLOCK ioctl,
> + providing the source nanoseconds from kvmclock (guest_src) and
> + CLOCK_REALTIME (time_src) in their respective fields. Ensure that the
> + KVM_CLOCK_REALTIME flag is set in the provided structure.
> +
> + KVM will restore the VM's kvmclock, accounting for elapsed time since
> + the clock values were recorded. Note that this will cause problems in
> + the guest (e.g., timeouts) unless CLOCK_REALTIME is synchronized between
> + the source and destination, and a reasonably short time passes between
> + the source pausing the VMs and the destination resuming them.
> + Due to the KVM_[SG]ET_CLOCK API using CLOCK_REALTIME instead of
> + CLOCK_TAI, leap seconds during the migration may also introduce errors.
> +
> +4.2 ATTRIBUTE: KVM_VCPU_TSC_SCALE
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Documentation/virt/kvm/devices/vcpu.rst:327: ERROR: Inconsistent title style:
skip from level 2 to 4.
4.2 ATTRIBUTE: KVM_VCPU_TSC_SCALE
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Established title styles: =/= = - [docutils]
Change this "underline" to use "--------------------------" (for whatever
width is needed) and also add the same to the 4.1 heading.
> +
> +:Parameters: struct kvm_vcpu_tsc_scale
> +
> +Returns:
> +
> + ======= ======================================
> + -EFAULT Error reading the provided parameter
> + address.
> + -ENXIO Attribute not supported (no TSC scaling)
> + -EINVAL Invalid request to write the attribute
> + ======= ======================================
--
~Randy