Hi David, Thank you very much for quick reply!
On 9/22/25 9:58 AM, David Woodhouse wrote: > On Mon, 2025-09-22 at 09:37 -0700, Dongli Zhang wrote: >> Hi, >> >> Would you mind helping confirm if kvm-clock/guest_tsc should stop counting >> elapsed time during downtime blackout? >> >> 1. guest_clock=T1, realtime=R1. >> 2. (qemu) stop >> 3. Wait for several seconds. >> 4. (qemu) cont >> 5. guest_clock=T2, realtime=R2. >> >> Should (T1 == T2), or (R2 - R1 == T2 - T1)? > > Neither. > > Realtime is something completely different and runs at a different rate > to the monotonic clock. In fact its rate compared to the monotonic > clock (and the TSC) is *variable* as NTP guides it. > > In your example of stopping and continuing on the *same* host, the > guest TSC *offset* from the host's TSC should remain the same. > > And the *precise* mathematical relationship that KVM advertises to the > guest as "how to turn a TSC value into nanoseconds since boot" should > also remain precisely the same. Does that mean: Regarding "stop/cont" scenario, both kvm-clock and guest_tsc value should remain the same, i.e., 1. When "stop", kvm-clock=K1, guest_tsc=T1. 2. Suppose many hours passed. 3. When "cont", guest VM should see kvm-clock==K1 and guest_tsc==T1, by refreshing both PVTI and tsc_offset at KVM. As demonstrated in my test, currently guest_tsc doesn't stop counting during blackout because of the lack of "MSR_IA32_TSC put" at kvmclock_vm_state_change(). Per my understanding, it is a bug and we may need to fix it. BTW, kvmclock_vm_state_change() already utilizes KVM_SET_CLOCK to re-configure kvm-clock before continuing the guest VM. > > KVM already lets you restore the TSC correctly. To restore KVM clock > correctly, you want something like KVM_SET_CLOCK_GUEST from > https://lore.kernel.org/all/[email protected]/ > > For cross machine migration, you *do* need to use a realtime clock > reference as that's the best you have (make sure you use TAI not UTC > and don't get affected by leap seconds or smearing). Use that to > restore the *TSC* as well as you can to make it appear to have kept > running consistently. And then KVM_SET_CLOCK_GUEST just as you would on > the same host. Indeed QEMU Live Migration also relies on kvmclock_vm_state_change() to temporarily stop/cont the source/target VM. Would you mean we expect something different for live migration, i.e., 1. Live Migrate a source VM to a file. 2. Copy the file to another server. 3. Wait for 1 hour. 4. Migrate from the file to target VM. Although it is equivalent to a one-hour downtime, we do need to count the missing one-hour, correct? That means: we have different expectations from stop/cont and live migration. - Live Migration: any downtime should be counted with the help from realtime. - stop/cont (savevm/loadvm): the value of kvm-clock/rdtsc should remain the same. > > And use vmclock to advertise the wallclock time to the guest as > precisely as possible, even the cycle after a live migration. > Thank you very much for suggestion on KVM_SET_CLOCK_GUEST and vmclock! Dongli Zhang
