Hi, Would you mind helping confirm if kvm-clock/guest_tsc should stop counting elapsed time during downtime blackout?
1. guest_clock=T1, realtime=R1. 2. (qemu) stop 3. Wait for several seconds. 4. (qemu) cont 5. guest_clock=T2, realtime=R2. Should (T1 == T2), or (R2 - R1 == T2 - T1)? For instance, suppose guest clocksource is 'tsc'. It is still incrementing during QEMU downtime blackout. [root@vm ~]# while true; do date; sleep 1; done Tue Sep 9 15:28:37 PDT 2025 Tue Sep 9 15:28:38 PDT 2025 Tue Sep 9 15:28:39 PDT 2025 Tue Sep 9 15:28:40 PDT 2025 Tue Sep 9 15:28:41 PDT 2025 Tue Sep 9 15:28:42 PDT 2025 Tue Sep 9 15:28:43 PDT 2025 ===> (qemu) stop, wait for 14 seconds. ---> 14 seconds! Tue Sep 9 15:28:57 PDT 2025 ===> (qemu) cont Tue Sep 9 15:28:58 PDT 2025 Tue Sep 9 15:28:59 PDT 2025 Tue Sep 9 15:29:00 PDT 2025 Tue Sep 9 15:29:01 PDT 2025 However, 'kvm-clock' stops incrementing during the blackout. [root@vm ~]# while true; do date; sleep 1; done Tue Sep 9 15:35:59 PDT 2025 Tue Sep 9 15:36:00 PDT 2025 Tue Sep 9 15:36:01 PDT 2025 Tue Sep 9 15:36:02 PDT 2025 Tue Sep 9 15:36:03 PDT 2025 ===> (qemu) stop, wait for many seconds. ---> No gap! Tue Sep 9 15:36:04 PDT 2025 ===> (qemu) cont Tue Sep 9 15:36:05 PDT 2025 Tue Sep 9 15:36:06 PDT 2025 Tue Sep 9 15:36:07 PDT 2025 Tue Sep 9 15:36:08 PDT 2025 Tue Sep 9 15:36:09 PDT 2025 Tue Sep 9 15:36:10 PDT 2025 Tue Sep 9 15:36:11 PDT 2025 Tue Sep 9 15:36:12 PDT 2025 They are many use cases that can involve a long/short downtime blackout. - stop/cont - savevm/loadvm - live migration, especially from/to a file. - dump-guest-memory - cpr? The KVM already exposes 'KVM_CLOCK_REALTIME' and 'KVM_VCPU_TSC_OFFSET' to help count all elapsed time. https://lore.kernel.org/all/[email protected]/ This is a prototype to demonstrate how QEMU can count elapsed downtime by taking advantage of 'KVM_CLOCK_REALTIME'. >From b97a514ac227645010ce3d1012af3a4943413844 Mon Sep 17 00:00:00 2001 From: Dongli Zhang <[email protected]> Date: Thu, 18 Sep 2025 14:59:42 -0700 Subject: [PATCH 1/1] target/i386/kvm: take advantage of KVM_CLOCK_REALTIME The Linux kernel commit c68dc1b577ea ("KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK") introduced 'realtime' field and KVM_CLOCK_REALTIME. The 'realtime' value is saved through KVM_GET_CLOCK and restored via KVM_SET_CLOCK. This enables the KVM clock to advance by the amount of elapsed downtime realtime during operations such as live migration, stop/cont, and savevm/loadvm. This patch/feature allows QEMU to take advantage of KVM_CLOCK_REALTIME. Signed-off-by: Dongli Zhang <[email protected]> --- hw/i386/kvm/clock.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c index f56382717f..906346ce2f 100644 --- a/hw/i386/kvm/clock.c +++ b/hw/i386/kvm/clock.c @@ -38,6 +38,8 @@ struct KVMClockState { /*< public >*/ uint64_t clock; + uint64_t realtime; + uint32_t flags; bool clock_valid; /* whether the 'clock' value was obtained in the 'paused' state */ @@ -107,7 +109,10 @@ static void kvm_update_clock(KVMClockState *s) fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(-ret)); abort(); } + s->clock = data.clock; + s->flags = data.flags & KVM_CLOCK_REALTIME; + s->realtime = data.realtime; /* If kvm_has_adjust_clock_stable() is false, KVM_GET_CLOCK returns * essentially CLOCK_MONOTONIC plus a guest-specific adjustment. This @@ -186,6 +191,11 @@ static void kvmclock_vm_state_change(void *opaque, bool running, s->clock_valid = false; data.clock = s->clock; + if (s->flags & KVM_CLOCK_REALTIME) { + data.flags = s->flags; + data.realtime = s->realtime; + } + ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data); if (ret < 0) { fprintf(stderr, "KVM_SET_CLOCK failed: %s\n", strerror(-ret)); @@ -259,6 +269,7 @@ static int kvmclock_pre_load(void *opaque) KVMClockState *s = opaque; s->clock_is_reliable = false; + s->flags = 0; return 0; } @@ -290,12 +301,14 @@ static int kvmclock_pre_save(void *opaque) static const VMStateDescription kvmclock_vmsd = { .name = "kvmclock", - .version_id = 1, + .version_id = 2, .minimum_version_id = 1, .pre_load = kvmclock_pre_load, .pre_save = kvmclock_pre_save, .fields = (const VMStateField[]) { VMSTATE_UINT64(clock, KVMClockState), + VMSTATE_UINT64(realtime, KVMClockState), + VMSTATE_UINT32(flags, KVMClockState), VMSTATE_END_OF_LIST() }, .subsections = (const VMStateDescription * const []) { -- 2.39.3 To take advantage of 'KVM_VCPU_TSC_OFFSET' can further improve 'guest_tsc'. Any suggestion on whether kvm-clock/guest_tsc should stop/continue counting during the blackout? Any expectation or requirement by QEMU? Thank you very much! Dongli Zhang
