On 8/25/2023 11:07 AM, Peter Xu wrote: > On Fri, Aug 25, 2023 at 09:28:28AM -0400, Steven Sistare wrote: >> On 8/24/2023 5:09 PM, Steven Sistare wrote: >>> On 8/17/2023 2:23 PM, Peter Xu wrote: >>>> On Mon, Aug 14, 2023 at 11:54:26AM -0700, Steve Sistare wrote: >>>>> Migration of a guest in the suspended runstate is broken. The incoming >>>>> migration code automatically tries to wake the guest, which is wrong; >>>>> the guest should end migration in the same runstate it started. Further, >>>>> for a restored snapshot, the automatic wakeup fails. The runstate is >>>>> RUNNING, but the guest is not. See the commit messages for the details. >>>> >>>> Hi Steve, >>>> >>>> I drafted two small patches to show what I meant, on top of this series. >>>> Before applying these two, one needs to revert patch 1 in this series. >>>> >>>> After applied, it should also pass all three new suspend tests. We can >>>> continue the discussion here based on the patches. >>> >>> Your 2 patches look good. I suggest we keep patch 1, and I squash patch 2 >>> into the other patches. > > Yes. Feel free to reorganize / modify /.. the changes in whatever way you > prefer in the final patchset. > >>> >>> There is one more fix needed: on the sending side, if the state is >>> suspended, >>> then ticks must be disabled so the tick globals are updated before they are >>> written to vmstate. Otherwise, tick starts at 0 in the receiver when >>> cpu_enable_ticks is called. >>> >>> ------------------------------------------- >>> diff --git a/migration/migration.c b/migration/migration.c >> [...] >>> ------------------------------------------- >> >> This diff is just a rough draft. I need to resume ticks if the migration >> fails or is cancelled, and I am trying to push the logic into vm_stop, >> vm_stop_force_state, and vm_start, and/or vm_prepare_start. > > Yes this sounds better than hard code things into migration codes, thanks. > > Maybe at least all the migration related code paths should always use > vm_stop_force_state() (e.g. save_snapshot)? > > At the meantime, AFAIU we should allow runstate_is_running() to return true > even for suspended, matching current usages of vm_start() / vm_stop(). But > again that can have risk of breaking existing users. > > I bet you may have a better grasp of what it should look like to solve the > current "migrate suspended VM" problem at the minimum but hopefully still > in a clean way, so I assume I'll just wait and see.
I found a better way. Rather than disabling ticks, I added a pre_save handler to capture and save the correct timer state even if the timer is running, using the logic from cpr_disable_ticks. No changes needed in the migration code: ------------------------------------ diff --git a/softmmu/cpu-timers.c b/softmmu/cpu-timers.c index 117408c..d5af317 100644 --- a/softmmu/cpu-timers.c +++ b/softmmu/cpu-timers.c @@ -157,6 +157,36 @@ static bool icount_shift_state_needed(void *opaque) return icount_enabled() == 2; } +static int cpu_pre_save_ticks(void *opaque) +{ + TimersState *t = &timers_state; + TimersState *snap = opaque; + + seqlock_write_lock(&t->vm_clock_seqlock, &t->vm_clock_lock); + + if (t->cpu_ticks_enabled) { + snap->cpu_ticks_offset = t->cpu_ticks_offset + cpu_get_host_ticks(); + snap->cpu_clock_offset = cpu_get_clock_locked(); + } else { + snap->cpu_ticks_offset = t->cpu_ticks_offset; + snap->cpu_clock_offset = t->cpu_clock_offset; + } + seqlock_write_unlock(&t->vm_clock_seqlock, &t->vm_clock_lock); + return 0; +} + +static int cpu_post_load_ticks(void *opaque, int version_id) +{ + TimersState *t = &timers_state; + TimersState *snap = opaque; + + seqlock_write_lock(&t->vm_clock_seqlock, &t->vm_clock_lock); + t->cpu_ticks_offset = snap->cpu_ticks_offset; + t->cpu_clock_offset = snap->cpu_clock_offset; + seqlock_write_unlock(&t->vm_clock_seqlock, &t->vm_clock_lock); + return 0; +} + /* * Subsection for warp timer migration is optional, because may not be created */ @@ -221,6 +251,8 @@ static const VMStateDescription vmstate_timers = { .name = "timer", .version_id = 2, .minimum_version_id = 1, + .pre_save = cpu_pre_save_ticks, + .post_load = cpu_post_load_ticks, .fields = (VMStateField[]) { VMSTATE_INT64(cpu_ticks_offset, TimersState), VMSTATE_UNUSED(8), @@ -269,9 +301,11 @@ TimersState timers_state; /* initialize timers state and the cpu throttle for convenience */ void cpu_timers_init(void) { + static TimersState timers_snapshot; + seqlock_init(&timers_state.vm_clock_seqlock); qemu_spin_init(&timers_state.vm_clock_lock); - vmstate_register(NULL, 0, &vmstate_timers, &timers_state); + vmstate_register(NULL, 0, &vmstate_timers, &timers_snapshot); cpu_throttle_init(); } ------------------------------------ - Steve