On 7/26/2023 4:18 PM, Peter Xu wrote: > On Fri, Jun 30, 2023 at 09:50:41AM -0400, Steven Sistare wrote: >> On 6/26/2023 2:27 PM, Peter Xu wrote: >>> On Fri, Jun 23, 2023 at 02:25:05PM -0400, Steven Sistare wrote: >>>> On 6/21/2023 4:28 PM, Peter Xu wrote: >>>>> On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote: >>>>>> On 6/20/2023 5:46 PM, Peter Xu wrote: >>>>>>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote: >>>>>>>> Migration of a guest in the suspended state is broken. The incoming >>>>>>>> migration code automatically tries to wake the guest, which IMO is >>>>>>>> wrong -- the guest should end migration in the same state it started. >>>>>>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), >>>>>>>> which >>>>>>>> bypasses vm_start(). The guest appears to be in the running state, but >>>>>>>> it is not. >>>>>>>> >>>>>>>> To fix, leave the guest in the suspended state, but call >>>>>>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed >>>>>>>> later, when the client sends a system_wakeup command. >>>>>>>> >>>>>>>> Signed-off-by: Steve Sistare <steven.sist...@oracle.com> >>>>>>>> --- >>>>>>>> migration/migration.c | 11 ++++------- >>>>>>>> softmmu/runstate.c | 1 + >>>>>>>> 2 files changed, 5 insertions(+), 7 deletions(-) >>>>>>>> >>>>>>>> diff --git a/migration/migration.c b/migration/migration.c >>>>>>>> index 17b4b47..851fe6d 100644 >>>>>>>> --- a/migration/migration.c >>>>>>>> +++ b/migration/migration.c >>>>>>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void >>>>>>>> *opaque) >>>>>>>> vm_start(); >>>>>>>> } else { >>>>>>>> runstate_set(global_state_get_runstate()); >>>>>>>> + if (runstate_check(RUN_STATE_SUSPENDED)) { >>>>>>>> + /* Force vm_start to be called later. */ >>>>>>>> + qemu_system_start_on_wakeup_request(); >>>>>>>> + } >>>>>>> >>>>>>> Is this really needed, along with patch 1? >>>>>>> >>>>>>> I have a very limited knowledge on suspension, so I'm prone to making >>>>>>> mistakes.. >>>>>>> >>>>>>> But from what I read this, qemu_system_wakeup_request() (existing one, >>>>>>> not >>>>>>> after patch 1 applied) will setup wakeup_reason and kick the main thread >>>>>>> using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be >>>>>>> done in >>>>>>> the main thread later on after qemu_wakeup_requested() returns true. >>>>>> >>>>>> Correct, here: >>>>>> >>>>>> if (qemu_wakeup_requested()) { >>>>>> pause_all_vcpus(); >>>>>> qemu_system_wakeup(); >>>>>> notifier_list_notify(&wakeup_notifiers, &wakeup_reason); >>>>>> wakeup_reason = QEMU_WAKEUP_REASON_NONE; >>>>>> resume_all_vcpus(); >>>>>> qapi_event_send_wakeup(); >>>>>> } >>>>>> >>>>>> However, that is not sufficient, because vm_start() was never called on >>>>>> the incoming >>>>>> side. vm_start calls the vm state notifiers for RUN_STATE_RUNNING, >>>>>> among other things. >>>>>> >>>>>> >>>>>> Without my fixes, it "works" because the outgoing migration >>>>>> automatically wakes a suspended >>>>>> guest, which sets the state to running, which is saved in global state: >>>>>> >>>>>> void migration_completion(MigrationState *s) >>>>>> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); >>>>>> global_state_store() >>>>>> >>>>>> Then the incoming migration calls vm_start here: >>>>>> >>>>>> migration/migration.c >>>>>> if (!global_state_received() || >>>>>> global_state_get_runstate() == RUN_STATE_RUNNING) { >>>>>> if (autostart) { >>>>>> vm_start(); >>>>>> >>>>>> vm_start must be called for correctness. >>>>> >>>>> I see. Though I had a feeling that this is still not the right way to do, >>>>> at least not as clean. >>>>> >>>>> One question is, would above work for postcopy when VM is suspended during >>>>> the switchover? >>>> >>>> Good catch, that is broken. >>>> I added qemu_system_start_on_wakeup_request to >>>> loadvm_postcopy_handle_run_bh >>>> and now it works. >>>> >>>> if (global_state_get_runstate() == RUN_STATE_RUNNING) { >>>> if (autostart) { >>>> vm_start(); >>>> } else { >>>> runstate_set(RUN_STATE_PAUSED); >>>> } >>>> } else { >>>> runstate_set(global_state_get_runstate()); >>>> if (runstate_check(RUN_STATE_SUSPENDED)) { >>>> qemu_system_start_on_wakeup_request(); >>>> } >>>> } >>>> >>>>> I think I see your point that vm_start() (mostly vm_prepare_start()) >>>>> contains a bunch of operations that maybe we must have before starting the >>>>> VM, but then.. should we just make that vm_start() unconditional when >>>>> loading VM completes? I just don't see anything won't need it (besides >>>>> -S), even COLO. >>>>> >>>>> So I'm wondering about something like this: >>>>> >>>>> ===8<=== >>>>> --- a/migration/migration.c >>>>> +++ b/migration/migration.c >>>>> @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void >>>>> *opaque) >>>>> >>>>> dirty_bitmap_mig_before_vm_start(); >>>>> >>>>> - if (!global_state_received() || >>>>> - global_state_get_runstate() == RUN_STATE_RUNNING) { >>>>> - if (autostart) { >>>>> - vm_start(); >>>>> - } else { >>>>> - runstate_set(RUN_STATE_PAUSED); >>>>> - } >>>>> - } else if (migration_incoming_colo_enabled()) { >>>>> + if (migration_incoming_colo_enabled()) { >>>>> migration_incoming_disable_colo(); >>>>> + /* COLO should always have autostart=1 or we can enforce it here >>>>> */ >>>>> + } >>>>> + >>>>> + if (autostart) { >>>>> + RunState run_state = global_state_get_runstate(); >>>>> vm_start(); >>>> >>>> This will resume the guest for a brief time, against the user's wishes. >>>> Not OK IMO. >>> >>> Ah okay.. >>> >>> Can we do whatever we need in vm_prepare_start(), then? I assume these >>> chunks are needed: >>> >>> /* >>> * WHPX accelerator needs to know whether we are going to step >>> * any CPUs, before starting the first one. >>> */ >>> if (cpus_accel->synchronize_pre_resume) { >>> cpus_accel->synchronize_pre_resume(step_pending); >>> } >>> >>> /* We are sending this now, but the CPUs will be resumed shortly later >>> */ >>> qapi_event_send_resume(); >>> >>> cpu_enable_ticks(); >>> >>> While these may not be needed, but instead only needed if RUN_STATE_RUNNING >>> below (runstate_set() will be needed regardless): >>> >>> runstate_set(RUN_STATE_RUNNING); >>> vm_state_notify(1, RUN_STATE_RUNNING); >>> >>> So here's another of my attempt (this time also taking >>> !global_state_received() into account)... >>> >>> RunState run_state; >>> >>> if (migration_incoming_colo_enabled()) { >>> migration_incoming_disable_colo(); >>> } >>> >>> if (!global_state_received()) { >>> run_state = RUN_STATE_RUNNING; >>> } else { >>> run_state = global_state_get_runstate(); >>> } >>> >>> if (autostart) { >>> /* Part of vm_prepare_start(), may isolate into a helper? */ >>> if (cpus_accel->synchronize_pre_resume) { >>> cpus_accel->synchronize_pre_resume(step_pending); >>> } >>> qapi_event_send_resume(); >>> cpu_enable_ticks(); >>> /* Setup the runstate on src */ >>> runstate_set(run_state); >>> if (run_state == RUN_STATE_RUNNING) { >>> vm_state_notify(1, RUN_STATE_RUNNING); >>> } >>> } else { >>> runstate_set(RUN_STATE_PAUSED); >>> } >>> >>> The whole idea is still do whatever needed here besides resuming the vcpus, >>> rather than leaving the whole vm_start() into system wakeup. It then can >>> avoid touching the system wakeup code which seems cleaner. >> >> The problem is that some actions cannot be performed at migration finish >> time, >> such as vm_state_notify RUN_STATE_RUNNING. The wakeup code called later >> still >> needs to know that vm_state_notify has not been called, and call it. > > Sorry, very late reply..
And I just returned from vaction :) > Can we just call vm_state_notify() earlier? We cannot. The guest is not running yet, and will not be until later. We cannot call notifiers that perform actions that complete, or react to, the guest entering a running state. > I know I'm not familiar with this code at all.. but I'm still not fully > convinced a new global is needed for this. IMHO QEMU becomes harder to > maintain with such globals, while already tons exist. > >> I just posted a new series with a cleaner wakeup, but it still uses a global. See "[PATCH V2 01/10] vl: start on wakeup request" in the new series. The transitions of the global are trivial and sensible: vm_start() vm_started = true; do_vm_stop() vm_started = false; current_run_state is already a global, confined to runstate.c. vm_started is a new qualifier on the state, and hence is also global. If runstate were a field in MachineState, then I would add vm_started to MachineState, but MachineState is not used in runstate.c. > I think the new version does not apply anymore to master. Do you have a > tree somewhere? I do not, but I will rebase and post V3 in a moment. - Steve