On Fri, Jun 30, 2023 at 09:50:41AM -0400, Steven Sistare wrote: > On 6/26/2023 2:27 PM, Peter Xu wrote: > > On Fri, Jun 23, 2023 at 02:25:05PM -0400, Steven Sistare wrote: > >> On 6/21/2023 4:28 PM, Peter Xu wrote: > >>> On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote: > >>>> On 6/20/2023 5:46 PM, Peter Xu wrote: > >>>>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote: > >>>>>> Migration of a guest in the suspended state is broken. The incoming > >>>>>> migration code automatically tries to wake the guest, which IMO is > >>>>>> wrong -- the guest should end migration in the same state it started. > >>>>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), > >>>>>> which > >>>>>> bypasses vm_start(). The guest appears to be in the running state, but > >>>>>> it is not. > >>>>>> > >>>>>> To fix, leave the guest in the suspended state, but call > >>>>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed > >>>>>> later, when the client sends a system_wakeup command. > >>>>>> > >>>>>> Signed-off-by: Steve Sistare <steven.sist...@oracle.com> > >>>>>> --- > >>>>>> migration/migration.c | 11 ++++------- > >>>>>> softmmu/runstate.c | 1 + > >>>>>> 2 files changed, 5 insertions(+), 7 deletions(-) > >>>>>> > >>>>>> diff --git a/migration/migration.c b/migration/migration.c > >>>>>> index 17b4b47..851fe6d 100644 > >>>>>> --- a/migration/migration.c > >>>>>> +++ b/migration/migration.c > >>>>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void > >>>>>> *opaque) > >>>>>> vm_start(); > >>>>>> } else { > >>>>>> runstate_set(global_state_get_runstate()); > >>>>>> + if (runstate_check(RUN_STATE_SUSPENDED)) { > >>>>>> + /* Force vm_start to be called later. */ > >>>>>> + qemu_system_start_on_wakeup_request(); > >>>>>> + } > >>>>> > >>>>> Is this really needed, along with patch 1? > >>>>> > >>>>> I have a very limited knowledge on suspension, so I'm prone to making > >>>>> mistakes.. > >>>>> > >>>>> But from what I read this, qemu_system_wakeup_request() (existing one, > >>>>> not > >>>>> after patch 1 applied) will setup wakeup_reason and kick the main thread > >>>>> using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be > >>>>> done in > >>>>> the main thread later on after qemu_wakeup_requested() returns true. > >>>> > >>>> Correct, here: > >>>> > >>>> if (qemu_wakeup_requested()) { > >>>> pause_all_vcpus(); > >>>> qemu_system_wakeup(); > >>>> notifier_list_notify(&wakeup_notifiers, &wakeup_reason); > >>>> wakeup_reason = QEMU_WAKEUP_REASON_NONE; > >>>> resume_all_vcpus(); > >>>> qapi_event_send_wakeup(); > >>>> } > >>>> > >>>> However, that is not sufficient, because vm_start() was never called on > >>>> the incoming > >>>> side. vm_start calls the vm state notifiers for RUN_STATE_RUNNING, > >>>> among other things. > >>>> > >>>> > >>>> Without my fixes, it "works" because the outgoing migration > >>>> automatically wakes a suspended > >>>> guest, which sets the state to running, which is saved in global state: > >>>> > >>>> void migration_completion(MigrationState *s) > >>>> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); > >>>> global_state_store() > >>>> > >>>> Then the incoming migration calls vm_start here: > >>>> > >>>> migration/migration.c > >>>> if (!global_state_received() || > >>>> global_state_get_runstate() == RUN_STATE_RUNNING) { > >>>> if (autostart) { > >>>> vm_start(); > >>>> > >>>> vm_start must be called for correctness. > >>> > >>> I see. Though I had a feeling that this is still not the right way to do, > >>> at least not as clean. > >>> > >>> One question is, would above work for postcopy when VM is suspended during > >>> the switchover? > >> > >> Good catch, that is broken. > >> I added qemu_system_start_on_wakeup_request to > >> loadvm_postcopy_handle_run_bh > >> and now it works. > >> > >> if (global_state_get_runstate() == RUN_STATE_RUNNING) { > >> if (autostart) { > >> vm_start(); > >> } else { > >> runstate_set(RUN_STATE_PAUSED); > >> } > >> } else { > >> runstate_set(global_state_get_runstate()); > >> if (runstate_check(RUN_STATE_SUSPENDED)) { > >> qemu_system_start_on_wakeup_request(); > >> } > >> } > >> > >>> I think I see your point that vm_start() (mostly vm_prepare_start()) > >>> contains a bunch of operations that maybe we must have before starting the > >>> VM, but then.. should we just make that vm_start() unconditional when > >>> loading VM completes? I just don't see anything won't need it (besides > >>> -S), even COLO. > >>> > >>> So I'm wondering about something like this: > >>> > >>> ===8<=== > >>> --- a/migration/migration.c > >>> +++ b/migration/migration.c > >>> @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void > >>> *opaque) > >>> > >>> dirty_bitmap_mig_before_vm_start(); > >>> > >>> - if (!global_state_received() || > >>> - global_state_get_runstate() == RUN_STATE_RUNNING) { > >>> - if (autostart) { > >>> - vm_start(); > >>> - } else { > >>> - runstate_set(RUN_STATE_PAUSED); > >>> - } > >>> - } else if (migration_incoming_colo_enabled()) { > >>> + if (migration_incoming_colo_enabled()) { > >>> migration_incoming_disable_colo(); > >>> + /* COLO should always have autostart=1 or we can enforce it here > >>> */ > >>> + } > >>> + > >>> + if (autostart) { > >>> + RunState run_state = global_state_get_runstate(); > >>> vm_start(); > >> > >> This will resume the guest for a brief time, against the user's wishes. > >> Not OK IMO. > > > > Ah okay.. > > > > Can we do whatever we need in vm_prepare_start(), then? I assume these > > chunks are needed: > > > > /* > > * WHPX accelerator needs to know whether we are going to step > > * any CPUs, before starting the first one. > > */ > > if (cpus_accel->synchronize_pre_resume) { > > cpus_accel->synchronize_pre_resume(step_pending); > > } > > > > /* We are sending this now, but the CPUs will be resumed shortly later > > */ > > qapi_event_send_resume(); > > > > cpu_enable_ticks(); > > > > While these may not be needed, but instead only needed if RUN_STATE_RUNNING > > below (runstate_set() will be needed regardless): > > > > runstate_set(RUN_STATE_RUNNING); > > vm_state_notify(1, RUN_STATE_RUNNING); > > > > So here's another of my attempt (this time also taking > > !global_state_received() into account)... > > > > RunState run_state; > > > > if (migration_incoming_colo_enabled()) { > > migration_incoming_disable_colo(); > > } > > > > if (!global_state_received()) { > > run_state = RUN_STATE_RUNNING; > > } else { > > run_state = global_state_get_runstate(); > > } > > > > if (autostart) { > > /* Part of vm_prepare_start(), may isolate into a helper? */ > > if (cpus_accel->synchronize_pre_resume) { > > cpus_accel->synchronize_pre_resume(step_pending); > > } > > qapi_event_send_resume(); > > cpu_enable_ticks(); > > /* Setup the runstate on src */ > > runstate_set(run_state); > > if (run_state == RUN_STATE_RUNNING) { > > vm_state_notify(1, RUN_STATE_RUNNING); > > } > > } else { > > runstate_set(RUN_STATE_PAUSED); > > } > > > > The whole idea is still do whatever needed here besides resuming the vcpus, > > rather than leaving the whole vm_start() into system wakeup. It then can > > avoid touching the system wakeup code which seems cleaner. > > The problem is that some actions cannot be performed at migration finish time, > such as vm_state_notify RUN_STATE_RUNNING. The wakeup code called later > still > needs to know that vm_state_notify has not been called, and call it.
Sorry, very late reply.. Can we just call vm_state_notify() earlier? I know I'm not familiar with this code at all.. but I'm still not fully convinced a new global is needed for this. IMHO QEMU becomes harder to maintain with such globals, while already tons exist. > > I just posted a new series with a cleaner wakeup, but it still uses a global. I think the new version does not apply anymore to master. Do you have a tree somewhere? Thanks, -- Peter Xu