On Fri, Aug 25, 2023 at 09:28:28AM -0400, Steven Sistare wrote: > On 8/24/2023 5:09 PM, Steven Sistare wrote: > > On 8/17/2023 2:23 PM, Peter Xu wrote: > >> On Mon, Aug 14, 2023 at 11:54:26AM -0700, Steve Sistare wrote: > >>> Migration of a guest in the suspended runstate is broken. The incoming > >>> migration code automatically tries to wake the guest, which is wrong; > >>> the guest should end migration in the same runstate it started. Further, > >>> for a restored snapshot, the automatic wakeup fails. The runstate is > >>> RUNNING, but the guest is not. See the commit messages for the details. > >> > >> Hi Steve, > >> > >> I drafted two small patches to show what I meant, on top of this series. > >> Before applying these two, one needs to revert patch 1 in this series. > >> > >> After applied, it should also pass all three new suspend tests. We can > >> continue the discussion here based on the patches. > > > > Your 2 patches look good. I suggest we keep patch 1, and I squash patch 2 > > into the other patches.
Yes. Feel free to reorganize / modify /.. the changes in whatever way you prefer in the final patchset. > > > > There is one more fix needed: on the sending side, if the state is > > suspended, > > then ticks must be disabled so the tick globals are updated before they are > > written to vmstate. Otherwise, tick starts at 0 in the receiver when > > cpu_enable_ticks is called. > > > > ------------------------------------------- > > diff --git a/migration/migration.c b/migration/migration.c > [...] > > ------------------------------------------- > > This diff is just a rough draft. I need to resume ticks if the migration > fails or is cancelled, and I am trying to push the logic into vm_stop, > vm_stop_force_state, and vm_start, and/or vm_prepare_start. Yes this sounds better than hard code things into migration codes, thanks. Maybe at least all the migration related code paths should always use vm_stop_force_state() (e.g. save_snapshot)? At the meantime, AFAIU we should allow runstate_is_running() to return true even for suspended, matching current usages of vm_start() / vm_stop(). But again that can have risk of breaking existing users. I bet you may have a better grasp of what it should look like to solve the current "migrate suspended VM" problem at the minimum but hopefully still in a clean way, so I assume I'll just wait and see. Thanks, -- Peter Xu