Re: [PATCH V3 00/10] fix migration of suspended runstate

2023-08-25 Thread Steven Sistare
On 8/25/2023 11:07 AM, Peter Xu wrote:
> On Fri, Aug 25, 2023 at 09:28:28AM -0400, Steven Sistare wrote:
>> On 8/24/2023 5:09 PM, Steven Sistare wrote:
>>> On 8/17/2023 2:23 PM, Peter Xu wrote:
 On Mon, Aug 14, 2023 at 11:54:26AM -0700, Steve Sistare wrote:
> Migration of a guest in the suspended runstate is broken.  The incoming
> migration code automatically tries to wake the guest, which is wrong;
> the guest should end migration in the same runstate it started.  Further,
> for a restored snapshot, the automatic wakeup fails.  The runstate is
> RUNNING, but the guest is not.  See the commit messages for the details.

 Hi Steve,

 I drafted two small patches to show what I meant, on top of this series.
 Before applying these two, one needs to revert patch 1 in this series.

 After applied, it should also pass all three new suspend tests.  We can
 continue the discussion here based on the patches.
>>>
>>> Your 2 patches look good.  I suggest we keep patch 1, and I squash patch 2
>>> into the other patches.
> 
> Yes.  Feel free to reorganize / modify /.. the changes in whatever way you
> prefer in the final patchset.
> 
>>>
>>> There is one more fix needed: on the sending side, if the state is 
>>> suspended,
>>> then ticks must be disabled so the tick globals are updated before they are
>>> written to vmstate.  Otherwise, tick starts at 0 in the receiver when
>>> cpu_enable_ticks is called.
>>>
>>> ---
>>> diff --git a/migration/migration.c b/migration/migration.c
>> [...]
>>> ---
>>
>> This diff is just a rough draft.  I need to resume ticks if the migration
>> fails or is cancelled, and I am trying to push the logic into vm_stop,
>> vm_stop_force_state, and vm_start, and/or vm_prepare_start.
> 
> Yes this sounds better than hard code things into migration codes, thanks.
> 
> Maybe at least all the migration related code paths should always use
> vm_stop_force_state() (e.g. save_snapshot)?
> 
> At the meantime, AFAIU we should allow runstate_is_running() to return true
> even for suspended, matching current usages of vm_start() / vm_stop().  But
> again that can have risk of breaking existing users.
> 
> I bet you may have a better grasp of what it should look like to solve the
> current "migrate suspended VM" problem at the minimum but hopefully still
> in a clean way, so I assume I'll just wait and see.

I found a better way.
Rather than disabling ticks, I added a pre_save handler to capture and save
the correct timer state even if the timer is running, using the logic from
cpr_disable_ticks. No changes needed in the migration code:


diff --git a/softmmu/cpu-timers.c b/softmmu/cpu-timers.c
index 117408c..d5af317 100644
--- a/softmmu/cpu-timers.c
+++ b/softmmu/cpu-timers.c
@@ -157,6 +157,36 @@ static bool icount_shift_state_needed(void *opaque)
 return icount_enabled() == 2;
 }

+static int cpu_pre_save_ticks(void *opaque)
+{
+TimersState *t = _state;
+TimersState *snap = opaque;
+
+seqlock_write_lock(>vm_clock_seqlock, >vm_clock_lock);
+
+if (t->cpu_ticks_enabled) {
+snap->cpu_ticks_offset = t->cpu_ticks_offset + cpu_get_host_ticks();
+snap->cpu_clock_offset = cpu_get_clock_locked();
+} else {
+snap->cpu_ticks_offset = t->cpu_ticks_offset;
+snap->cpu_clock_offset = t->cpu_clock_offset;
+}
+seqlock_write_unlock(>vm_clock_seqlock, >vm_clock_lock);
+return 0;
+}
+
+static int cpu_post_load_ticks(void *opaque, int version_id)
+{
+TimersState *t = _state;
+TimersState *snap = opaque;
+
+seqlock_write_lock(>vm_clock_seqlock, >vm_clock_lock);
+t->cpu_ticks_offset = snap->cpu_ticks_offset;
+t->cpu_clock_offset = snap->cpu_clock_offset;
+seqlock_write_unlock(>vm_clock_seqlock, >vm_clock_lock);
+return 0;
+}
+
 /*
  * Subsection for warp timer migration is optional, because may not be created
  */
@@ -221,6 +251,8 @@ static const VMStateDescription vmstate_timers = {
 .name = "timer",
 .version_id = 2,
 .minimum_version_id = 1,
+.pre_save = cpu_pre_save_ticks,
+.post_load = cpu_post_load_ticks,
 .fields = (VMStateField[]) {
 VMSTATE_INT64(cpu_ticks_offset, TimersState),
 VMSTATE_UNUSED(8),
@@ -269,9 +301,11 @@ TimersState timers_state;
 /* initialize timers state and the cpu throttle for convenience */
 void cpu_timers_init(void)
 {
+static TimersState timers_snapshot;
+
 seqlock_init(_state.vm_clock_seqlock);
 qemu_spin_init(_state.vm_clock_lock);
-vmstate_register(NULL, 0, _timers, _state);
+vmstate_register(NULL, 0, _timers, _snapshot);

 cpu_throttle_init();
 }


- Steve



Re: [PATCH V3 00/10] fix migration of suspended runstate

2023-08-25 Thread Peter Xu
On Fri, Aug 25, 2023 at 09:28:28AM -0400, Steven Sistare wrote:
> On 8/24/2023 5:09 PM, Steven Sistare wrote:
> > On 8/17/2023 2:23 PM, Peter Xu wrote:
> >> On Mon, Aug 14, 2023 at 11:54:26AM -0700, Steve Sistare wrote:
> >>> Migration of a guest in the suspended runstate is broken.  The incoming
> >>> migration code automatically tries to wake the guest, which is wrong;
> >>> the guest should end migration in the same runstate it started.  Further,
> >>> for a restored snapshot, the automatic wakeup fails.  The runstate is
> >>> RUNNING, but the guest is not.  See the commit messages for the details.
> >>
> >> Hi Steve,
> >>
> >> I drafted two small patches to show what I meant, on top of this series.
> >> Before applying these two, one needs to revert patch 1 in this series.
> >>
> >> After applied, it should also pass all three new suspend tests.  We can
> >> continue the discussion here based on the patches.
> > 
> > Your 2 patches look good.  I suggest we keep patch 1, and I squash patch 2
> > into the other patches.

Yes.  Feel free to reorganize / modify /.. the changes in whatever way you
prefer in the final patchset.

> > 
> > There is one more fix needed: on the sending side, if the state is 
> > suspended,
> > then ticks must be disabled so the tick globals are updated before they are
> > written to vmstate.  Otherwise, tick starts at 0 in the receiver when
> > cpu_enable_ticks is called.
> > 
> > ---
> > diff --git a/migration/migration.c b/migration/migration.c
> [...]
> > ---
> 
> This diff is just a rough draft.  I need to resume ticks if the migration
> fails or is cancelled, and I am trying to push the logic into vm_stop,
> vm_stop_force_state, and vm_start, and/or vm_prepare_start.

Yes this sounds better than hard code things into migration codes, thanks.

Maybe at least all the migration related code paths should always use
vm_stop_force_state() (e.g. save_snapshot)?

At the meantime, AFAIU we should allow runstate_is_running() to return true
even for suspended, matching current usages of vm_start() / vm_stop().  But
again that can have risk of breaking existing users.

I bet you may have a better grasp of what it should look like to solve the
current "migrate suspended VM" problem at the minimum but hopefully still
in a clean way, so I assume I'll just wait and see.

Thanks,

-- 
Peter Xu




Re: [PATCH V3 00/10] fix migration of suspended runstate

2023-08-25 Thread Steven Sistare
On 8/24/2023 5:09 PM, Steven Sistare wrote:
> On 8/17/2023 2:23 PM, Peter Xu wrote:
>> On Mon, Aug 14, 2023 at 11:54:26AM -0700, Steve Sistare wrote:
>>> Migration of a guest in the suspended runstate is broken.  The incoming
>>> migration code automatically tries to wake the guest, which is wrong;
>>> the guest should end migration in the same runstate it started.  Further,
>>> for a restored snapshot, the automatic wakeup fails.  The runstate is
>>> RUNNING, but the guest is not.  See the commit messages for the details.
>>
>> Hi Steve,
>>
>> I drafted two small patches to show what I meant, on top of this series.
>> Before applying these two, one needs to revert patch 1 in this series.
>>
>> After applied, it should also pass all three new suspend tests.  We can
>> continue the discussion here based on the patches.
> 
> Your 2 patches look good.  I suggest we keep patch 1, and I squash patch 2
> into the other patches.
> 
> There is one more fix needed: on the sending side, if the state is suspended,
> then ticks must be disabled so the tick globals are updated before they are
> written to vmstate.  Otherwise, tick starts at 0 in the receiver when
> cpu_enable_ticks is called.
> 
> ---
> diff --git a/migration/migration.c b/migration/migration.c
[...]
> ---

This diff is just a rough draft.  I need to resume ticks if the migration
fails or is cancelled, and I am trying to push the logic into vm_stop,
vm_stop_force_state, and vm_start, and/or vm_prepare_start.

- Steve



Re: [PATCH V3 00/10] fix migration of suspended runstate

2023-08-24 Thread Steven Sistare
On 8/17/2023 2:23 PM, Peter Xu wrote:
> On Mon, Aug 14, 2023 at 11:54:26AM -0700, Steve Sistare wrote:
>> Migration of a guest in the suspended runstate is broken.  The incoming
>> migration code automatically tries to wake the guest, which is wrong;
>> the guest should end migration in the same runstate it started.  Further,
>> for a restored snapshot, the automatic wakeup fails.  The runstate is
>> RUNNING, but the guest is not.  See the commit messages for the details.
> 
> Hi Steve,
> 
> I drafted two small patches to show what I meant, on top of this series.
> Before applying these two, one needs to revert patch 1 in this series.
> 
> After applied, it should also pass all three new suspend tests.  We can
> continue the discussion here based on the patches.

Your 2 patches look good.  I suggest we keep patch 1, and I squash patch 2
into the other patches.

There is one more fix needed: on the sending side, if the state is suspended,
then ticks must be disabled so the tick globals are updated before they are
written to vmstate.  Otherwise, tick starts at 0 in the receiver when
cpu_enable_ticks is called.

---
diff --git a/migration/migration.c b/migration/migration.c
index b004475..89d56a8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -63,6 +63,7 @@
 #include "sysemu/cpus.h"
 #include "yank_functions.h"
 #include "sysemu/qtest.h"
+#include "sysemu/cpu-timers.h"
 #include "options.h"
 #include "sysemu/dirtylimit.h"

@@ -2125,6 +2126,9 @@ static int postcopy_start(MigrationState *ms, Error **errp
 trace_postcopy_start_set_run();

 global_state_store();
+if (runstate_check(RUN_STATE_SUSPENDED)) {
+cpu_disable_ticks();
+}
 ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
 if (ret < 0) {
 goto fail;
@@ -2333,6 +2337,9 @@ static void migration_completion(MigrationState *s)
 s->vm_old_state = runstate_get();
 global_state_store();

+if (runstate_check(RUN_STATE_SUSPENDED)) {
+cpu_disable_ticks();
+}
 ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
 trace_migration_completion_vm_stop(ret);
 if (ret >= 0) {
diff --git a/migration/savevm.c b/migration/savevm.c
index 7b9c477..eff6976 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -58,6 +58,7 @@
 #include "qemu/cutils.h"
 #include "io/channel-buffer.h"
 #include "io/channel-file.h"
+#include "sysemu/cpu-timers.h"
 #include "sysemu/replay.h"
 #include "sysemu/runstate.h"
 #include "sysemu/sysemu.h"
@@ -2969,6 +2970,9 @@ bool save_snapshot(const char *name, bool overwrite, const
 saved_vm_running = runstate_is_running();

 global_state_store();
+if (runstate_check(RUN_STATE_SUSPENDED)) {
+cpu_disable_ticks();
+}
 vm_stop(RUN_STATE_SAVE_VM);

 bdrv_drain_all_begin();
@@ -3037,6 +3041,8 @@ bool save_snapshot(const char *name, bool overwrite, const

 if (saved_vm_running) {
 vm_start();
+} else if (runstate_check(RUN_STATE_SUSPENDED)) {
+cpu_enable_ticks();
 }
 return ret == 0;
 }

---

- Steve



Re: [PATCH V3 00/10] fix migration of suspended runstate

2023-08-17 Thread Peter Xu
On Mon, Aug 14, 2023 at 11:54:26AM -0700, Steve Sistare wrote:
> Migration of a guest in the suspended runstate is broken.  The incoming
> migration code automatically tries to wake the guest, which is wrong;
> the guest should end migration in the same runstate it started.  Further,
> for a restored snapshot, the automatic wakeup fails.  The runstate is
> RUNNING, but the guest is not.  See the commit messages for the details.

Hi Steve,

I drafted two small patches to show what I meant, on top of this series.
Before applying these two, one needs to revert patch 1 in this series.

After applied, it should also pass all three new suspend tests.  We can
continue the discussion here based on the patches.

Thanks,

===
>From 2e495b08be4c56d5d8a47ba1657bae6e316c6254 Mon Sep 17 00:00:00 2001
From: Peter Xu 
Date: Thu, 17 Aug 2023 12:32:00 -0400
Subject: [PATCH 1/2] cpus: Allow vm_prepare_start() to take vm state as input

It was by default always setting the state as RUNNING, but logically
SUSPENDED (acpi s1) should also fall into "vm running" case, where it's
only the vcpus that are stopped running (while everything else is).

Adding such a state parameter to be prepared when we want to prepare start
when not allowing vcpus to start yet (RUN_STATE_SUSPENDED).

Note: I found that not all vm state notifiers are ready for SUSPENDED when
having running=true set.  Here let's always pass in RUNNING irrelevant of
the state passed into vm_prepare_start(), and leave that for later to
figure out.  So far there should have (hopefully) no impact functional
wise.

For this specific patch, no functional change at all should be intended,
because all callers are still passing over RUNNING.

Signed-off-by: Peter Xu 
---
 include/sysemu/runstate.h |  3 ++-
 gdbstub/softmmu.c |  2 +-
 softmmu/cpus.c| 12 +---
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 7beb29c2e2..7d889ab7c7 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -39,8 +39,9 @@ void vm_start(void);
  * vm_prepare_start: Prepare for starting/resuming the VM
  *
  * @step_pending: whether any of the CPUs is about to be single-stepped by gdb
+ * @state: the vm state to setup
  */
-int vm_prepare_start(bool step_pending);
+int vm_prepare_start(bool step_pending, RunState state);
 int vm_stop(RunState state);
 int vm_stop_force_state(RunState state);
 int vm_shutdown(void);
diff --git a/gdbstub/softmmu.c b/gdbstub/softmmu.c
index f509b7285d..a43e8328c0 100644
--- a/gdbstub/softmmu.c
+++ b/gdbstub/softmmu.c
@@ -565,7 +565,7 @@ int gdb_continue_partial(char *newstates)
 }
 }
 
-if (vm_prepare_start(step_requested)) {
+if (vm_prepare_start(step_requested, RUN_STATE_RUNNING)) {
 return 0;
 }
 
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index fed20ffb5d..000fac79b7 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -681,7 +681,7 @@ int vm_stop(RunState state)
  * Returns -1 if the vCPUs are not to be restarted (e.g. if they are already
  * running or in case of an error condition), 0 otherwise.
  */
-int vm_prepare_start(bool step_pending)
+int vm_prepare_start(bool step_pending, RunState state)
 {
 RunState requested;
 
@@ -713,14 +713,20 @@ int vm_prepare_start(bool step_pending)
 qapi_event_send_resume();
 
 cpu_enable_ticks();
-runstate_set(RUN_STATE_RUNNING);
+runstate_set(state);
+/*
+ * FIXME: ignore "state" being passed in for now, notify always with
+ * RUNNING. Because some of the vm state change handlers may not expect
+ * other states (e.g. SUSPENDED) passed in with running=true.  This can
+ * be modified after proper investigation over all vm state notifiers.
+ */
 vm_state_notify(1, RUN_STATE_RUNNING);
 return 0;
 }
 
 void vm_start(void)
 {
-if (!vm_prepare_start(false)) {
+if (!vm_prepare_start(false, RUN_STATE_RUNNING)) {
 resume_all_vcpus();
 }
 }
-- 
2.41.0

===

>From 4a0936eafd03952d58ab380271559c4a2049b96e Mon Sep 17 00:00:00 2001
From: Peter Xu 
Date: Thu, 17 Aug 2023 12:44:29 -0400
Subject: [PATCH 2/2] fixup

Signed-off-by: Peter Xu 
---
 migration/migration.c| 9 +
 tests/qtest/migration-test.c | 9 +
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index e6b8024b03..b004475af6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -497,7 +497,7 @@ static void process_incoming_migration_bh(void *opaque)
 migration_incoming_disable_colo();
 vm_start();
 } else {
-runstate_set(global_state_get_runstate());
+vm_prepare_start(false, global_state_get_runstate());
 }
 /*
  * This must happen after any state changes since as soon as an external
@@ -1143,15 +1143,16 @@ void migrate_set_state(int *state, int old_state, int 
new_state)
 
 void 

[PATCH V3 00/10] fix migration of suspended runstate

2023-08-14 Thread Steve Sistare
Migration of a guest in the suspended runstate is broken.  The incoming
migration code automatically tries to wake the guest, which is wrong;
the guest should end migration in the same runstate it started.  Further,
for a restored snapshot, the automatic wakeup fails.  The runstate is
RUNNING, but the guest is not.  See the commit messages for the details.

Changes in V2:
  * simplify "start on wakeup request"
  * fix postcopy, snapshot, and background migration
  * refactor fixes for each type of migration
  * explicitly handled suspended events and runstate in tests
  * add test for postcopy and background migration

Changes in V3:
  * rebase to tip
  * fix hang in new function migrate_wait_for_dirty_mem

Steve Sistare (10):
  vl: start on wakeup request
  migration: preserve suspended runstate
  migration: add runstate function
  migration: preserve suspended for snapshot
  migration: preserve suspended for bg_migration
  tests/qtest: migration events
  tests/qtest: option to suspend during migration
  tests/qtest: precopy migration with suspend
  tests/qtest: postcopy migration with suspend
  tests/qtest: background migration with suspend

 include/sysemu/runstate.h|   1 +
 migration/migration.c|  28 --
 migration/migration.h|   1 +
 migration/savevm.c   |   9 +-
 softmmu/cpus.c   |  12 +++
 softmmu/runstate.c   |   5 +-
 tests/migration/i386/Makefile|   5 +-
 tests/migration/i386/a-b-bootblock.S |  51 +-
 tests/migration/i386/a-b-bootblock.h |  22 +++--
 tests/qtest/migration-helpers.c  |  27 ++
 tests/qtest/migration-helpers.h  |   9 +-
 tests/qtest/migration-test.c | 177 +++
 12 files changed, 258 insertions(+), 89 deletions(-)

-- 
1.8.3.1