On 12/4/2023 12:24 PM, Peter Xu wrote:
> On Fri, Dec 01, 2023 at 11:23:33AM -0500, Steven Sistare wrote:
>>>> @@ -109,6 +117,7 @@ static int global_state_post_load(void *opaque, int 
>>>> version_id)
>>>>          return -EINVAL;
>>>>      }
>>>>      s->state = r;
>>>> +    vm_set_suspended(s->vm_was_suspended || r == RUN_STATE_SUSPENDED);
>>>
>>> IIUC current vm_was_suspended (based on my read of your patch) was not the
>>> same as a boolean representing "whether VM is suspended", but only a
>>> temporary field to remember that for a VM stop request.  To be explicit, I
>>> didn't see this flag set in qemu_system_suspend() in your previous patch.
>>>
>>> If so, we can already do:
>>>
>>>   vm_set_suspended(s->vm_was_suspended);
>>>
>>> Irrelevant of RUN_STATE_SUSPENDED?
>>
>> We need both terms of the expression.
>>
>> If the vm *is* suspended (RUN_STATE_SUSPENDED), then vm_was_suspended = 
>> false.
>> We call global_state_store prior to vm_stop_force_state, so the incoming
>> side sees s->state = RUN_STATE_SUSPENDED and s->vm_was_suspended = false.
> 
> Right.
> 
>> However, the runstate is RUN_STATE_INMIGRATE.  When incoming finishes by
>> calling vm_start, we need to restore the suspended state.  Thus in 
>> global_state_post_load, we must set vm_was_suspended = true.
> 
> With above, shouldn't global_state_get_runstate() (on dest) fetch SUSPENDED
> already?  Then I think it should call vm_start(SUSPENDED) if to start.

The V6 code does not pass a state to vm_start, and knowledge of vm_was_suspended
is confined to the global_state and cpus functions.  IMO this is a more modular
and robust solution, as multiple sites may call vm_start(), and the right thing
happens.  Look at patch 6.  The changes are minimal because vm_start "just 
works".

> Maybe you're talking about the special case where autostart==false?  We
> used to have this (existing process_incoming_migration_bh()):
> 
>     if (!global_state_received() ||
>         global_state_get_runstate() == RUN_STATE_RUNNING) {
>         if (autostart) {
>             vm_start();
>         } else {
>             runstate_set(RUN_STATE_PAUSED);
>         }
>     }
> 
> If so maybe I get you, because in the "else" path we do seem to lose the
> SUSPENDED state again, but in that case IMHO we should logically set
> vm_was_suspended only when we "lose" it - we didn't lose it during
> migration, but only until we decided to switch to PAUSED (due to
> autostart==false). IOW, change above to something like:
> 
>     state = global_state_get_runstate();
>     if (!global_state_received() || runstate_is_alive(state)) {
>         if (autostart) {
>             vm_start(state);
>         } else {
>             if (runstate_is_suspended(state)) {
>                 /* Remember suspended state before setting system to STOPed */
>                 vm_was_suspended = true;
>             }
>             runstate_set(RUN_STATE_PAUSED);
>         }
>     }

This is similar to V5 which tested suspended and fiddled with runstate at
multiple call sites in migration and snapshot.  I believe V6 is cleaner.

> It may or may not have a functional difference even if current patch,
> though.  However maybe clearer to follow vm_was_suspended's strict
> definition.
> 
>>
>> If the vm *was* suspended, but is currently stopped (eg RUN_STATE_PAUSED),
>> then vm_was_suspended = true.  Migration from that state sets
>> vm_was_suspended = s->vm_was_suspended = true in global_state_post_load and 
>> ends with runstate_set(RUN_STATE_PAUSED).
>>
>> I will add a comment here in the code.
>>  
>>>>      return 0;
>>>>  }
>>>> @@ -134,6 +143,7 @@ static const VMStateDescription vmstate_globalstate = {
>>>>      .fields = (VMStateField[]) {
>>>>          VMSTATE_UINT32(size, GlobalState),
>>>>          VMSTATE_BUFFER(runstate, GlobalState),
>>>> +        VMSTATE_BOOL(vm_was_suspended, GlobalState),
>>>>          VMSTATE_END_OF_LIST()
>>>>      },
>>>>  };
>>>
>>> I think this will break migration between old/new, unfortunately.  And
>>> since the global state exist mostly for every VM, all VM setup should be
>>> affected, and over all archs.
>>
>> Thanks, I keep forgetting that my binary tricks are no good here.  However,
>> I have one other trick up my sleeve, which is to store vm_was_running in
>> global_state.runstate[strlen(runstate) + 2].  It is forwards and backwards
>> compatible, since that byte is always 0 in older qemu.  It can be implemented
>> with a few lines of code change confined to global_state.c, versus many 
>> lines 
>> spread across files to do it the conventional way using a compat property and
>> a subsection.  Sound OK?  
> 
> Tricky!  But sounds okay to me.  I think you're inventing some of your own
> way of being compatible, not relying on machine type as a benefit.  If go
> this route please document clearly on the layout and also what it looked
> like in old binaries.
> 
> I think maybe it'll be good to keep using strings, so in the new binaries
> we allow >1 strings, then we define properly on those strings (index 0:
> runstate, existed since start; index 2: suspended, perhaps using "1"/"0" to
> express, while 0x00 means old binary, etc.).
> 
> I hope this trick will need less code than the subsection solution,
> otherwise I'd still consider going with that, which is the "common
> solution".
> 
> Let's also see whether Juan/Fabiano/others has any opinions.

The disadvantage of using strings '0' and '1' is the additional check for
the backwards compatible value 0x00.  No big deal, and I'll do that if you 
prefer, 
but it seems unnecessary. I had already written this for binary 0/1. Not yet 
tested, 
and still needs comments:

-----------
diff --git a/migration/global_state.c b/migration/global_state.c
index 4e2a9d8..8a59554 100644
--- a/migration/global_state.c
+++ b/migration/global_state.c
@@ -32,9 +32,10 @@ static GlobalState global_state;
 static void global_state_do_store(RunState state)
 {
     const char *state_str = RunState_str(state);
-    assert(strlen(state_str) < sizeof(global_state.runstate));
+    assert(strlen(state_str) < sizeof(global_state.runstate) - 2);
     strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate),
               state_str, '\0');
+    global_state.runstate[strlen(state_str) + 1] = vm_get_suspended();
 }

 void global_state_store(void)
@@ -68,6 +69,12 @@ static bool global_state_needed(void *opaque)
         return true;
     }

+    /* If the suspended state must be remembered, it is needed */
+
+    if (vm_get_suspended()) {
+        return true;
+    }
+
     /* If state is running or paused, it is not needed */

     if (strcmp(runstate, "running") == 0 ||
@@ -109,6 +116,8 @@ static int global_state_post_load(void *opaque, int version_
         return -EINVAL;
     }
     s->state = r;
+    bool vm_was_suspended = runstate[strlen(runstate) + 1];
+    vm_set_suspended(vm_was_suspended || r == RUN_STATE_SUSPENDED);

     return 0;
 }
------------

- Steve

Reply via email to