On Tue, Nov 21, 2023 at 04:21:18PM -0500, Steven Sistare wrote:
> On 11/20/2023 4:44 PM, Peter Xu wrote:
> > On Mon, Nov 20, 2023 at 03:55:54PM -0500, Steven Sistare wrote:
> >> If we drop force, then all calls to vm_stop will completely stop the
> >> suspended state, eg an hmp "stop" command. This causes two problems.
> >> First, that is a change in user-visible behavior for something that
> >> currently works,
> > 
> > IMHO it depends on what should be the correct behavior.  IOW, when VM is in
> > SUSPENDED state and then the user sends "stop" QMP command, what should we
> > expect?
> > 
> > My understanding is we should expect to fully stop the VM, including the
> > ticks, for example.  Keeping the ticks running even after QMP "stop"
> > doesn't sound right, isn't it?
> 
> I agree, but others may not, and this decision would require approval from 
> maintainers in other areas, including upper layers such as libvirt that are
> aware of the suspended state.  It is a user-visible change, and may require 
> a new libvirt release.

$ ./scripts/get_maintainer.pl -f system/cpus.c
Richard Henderson <richard.hender...@linaro.org> (maintainer:Overall TCG CPUs)
Paolo Bonzini <pbonz...@redhat.com> (reviewer:Overall TCG CPUs)
qemu-devel@nongnu.org (open list:All patches CC here)

I'm also copying Richard, while Dan/Paolo is already in the loop, so we
should have the "quorum" already.  Let's see whether we can already get
some comments from the maintainers..

> 
> >> vs the migration code where we are fixing brokenness.
> > 
> > This is not a migration-only bug if above holds, IMO.
> > 
> >> Second, it does not quite work, because the state becomes
> >> RUN_STATE_PAUSED, so the suspended state is forgotten, and the hmp "cont"
> >> will try to set the running state.  I could fix that by introducing a new
> >> state RUN_STATE_SUSPENDED_STOPPED, but again it is a user-visible change
> >> in existing behavior.  (I even implemented that while developing, then I
> >> realized it was not needed to fix the migration bugs.)
> > 
> > Good point.
> > 
> > Now with above comments, what's your thoughts on whether we should change
> > the user behavior?  My answer is still a yes.
> > 
> > Maybe SUSPENDED should not be a RunState at all? SUSPENDED is guest visible
> > behavior, while something like QMP "stop" is not guest visible.  Maybe we
> > should remember it separately?
> >
> > It means qemu_system_suspend() could remember that in a separate field (as
> > part of guest state), then when wakeup we should conditionally go back
> > with/without vcpus running depending on the new "suspended" state.
>
> Regardless of how we remember it, the status command must still expose
> the suspended state to the user.  The user must be able to see that a
> guest is suspended, and decide when to issue a wakeup command.

Hmm, right, we may want to keep having the SUSPENDED state in RunState,
even another separate "vm_suspended" boolean might still be required.

> 
> If we change the stop command to completely stop a suspended vm, then we must
> allow the user to query whether a vm is suspended-running or 
> suspended-stopped,
> because the command they must issue to resume is different: wakeup for the
> former, and cont for the latter.

If it's stopped, the user must need a "cont" anyway.  And then if after
"cont" the user still sees it's suspended, then would "system_wakeup" work
here if necessary, after that "cont"?

Let's consider the current QEMU with below sequence of operations:

  1) vm running
  2) guest triggers ACPI suspend -> vm suspended
  3) admin triggers "stop" cmd -> vm suspended (ignored..)
  4) admin triggers "cont" cmd -> vm suspended (ignored.. too)

AFAICT both 2) and 3) are unwanted behavior, and after noticing 3) I feel
stronger that this is not a migration issue alone.

It also means after step 1)-3) if we got a wakeup elsewhere, the VM can
actually be running!  That's definitely unexpected after admin sends "stop"
already.  Isn't that another real bug?

I'm slightly confused on why you said above that libvirt will need a new
release. Could you elaborate?  Especially on what scenario we need to
maintain compatibility that still makes sense.

> 
> This change is visible to libvirt.  Adding it will delay this entire patch
> series, and is not necessary for fixing the migration bug.  There is no
> downside to drawing the line here for this series, and possibly changing stop
> semantics in the future.

This series will need to wait for rc releases anyway until 8.2 all out:

  https://wiki.qemu.org/Planning/8.2

I think we still have time to even catch the earliest train right after 8.2
released, if we can reach a consensus soon in whatever form.

Having a partial solution merged for migration is probably doable, but that
will make the code even more complicated and harder to maintain.  So before
doing so, I'd at least like to understand better on what use case you were
describing that will start to fall apart.

Thanks,

-- 
Peter Xu


Reply via email to