Re: [Qemu-devel] [RFC 0/2]: qemu-ga: drop automatic reaper

2012-04-23 Thread Michal Privoznik
On 20.04.2012 15:36, Luiz Capitulino wrote:
 On Fri, 20 Apr 2012 14:07:16 +0200
 Michal Privoznik mpriv...@redhat.com wrote:
 
 But, I think if we tell users we'll *only* send response on error,
 we should do our part to *not* send the responses, rather than relying
 on them having implemented the reset mechanism to throw them away after
 guest wake-up. What we could do is allow a command to set a flag, after
 it reaps it's child (in the case of suspend this would be after
 wake-up, for shutdown it'd basically be a no-op, but worth adding
 for readability sake), to have qemu-ga not send a response. We'd
 implement it similarly to how we did ga_set_response_delimited().

 Fine with me. Stating that do not wait for an OK response, because none
 will be sent sounds clearer than an OK response may, or may not be
 emitted. Or it may be emitted when the VM resumes.


 Just to make this clear: this report-only-error behavior concerns only
 guest-suspend-* and guest-shutdown commands, right? Because otherwise,
 if we enable such behavior for all commands (e.g. fsfreeze) I think we
 are entering the world of pain.
 
 Exactly, this would only concern the suspend and shutdown commands.
 
 From user POV there is a huge difference between those 2 sets of
 commands (suspend/shutdown on one side, the others on the other side):
 - the first emits an event on qemu monitor, so libvirt can catch that
 and confirm suspend/shutdown has succeeded
 
 Oops, this is a different subject but there's a problem here. Events are just
 hints, they shouldn't be your definitive source of information.
 
 For shutdown and suspend-disk, I think that the best indication that
 the command has succeeded is that the VM will successfully exit. We could
 also have a special exit status code for suspend-to-disk, because the
 command could run in parallel with the user powering off the VM and libvirt
 wouldn't know that (and would think the VM is suspended, while it's really
 powered-off).
 
 For suspend-ram and suspend-hybrid, we're missing a 'suspended' RunState.
 The event serve as a good hint and you can use it, but if it's lost for some
 reason (eg. libvirt crashes before it's received) then libvirt can learn the
 VM state by issuing query-status.
 
 Now, going back to the original subject. I have to admit that I'm not sure
 what's the best way to go here.
 
 I'll try to recapitulate (for myself and for those that may be confused) I'll 
 be
 verbose a bit.
 
 We have two qemu-ga commands that are special: guest-shutdown and the 
 guest-suspend
 family. They are special because they shut down the VM or suspend its 
 execution
 (meaning that the world of qemu-ga is gone or gets completely frozen).
 
 Today, shutdown is an asynchronous operation: qemu-ga gets the shutdown 
 process
 started and returns to accept new commands. For qemu-ga clients, this implies:
 
  1. errors in the shutdown operation are not reported back
  2. qemu-ga doesn't block
 
 For qemu-ga this implies having a way to automatically reap terminated 
 children
 processes.
 
 The guest-suspend commands do the same when executing the suspend operation,
 but before they do that they need to query the VM for suspend support and
 this is done by executing pm-is-supported (if available). This fact shouldn't
 be visible to qemu-ga clients, but it has two internal implications:
 
  1. The operation is half synchronous and half asynchronous
  2. In order to bypass the automatic process reaper in qemu-ga when executing
 pm-is-supported, we have to play tricks that makes the suspend code
 more complex than it should be
 
 We have two options:
 
  1. Keep the current behavior (explained above, shutdown is async, suspend
 is half sync half async). For libvirt this means nothing changes, for
 qemu-ga this means more complex code
 
  2. Change everything to be synchronous (this series). This essentially means:
 
  A. errors are going to reported back
  B. qemu-ga will block
  C. we avoid all the dirty tricks, and qemu-ga code becomes simpler
  D. In theory, this should be a compatible change due to the end of world
 nature of the commands involved
 
 A third possible option would be to have asynchronous support. But I'm not
 sure whether this would fit well and how complex this would be (specially
 because of fork()).
 

Okay, thanks for recap.
One thing that I am sure will not play nicely is old libvirt with new
qemu-ga. Here's the flow:

1. User issues virDomainPMSuspend*()
2. Libvirt chews this and calls guest-suspend-* holding up return from
API until an qemu-ga answer is read
3. Guest gets suspended
... Eventually ...
4. Guest gets resumed and qemu-ga returns {'return':{}}
5. Libvirt reads response and returns from API

So, I think if we are going to change these commands, we need to spawn a
new ones so user can distinguish if he's talking to fixed or broken
qemu-ga. Or, we can introduce query-* commands family like we already
have for the 

Re: [Qemu-devel] [RFC 0/2]: qemu-ga: drop automatic reaper

2012-04-23 Thread Michael Roth
On Mon, Apr 23, 2012 at 02:20:53PM +0200, Michal Privoznik wrote:
 On 20.04.2012 15:36, Luiz Capitulino wrote:
  On Fri, 20 Apr 2012 14:07:16 +0200
  Michal Privoznik mpriv...@redhat.com wrote:
  
  But, I think if we tell users we'll *only* send response on error,
  we should do our part to *not* send the responses, rather than relying
  on them having implemented the reset mechanism to throw them away after
  guest wake-up. What we could do is allow a command to set a flag, after
  it reaps it's child (in the case of suspend this would be after
  wake-up, for shutdown it'd basically be a no-op, but worth adding
  for readability sake), to have qemu-ga not send a response. We'd
  implement it similarly to how we did ga_set_response_delimited().
 
  Fine with me. Stating that do not wait for an OK response, because none
  will be sent sounds clearer than an OK response may, or may not be
  emitted. Or it may be emitted when the VM resumes.
 
 
  Just to make this clear: this report-only-error behavior concerns only
  guest-suspend-* and guest-shutdown commands, right? Because otherwise,
  if we enable such behavior for all commands (e.g. fsfreeze) I think we
  are entering the world of pain.
  
  Exactly, this would only concern the suspend and shutdown commands.
  
  From user POV there is a huge difference between those 2 sets of
  commands (suspend/shutdown on one side, the others on the other side):
  - the first emits an event on qemu monitor, so libvirt can catch that
  and confirm suspend/shutdown has succeeded
  
  Oops, this is a different subject but there's a problem here. Events are 
  just
  hints, they shouldn't be your definitive source of information.
  
  For shutdown and suspend-disk, I think that the best indication that
  the command has succeeded is that the VM will successfully exit. We could
  also have a special exit status code for suspend-to-disk, because the
  command could run in parallel with the user powering off the VM and libvirt
  wouldn't know that (and would think the VM is suspended, while it's really
  powered-off).
  
  For suspend-ram and suspend-hybrid, we're missing a 'suspended' RunState.
  The event serve as a good hint and you can use it, but if it's lost for some
  reason (eg. libvirt crashes before it's received) then libvirt can learn the
  VM state by issuing query-status.
  
  Now, going back to the original subject. I have to admit that I'm not sure
  what's the best way to go here.
  
  I'll try to recapitulate (for myself and for those that may be confused) 
  I'll be
  verbose a bit.
  
  We have two qemu-ga commands that are special: guest-shutdown and the 
  guest-suspend
  family. They are special because they shut down the VM or suspend its 
  execution
  (meaning that the world of qemu-ga is gone or gets completely frozen).
  
  Today, shutdown is an asynchronous operation: qemu-ga gets the shutdown 
  process
  started and returns to accept new commands. For qemu-ga clients, this 
  implies:
  
   1. errors in the shutdown operation are not reported back
   2. qemu-ga doesn't block
  
  For qemu-ga this implies having a way to automatically reap terminated 
  children
  processes.
  
  The guest-suspend commands do the same when executing the suspend operation,
  but before they do that they need to query the VM for suspend support and
  this is done by executing pm-is-supported (if available). This fact 
  shouldn't
  be visible to qemu-ga clients, but it has two internal implications:
  
   1. The operation is half synchronous and half asynchronous
   2. In order to bypass the automatic process reaper in qemu-ga when 
  executing
  pm-is-supported, we have to play tricks that makes the suspend code
  more complex than it should be
  
  We have two options:
  
   1. Keep the current behavior (explained above, shutdown is async, suspend
  is half sync half async). For libvirt this means nothing changes, for
  qemu-ga this means more complex code
  
   2. Change everything to be synchronous (this series). This essentially 
  means:
  
   A. errors are going to reported back
   B. qemu-ga will block
   C. we avoid all the dirty tricks, and qemu-ga code becomes simpler
   D. In theory, this should be a compatible change due to the end of 
  world
  nature of the commands involved
  
  A third possible option would be to have asynchronous support. But I'm not
  sure whether this would fit well and how complex this would be (specially
  because of fork()).
  
 
 Okay, thanks for recap.
 One thing that I am sure will not play nicely is old libvirt with new
 qemu-ga. Here's the flow:
 
 1. User issues virDomainPMSuspend*()
 2. Libvirt chews this and calls guest-suspend-* holding up return from
 API until an qemu-ga answer is read
 3. Guest gets suspended
 ... Eventually ...
 4. Guest gets resumed and qemu-ga returns {'return':{}}
 5. Libvirt reads response and returns from API
 
 So, I think if we are going to 

Re: [Qemu-devel] [RFC 0/2]: qemu-ga: drop automatic reaper

2012-04-23 Thread Luiz Capitulino
On Mon, 23 Apr 2012 09:14:18 -0500
Michael Roth mdr...@linux.vnet.ibm.com wrote:

 So, currently, libvirt waits indefinitely (till guest wake-up) for
 the guest-suspend-* response? I think that's broken, since
 guest-suspend-* is documented such that a response is not garaunteed,

That's true, but there's a problem on our side too.

We say that a response may not be sent, but we also don't document what clients
can do to check if the command has been successfully executed. This issue also
exists with shutdown, although it's pretty obvious that if qemu exits with a
status code of zero then it has powered down (or the user did that her/himself,
which doesn't matter). But this has to be documented to be official.

For suspend-ram and suspend-hybrid, the recommended way to know the command
has completed is to wait for the SUSPEND event and/or poll for the RunState
change (will submit a patch for this).

Now, I don't know what to do for suspend-disk. I was hoping that having a
different exit status for it was possible, but Gleb says this has problems
too as guests may emulate S4 by saving ram contents and then powering
down. Maybe we could use this anyway and let clients know that a regular
exit status is also possible.

However, there's another issue. This is out of qemu-ga's realm, but it's
interesting to find an unified solution: libvirt also wants to know when
the user suspends to S4 from user-space. Having a different status code
would be perfect for this...



Re: [Qemu-devel] [RFC 0/2]: qemu-ga: drop automatic reaper

2012-04-20 Thread Paolo Bonzini
Il 19/04/2012 21:34, Luiz Capitulino ha scritto:
 If this turns out to be a problem for libvirt, I'd say it's a libvirt bug.

FWIW, I agree completely.

Paolo



Re: [Qemu-devel] [RFC 0/2]: qemu-ga: drop automatic reaper

2012-04-20 Thread Michal Privoznik
On 19.04.2012 21:34, Luiz Capitulino wrote:
 On Thu, 19 Apr 2012 13:57:48 -0500
 Michael Roth mdr...@linux.vnet.ibm.com wrote:
 
 On Thu, Apr 19, 2012 at 02:36:53PM -0300, Luiz Capitulino wrote:
 Michael,

 I'm going to revive this topic one more time. I was working on v2 of my 
 fixes
 to the suspend race bugs and really thought that the real problem is that 
 the
 code shouldn't be that complex.

 Basically, this series drops the automatic reaper and adds waitpid() calls 
 related logic to the suspend and shutdown functions.

 My objective with this RFC is to understand why this can't be done.

 Hi Luiz,

 CC'ing Michael as some of these suggestions might affect his work on
 libvirt integration.
 
 I forgot to CC him and Eric.
 
 Thanks for taking the time to implement what this would look like
 this. I think this is a sensible approach in general, the main issue
 for me is the specific commands currently impacted by such a change:
 suspend and shutdown.

 While we do document that these commands may not return a response,
 the new behavior actually introduces a pathological assurance that
 they *won't* return a response:
 shutdown/pm-suspend/echo mem /sys/power/state never return, or don't
 return till the guest wakes up.
 
 Yes, but that can be viewed as an implementation detail because this is
 also possible with the current code (ie. if the suspend process manages
 to suspend the guest before qemu-ga is able to emit a response).
 
 So *every* suspend/shutdown command will result in a timeout or
 indefinite hang to the user. Honestly I just find the behavior
 completely unexpected/unintuitive and would prefer to reduce it to
 being a corner case. As the code stands currently it's actually
 extremely rare that a response won't be returned before the commands
 are executed.
 
 I don't think we can assume this is a corner case. It all depends on how
 the two processes are scheduled. On an idle VM it's more likely that qemu-ga
 will emit the response first because the suspend process does more I/O, but
 that really depends on the scheduler/system load. This is also true for the
 shutdown command.
 
 As we have discussed with Anthony on irc months ago, the client can't expect
 a response from this command and should reset the connection before sending
 new commands when the VM resumes.
 
 If this turns out to be a problem for libvirt, I'd say it's a libvirt bug.
 
 So that's why we're jumping through hoops atm. It's more a
 philosophical reason than a technical one.

 But if we were to do things as you proposed and document things as
 this command will *only* show a response on error, I guess maybe I
 can live with that... in the case of shutdown older clients will
 start seeing timeouts where they previously expected a nothing
 response. The nothing response never entailed success or failure so
 this shouldn't break anything that wasn't already broken however...
 
 Yes, and again, depending on the VM load it's possible that the VM vanishes
 before qemu-ga emits its response.
 
 But, I think if we tell users we'll *only* send response on error,
 we should do our part to *not* send the responses, rather than relying
 on them having implemented the reset mechanism to throw them away after
 guest wake-up. What we could do is allow a command to set a flag, after
 it reaps it's child (in the case of suspend this would be after
 wake-up, for shutdown it'd basically be a no-op, but worth adding
 for readability sake), to have qemu-ga not send a response. We'd
 implement it similarly to how we did ga_set_response_delimited().
 
 Fine with me. Stating that do not wait for an OK response, because none
 will be sent sounds clearer than an OK response may, or may not be
 emitted. Or it may be emitted when the VM resumes.


Just to make this clear: this report-only-error behavior concerns only
guest-suspend-* and guest-shutdown commands, right? Because otherwise,
if we enable such behavior for all commands (e.g. fsfreeze) I think we
are entering the world of pain.

From user POV there is a huge difference between those 2 sets of
commands (suspend/shutdown on one side, the others on the other side):
- the first emits an event on qemu monitor, so libvirt can catch that
and confirm suspend/shutdown has succeeded
- while the latter emits no event, it is impossible to determine command
successfulness. Simply, because it is hard to tell if command is in
execution phase (e.g. kernel is syncing disks) or command has already
finished (and FS is frozen already).

Therefore, I can live with making commands from suspend/shutdown family
only-error-reporting as long as I can check guest has been really
suspended/shut off.

Michal



Re: [Qemu-devel] [RFC 0/2]: qemu-ga: drop automatic reaper

2012-04-20 Thread Paolo Bonzini
Il 20/04/2012 14:07, Michal Privoznik ha scritto:
 Just to make this clear: this report-only-error behavior concerns only
 guest-suspend-* and guest-shutdown commands, right? Because otherwise,
 if we enable such behavior for all commands (e.g. fsfreeze) I think we
 are entering the world of pain.

Yes, of course.

I hadn't thought of the QEMU event, but that makes not reporting success
of suspend/shutdown totally acceptable.  Especially considering that the
guest can _never_ be sure that suspend will succeed, while QEMU can
report it accurately.

Paolo



Re: [Qemu-devel] [RFC 0/2]: qemu-ga: drop automatic reaper

2012-04-20 Thread Luiz Capitulino
On Fri, 20 Apr 2012 14:07:16 +0200
Michal Privoznik mpriv...@redhat.com wrote:

  But, I think if we tell users we'll *only* send response on error,
  we should do our part to *not* send the responses, rather than relying
  on them having implemented the reset mechanism to throw them away after
  guest wake-up. What we could do is allow a command to set a flag, after
  it reaps it's child (in the case of suspend this would be after
  wake-up, for shutdown it'd basically be a no-op, but worth adding
  for readability sake), to have qemu-ga not send a response. We'd
  implement it similarly to how we did ga_set_response_delimited().
  
  Fine with me. Stating that do not wait for an OK response, because none
  will be sent sounds clearer than an OK response may, or may not be
  emitted. Or it may be emitted when the VM resumes.
 
 
 Just to make this clear: this report-only-error behavior concerns only
 guest-suspend-* and guest-shutdown commands, right? Because otherwise,
 if we enable such behavior for all commands (e.g. fsfreeze) I think we
 are entering the world of pain.

Exactly, this would only concern the suspend and shutdown commands.

 From user POV there is a huge difference between those 2 sets of
 commands (suspend/shutdown on one side, the others on the other side):
 - the first emits an event on qemu monitor, so libvirt can catch that
 and confirm suspend/shutdown has succeeded

Oops, this is a different subject but there's a problem here. Events are just
hints, they shouldn't be your definitive source of information.

For shutdown and suspend-disk, I think that the best indication that
the command has succeeded is that the VM will successfully exit. We could
also have a special exit status code for suspend-to-disk, because the
command could run in parallel with the user powering off the VM and libvirt
wouldn't know that (and would think the VM is suspended, while it's really
powered-off).

For suspend-ram and suspend-hybrid, we're missing a 'suspended' RunState.
The event serve as a good hint and you can use it, but if it's lost for some
reason (eg. libvirt crashes before it's received) then libvirt can learn the
VM state by issuing query-status.

Now, going back to the original subject. I have to admit that I'm not sure
what's the best way to go here.

I'll try to recapitulate (for myself and for those that may be confused) I'll be
verbose a bit.

We have two qemu-ga commands that are special: guest-shutdown and the 
guest-suspend
family. They are special because they shut down the VM or suspend its execution
(meaning that the world of qemu-ga is gone or gets completely frozen).

Today, shutdown is an asynchronous operation: qemu-ga gets the shutdown process
started and returns to accept new commands. For qemu-ga clients, this implies:

 1. errors in the shutdown operation are not reported back
 2. qemu-ga doesn't block

For qemu-ga this implies having a way to automatically reap terminated children
processes.

The guest-suspend commands do the same when executing the suspend operation,
but before they do that they need to query the VM for suspend support and
this is done by executing pm-is-supported (if available). This fact shouldn't
be visible to qemu-ga clients, but it has two internal implications:

 1. The operation is half synchronous and half asynchronous
 2. In order to bypass the automatic process reaper in qemu-ga when executing
pm-is-supported, we have to play tricks that makes the suspend code
more complex than it should be

We have two options:

 1. Keep the current behavior (explained above, shutdown is async, suspend
is half sync half async). For libvirt this means nothing changes, for
qemu-ga this means more complex code

 2. Change everything to be synchronous (this series). This essentially means:

 A. errors are going to reported back
 B. qemu-ga will block
 C. we avoid all the dirty tricks, and qemu-ga code becomes simpler
 D. In theory, this should be a compatible change due to the end of world
nature of the commands involved

A third possible option would be to have asynchronous support. But I'm not
sure whether this would fit well and how complex this would be (specially
because of fork()).




Re: [Qemu-devel] [RFC 0/2]: qemu-ga: drop automatic reaper

2012-04-20 Thread Michael Roth
On Fri, Apr 20, 2012 at 10:36:38AM -0300, Luiz Capitulino wrote:
 On Fri, 20 Apr 2012 14:07:16 +0200
 Michal Privoznik mpriv...@redhat.com wrote:
 
   But, I think if we tell users we'll *only* send response on error,
   we should do our part to *not* send the responses, rather than relying
   on them having implemented the reset mechanism to throw them away after
   guest wake-up. What we could do is allow a command to set a flag, after
   it reaps it's child (in the case of suspend this would be after
   wake-up, for shutdown it'd basically be a no-op, but worth adding
   for readability sake), to have qemu-ga not send a response. We'd
   implement it similarly to how we did ga_set_response_delimited().
   
   Fine with me. Stating that do not wait for an OK response, because none
   will be sent sounds clearer than an OK response may, or may not be
   emitted. Or it may be emitted when the VM resumes.
  
  
  Just to make this clear: this report-only-error behavior concerns only
  guest-suspend-* and guest-shutdown commands, right? Because otherwise,
  if we enable such behavior for all commands (e.g. fsfreeze) I think we
  are entering the world of pain.
 
 Exactly, this would only concern the suspend and shutdown commands.
 
  From user POV there is a huge difference between those 2 sets of
  commands (suspend/shutdown on one side, the others on the other side):
  - the first emits an event on qemu monitor, so libvirt can catch that
  and confirm suspend/shutdown has succeeded
 
 Oops, this is a different subject but there's a problem here. Events are just
 hints, they shouldn't be your definitive source of information.
 
 For shutdown and suspend-disk, I think that the best indication that
 the command has succeeded is that the VM will successfully exit. We could
 also have a special exit status code for suspend-to-disk, because the
 command could run in parallel with the user powering off the VM and libvirt
 wouldn't know that (and would think the VM is suspended, while it's really
 powered-off).

Do we have a way of differentiating a guest shutting down as a result of
shutdown vs. suspend-to-disk? AFAIK they both have the same ending state. If
so I'm not sure we can say anything other than the machine may or may not
resume when you start it up, depending on whether or not the suspend-to-disk
completed before the machine shut down.

So for both shutdown and suspend-to-disk, i think the best we can do it
treat qemu exit as an indicator of success (from the client perspective,
at least).

And if we can do that, shutdown and suspend-to-disk are pretty
straighforward: we either get an error response, or a timeout. Errors
are easy to report, timeouts: you just keep issuing the reset sequence
and attempting to re-establish a connection. If the machine is hung this
can continue indefinitely, which is fine if your reset sequence also
has a timeout and allows periodic execution of libvirt to it can do
whatever it normally does if a guest is hung.

 
 For suspend-ram and suspend-hybrid, we're missing a 'suspended' RunState.
 The event serve as a good hint and you can use it, but if it's lost for some
 reason (eg. libvirt crashes before it's received) then libvirt can learn the
 VM state by issuing query-status.

Is it pretty trivial to plumb QEVENT_SUSPEND/QEVENT_WAKEUP to a
RunState? If so, checking query-status should be similar to checking for
qemu process exit. If you don't see a transition after a certain
threshold, you transition back into qemu-ga reset sequence loop, and
treat the machine the same way libvirt would treat any hung guest.

The one downside to query-status vs. an event is that there's a race
window where the suspend succeeds but gets woken up before we see it in
a suspended state, and event would work around this, but there may as
you said be instances where we miss it.

 
 Now, going back to the original subject. I have to admit that I'm not sure
 what's the best way to go here.
 
 I'll try to recapitulate (for myself and for those that may be confused) I'll 
 be
 verbose a bit.
 
 We have two qemu-ga commands that are special: guest-shutdown and the 
 guest-suspend
 family. They are special because they shut down the VM or suspend its 
 execution
 (meaning that the world of qemu-ga is gone or gets completely frozen).
 
 Today, shutdown is an asynchronous operation: qemu-ga gets the shutdown 
 process
 started and returns to accept new commands. For qemu-ga clients, this implies:
 
  1. errors in the shutdown operation are not reported back
  2. qemu-ga doesn't block
 
 For qemu-ga this implies having a way to automatically reap terminated 
 children
 processes.
 
 The guest-suspend commands do the same when executing the suspend operation,
 but before they do that they need to query the VM for suspend support and
 this is done by executing pm-is-supported (if available). This fact shouldn't
 be visible to qemu-ga clients, but it has two internal implications:
 
  1. The operation is 

Re: [Qemu-devel] [RFC 0/2]: qemu-ga: drop automatic reaper

2012-04-20 Thread Michael Roth
On Fri, Apr 20, 2012 at 12:26:10PM -0500, Michael Roth wrote:
 On Fri, Apr 20, 2012 at 10:36:38AM -0300, Luiz Capitulino wrote:
  On Fri, 20 Apr 2012 14:07:16 +0200
  Michal Privoznik mpriv...@redhat.com wrote:
  
But, I think if we tell users we'll *only* send response on error,
we should do our part to *not* send the responses, rather than relying
on them having implemented the reset mechanism to throw them away after
guest wake-up. What we could do is allow a command to set a flag, after
it reaps it's child (in the case of suspend this would be after
wake-up, for shutdown it'd basically be a no-op, but worth adding
for readability sake), to have qemu-ga not send a response. We'd
implement it similarly to how we did ga_set_response_delimited().

Fine with me. Stating that do not wait for an OK response, because none
will be sent sounds clearer than an OK response may, or may not be
emitted. Or it may be emitted when the VM resumes.
   
   
   Just to make this clear: this report-only-error behavior concerns only
   guest-suspend-* and guest-shutdown commands, right? Because otherwise,
   if we enable such behavior for all commands (e.g. fsfreeze) I think we
   are entering the world of pain.
  
  Exactly, this would only concern the suspend and shutdown commands.
  
   From user POV there is a huge difference between those 2 sets of
   commands (suspend/shutdown on one side, the others on the other side):
   - the first emits an event on qemu monitor, so libvirt can catch that
   and confirm suspend/shutdown has succeeded
  
  Oops, this is a different subject but there's a problem here. Events are 
  just
  hints, they shouldn't be your definitive source of information.
  
  For shutdown and suspend-disk, I think that the best indication that
  the command has succeeded is that the VM will successfully exit. We could
  also have a special exit status code for suspend-to-disk, because the
  command could run in parallel with the user powering off the VM and libvirt
  wouldn't know that (and would think the VM is suspended, while it's really
  powered-off).
 
 Do we have a way of differentiating a guest shutting down as a result of
 shutdown vs. suspend-to-disk? AFAIK they both have the same ending state. If
 so I'm not sure we can say anything other than the machine may or may not
 resume when you start it up, depending on whether or not the suspend-to-disk
 completed before the machine shut down.
 
 So for both shutdown and suspend-to-disk, i think the best we can do it
 treat qemu exit as an indicator of success (from the client perspective,
 at least).
 
 And if we can do that, shutdown and suspend-to-disk are pretty
 straighforward: we either get an error response, or a timeout. Errors

for the remaining, non-success cases, I mean. process exit being the
single success case.

 are easy to report, timeouts: you just keep issuing the reset sequence
 and attempting to re-establish a connection. If the machine is hung this
 can continue indefinitely, which is fine if your reset sequence also
 has a timeout and allows periodic execution of libvirt to it can do
 whatever it normally does if a guest is hung.
 

Althrough there are also halt and reboot, which have a different success
case... reboot I think will just always been either an error, or a
timeout, and since timeout induces the reset sequence loop we should
eventually re-establish connection with qemu-ga and be fine. So that's
good.

Halt I think just falls into the hung guest category...

  
  For suspend-ram and suspend-hybrid, we're missing a 'suspended' RunState.
  The event serve as a good hint and you can use it, but if it's lost for some
  reason (eg. libvirt crashes before it's received) then libvirt can learn the
  VM state by issuing query-status.
 
 Is it pretty trivial to plumb QEVENT_SUSPEND/QEVENT_WAKEUP to a
 RunState? If so, checking query-status should be similar to checking for
 qemu process exit. If you don't see a transition after a certain
 threshold, you transition back into qemu-ga reset sequence loop, and
 treat the machine the same way libvirt would treat any hung guest.
 
 The one downside to query-status vs. an event is that there's a race
 window where the suspend succeeds but gets woken up before we see it in
 a suspended state, and event would work around this, but there may as
 you said be instances where we miss it.
 
  
  Now, going back to the original subject. I have to admit that I'm not sure
  what's the best way to go here.
  
  I'll try to recapitulate (for myself and for those that may be confused) 
  I'll be
  verbose a bit.
  
  We have two qemu-ga commands that are special: guest-shutdown and the 
  guest-suspend
  family. They are special because they shut down the VM or suspend its 
  execution
  (meaning that the world of qemu-ga is gone or gets completely frozen).
  
  Today, shutdown is an asynchronous operation: qemu-ga gets the shutdown 
  process
  

Re: [Qemu-devel] [RFC 0/2]: qemu-ga: drop automatic reaper

2012-04-20 Thread Paolo Bonzini
Il 20/04/2012 19:26, Michael Roth ha scritto:
 Do we have a way of differentiating a guest shutting down as a result of
 shutdown vs. suspend-to-disk? AFAIK they both have the same ending state. If
 so I'm not sure we can say anything other than the machine may or may not
 resume when you start it up, depending on whether or not the suspend-to-disk
 completed before the machine shut down.

I'm not sure but I think you can differentiate S4 and S5 in the DSDT.

Paolo



Re: [Qemu-devel] [RFC 0/2]: qemu-ga: drop automatic reaper

2012-04-20 Thread Gleb Natapov
On Fri, Apr 20, 2012 at 09:39:59PM +0200, Paolo Bonzini wrote:
 Il 20/04/2012 19:26, Michael Roth ha scritto:
  Do we have a way of differentiating a guest shutting down as a result of
  shutdown vs. suspend-to-disk? AFAIK they both have the same ending state. If
  so I'm not sure we can say anything other than the machine may or may not
  resume when you start it up, depending on whether or not the suspend-to-disk
  completed before the machine shut down.
 
 I'm not sure but I think you can differentiate S4 and S5 in the DSDT.
 
You can (not done now) but I wouldn't rely on it. Some OSes do suspend
to disk, even when bios does not report S4 support, by doing powerdown. At
least Linux and WinXP do that.

--
Gleb.



[Qemu-devel] [RFC 0/2]: qemu-ga: drop automatic reaper

2012-04-19 Thread Luiz Capitulino
Michael,

I'm going to revive this topic one more time. I was working on v2 of my fixes
to the suspend race bugs and really thought that the real problem is that the
code shouldn't be that complex.

Basically, this series drops the automatic reaper and adds waitpid() calls 
related logic to the suspend and shutdown functions.

My objective with this RFC is to understand why this can't be done.

Thanks.

 qemu-ga.c|   17 +--
 qga/commands-posix.c |  136 ++
 2 files changed, 60 insertions(+), 93 deletions(-)



Re: [Qemu-devel] [RFC 0/2]: qemu-ga: drop automatic reaper

2012-04-19 Thread Michael Roth
On Thu, Apr 19, 2012 at 02:36:53PM -0300, Luiz Capitulino wrote:
 Michael,
 
 I'm going to revive this topic one more time. I was working on v2 of my fixes
 to the suspend race bugs and really thought that the real problem is that the
 code shouldn't be that complex.
 
 Basically, this series drops the automatic reaper and adds waitpid() calls 
 related logic to the suspend and shutdown functions.
 
 My objective with this RFC is to understand why this can't be done.

Hi Luiz,

CC'ing Michael as some of these suggestions might affect his work on
libvirt integration.

Thanks for taking the time to implement what this would look like
this. I think this is a sensible approach in general, the main issue
for me is the specific commands currently impacted by such a change:
suspend and shutdown.

While we do document that these commands may not return a response,
the new behavior actually introduces a pathological assurance that
they *won't* return a response:
shutdown/pm-suspend/echo mem /sys/power/state never return, or don't
return till the guest wakes up.

So *every* suspend/shutdown command will result in a timeout or
indefinite hang to the user. Honestly I just find the behavior
completely unexpected/unintuitive and would prefer to reduce it to
being a corner case. As the code stands currently it's actually
extremely rare that a response won't be returned before the commands
are executed.

So that's why we're jumping through hoops atm. It's more a
philosophical reason than a technical one.

But if we were to do things as you proposed and document things as
this command will *only* show a response on error, I guess maybe I
can live with that... in the case of shutdown older clients will
start seeing timeouts where they previously expected a nothing
response. The nothing response never entailed success or failure so
this shouldn't break anything that wasn't already broken however...

But, I think if we tell users we'll *only* send response on error,
we should do our part to *not* send the responses, rather than relying
on them having implemented the reset mechanism to throw them away after
guest wake-up. What we could do is allow a command to set a flag, after
it reaps it's child (in the case of suspend this would be after
wake-up, for shutdown it'd basically be a no-op, but worth adding
for readability sake), to have qemu-ga not send a response. We'd
implement it similarly to how we did ga_set_response_delimited().

Thoughts?

 
 Thanks.
 
  qemu-ga.c|   17 +--
  qga/commands-posix.c |  136 
 ++
  2 files changed, 60 insertions(+), 93 deletions(-)
 



Re: [Qemu-devel] [RFC 0/2]: qemu-ga: drop automatic reaper

2012-04-19 Thread Luiz Capitulino
On Thu, 19 Apr 2012 13:57:48 -0500
Michael Roth mdr...@linux.vnet.ibm.com wrote:

 On Thu, Apr 19, 2012 at 02:36:53PM -0300, Luiz Capitulino wrote:
  Michael,
  
  I'm going to revive this topic one more time. I was working on v2 of my 
  fixes
  to the suspend race bugs and really thought that the real problem is that 
  the
  code shouldn't be that complex.
  
  Basically, this series drops the automatic reaper and adds waitpid() calls 
  related logic to the suspend and shutdown functions.
  
  My objective with this RFC is to understand why this can't be done.
 
 Hi Luiz,
 
 CC'ing Michael as some of these suggestions might affect his work on
 libvirt integration.

I forgot to CC him and Eric.

 Thanks for taking the time to implement what this would look like
 this. I think this is a sensible approach in general, the main issue
 for me is the specific commands currently impacted by such a change:
 suspend and shutdown.
 
 While we do document that these commands may not return a response,
 the new behavior actually introduces a pathological assurance that
 they *won't* return a response:
 shutdown/pm-suspend/echo mem /sys/power/state never return, or don't
 return till the guest wakes up.

Yes, but that can be viewed as an implementation detail because this is
also possible with the current code (ie. if the suspend process manages
to suspend the guest before qemu-ga is able to emit a response).

 So *every* suspend/shutdown command will result in a timeout or
 indefinite hang to the user. Honestly I just find the behavior
 completely unexpected/unintuitive and would prefer to reduce it to
 being a corner case. As the code stands currently it's actually
 extremely rare that a response won't be returned before the commands
 are executed.

I don't think we can assume this is a corner case. It all depends on how
the two processes are scheduled. On an idle VM it's more likely that qemu-ga
will emit the response first because the suspend process does more I/O, but
that really depends on the scheduler/system load. This is also true for the
shutdown command.

As we have discussed with Anthony on irc months ago, the client can't expect
a response from this command and should reset the connection before sending
new commands when the VM resumes.

If this turns out to be a problem for libvirt, I'd say it's a libvirt bug.

 So that's why we're jumping through hoops atm. It's more a
 philosophical reason than a technical one.
 
 But if we were to do things as you proposed and document things as
 this command will *only* show a response on error, I guess maybe I
 can live with that... in the case of shutdown older clients will
 start seeing timeouts where they previously expected a nothing
 response. The nothing response never entailed success or failure so
 this shouldn't break anything that wasn't already broken however...

Yes, and again, depending on the VM load it's possible that the VM vanishes
before qemu-ga emits its response.

 But, I think if we tell users we'll *only* send response on error,
 we should do our part to *not* send the responses, rather than relying
 on them having implemented the reset mechanism to throw them away after
 guest wake-up. What we could do is allow a command to set a flag, after
 it reaps it's child (in the case of suspend this would be after
 wake-up, for shutdown it'd basically be a no-op, but worth adding
 for readability sake), to have qemu-ga not send a response. We'd
 implement it similarly to how we did ga_set_response_delimited().

Fine with me. Stating that do not wait for an OK response, because none
will be sent sounds clearer than an OK response may, or may not be
emitted. Or it may be emitted when the VM resumes.