[Qemu-devel] [RFC 0/7] Optional toplevel sections

2014-10-15 Thread Juan Quintela
Hi

by popular demand, and after too many time, this series.  This is an
RFC to know what people think about how to use them, the interface
proposed, whatever.

* simplify optional subsections moving the "needed" function to
  vmstate description.  I think that this simplification makes sense
  by itself, it is indipendent of the rest of the patches.

* runstate: To make an example of an optional section, I decided to
  use current runstate.  Right now, we have a problem when:
  - we start destination without -S
  - we run migration, and it causes an ioerror on source, but migration finishes
  - we try to run migration on destination anyways, when it is
possible that we could get disk corruption (the ioerror was there for a 
reason)
  Luiz: You can see any obvious improvement about how we use runstates?
  Laine: Could you told me if you (libvirt) like this or would want
 something a bit different?

* I sent that option indpendently for new machine types.

* For old machine types I use this as one example of optional section.
  We only sent it when the state is different from "running" or "paused".

  So, the only case where we fail is if we migrate to an old qemu and
  there is one error.

* On the runstate subsection "postload" we can send any event for
  anything that libvirt wants when migration finishes.
  Laine, can you told us what libvirt would preffer for this?

Kevin: You asked for optional sections in the past for the block
   layer, would this proposal be enough for you?

Please review, comment.

Thanks, Juan.

PD.  Yes, on proper submission, patches 6 and 7 are on the wrong order.

Juan Quintela (7):
  migration: Create optional sections
  runstate: Add runstate store
  runstate: create runstate_index function
  runstate: migration allows more transitions now
  migration: create now section to store global state
  global_state: Make section optional
  vmstate: Create optional sections

 cpus.c|  11 ++--
 docs/migration.txt|  11 ++--
 exec.c|  11 ++--
 hw/acpi/ich9.c|  10 ++--
 hw/acpi/piix4.c   |  10 ++--
 hw/block/fdc.c|  37 +
 hw/char/serial.c  |  41 ++-
 hw/display/qxl.c  |  11 ++--
 hw/display/vga.c  |  11 ++--
 hw/i386/pc_piix.c |   2 +
 hw/ide/core.c |  32 +---
 hw/ide/pci.c  |  16 +++---
 hw/input/pckbd.c  |  22 
 hw/input/ps2.c|  11 ++--
 hw/isa/lpc_ich9.c |  10 ++--
 hw/net/e1000.c|  11 ++--
 hw/net/rtl8139.c  |  11 ++--
 hw/net/vmxnet3.c  |  12 ++---
 hw/pci-host/piix.c|  10 ++--
 hw/scsi/scsi-bus.c|  11 ++--
 hw/timer/hpet.c   |  11 ++--
 hw/timer/mc146818rtc.c|  23 -
 hw/usb/hcd-ohci.c |  11 ++--
 hw/usb/redirect.c |  34 ++--
 hw/virtio/virtio.c|  10 ++--
 include/migration/migration.h |   4 ++
 include/migration/vmstate.h   |  10 ++--
 include/sysemu/sysemu.h   |   2 +
 migration.c   | 117 --
 savevm.c  |  14 +++--
 target-arm/machine.c  |  26 --
 target-i386/machine.c |  71 ++---
 target-ppc/machine.c  |  62 +-
 vl.c  |  26 ++
 vmstate.c |  27 +++---
 35 files changed, 393 insertions(+), 356 deletions(-)

-- 
2.1.0




Re: [Qemu-devel] [RFC 0/7] Optional toplevel sections

2014-10-15 Thread Eric Blake
On 10/15/2014 01:55 AM, Juan Quintela wrote:
> Hi
> 
> by popular demand, and after too many time, this series.  This is an
> RFC to know what people think about how to use them, the interface
> proposed, whatever.
> 
> * simplify optional subsections moving the "needed" function to
>   vmstate description.  I think that this simplification makes sense
>   by itself, it is indipendent of the rest of the patches.
> 
> * runstate: To make an example of an optional section, I decided to
>   use current runstate.  Right now, we have a problem when:
>   - we start destination without -S
>   - we run migration, and it causes an ioerror on source, but migration 
> finishes
>   - we try to run migration on destination anyways, when it is
> possible that we could get disk corruption (the ioerror was there for a 
> reason)
>   Luiz: You can see any obvious improvement about how we use runstates?
>   Laine: Could you told me if you (libvirt) like this or would want
>  something a bit different?

Right now, libvirt always uses -S, then just calls 'cont' on the
destination to resume the CPUs (if the migration was live and the source
was in the running state).  But if we start passing other states, 'cont'
might not be the right thing to do.  For example, if the guest is at S3
on the source, how do we transfer from in migration to S3 at the
destination?  Do we need a new monitor command that says to put the
guest into the same state that migration said it should be in (and the
command fails if migration was from an older source that did not send
the subsection)?

How can libvirt introspect that the destination qemu is new enough to
understand the subsections, and/or that the source qemu is new enough to
send the subsections?

> 
> * I sent that option indpendently for new machine types.
> 
> * For old machine types I use this as one example of optional section.
>   We only sent it when the state is different from "running" or "paused".
> 
>   So, the only case where we fail is if we migrate to an old qemu and
>   there is one error.

Seems reasonable.

> 
> * On the runstate subsection "postload" we can send any event for
>   anything that libvirt wants when migration finishes.
>   Laine, can you told us what libvirt would preffer for this?

So you're thinking that an event on the destination emitted stating that
'incoming migration is complete, and requests the following state' is
sufficient for libvirt to know how to put the domain into that state?

> 
> Kevin: You asked for optional sections in the past for the block
>layer, would this proposal be enough for you?
> 
> Please review, comment.
> 
> Thanks, Juan.
> 
> PD.  Yes, on proper submission, patches 6 and 7 are on the wrong order.
> 
> Juan Quintela (7):
>   migration: Create optional sections
>   runstate: Add runstate store
>   runstate: create runstate_index function
>   runstate: migration allows more transitions now
>   migration: create now section to store global state
>   global_state: Make section optional
>   vmstate: Create optional sections
> 
>  cpus.c|  11 ++--
>  docs/migration.txt|  11 ++--
>  exec.c|  11 ++--
>  hw/acpi/ich9.c|  10 ++--
>  hw/acpi/piix4.c   |  10 ++--
>  hw/block/fdc.c|  37 +
>  hw/char/serial.c  |  41 ++-
>  hw/display/qxl.c  |  11 ++--
>  hw/display/vga.c  |  11 ++--
>  hw/i386/pc_piix.c |   2 +
>  hw/ide/core.c |  32 +---
>  hw/ide/pci.c  |  16 +++---
>  hw/input/pckbd.c  |  22 
>  hw/input/ps2.c|  11 ++--
>  hw/isa/lpc_ich9.c |  10 ++--
>  hw/net/e1000.c|  11 ++--
>  hw/net/rtl8139.c  |  11 ++--
>  hw/net/vmxnet3.c  |  12 ++---
>  hw/pci-host/piix.c|  10 ++--
>  hw/scsi/scsi-bus.c|  11 ++--
>  hw/timer/hpet.c   |  11 ++--
>  hw/timer/mc146818rtc.c|  23 -
>  hw/usb/hcd-ohci.c |  11 ++--
>  hw/usb/redirect.c |  34 ++--
>  hw/virtio/virtio.c|  10 ++--
>  include/migration/migration.h |   4 ++
>  include/migration/vmstate.h   |  10 ++--
>  include/sysemu/sysemu.h   |   2 +
>  migration.c   | 117 
> --
>  savevm.c  |  14 +++--
>  target-arm/machine.c  |  26 --
>  target-i386/machine.c |  71 ++---
>  target-ppc/machine.c  |  62 +-
>  vl.c  |  26 ++
>  vmstate.c |  27 +++---
>  35 files changed, 393 insertions(+), 356 deletions(-)
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC 0/7] Optional toplevel sections

2014-10-15 Thread Juan Quintela
Eric Blake  wrote:
> On 10/15/2014 01:55 AM, Juan Quintela wrote:
>> Hi
>> 
>> by popular demand, and after too many time, this series.  This is an
>> RFC to know what people think about how to use them, the interface
>> proposed, whatever.
>> 
>> * simplify optional subsections moving the "needed" function to
>>   vmstate description.  I think that this simplification makes sense
>>   by itself, it is indipendent of the rest of the patches.
>> 
>> * runstate: To make an example of an optional section, I decided to
>>   use current runstate.  Right now, we have a problem when:
>>   - we start destination without -S
>>   - we run migration, and it causes an ioerror on source, but migration 
>> finishes
>>   - we try to run migration on destination anyways, when it is
>> possible that we could get disk corruption (the ioerror was there for a 
>> reason)
>>   Luiz: You can see any obvious improvement about how we use runstates?
>>   Laine: Could you told me if you (libvirt) like this or would want
>>  something a bit different?
>
> Right now, libvirt always uses -S, then just calls 'cont' on the
> destination to resume the CPUs (if the migration was live and the source
> was in the running state).  But if we start passing other states, 'cont'
> might not be the right thing to do.  For example, if the guest is at S3
> on the source, how do we transfer from in migration to S3 at the
> destination?

It should be transparent (I haven't tested, this series was to ask for
comments before investing more time on it).

> Do we need a new monitor command that says to put the
> guest into the same state that migration said it should be in (and the
> command fails if migration was from an older source that did not send
> the subsection)?

I think that you don't need the command.
Target is started "paused" (-S) or "running" (nothing).


source old: target old: no changes
source old: target new: the same as now, no changes
source new: target new: we set the right state.  And if it is not
"running" we don't run on destination, independent of what is happening.
source new: target old: if source is in state "running" or "paused", no
change.  If source is in error state, we sent
the section and migration gets aborted (target
don't understand it)

source new: target new, running with old machine type:
if state is "running" or "paused", nothing is sent.
if state is "error", target is set to "error".

So, I think that we get all the cases possible right, no?


> How can libvirt introspect that the destination qemu is new enough to
> understand the subsections, and/or that the source qemu is new enough to
> send the subsections?

A new qemu_option value would make for you?

>> 
>> * I sent that option indpendently for new machine types.
>> 
>> * For old machine types I use this as one example of optional section.
>>   We only sent it when the state is different from "running" or "paused".
>> 
>>   So, the only case where we fail is if we migrate to an old qemu and
>>   there is one error.
>
> Seems reasonable.
>
>> 
>> * On the runstate subsection "postload" we can send any event for
>>   anything that libvirt wants when migration finishes.
>>   Laine, can you told us what libvirt would preffer for this?
>
> So you're thinking that an event on the destination emitted stating that
> 'incoming migration is complete, and requests the following state' is
> sufficient for libvirt to know how to put the domain into that state?

It is up to libvirt what to do.

My idea here is that, if you don't use libvirt, you just start without
-S.

If you use libvirt, and you *don't* need to do anything special to run
after migration, you shouldn't use -S.  And I would emit an event saying
"migration was finished".

But what I want to know is _what_ events are you interested in?

Later, Juan.



Re: [Qemu-devel] [RFC 0/7] Optional toplevel sections

2014-10-15 Thread Eric Blake
On 10/15/2014 09:59 AM, Juan Quintela wrote:

>> Do we need a new monitor command that says to put the
>> guest into the same state that migration said it should be in (and the
>> command fails if migration was from an older source that did not send
>> the subsection)?
> 
> I think that you don't need the command.
> Target is started "paused" (-S) or "running" (nothing).

Libvirt will _always_ pass -S.  This is because there is a need for
additional handshaking from destination back to source to let the source
know that the destination is ready to take over operation.

> 
> 
> source old: target old: no changes
> source old: target new: the same as now, no changes
> source new: target new: we set the right state.  And if it is not
> "running" we don't run on destination, independent of what is happening.
> source new: target old: if source is in state "running" or "paused", no
> change.  If source is in error state, we sent
> the section and migration gets aborted (target
> don't understand it)
> 
> source new: target new, running with old machine type:
> if state is "running" or "paused", nothing is sent.
> if state is "error", target is set to "error".
> 
> So, I think that we get all the cases possible right, no?

Only if the existing 'cont' is changed to do something other than put
the destination into 'running', which doesn't sound good; or if we add
some new way for the destination to resume to the state passed in migration.

> 
> 
>> How can libvirt introspect that the destination qemu is new enough to
>> understand the subsections, and/or that the source qemu is new enough to
>> send the subsections?
> 
> A new qemu_option value would make for you?

Or even the existence of a new QMP command in parallel to 'cont' that
has semantics of 'please restore this guest to the state it had on
incoming migration'.

> If you use libvirt, and you *don't* need to do anything special to run
> after migration, you shouldn't use -S.  And I would emit an event saying
> "migration was finished".
> 

No, libvirt will ALWAYS use -S.  So what we need is the hooks for using
-S and still relying on the migration stream rather than the current
status quo of a blind 'cont' (or nothing, if libvirt knows the source
was also in the paused state).  In fact, it is more confusing than that:
libvirt has a live migration mode that will auto-fallback to paused
migration if live migration wasn't converging fast enough.  That is, on
the source, we intentionally pause the source to quit waiting for
convergence, but on the destination we then 'cont' to wake the guest
back up.  In _that_ scenario, the migration stream will contain data
that the guest is paused, but we WANT the destination to be running.

> But what I want to know is _what_ events are you interested in?

Really, an event that the destination is ready to be woken up, and some
indication of the destination having received state in the migration
stream (so that it will wake up to the correct state).

> 
> Later, Juan.
> 
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC 0/7] Optional toplevel sections

2014-10-17 Thread Paolo Bonzini
Il 15/10/2014 17:59, Juan Quintela ha scritto:
> My idea here is that, if you don't use libvirt, you just start without
> -S.

If you don't use libvirt or any other QEMU management layer, you're not
going to do migration except for debugging purposes.  There's just too
much state going on to be able to do it reliably.

> If you use libvirt, and you *don't* need to do anything special to run
> after migration, you shouldn't use -S.

Is this a real requirement, or just "it sounds nicer that way"?  How
much time really passes between the end of migration and the issuing of
the "-cont" command?

And the $1,000,000 questionL.aAre you _absolutely_ sure that an
automatic restart is entirely robust against a failure of the connection
between the two libvirtd instances?  Could you end up with the VM
running on two hosts?  Using -S gets QEMU completely out of the
equation, which is nice.

By the way, some of the states (I can think of io-error, guest-panicked,
watchdog) can be detected on the destination and restored.  Migrating a
machine with io-error state is definitely something that you want to do
no matter what versions of QEMU you have.  It may be the only way to
recover for a network partition like this:

   DISK
  /\
 |  \
 X   |
 |   |
SRC --- DEST

(not impossible: e.g. the SRC->DISK is fibre channel, but the SRC->DEST
link is Ethernet.  Or you have a replicated disk setup, some daemon
fails in SRC's replica but not DEST's).

> And I would emit an event saying
> "migration was finished".

The event should be emitted nevertheless. :)

Paolo



Re: [Qemu-devel] [RFC 0/7] Optional toplevel sections

2014-10-20 Thread Kevin Wolf
Am 15.10.2014 um 09:55 hat Juan Quintela geschrieben:
> Hi
> 
> by popular demand, and after too many time, this series.  This is an
> RFC to know what people think about how to use them, the interface
> proposed, whatever.
> [...]
> 
> Kevin: You asked for optional sections in the past for the block
>layer, would this proposal be enough for you?

I know I've asked in more than one occasion, and of course I don't
remember all the details any more. Anyway, I remember two cases offhand:

* qcow2 with patches like Delayed COW keeps internal block layer state
  in memory that might need to be migrated. This series looks fine for
  this case in principle, we'd just need to find a way to distinguish
  the affected BlockDriverStates. We can probably take a node-name if it
  exists (with Jeff's auto-naming patches not a problem, because then it
  would always exist)

  How do devices solve this? Do they use something like a qdev path to
  identify to which device a given section belongs?

* When a VM is stopped after an I/O error, we need to migrate the
  information about pending requests (bdrv_drain_all doesn't complete
  the failed requests). Currently we do this in device code, but it
  would be very nice to make this common block layer functionality.

  The problem here is that bdrv_aio_readv/writev get an opaque pointer
  back to the device, which of course becomes meaningless during
  migration.

  So this one is tricky even if we have optional top-level sections.

Kevin



Re: [Qemu-devel] [RFC 0/7] Optional toplevel sections

2014-10-20 Thread Dr. David Alan Gilbert
* Paolo Bonzini (pbonz...@redhat.com) wrote:
> Il 15/10/2014 17:59, Juan Quintela ha scritto:
> > My idea here is that, if you don't use libvirt, you just start without
> > -S.
> 
> If you don't use libvirt or any other QEMU management layer, you're not
> going to do migration except for debugging purposes.  There's just too
> much state going on to be able to do it reliably.

I'm not sure that's entirely true - while I agree that most users will
use libvirt, migration with shared disk is pretty easy; the only thing
that you need to do is bring up the tap on the destination, and I'm
not sure libvirt gets the timing ideal for it.

Dave


> 
> > If you use libvirt, and you *don't* need to do anything special to run
> > after migration, you shouldn't use -S.
> 
> Is this a real requirement, or just "it sounds nicer that way"?  How
> much time really passes between the end of migration and the issuing of
> the "-cont" command?
> 
> And the $1,000,000 questionL.aAre you _absolutely_ sure that an
> automatic restart is entirely robust against a failure of the connection
> between the two libvirtd instances?  Could you end up with the VM
> running on two hosts?  Using -S gets QEMU completely out of the
> equation, which is nice.
> 
> By the way, some of the states (I can think of io-error, guest-panicked,
> watchdog) can be detected on the destination and restored.  Migrating a
> machine with io-error state is definitely something that you want to do
> no matter what versions of QEMU you have.  It may be the only way to
> recover for a network partition like this:
> 
>DISK
>   /\
>  |  \
>  X   |
>  |   |
> SRC --- DEST
> 
> (not impossible: e.g. the SRC->DISK is fibre channel, but the SRC->DEST
> link is Ethernet.  Or you have a replicated disk setup, some daemon
> fails in SRC's replica but not DEST's).
> 
> > And I would emit an event saying
> > "migration was finished".
> 
> The event should be emitted nevertheless. :)
> 
> Paolo
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK