[openstack-dev] [nova] live migration in Mitaka

2015-09-18 Thread Murray, Paul (HP Cloud)
Hi All,

There are various efforts going on around live migration at the moment: fixing 
up CI, bug fixes, additions to cover more corner cases, proposals for new 
operations

Generally live migration could do with a little TLC (see: [1]), so I am going 
to suggest we give some of that care in the next cycle.

Please respond to this post if you have an interest in this and what you would 
like to see done. Include anything you are already getting on with so we get a 
clear picture. If there is enough interest I'll put this together as a proposal 
for a work stream. Something along the lines of "robustify live migration".

Paul

[1]: 
https://www.openstack.org/summit/vancouver-2015/summit-videos/presentation/live-migration-at-hp-public-cloud


Paul Murray
Nova Technical Lead, HP Cloud
+44 117 316 2527

Hewlett-Packard Limited   |   Registered Office: Cain Road, Bracknell, 
Berkshire, RG12 1HN   |Registered No: 690597 England   |VAT Number: GB 
314 1496 79

This e-mail may contain confidential and/or legally privileged material for the 
sole use of the intended recipient.  If you are not the intended recipient (or 
authorized to receive for the recipient) please contact the sender by reply 
e-mail and delete all copies of this message.  If you are receiving this 
message internally within the Hewlett Packard group of companies, you should 
consider the contents "CONFIDENTIAL".

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-10-01 Thread Mathieu Gagné
On 2015-10-01 7:26 AM, Kashyap Chamarthy wrote:
> On Wed, Sep 30, 2015 at 11:25:12AM +, Murray, Paul (HP Cloud) wrote:
>>
>>> Please respond to this post if you have an interest in this and what
>>> you would like to see done.  Include anything you are already
>>> getting on with so we get a clear picture. 
>>
>> Thank you to those who replied to this thread. I have used the
>> contents to start an etherpad page here:
>>
>> https://etherpad.openstack.org/p/mitaka-live-migration
> 
> I added a couple of URLs for upstream libvirt work that allow for
> selective block device migration, and the in-progress generic TLS
> support work by Dan Berrange in upstream QEMU.
> 
>> I have taken the liberty of listing those that responded to the thread
>> and the authors of mentioned patches as interested people.
>  
>> From the responses and looking at the specs up for review it looks
>> like there are about five areas that could be addressed in Mitaka and
>> several others that could come later. The first five are:
>>
>>
>> - migrating instances with a mix of local disks and cinder volumes 
> 
> IIUC, this is possible with the selective block device migration work
> merged in upstream libvirt:
> 
> https://www.redhat.com/archives/libvir-list/2015-May/msg00955.html
> 

Can someone explain to me what is the actual "disk name" I have to pass
in to libvirt? I couldn't find any documentation about how to use this
feature.

-- 
Mathieu

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-10-01 Thread Koniszewski, Pawel
> -Original Message-
> From: Mathieu Gagné [mailto:mga...@internap.com]
> Sent: Thursday, October 1, 2015 7:24 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] live migration in Mitaka
> 
> On 2015-10-01 7:26 AM, Kashyap Chamarthy wrote:
> > On Wed, Sep 30, 2015 at 11:25:12AM +, Murray, Paul (HP Cloud) wrote:
> >>
> >>> Please respond to this post if you have an interest in this and what
> >>> you would like to see done.  Include anything you are already
> >>> getting on with so we get a clear picture.
> >>
> >> Thank you to those who replied to this thread. I have used the
> >> contents to start an etherpad page here:
> >>
> >> https://etherpad.openstack.org/p/mitaka-live-migration
> >
> > I added a couple of URLs for upstream libvirt work that allow for
> > selective block device migration, and the in-progress generic TLS
> > support work by Dan Berrange in upstream QEMU.
> >
> >> I have taken the liberty of listing those that responded to the
> >> thread and the authors of mentioned patches as interested people.
> >
> >> From the responses and looking at the specs up for review it looks
> >> like there are about five areas that could be addressed in Mitaka and
> >> several others that could come later. The first five are:
> >>
> >>
> >> - migrating instances with a mix of local disks and cinder volumes
> >
> > IIUC, this is possible with the selective block device migration work
> > merged in upstream libvirt:
> >
> > https://www.redhat.com/archives/libvir-list/2015-May/msg00955.html
> >
> 
> Can someone explain to me what is the actual "disk name" I have to pass in
> to libvirt? I couldn't find any documentation about how to use this
feature.

You have to pass device names from /dev/, e.g., if a VM has ephemeral disk
attached at /dev/vdb you need to pass in 'vdb'. Format expected by
migrate_disks is ",...".

Kind Regards,
Pawel Koniszewski


smime.p7s
Description: S/MIME cryptographic signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-10-02 Thread Rosa, Andrea (HP Cloud Services)
Hi all

 
> Not all of these are covered by specs yet and all the existing specs need
> reviews. Please look at the etherpad and see if there is anything you think is
> missing.

What about adding a way to migrate files which are not migrated at the moment, 
like console.log?
I think that could be used to  migrate the unrescue.xml file as well and then 
we could enable the migration for instances in rescue state.
If we can't configure libvirt/QEMU to migrate those files, the only idea I have 
is using RPC mechanism,  any other ideas?
I'd like to have some opinions before proposing a new spec.

Thanks
--
Andrea Rosa

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-10-02 Thread Kashyap Chamarthy
On Fri, Oct 02, 2015 at 06:20:31AM +, Koniszewski, Pawel wrote:
> > -Original Message-
> > From: Mathieu Gagné [mailto:mga...@internap.com]
> > Sent: Thursday, October 1, 2015 7:24 PM
> > To: OpenStack Development Mailing List (not for usage questions)

[. . .]

> > >> I have taken the liberty of listing those that responded to the
> > >> thread and the authors of mentioned patches as interested people.
> > >
> > >> From the responses and looking at the specs up for review it looks
> > >> like there are about five areas that could be addressed in Mitaka and
> > >> several others that could come later. The first five are:
> > >>
> > >>
> > >> - migrating instances with a mix of local disks and cinder volumes
> > >
> > > IIUC, this is possible with the selective block device migration work
> > > merged in upstream libvirt:
> > >
> > > https://www.redhat.com/archives/libvir-list/2015-May/msg00955.html
> > >
> > 
> > Can someone explain to me what is the actual "disk name" I have to pass in
> > to libvirt? I couldn't find any documentation about how to use this
> feature.
> 
> You have to pass device names from /dev/, e.g., if a VM has ephemeral disk
> attached at /dev/vdb you need to pass in 'vdb'. Format expected by
> migrate_disks is ",...".

Yeah, you can enumerate the current block devices for an instance by
doing:

$ virsh domblklist instance-0001

[If you're curious, the 'v' in the 'vda/vdb' stands for 'virtio' disks.
For non-virtio disks, you'd see a device name like 'hda'.]

-- 
/kashyap

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-10-06 Thread Paul Carlton

https://review.openstack.org/#/c/85048/ was raised to address the
migration of instances that are not running but people did not warm to
the idea of bringing a stopped/suspended instance to a paused state to
migrate it.  Is there any work in progress to get libvirt enhanced to
perform the migration of non active virtual machines?

--
Paul Carlton
Software Engineer
Cloud Services
Hewlett Packard
BUK03:T242
Longdown Avenue
Stoke Gifford
Bristol BS34 8QZ

Mobile:+44 (0)7768 994283
Email:mailto:paul.carlt...@hpe.com
Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN 
Registered No: 690597 England.
The contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error, you should delete it from 
your system immediately and advise the sender. To any recipient of this message within 
HP, unless otherwise stated you should consider this message and attachments as "HP 
CONFIDENTIAL".


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-10-06 Thread Daniel P. Berrange
On Tue, Oct 06, 2015 at 02:54:21PM +0100, Paul Carlton wrote:
> https://review.openstack.org/#/c/85048/ was raised to address the
> migration of instances that are not running but people did not warm to
> the idea of bringing a stopped/suspended instance to a paused state to
> migrate it.  Is there any work in progress to get libvirt enhanced to
> perform the migration of non active virtual machines?

Libvirt can "migrate" the configuration of an inactive VM, but does
not plan todo anything related to storage migration. OpenStack could
already solve this itself by using libvirt storage pool APIs to
copy storage volumes across, but the storage pool worked in Nova
is stalled

https://review.openstack.org/#/q/status:abandoned+project:openstack/nova+branch:master+topic:bp/use-libvirt-storage-pools,n,z
> 

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-10-06 Thread Chris Friesen

On 10/06/2015 08:11 AM, Daniel P. Berrange wrote:

On Tue, Oct 06, 2015 at 02:54:21PM +0100, Paul Carlton wrote:

https://review.openstack.org/#/c/85048/ was raised to address the
migration of instances that are not running but people did not warm to
the idea of bringing a stopped/suspended instance to a paused state to
migrate it.  Is there any work in progress to get libvirt enhanced to
perform the migration of non active virtual machines?


Libvirt can "migrate" the configuration of an inactive VM, but does
not plan todo anything related to storage migration. OpenStack could
already solve this itself by using libvirt storage pool APIs to
copy storage volumes across, but the storage pool worked in Nova
is stalled

https://review.openstack.org/#/q/status:abandoned+project:openstack/nova+branch:master+topic:bp/use-libvirt-storage-pools,n,z


What is the libvirt API to migrate a paused/suspended VM?  Currently nova uses 
dom.managedSave(), so it doesn't know what file libvirt used to save the state. 
 Can libvirt migrate that file transparently?


I had thought we might switch to virDomainSave() and then use the cold migration 
framework, but that requires passwordless ssh.  If there's a way to get libvirt 
to handle it internally via the storage pool API then that would be better.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-10-06 Thread Paul Carlton



On 06/10/15 17:30, Chris Friesen wrote:

On 10/06/2015 08:11 AM, Daniel P. Berrange wrote:

On Tue, Oct 06, 2015 at 02:54:21PM +0100, Paul Carlton wrote:

https://review.openstack.org/#/c/85048/ was raised to address the
migration of instances that are not running but people did not warm to
the idea of bringing a stopped/suspended instance to a paused state to
migrate it.  Is there any work in progress to get libvirt enhanced to
perform the migration of non active virtual machines?


Libvirt can "migrate" the configuration of an inactive VM, but does
not plan todo anything related to storage migration. OpenStack could
already solve this itself by using libvirt storage pool APIs to
copy storage volumes across, but the storage pool worked in Nova
is stalled

https://review.openstack.org/#/q/status:abandoned+project:openstack/nova+branch:master+topic:bp/use-libvirt-storage-pools,n,z 



What is the libvirt API to migrate a paused/suspended VM? Currently 
nova uses dom.managedSave(), so it doesn't know what file libvirt used 
to save the state.  Can libvirt migrate that file transparently?


I had thought we might switch to virDomainSave() and then use the cold 
migration framework, but that requires passwordless ssh.  If there's a 
way to get libvirt to handle it internally via the storage pool API 
then that would be better.


Chris

__ 


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


So my reading of this is the issue could be addressed in Mitaka by
implementing
http://specs.openstack.org/openstack/nova-specs/specs/kilo/approved/use-libvirt-storage-pools.html
and
https://review.openstack.org/#/c/126979/4/specs/kilo/approved/migrate-libvirt-volumes.rst

is there any prospect of this being progressed?

--
Paul Carlton
Software Engineer
Cloud Services
Hewlett Packard
BUK03:T242
Longdown Avenue
Stoke Gifford
Bristol BS34 8QZ

Mobile:+44 (0)7768 994283
Email:mailto:paul.carlt...@hpe.com
Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN 
Registered No: 690597 England.
The contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error, you should delete it from 
your system immediately and advise the sender. To any recipient of this message within 
HP, unless otherwise stated you should consider this message and attachments as "HP 
CONFIDENTIAL".


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-10-06 Thread Chris Friesen

On 10/06/2015 11:27 AM, Paul Carlton wrote:



On 06/10/15 17:30, Chris Friesen wrote:

On 10/06/2015 08:11 AM, Daniel P. Berrange wrote:

On Tue, Oct 06, 2015 at 02:54:21PM +0100, Paul Carlton wrote:

https://review.openstack.org/#/c/85048/ was raised to address the
migration of instances that are not running but people did not warm to
the idea of bringing a stopped/suspended instance to a paused state to
migrate it.  Is there any work in progress to get libvirt enhanced to
perform the migration of non active virtual machines?


Libvirt can "migrate" the configuration of an inactive VM, but does
not plan todo anything related to storage migration. OpenStack could
already solve this itself by using libvirt storage pool APIs to
copy storage volumes across, but the storage pool worked in Nova
is stalled

https://review.openstack.org/#/q/status:abandoned+project:openstack/nova+branch:master+topic:bp/use-libvirt-storage-pools,n,z



What is the libvirt API to migrate a paused/suspended VM? Currently nova uses
dom.managedSave(), so it doesn't know what file libvirt used to save the
state.  Can libvirt migrate that file transparently?

I had thought we might switch to virDomainSave() and then use the cold
migration framework, but that requires passwordless ssh.  If there's a way to
get libvirt to handle it internally via the storage pool API then that would
be better.




So my reading of this is the issue could be addressed in Mitaka by
implementing
http://specs.openstack.org/openstack/nova-specs/specs/kilo/approved/use-libvirt-storage-pools.html

and
https://review.openstack.org/#/c/126979/4/specs/kilo/approved/migrate-libvirt-volumes.rst


is there any prospect of this being progressed?


Paul, that would avoid the need for cold migrations to use passwordless ssh 
between nodes.  However, I think there may be additional work to handle 
migrating paused/suspended instances--still waiting for Daniel to address that bit.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-10-07 Thread Daniel P. Berrange
On Tue, Oct 06, 2015 at 06:27:12PM +0100, Paul Carlton wrote:
> 
> 
> On 06/10/15 17:30, Chris Friesen wrote:
> >On 10/06/2015 08:11 AM, Daniel P. Berrange wrote:
> >>On Tue, Oct 06, 2015 at 02:54:21PM +0100, Paul Carlton wrote:
> >>>https://review.openstack.org/#/c/85048/ was raised to address the
> >>>migration of instances that are not running but people did not warm to
> >>>the idea of bringing a stopped/suspended instance to a paused state to
> >>>migrate it.  Is there any work in progress to get libvirt enhanced to
> >>>perform the migration of non active virtual machines?
> >>
> >>Libvirt can "migrate" the configuration of an inactive VM, but does
> >>not plan todo anything related to storage migration. OpenStack could
> >>already solve this itself by using libvirt storage pool APIs to
> >>copy storage volumes across, but the storage pool worked in Nova
> >>is stalled
> >>
> >>https://review.openstack.org/#/q/status:abandoned+project:openstack/nova+branch:master+topic:bp/use-libvirt-storage-pools,n,z
> >>
> >
> >What is the libvirt API to migrate a paused/suspended VM? Currently nova
> >uses dom.managedSave(), so it doesn't know what file libvirt used to save
> >the state.  Can libvirt migrate that file transparently?
> >
> >I had thought we might switch to virDomainSave() and then use the cold
> >migration framework, but that requires passwordless ssh.  If there's a way
> >to get libvirt to handle it internally via the storage pool API then that
> >would be better.
> >
> >Chris
> >
> >__
> >
> >OpenStack Development Mailing List (not for usage questions)
> >Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> So my reading of this is the issue could be addressed in Mitaka by
> implementing
> http://specs.openstack.org/openstack/nova-specs/specs/kilo/approved/use-libvirt-storage-pools.html
> and
> https://review.openstack.org/#/c/126979/4/specs/kilo/approved/migrate-libvirt-volumes.rst
> 
> is there any prospect of this being progressed?

The guy who started that work, Solly Ross, is no longer involved in the
Nova project. The overall idea is still sound, but the patches need more
work to get them into a state suitable for serious review & potential
merge. So it is basically waiting for someone motivated to take over
the existing patches Solly did...

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-10-07 Thread Daniel P. Berrange
On Tue, Oct 06, 2015 at 11:43:52AM -0600, Chris Friesen wrote:
> On 10/06/2015 11:27 AM, Paul Carlton wrote:
> >
> >
> >On 06/10/15 17:30, Chris Friesen wrote:
> >>On 10/06/2015 08:11 AM, Daniel P. Berrange wrote:
> >>>On Tue, Oct 06, 2015 at 02:54:21PM +0100, Paul Carlton wrote:
> https://review.openstack.org/#/c/85048/ was raised to address the
> migration of instances that are not running but people did not warm to
> the idea of bringing a stopped/suspended instance to a paused state to
> migrate it.  Is there any work in progress to get libvirt enhanced to
> perform the migration of non active virtual machines?
> >>>
> >>>Libvirt can "migrate" the configuration of an inactive VM, but does
> >>>not plan todo anything related to storage migration. OpenStack could
> >>>already solve this itself by using libvirt storage pool APIs to
> >>>copy storage volumes across, but the storage pool worked in Nova
> >>>is stalled
> >>>
> >>>https://review.openstack.org/#/q/status:abandoned+project:openstack/nova+branch:master+topic:bp/use-libvirt-storage-pools,n,z
> >>>
> >>
> >>What is the libvirt API to migrate a paused/suspended VM? Currently nova 
> >>uses
> >>dom.managedSave(), so it doesn't know what file libvirt used to save the
> >>state.  Can libvirt migrate that file transparently?
> >>
> >>I had thought we might switch to virDomainSave() and then use the cold
> >>migration framework, but that requires passwordless ssh.  If there's a way 
> >>to
> >>get libvirt to handle it internally via the storage pool API then that would
> >>be better.
> 
> 
> >So my reading of this is the issue could be addressed in Mitaka by
> >implementing
> >http://specs.openstack.org/openstack/nova-specs/specs/kilo/approved/use-libvirt-storage-pools.html
> >
> >and
> >https://review.openstack.org/#/c/126979/4/specs/kilo/approved/migrate-libvirt-volumes.rst
> >
> >
> >is there any prospect of this being progressed?
> 
> Paul, that would avoid the need for cold migrations to use passwordless ssh
> between nodes.  However, I think there may be additional work to handle
> migrating paused/suspended instances--still waiting for Daniel to address
> that bit.

Migrating paused VMs should "just work" - certainly at the libvirt/QEMU
level there's no distinction between a paused & running VM wrt migration.
I know that historically Nova has blocked migration if the VM is paused
and I recall patches to remove that pointless restriction. I can't
remember if they ever merged.

For suspended instances, the scenario is really the same as with completely
offline instances. The only extra step is that you need to migrate the saved
image state file, as well as the disk images. This is trivial once you have
done the code for migrating disk images offline, since its "just one more file"
to care about.  Officially apps aren't supposed to know where libvirt keeps
the managed save files, but I think it is fine for Nova to peek behind the
scenes to get them. Alternatively I'd be happy to see an API added to libvirt
to allow the managed save files to be uploaded & downloaded via a libvirt
virStreamPtr object, in the same way we provide APIs to  upload & download
disk volumes. This would avoid the need to know explicitly about the file
location for the managed save image.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-10-07 Thread Paul Carlton

I'd be happy to take this on in Mitaka

On 07/10/15 10:14, Daniel P. Berrange wrote:

On Tue, Oct 06, 2015 at 11:43:52AM -0600, Chris Friesen wrote:

On 10/06/2015 11:27 AM, Paul Carlton wrote:


On 06/10/15 17:30, Chris Friesen wrote:

On 10/06/2015 08:11 AM, Daniel P. Berrange wrote:

On Tue, Oct 06, 2015 at 02:54:21PM +0100, Paul Carlton wrote:

https://review.openstack.org/#/c/85048/ was raised to address the
migration of instances that are not running but people did not warm to
the idea of bringing a stopped/suspended instance to a paused state to
migrate it.  Is there any work in progress to get libvirt enhanced to
perform the migration of non active virtual machines?

Libvirt can "migrate" the configuration of an inactive VM, but does
not plan todo anything related to storage migration. OpenStack could
already solve this itself by using libvirt storage pool APIs to
copy storage volumes across, but the storage pool worked in Nova
is stalled

https://review.openstack.org/#/q/status:abandoned+project:openstack/nova+branch:master+topic:bp/use-libvirt-storage-pools,n,z


What is the libvirt API to migrate a paused/suspended VM? Currently nova uses
dom.managedSave(), so it doesn't know what file libvirt used to save the
state.  Can libvirt migrate that file transparently?

I had thought we might switch to virDomainSave() and then use the cold
migration framework, but that requires passwordless ssh.  If there's a way to
get libvirt to handle it internally via the storage pool API then that would
be better.



So my reading of this is the issue could be addressed in Mitaka by
implementing
http://specs.openstack.org/openstack/nova-specs/specs/kilo/approved/use-libvirt-storage-pools.html

and
https://review.openstack.org/#/c/126979/4/specs/kilo/approved/migrate-libvirt-volumes.rst


is there any prospect of this being progressed?

Paul, that would avoid the need for cold migrations to use passwordless ssh
between nodes.  However, I think there may be additional work to handle
migrating paused/suspended instances--still waiting for Daniel to address
that bit.

Migrating paused VMs should "just work" - certainly at the libvirt/QEMU
level there's no distinction between a paused & running VM wrt migration.
I know that historically Nova has blocked migration if the VM is paused
and I recall patches to remove that pointless restriction. I can't
remember if they ever merged.

For suspended instances, the scenario is really the same as with completely
offline instances. The only extra step is that you need to migrate the saved
image state file, as well as the disk images. This is trivial once you have
done the code for migrating disk images offline, since its "just one more file"
to care about.  Officially apps aren't supposed to know where libvirt keeps
the managed save files, but I think it is fine for Nova to peek behind the
scenes to get them. Alternatively I'd be happy to see an API added to libvirt
to allow the managed save files to be uploaded & downloaded via a libvirt
virStreamPtr object, in the same way we provide APIs to  upload & download
disk volumes. This would avoid the need to know explicitly about the file
location for the managed save image.

Regards,
Daniel


--
Paul Carlton
Software Engineer
Cloud Services
Hewlett Packard
BUK03:T242
Longdown Avenue
Stoke Gifford
Bristol BS34 8QZ

Mobile:+44 (0)7768 994283
Email:mailto:paul.carlt...@hpe.com
Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN 
Registered No: 690597 England.
The contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error, you should delete it from 
your system immediately and advise the sender. To any recipient of this message within 
HP, unless otherwise stated you should consider this message and attachments as "HP 
CONFIDENTIAL".




smime.p7s
Description: S/MIME Cryptographic Signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-10-07 Thread Daniel P. Berrange
On Wed, Oct 07, 2015 at 10:26:05AM +0100, Paul Carlton wrote:
> I'd be happy to take this on in Mitaka

Ok, first step would be to re-propose the old Kilo spec against Mitaka and
we should be able to fast-approve it.

> >>>So my reading of this is the issue could be addressed in Mitaka by
> >>>implementing
> >>>http://specs.openstack.org/openstack/nova-specs/specs/kilo/approved/use-libvirt-storage-pools.html
> >>>
> >>>and
> >>>https://review.openstack.org/#/c/126979/4/specs/kilo/approved/migrate-libvirt-volumes.rst

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-10-07 Thread Chris Friesen

On 10/07/2015 03:14 AM, Daniel P. Berrange wrote:


For suspended instances, the scenario is really the same as with completely
offline instances. The only extra step is that you need to migrate the saved
image state file, as well as the disk images. This is trivial once you have
done the code for migrating disk images offline, since its "just one more file"
to care about.  Officially apps aren't supposed to know where libvirt keeps
the managed save files, but I think it is fine for Nova to peek behind the
scenes to get them. Alternatively I'd be happy to see an API added to libvirt
to allow the managed save files to be uploaded & downloaded via a libvirt
virStreamPtr object, in the same way we provide APIs to  upload & download
disk volumes. This would avoid the need to know explicitly about the file
location for the managed save image.


Assuming we were using libvirt with the storage pools API could we currently 
(with existing libvirt) migrate domains that have been suspended with 
virDomainSave()?  Or is the only current option to have nova move the file over 
using passwordless access?


I'm assuming we want to work towards using storage pools to get away from the 
need for passwordless access between hypervisors, so having libvirt support 
would be useful.


Chris


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-10-08 Thread Daniel P. Berrange
On Wed, Oct 07, 2015 at 03:54:29PM -0600, Chris Friesen wrote:
> On 10/07/2015 03:14 AM, Daniel P. Berrange wrote:
> 
> >For suspended instances, the scenario is really the same as with completely
> >offline instances. The only extra step is that you need to migrate the saved
> >image state file, as well as the disk images. This is trivial once you have
> >done the code for migrating disk images offline, since its "just one more 
> >file"
> >to care about.  Officially apps aren't supposed to know where libvirt keeps
> >the managed save files, but I think it is fine for Nova to peek behind the
> >scenes to get them. Alternatively I'd be happy to see an API added to libvirt
> >to allow the managed save files to be uploaded & downloaded via a libvirt
> >virStreamPtr object, in the same way we provide APIs to  upload & download
> >disk volumes. This would avoid the need to know explicitly about the file
> >location for the managed save image.
> 
> Assuming we were using libvirt with the storage pools API could we currently
> (with existing libvirt) migrate domains that have been suspended with
> virDomainSave()?  Or is the only current option to have nova move the file
> over using passwordless access?

If you used virDomainSave() instead of virDomainManagedSave() then you control
the file location, so you could create a directory based storage pool and
save the state into that directory, at which point you can use the storag
pool APIs to upload/download that data.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-10-08 Thread Paul Carlton



On 08/10/15 09:57, Daniel P. Berrange wrote:

On Wed, Oct 07, 2015 at 03:54:29PM -0600, Chris Friesen wrote:

On 10/07/2015 03:14 AM, Daniel P. Berrange wrote:


For suspended instances, the scenario is really the same as with completely
offline instances. The only extra step is that you need to migrate the saved
image state file, as well as the disk images. This is trivial once you have
done the code for migrating disk images offline, since its "just one more file"
to care about.  Officially apps aren't supposed to know where libvirt keeps
the managed save files, but I think it is fine for Nova to peek behind the
scenes to get them. Alternatively I'd be happy to see an API added to libvirt
to allow the managed save files to be uploaded & downloaded via a libvirt
virStreamPtr object, in the same way we provide APIs to  upload & download
disk volumes. This would avoid the need to know explicitly about the file
location for the managed save image.

Assuming we were using libvirt with the storage pools API could we currently
(with existing libvirt) migrate domains that have been suspended with
virDomainSave()?  Or is the only current option to have nova move the file
over using passwordless access?

If you used virDomainSave() instead of virDomainManagedSave() then you control
the file location, so you could create a directory based storage pool and
save the state into that directory, at which point you can use the storag
pool APIs to upload/download that data.


Regards,
Daniel

I will update https://review.openstack.org/#/c/232053
which covers use of libvirt cold migration of non active instances to
cover the use of virDomainSave() and thus allow migration of suspended
instances.

--
Paul Carlton
Software Engineer
Cloud Services
Hewlett Packard
BUK03:T242
Longdown Avenue
Stoke Gifford
Bristol BS34 8QZ

Mobile:+44 (0)7768 994283
Email:mailto:paul.carlt...@hpe.com
Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN 
Registered No: 690597 England.
The contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error, you should delete it from 
your system immediately and advise the sender. To any recipient of this message within 
HP, unless otherwise stated you should consider this message and attachments as "HP 
CONFIDENTIAL".




smime.p7s
Description: S/MIME Cryptographic Signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-18 Thread Daniel P. Berrange
On Fri, Sep 18, 2015 at 11:53:05AM +, Murray, Paul (HP Cloud) wrote:
> Hi All,
> 
> There are various efforts going on around live migration at the moment:
> fixing up CI, bug fixes, additions to cover more corner cases, proposals
> for new operations
> 
> Generally live migration could do with a little TLC (see: [1]), so I am
> going to suggest we give some of that care in the next cycle.
> 
> Please respond to this post if you have an interest in this and what you
> would like to see done. Include anything you are already getting on with
> so we get a clear picture. If there is enough interest I'll put this
> together as a proposal for a work stream. Something along the lines of
> "robustify live migration".

We merged some robustness improvements for migration during Liberty.
Specifically, with KVM we now track the progress of data transfer
and if it is not making forward progress during a set window of
time, we will abort the migration. This ensures you don't get a
migration that never ends. We also now have code which dynamically
increases the max permitted downtime during switchover, to try and
make it more likely to succeeed. We could do with getting feedback
on how well the various tunable settings work in practie for real
world deployments, to see if we need to change any defaults.

There was a proposal to nova to allow the 'pause' operation to be
invoked while migration was happening. This would turn a live
migration into a coma-migration, thereby ensuring it succeeds.
I cna't remember if this merged or not, as i can't find the review
offhand, but its important to have this ASAP IMHO, as when
evacuating VMs from a host admins need a knob to use to force
successful evacuation, even at the cost of pausing the guest
temporarily.

In libvirt upstream we now have the ability to filter what disks are
migrated during block migration. We need to leverage that new feature
to fix the long standing problems of block migration when non-local
images are attached - eg cinder volumes. We definitely want this
in Mitaka.

We should look at what we need to do to isolate the migration data
network from the main management network. Currently we live
migrate over whatever network is associated with the compute hosts
primary Hostname / IP address. This is not neccessarily the fastest
NIC on the host. We ought to be able to record an alternative
hostname / IP address against each compute host to indicate the
desired migration interface.

Libvirt/KVM have the ability to turn on compression for migration
which again improves the chances of convergance & thus success.
We would look at leveraging that.

QEMU has a crude "auto-converge" flag you can turn on, which limits
guest CPU execution time, in an attempt to slow down data dirtying
rate to again improve chance of successful convergance.

I'm working on enhancements to QEMU itself to support TLS encryption
for migration. This will enable openstack to have secure migration
datastream, without having to tunnel via libvirtd. This is useful
as tunneling via libvirtd doesn't work with block migration. It will
also be much faster than tunnelling. This probably might be merged
in QEMU before Mitaka cycle ends, but more likely it is Nxxx cycle

There is also work on post-copy migration in QEMU. Normally with
live migration, the guest doesn't start executing on the target
host until migration has transferred all data. There are many
workloads where that doesn't work, as the guest is dirtying data
too quickly, With post-copy you can start runing the guest on the
target at any time, and when it faults on a missing page that will
be pulled from the source host. This is slightly more fragile as
you risk loosing the guest entirely if the source host dies before
migration finally completes. It does guarantee that migration will
succeed no matter what workload is in the guest. This is probably
N cycle material.

Testing. Testing. Testing.

Lots more I can't think of right now

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-18 Thread John Garbutt
On 18 September 2015 at 16:23, Daniel P. Berrange  wrote:
> On Fri, Sep 18, 2015 at 11:53:05AM +, Murray, Paul (HP Cloud) wrote:
>> Hi All,
>>
>> There are various efforts going on around live migration at the moment:
>> fixing up CI, bug fixes, additions to cover more corner cases, proposals
>> for new operations
>>
>> Generally live migration could do with a little TLC (see: [1]), so I am
>> going to suggest we give some of that care in the next cycle.
>>
>> Please respond to this post if you have an interest in this and what you
>> would like to see done. Include anything you are already getting on with
>> so we get a clear picture. If there is enough interest I'll put this
>> together as a proposal for a work stream. Something along the lines of
>> "robustify live migration".
>
>
>
> Testing. Testing. Testing.

+1 for Testing

The "CI for reliable live-migration" thread was covering some of the
details on the multi-host CI options.

Thanks,
johnthetubaugy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-18 Thread Timofei Durakov
Hi,
some work items:
http://lists.openstack.org/pipermail/openstack-dev/2015-September/073965.html
- ci coverage for live-migration
https://blueprints.launchpad.net/nova/+spec/split-different-live-migration-types
- compute + drivers code cleanup

On Fri, Sep 18, 2015 at 6:47 PM, John Garbutt  wrote:

> On 18 September 2015 at 16:23, Daniel P. Berrange 
> wrote:
> > On Fri, Sep 18, 2015 at 11:53:05AM +, Murray, Paul (HP Cloud) wrote:
> >> Hi All,
> >>
> >> There are various efforts going on around live migration at the moment:
> >> fixing up CI, bug fixes, additions to cover more corner cases, proposals
> >> for new operations
> >>
> >> Generally live migration could do with a little TLC (see: [1]), so I am
> >> going to suggest we give some of that care in the next cycle.
> >>
> >> Please respond to this post if you have an interest in this and what you
> >> would like to see done. Include anything you are already getting on with
> >> so we get a clear picture. If there is enough interest I'll put this
> >> together as a proposal for a work stream. Something along the lines of
> >> "robustify live migration".
> >
> >
> >
> > Testing. Testing. Testing.
>
> +1 for Testing
>
> The "CI for reliable live-migration" thread was covering some of the
> details on the multi-host CI options.
>
> Thanks,
> johnthetubaugy
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-21 Thread Daniel P. Berrange
On Fri, Sep 18, 2015 at 05:47:31PM +, Carlton, Paul (Cloud Services) wrote:
> However the most significant impediment we encountered was customer
> complaints about performance of instances during migration.  We did a little
> bit of work to identify the cause of this and concluded that the main issues
> was disk i/o contention.  I wonder if this is something you or others have
> encountered?  I'd be interested in any idea for managing the rate of the
> migration processing to prevent it from adversely impacting the customer
> application performance.  I appreciate that if we throttle the migration
> processing it will take longer and may not be able to keep up with the rate
> of disk/memory change in the instance.

I would not expect live migration to have an impact on disk I/O, unless
your storage is network based and using the same network as the migration
data. While migration is taking place you'll see a small impact on the
guest compute performance, due to page table dirty bitmap tracking, but
that shouldn't appear directly as disk I/O problem. There is no throttling
of guest I/O at all during migration.

> Could you point me at somewhere I can get details of the tuneable setting
> relating to cutover down time please?  I'm assuming that at these are
> libvirt/qemu settings?  I'd like to play with them in our test environment
> to see if we can simulate busy instances and determine what works.  I'd also
> be happy to do some work to expose these in nova so the cloud operator can
> tweak if necessary?

It is already exposed as 'live_migration_downtime' along with
live_migration_downtime_steps, and live_migration_downtime_delay.
Again, it shouldn't have any impact on guest performance while
live migration is taking place. It only comes into effect when
checking whether the guest is ready to switch to the new host.

> I understand that you have added some functionality to the nova compute
> manager to collect data on migration progress and emit this to the log file.
> I'd like to propose that we extend this to emit notification message
> containing progress information so a cloud operator's orchestration can
> consume these events and use them to monitor progress of individual
> migrations.  This information could be used to generate alerts or tickets so
> that support staff can intervene.  The smarts in qemu to help it make
> progress are very welcome and necessary but in my experience the cloud
> operator needs to be able to manage these and if it is necessary to slow
> down or even pause a customer's instance to complete the migration the cloud
> operator may need to gain customer consent before proceeding.

We already update the Nova  instance object's 'progress' value with the
info on the migration progress. IIRC, this is visible via 'nova show '
or something like that.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-21 Thread Koniszewski, Pawel
> -Original Message-
> From: Daniel P. Berrange [mailto:berra...@redhat.com]
> Sent: Friday, September 18, 2015 5:24 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] live migration in Mitaka
>
> On Fri, Sep 18, 2015 at 11:53:05AM +, Murray, Paul (HP Cloud) wrote:
> > Hi All,
> >
> > There are various efforts going on around live migration at the moment:
> > fixing up CI, bug fixes, additions to cover more corner cases,
> > proposals for new operations
> >
> > Generally live migration could do with a little TLC (see: [1]), so I
> > am going to suggest we give some of that care in the next cycle.
> >
> > Please respond to this post if you have an interest in this and what
> > you would like to see done. Include anything you are already getting
> > on with so we get a clear picture. If there is enough interest I'll
> > put this together as a proposal for a work stream. Something along the
> > lines of "robustify live migration".
>
> We merged some robustness improvements for migration during Liberty.
> Specifically, with KVM we now track the progress of data transfer and if
it
> is
> not making forward progress during a set window of time, we will abort the
> migration. This ensures you don't get a migration that never ends. We also
> now have code which dynamically increases the max permitted downtime
> during switchover, to try and make it more likely to succeeed. We could do
> with getting feedback on how well the various tunable settings work in
> practie for real world deployments, to see if we need to change any
> defaults.
>
> There was a proposal to nova to allow the 'pause' operation to be invoked
> while migration was happening. This would turn a live migration into a
coma-
> migration, thereby ensuring it succeeds.
> I cna't remember if this merged or not, as i can't find the review
offhand,
> but
> its important to have this ASAP IMHO, as when evacuating VMs from a host
> admins need a knob to use to force successful evacuation, even at the cost
> of pausing the guest temporarily.

There are two different proposals - cancel on-going live migration and pause
VM during live migration. Both are very important. Right now there is no way

to interact with on-going live migration through Nova.

Specification for 'cancel on-going live migration' is up for review [1].
'Pause VM during live migration' (it might be something like
force-live-migration) depends on this change so I'm waiting with
specification
until 'cancel' spec is merged. I'll try to prepare it before summit so both
specs can be discussed in Tokyo.

> In libvirt upstream we now have the ability to filter what disks are
> migrated
> during block migration. We need to leverage that new feature to fix the
long
> standing problems of block migration when non-local images are attached -
> eg cinder volumes. We definitely want this in Mitaka.
>
> We should look at what we need to do to isolate the migration data network
> from the main management network. Currently we live migrate over
> whatever network is associated with the compute hosts primary Hostname /
> IP address. This is not neccessarily the fastest NIC on the host. We ought
> to
> be able to record an alternative hostname / IP address against each
compute
> host to indicate the desired migration interface.
>
> Libvirt/KVM have the ability to turn on compression for migration which
> again
> improves the chances of convergance & thus success.
> We would look at leveraging that.

It is merged in QEMU (version 2.4), however, it isn't merged in Libvirt
yet[2]
(1-9 patches from ShaoHe Feng). The simplest solution shouldn't require any
work in Nova, it's just another live migration flag. To extend this we will
probably need to add API call to nova, e.g, to change compression
ratio or to change number of compression threads.

However, this work is for O cycle (or even later) IMHO. The latest used QEMU
is 2.3 (in Ubuntu 15.10). Adoption of QEMU 2.4 and Libvirt with compression
will take some time, so we don't need to focus on it right now.

> QEMU has a crude "auto-converge" flag you can turn on, which limits guest
> CPU execution time, in an attempt to slow down data dirtying rate to again
> improve chance of successful convergance.
>
> I'm working on enhancements to QEMU itself to support TLS encryption for
> migration. This will enable openstack to have secure migration datastream,
> without having to tunnel via libvirtd. This is useful as tunneling via
> libvirtd
> doesn't work with block migration. It will also be much faster than
> tunnelling.
>

Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-21 Thread Paul Carlton

Daniel

Thanks.

We will need to do some work to recreate the instance performance  and 
disk i/o issues and investigate further.


My original message did not go out to the mailing list due to an 
subscription issue, so including it here



I'm just starting work on Nova upstream having been focused on live
migration orchestration in our large Public Cloud environment.  We were
trying to use live migration to do rolling reboots of compute nodes in 
order to apply software patches that required node or virtual machine 
restarts to apply.  For this sort of activity to work on a large scale 
the orchestration needs to be highly automated and integrate with the 
operations monitoring and issue tracking systems.  It also needs the 
mechanism used to move instances to be highly robust.


However the most significant impediment we encountered was customer
complaints about performance of instances during migration.  We did a 
little bit of work to identify the cause of this and concluded that the 
main issues was disk i/o contention.  I wonder if this is something you 
or others have encountered?  I'd be interested in any idea for managing 
the rate of the migration processing to prevent it from adversely 
impacting the customer application performance.  I appreciate that if we 
throttle the migration processing it will take longer and may not be 
able to keep up with the rate of disk/memory change in the instance.


Could you point me at somewhere I can get details of the tuneable 
setting relating to cutover down time please?  I'm assuming that at 
these are libvirt/qemu settings?  I'd like to play with them in our test 
environment to see if we can simulate busy instances and determine what 
works.  I'd also be happy to do some work to expose these in nova so the 
cloud operator can tweak if necessary?


I understand that you have added some functionality to the nova compute
manager to collect data on migration progress and emit this to the log file.

I'd like to propose that we extend this to emit notification message
containing progress information so a cloud operator's orchestration can
consume these events and use them to monitor progress of individual
migrations.  This information could be used to generate alerts or 
tickets so that support staff can intervene.  The smarts in qemu to help 
it make progress are very welcome and necessary but in my experience the 
cloud operator needs to be able to manage these and if it is necessary 
to slow down or even pause a customer's instance to complete the 
migration the cloudoperator may need to gain customer consent before 
proceeding.


I am also considering submitting a proposal to build on the current spec 
for monitoring and cancelling migrations to make the migration status 
information available to users (based on policy setting) and include an 
estimated time to complete information in the response.  I appreciate 
that this would only be an 'estimate' but it may give the user some idea 
of how long they will need to wait until they can perform operations on 
their instance that are not currently permitted during migration.  To 
cater for the scenario where a customer urgently needs to perform an 
inhibited operation (like attach or detach a volume) then I would 
propose that we allow for a user to cancel the migration of their own 
instances.  This would be enabled for authorized users based on granting 
them a specific role.


More thoughts Monday!




-Original Message-
From: Daniel P. Berrange [mailto:berra...@redhat.com]
Sent: 21 September 2015 09:56
To: Carlton, Paul (Cloud Services)
Cc: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] live migration in Mitaka

On Fri, Sep 18, 2015 at 05:47:31PM +, Carlton, Paul (Cloud Services) 
wrote:

However the most significant impediment we encountered was customer
complaints about performance of instances during migration.  We did a
little bit of work to identify the cause of this and concluded that
the main issues was disk i/o contention.  I wonder if this is
something you or others have encountered?  I'd be interested in any
idea for managing the rate of the migration processing to prevent it
from adversely impacting the customer application performance.  I
appreciate that if we throttle the migration processing it will take
longer and may not be able to keep up with the rate of disk/memory change in
the instance.


I would not expect live migration to have an impact on disk I/O, unless 
your storage is network based and using the same network as the 
migration data. While migration is taking place you'll see a small 
impact on the guest compute performance, due to page table dirty bitmap 
tracking, but that shouldn't appear directly as disk I/O problem. There 
is no throttling of guest I/O at all during migration.



Could you point me at somewhere I can get details of the tuneable

Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-22 Thread Rosa, Andrea (HP Cloud Services)
Hi all,

> Please respond to this post if you have an interest in this and what you 
> would like to see done. Include anything you are already getting on with so 
> we get a clear picture. 

I have put a new spec about "allow more instance actions during the live 
migration" [0].
Please note that this is a follow up of a specs proposed for Liberty [1]  I put 
a new patch as I know that the author of the original spec is not going to work 
on it anymore.

Regards
--
Andrea Rosa

[0] https://review.openstack.org/226199
[1] https://review.openstack.org/179346

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-22 Thread Chris Friesen

On 09/21/2015 02:56 AM, Daniel P. Berrange wrote:

On Fri, Sep 18, 2015 at 05:47:31PM +, Carlton, Paul (Cloud Services) wrote:

However the most significant impediment we encountered was customer
complaints about performance of instances during migration.  We did a little
bit of work to identify the cause of this and concluded that the main issues
was disk i/o contention.  I wonder if this is something you or others have
encountered?  I'd be interested in any idea for managing the rate of the
migration processing to prevent it from adversely impacting the customer
application performance.  I appreciate that if we throttle the migration
processing it will take longer and may not be able to keep up with the rate
of disk/memory change in the instance.


I would not expect live migration to have an impact on disk I/O, unless
your storage is network based and using the same network as the migration
data. While migration is taking place you'll see a small impact on the
guest compute performance, due to page table dirty bitmap tracking, but
that shouldn't appear directly as disk I/O problem. There is no throttling
of guest I/O at all during migration.


Technically if you're doing a lot of disk I/O couldn't you end up with a case 
where you're thrashing the page cache enough to interfere with migration?  So 
it's actually memory change that is the problem, but it might not be memory that 
the application is modifying directly but rather memory allocated by the kernel.



Could you point me at somewhere I can get details of the tuneable setting
relating to cutover down time please?  I'm assuming that at these are
libvirt/qemu settings?  I'd like to play with them in our test environment
to see if we can simulate busy instances and determine what works.  I'd also
be happy to do some work to expose these in nova so the cloud operator can
tweak if necessary?


It is already exposed as 'live_migration_downtime' along with
live_migration_downtime_steps, and live_migration_downtime_delay.
Again, it shouldn't have any impact on guest performance while
live migration is taking place. It only comes into effect when
checking whether the guest is ready to switch to the new host.


Has anyone given thought to exposing some of these new parameters to the 
end-user?  I could see a scenario where an image might want to specify the 
acceptable downtime over migration.  (On the other hand that might be tricky 
from the operator perspective.)


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-22 Thread Daniel P. Berrange
On Tue, Sep 22, 2015 at 09:05:11AM -0600, Chris Friesen wrote:
> On 09/21/2015 02:56 AM, Daniel P. Berrange wrote:
> >On Fri, Sep 18, 2015 at 05:47:31PM +, Carlton, Paul (Cloud Services) 
> >wrote:
> >>However the most significant impediment we encountered was customer
> >>complaints about performance of instances during migration.  We did a little
> >>bit of work to identify the cause of this and concluded that the main issues
> >>was disk i/o contention.  I wonder if this is something you or others have
> >>encountered?  I'd be interested in any idea for managing the rate of the
> >>migration processing to prevent it from adversely impacting the customer
> >>application performance.  I appreciate that if we throttle the migration
> >>processing it will take longer and may not be able to keep up with the rate
> >>of disk/memory change in the instance.
> >
> >I would not expect live migration to have an impact on disk I/O, unless
> >your storage is network based and using the same network as the migration
> >data. While migration is taking place you'll see a small impact on the
> >guest compute performance, due to page table dirty bitmap tracking, but
> >that shouldn't appear directly as disk I/O problem. There is no throttling
> >of guest I/O at all during migration.
> 
> Technically if you're doing a lot of disk I/O couldn't you end up with a
> case where you're thrashing the page cache enough to interfere with
> migration?  So it's actually memory change that is the problem, but it might
> not be memory that the application is modifying directly but rather memory
> allocated by the kernel.
> 
> >>Could you point me at somewhere I can get details of the tuneable setting
> >>relating to cutover down time please?  I'm assuming that at these are
> >>libvirt/qemu settings?  I'd like to play with them in our test environment
> >>to see if we can simulate busy instances and determine what works.  I'd also
> >>be happy to do some work to expose these in nova so the cloud operator can
> >>tweak if necessary?
> >
> >It is already exposed as 'live_migration_downtime' along with
> >live_migration_downtime_steps, and live_migration_downtime_delay.
> >Again, it shouldn't have any impact on guest performance while
> >live migration is taking place. It only comes into effect when
> >checking whether the guest is ready to switch to the new host.
> 
> Has anyone given thought to exposing some of these new parameters to the
> end-user?  I could see a scenario where an image might want to specify the
> acceptable downtime over migration.  (On the other hand that might be tricky
> from the operator perspective.)

I'm of the opinion that we should really try to avoid exposing *any*
migration tunables to the tenant user. All the tunables are pretty
hypervisor specific and low level and not very friendly to expose
to tenants. Instead our focus should be on ensuring that it will
always "just work" from the tenants POV.

When QEMU gets 'post copy' migration working, we'll want to adopt
that asap, as that will give us the means to guarantee that migration
will always complete with very little need for tuning.

At most I could see the users being able to given some high level
indication as to whether their images tolerate some level of
latency, so Nova can decide what migration characteristic is
acceptable.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-22 Thread Chris Friesen
Apologies for the indirect quote, some of the earlier posts got deleted before I 
noticed the thread.


On 09/21/2015 03:43 AM, Koniszewski, Pawel wrote:

-Original Message-
From: Daniel P. Berrange [mailto:berra...@redhat.com]



There was a proposal to nova to allow the 'pause' operation to be invoked
while migration was happening. This would turn a live migration into a
coma-migration, thereby ensuring it succeeds. I cna't remember if this
merged or not, as i can't find the review offhand, but its important to
have this ASAP IMHO, as when evacuating VMs from a host admins need a knob
to use to force successful evacuation, even at the cost of pausing the
guest temporarily.


It's not strictly "live" migration, but for the same reason of pushing VMs off a 
host for maintenance it would be nice to have some way of migrating suspended 
instances.  (As brought up in 
http://lists.openstack.org/pipermail/openstack-dev/2015-September/075042.html)



In libvirt upstream we now have the ability to filter what disks are
migrated during block migration. We need to leverage that new feature to
fix the long standing problems of block migration when non-local images are
attached - eg cinder volumes. We definitely want this in Mitaka.


Agreed, this would be a very useful addition.


We should look at what we need to do to isolate the migration data network
from the main management network. Currently we live migrate over whatever
network is associated with the compute hosts primary Hostname / IP address.
This is not neccessarily the fastest NIC on the host. We ought to be able
to record an alternative hostname / IP address against each compute host to
indicate the desired migration interface.


Yes, this would be good to have upstream.  We've added this sort of thing 
locally (though with a hardcoded naming scheme) to allow migration over 10G 
links with management over 1G links.



There is also work on post-copy migration in QEMU. Normally with live
migration, the guest doesn't start executing on the target host until
migration has transferred all data. There are many workloads where that
doesn't work, as the guest is dirtying data too quickly, With post-copy you
can start running the guest on the target at any time, and when it faults
on a missing page that will be pulled from the source host. This is
slightly more fragile as you risk loosing the guest entirely if the source
host dies before migration finally completes. It does guarantee that
migration will succeed no matter what workload is in the guest. This is
probably N cycle material.


It seems to me that the ideal solution would be to start doing pre-copy 
migration, then if that doesn't converge with the specified downtime value then 
maybe have the option to just cut over to the destination and do a post-copy 
migration of the remaining data.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-22 Thread Daniel P. Berrange
On Tue, Sep 22, 2015 at 09:29:46AM -0600, Chris Friesen wrote:
> >>There is also work on post-copy migration in QEMU. Normally with live
> >>migration, the guest doesn't start executing on the target host until
> >>migration has transferred all data. There are many workloads where that
> >>doesn't work, as the guest is dirtying data too quickly, With post-copy you
> >>can start running the guest on the target at any time, and when it faults
> >>on a missing page that will be pulled from the source host. This is
> >>slightly more fragile as you risk loosing the guest entirely if the source
> >>host dies before migration finally completes. It does guarantee that
> >>migration will succeed no matter what workload is in the guest. This is
> >>probably N cycle material.
> 
> It seems to me that the ideal solution would be to start doing pre-copy
> migration, then if that doesn't converge with the specified downtime value
> then maybe have the option to just cut over to the destination and do a
> post-copy migration of the remaining data.

Yes, that is precisely what the QEMU developers working on this
featue suggest we should do. The lazy page faulting on the target
host has a performance hit on the guest, so you definitely need
to give a little time for pre-copy to start off with, and then
switch to post-copy once some benchmark is reached, or if progress
info shows the transfer is not making progress.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-23 Thread Paul Carlton


On 22/09/15 16:20, Daniel P. Berrange wrote:

On Tue, Sep 22, 2015 at 09:05:11AM -0600, Chris Friesen wrote:

On 09/21/2015 02:56 AM, Daniel P. Berrange wrote:

On Fri, Sep 18, 2015 at 05:47:31PM +, Carlton, Paul (Cloud Services) wrote:

However the most significant impediment we encountered was customer
complaints about performance of instances during migration.  We did a little
bit of work to identify the cause of this and concluded that the main issues
was disk i/o contention.  I wonder if this is something you or others have
encountered?  I'd be interested in any idea for managing the rate of the
migration processing to prevent it from adversely impacting the customer
application performance.  I appreciate that if we throttle the migration
processing it will take longer and may not be able to keep up with the rate
of disk/memory change in the instance.

I would not expect live migration to have an impact on disk I/O, unless
your storage is network based and using the same network as the migration
data. While migration is taking place you'll see a small impact on the
guest compute performance, due to page table dirty bitmap tracking, but
that shouldn't appear directly as disk I/O problem. There is no throttling
of guest I/O at all during migration.

Technically if you're doing a lot of disk I/O couldn't you end up with a
case where you're thrashing the page cache enough to interfere with
migration?  So it's actually memory change that is the problem, but it might
not be memory that the application is modifying directly but rather memory
allocated by the kernel.


Could you point me at somewhere I can get details of the tuneable setting
relating to cutover down time please?  I'm assuming that at these are
libvirt/qemu settings?  I'd like to play with them in our test environment
to see if we can simulate busy instances and determine what works.  I'd also
be happy to do some work to expose these in nova so the cloud operator can
tweak if necessary?

It is already exposed as 'live_migration_downtime' along with
live_migration_downtime_steps, and live_migration_downtime_delay.
Again, it shouldn't have any impact on guest performance while
live migration is taking place. It only comes into effect when
checking whether the guest is ready to switch to the new host.

Has anyone given thought to exposing some of these new parameters to the
end-user?  I could see a scenario where an image might want to specify the
acceptable downtime over migration.  (On the other hand that might be tricky
from the operator perspective.)

I'm of the opinion that we should really try to avoid exposing *any*
migration tunables to the tenant user. All the tunables are pretty
hypervisor specific and low level and not very friendly to expose
to tenants. Instead our focus should be on ensuring that it will
always "just work" from the tenants POV.

When QEMU gets 'post copy' migration working, we'll want to adopt
that asap, as that will give us the means to guarantee that migration
will always complete with very little need for tuning.

At most I could see the users being able to given some high level
indication as to whether their images tolerate some level of
latency, so Nova can decide what migration characteristic is
acceptable.

Regards,
Daniel

Actually I was not envisaging the controls on migration tuning being
made available to the user.  I was thinking we should provide the cloud
administrator with the facility to increase the live_migration_downtime
setting to increase the chance of a migration being able to complete.
I would expect that this would be used in consultation with the instance
owner.  It seems to me, it might be a viable alternative to pausing the
instance to allow the migration to complete.

--
Paul Carlton
Software Engineer
Cloud Services
Hewlett Packard
BUK03:T242
Longdown Avenue
Stoke Gifford
Bristol BS34 8QZ

Mobile:+44 (0)7768 994283
Email:mailto:paul.carlt...@hpe.com
Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN 
Registered No: 690597 England.
The contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error, you should delete it from 
your system immediately and advise the sender. To any recipient of this message within 
HP, unless otherwise stated you should consider this message and attachments as "HP 
CONFIDENTIAL".




smime.p7s
Description: S/MIME Cryptographic Signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-23 Thread Paul Carlton



On 22/09/15 16:44, Daniel P. Berrange wrote:

On Tue, Sep 22, 2015 at 09:29:46AM -0600, Chris Friesen wrote:

There is also work on post-copy migration in QEMU. Normally with live
migration, the guest doesn't start executing on the target host until
migration has transferred all data. There are many workloads where that
doesn't work, as the guest is dirtying data too quickly, With post-copy you
can start running the guest on the target at any time, and when it faults
on a missing page that will be pulled from the source host. This is
slightly more fragile as you risk loosing the guest entirely if the source
host dies before migration finally completes. It does guarantee that
migration will succeed no matter what workload is in the guest. This is
probably N cycle material.

It seems to me that the ideal solution would be to start doing pre-copy
migration, then if that doesn't converge with the specified downtime value
then maybe have the option to just cut over to the destination and do a
post-copy migration of the remaining data.

Yes, that is precisely what the QEMU developers working on this
featue suggest we should do. The lazy page faulting on the target
host has a performance hit on the guest, so you definitely need
to give a little time for pre-copy to start off with, and then
switch to post-copy once some benchmark is reached, or if progress
info shows the transfer is not making progress.

Regards,
Daniel

I'd be a bit concerned about automatically switching to the post copy
mode.  As Daniel commented perviously, if something goes wrong on the
source node the customer's instance could be lost.  Many cloud operators
will want to control the use of this mode.  As per my previous message
this could be something that could be set on or off by default but
provide a PUT operation on os-migration to update setting on for a
specific migration

--
Paul Carlton
Software Engineer
Cloud Services
Hewlett Packard
BUK03:T242
Longdown Avenue
Stoke Gifford
Bristol BS34 8QZ

Mobile:+44 (0)7768 994283
Email:mailto:paul.carlt...@hpe.com
Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN 
Registered No: 690597 England.
The contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error, you should delete it from 
your system immediately and advise the sender. To any recipient of this message within 
HP, unless otherwise stated you should consider this message and attachments as "HP 
CONFIDENTIAL".




smime.p7s
Description: S/MIME Cryptographic Signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-23 Thread Daniel P. Berrange
On Wed, Sep 23, 2015 at 01:48:17PM +0100, Paul Carlton wrote:
> 
> 
> On 22/09/15 16:44, Daniel P. Berrange wrote:
> >On Tue, Sep 22, 2015 at 09:29:46AM -0600, Chris Friesen wrote:
> There is also work on post-copy migration in QEMU. Normally with live
> migration, the guest doesn't start executing on the target host until
> migration has transferred all data. There are many workloads where that
> doesn't work, as the guest is dirtying data too quickly, With post-copy 
> you
> can start running the guest on the target at any time, and when it faults
> on a missing page that will be pulled from the source host. This is
> slightly more fragile as you risk loosing the guest entirely if the source
> host dies before migration finally completes. It does guarantee that
> migration will succeed no matter what workload is in the guest. This is
> probably N cycle material.
> >>It seems to me that the ideal solution would be to start doing pre-copy
> >>migration, then if that doesn't converge with the specified downtime value
> >>then maybe have the option to just cut over to the destination and do a
> >>post-copy migration of the remaining data.
> >Yes, that is precisely what the QEMU developers working on this
> >featue suggest we should do. The lazy page faulting on the target
> >host has a performance hit on the guest, so you definitely need
> >to give a little time for pre-copy to start off with, and then
> >switch to post-copy once some benchmark is reached, or if progress
> >info shows the transfer is not making progress.
> >
> >Regards,
> >Daniel
> I'd be a bit concerned about automatically switching to the post copy
> mode.  As Daniel commented perviously, if something goes wrong on the
> source node the customer's instance could be lost.  Many cloud operators
> will want to control the use of this mode.  As per my previous message
> this could be something that could be set on or off by default but
> provide a PUT operation on os-migration to update setting on for a
> specific migration

NB, if you are concerned about the source host going down while
migration is still taking place, you will loose the VM even with
pre-copy mode too, since the VM will of course still be running
on the source.

The new failure scenario is essentially about the network
connection between the source & host guest - if the network
layer fails while post-copy is running, then you loose the
VM.

In some sense post-copy will reduce the window of failure,
because it should ensure that the VM migration completes
in a faster & finite amount of time. I think this is
probably particularly important for host evacuation so
the admin can guarantee to get all the VMs off a host in
a reasonable amount of time.

As such I don't think you need expose post-copy as a concept in the
API, but I could see a nova.conf value to say whether use of post-copy
was acceptable, so those who want to have stronger resilience against
network failure can turn off post-copy.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-23 Thread Paul Carlton



On 23/09/15 14:11, Daniel P. Berrange wrote:

On Wed, Sep 23, 2015 at 01:48:17PM +0100, Paul Carlton wrote:


On 22/09/15 16:44, Daniel P. Berrange wrote:

On Tue, Sep 22, 2015 at 09:29:46AM -0600, Chris Friesen wrote:

There is also work on post-copy migration in QEMU. Normally with live
migration, the guest doesn't start executing on the target host until
migration has transferred all data. There are many workloads where that
doesn't work, as the guest is dirtying data too quickly, With post-copy you
can start running the guest on the target at any time, and when it faults
on a missing page that will be pulled from the source host. This is
slightly more fragile as you risk loosing the guest entirely if the source
host dies before migration finally completes. It does guarantee that
migration will succeed no matter what workload is in the guest. This is
probably N cycle material.

It seems to me that the ideal solution would be to start doing pre-copy
migration, then if that doesn't converge with the specified downtime value
then maybe have the option to just cut over to the destination and do a
post-copy migration of the remaining data.

Yes, that is precisely what the QEMU developers working on this
featue suggest we should do. The lazy page faulting on the target
host has a performance hit on the guest, so you definitely need
to give a little time for pre-copy to start off with, and then
switch to post-copy once some benchmark is reached, or if progress
info shows the transfer is not making progress.

Regards,
Daniel

I'd be a bit concerned about automatically switching to the post copy
mode.  As Daniel commented perviously, if something goes wrong on the
source node the customer's instance could be lost.  Many cloud operators
will want to control the use of this mode.  As per my previous message
this could be something that could be set on or off by default but
provide a PUT operation on os-migration to update setting on for a
specific migration

NB, if you are concerned about the source host going down while
migration is still taking place, you will loose the VM even with
pre-copy mode too, since the VM will of course still be running
on the source.

The new failure scenario is essentially about the network
connection between the source & host guest - if the network
layer fails while post-copy is running, then you loose the
VM.

In some sense post-copy will reduce the window of failure,
because it should ensure that the VM migration completes
in a faster & finite amount of time. I think this is
probably particularly important for host evacuation so
the admin can guarantee to get all the VMs off a host in
a reasonable amount of time.

As such I don't think you need expose post-copy as a concept in the
API, but I could see a nova.conf value to say whether use of post-copy
was acceptable, so those who want to have stronger resilience against
network failure can turn off post-copy.

Regards,
Daniel


If the source node fails during a pre-copy migration then when that node
is restored the instance is ok again (usually).  With the post-copy
approach the risk is that the instance will be corrupted which many
cloud operators would consider to be an unacceptable risk.

However, let's start by exposing it as a nova.conf setting and see how
that goes.

--
Paul Carlton
Software Engineer
Cloud Services
Hewlett Packard
BUK03:T242
Longdown Avenue
Stoke Gifford
Bristol BS34 8QZ

Mobile:+44 (0)7768 994283
Email:mailto:paul.carlt...@hpe.com
Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN 
Registered No: 690597 England.
The contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error, you should delete it from 
your system immediately and advise the sender. To any recipient of this message within 
HP, unless otherwise stated you should consider this message and attachments as "HP 
CONFIDENTIAL".




smime.p7s
Description: S/MIME Cryptographic Signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-30 Thread Murray, Paul (HP Cloud)

> Please respond to this post if you have an interest in this and what you 
> would like to see done. 
> Include anything you are already getting on with so we get a clear picture. 

Thank you to those who replied to this thread. I have used the contents to 
start an etherpad page here:

https://etherpad.openstack.org/p/mitaka-live-migration 

I have taken the liberty of listing those that responded to the thread and the 
authors of mentioned patches as interested people.

>From the responses and looking at the specs up for review it looks like there 
>are about five areas that could be addressed in Mitaka and several others that 
>could come later. The first five are:

- migrating instances with a mix of local disks and cinder volumes
- pause instance during migration
- cancel migration
- migrate suspended instances
- improve CI coverage

Not all of these are covered by specs yet and all the existing specs need 
reviews. Please look at the etherpad and see if there is anything you think is 
missing.

Paul



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-30 Thread Koniszewski, Pawel
> -Original Message-
> From: Murray, Paul (HP Cloud) [mailto:pmur...@hpe.com]
> Sent: Wednesday, September 30, 2015 1:25 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] live migration in Mitaka
>
>
> > Please respond to this post if you have an interest in this and what you
> would like to see done.
> > Include anything you are already getting on with so we get a clear 
> > picture.
>
> Thank you to those who replied to this thread. I have used the contents to
> start an etherpad page here:
>
> https://etherpad.openstack.org/p/mitaka-live-migration
>
> I have taken the liberty of listing those that responded to the thread and

> the
> authors of mentioned patches as interested people.
>
> From the responses and looking at the specs up for review it looks like 
> there
> are about five areas that could be addressed in Mitaka and several others
> that could come later. The first five are:
>
> - migrating instances with a mix of local disks and cinder volumes

Preliminary patch is up for review [1], we need to switch it to libvirt's v3

migrate API.

> - pause instance during migration
> - cancel migration
> - migrate suspended instances

I'm not sure I understand this correctly. When user calls 'nova suspend' I 
thought that it actually "hibernates" VM and saves memory state to disk 
[2][3]. In such case there is nothing to "live" migrate - shouldn't 
cold-migration/resize solve this problem?

> - improve CI coverage
>
> Not all of these are covered by specs yet and all the existing specs need
> reviews. Please look at the etherpad and see if there is anything you
think 
> is
> missing.

Paul, thanks for taking care of this. I've added missing spec to force live 
migration to finish [4].

Hope we manage to discuss all these items in Tokyo.

[1] https://review.openstack.org/#/c/227278/
[2] 
https://github.com/openstack/nova/blob/e31d1e11bd42bcfbd7b2c3d732d184a367b75
d6f/nova/virt/libvirt/driver.py#L2311
[3] 
https://github.com/openstack/nova/blob/e31d1e11bd42bcfbd7b2c3d732d184a367b75
d6f/nova/virt/libvirt/guest.py#L308
[4] https://review.openstack.org/#/c/229040/

Kind Regards,
Pawel Koniszewski


smime.p7s
Description: S/MIME cryptographic signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-09-30 Thread Chris Friesen

On 09/30/2015 06:03 AM, Koniszewski, Pawel wrote:

From: Murray, Paul (HP Cloud) [mailto:pmur...@hpe.com]



- migrate suspended instances


I'm not sure I understand this correctly. When user calls 'nova suspend' I
thought that it actually "hibernates" VM and saves memory state to disk
[2][3]. In such case there is nothing to "live" migrate - shouldn't
cold-migration/resize solve this problem?


A "suspend" currently uses a libvirt API (dom.managedSave()) that results in the 
use of a libvirt-managed hibernation file. (So nova doesn't know the filename.) 
 I've only looked at it briefly, but it seems like it should be possible to 
switch to virDomainSave(), which would let nova specify the file to save, and 
therefore allow cold migration of the suspended instance.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] live migration in Mitaka

2015-10-01 Thread Kashyap Chamarthy
On Wed, Sep 30, 2015 at 11:25:12AM +, Murray, Paul (HP Cloud) wrote:
> 
> > Please respond to this post if you have an interest in this and what
> > you would like to see done.  Include anything you are already
> > getting on with so we get a clear picture. 
>
> Thank you to those who replied to this thread. I have used the
> contents to start an etherpad page here:
>
> https://etherpad.openstack.org/p/mitaka-live-migration

I added a couple of URLs for upstream libvirt work that allow for
selective block device migration, and the in-progress generic TLS
support work by Dan Berrange in upstream QEMU.

> I have taken the liberty of listing those that responded to the thread
> and the authors of mentioned patches as interested people.
 
> From the responses and looking at the specs up for review it looks
> like there are about five areas that could be addressed in Mitaka and
> several others that could come later. The first five are:
>
> 
> - migrating instances with a mix of local disks and cinder volumes 

IIUC, this is possible with the selective block device migration work
merged in upstream libvirt:

https://www.redhat.com/archives/libvir-list/2015-May/msg00955.html

> - pause instance during migration
> - cancel migration 
> - migrate suspended instances 
> - improve CI coverage
> 
> Not all of these are covered by specs yet and all the existing specs
> need reviews. Please look at the etherpad and see if there is anything
> you think is missing.
> 

-- 
/kashyap

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev