[Openstack] KVM live block migration: stability, future, docs

2012-08-06 Thread Blair Bethwaite
Hi all,

KVM block migration support in OpenStack
(https://blueprints.launchpad.net/nova/+spec/kvm-block-migration)
seems to be somewhat of a secret - there's almost nothing in the
docs/guides (which to the contrary state that live migration is only
possible with shared storage) and only a couple of mentions on list,
yet it's been around since Diablo. Should this be taken to mean it's
considered unstable, or just that no-one interested in documenting it
understands the significance of such a feature to deployment
architects? After all, decent shared storage is an expensive prospect
with a pile of associated design and management overhead!

I'd be happy to contribute some documentation patches (starting with
the admin guide) that cover this. But first I'd like to get some
confirmation that it's here to stay, which will be significant for our
own large deployment. We've tested with Essex on Ubuntu Precise and
seen a bit of weird file-system behaviour, which we currently suspect
might be a consequence of using ext3 in the guest. But also, there
seems to be some associated lag with interactive services (e.g. active
VNC session) in the guest, not yet sure how this compares to the
non-block live migration case.

We'd really appreciate anybody actively using this feature to speak up
and comment on their mileage, especially with respect to ops.

I'm slightly concerned that KVM may drop this going forward
(http://www.spinics.net/lists/kvm/msg72228.html), though that would be
unlikely to affect anybody deploying on Precise.

-- 
Cheers,
~Blairo

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-07 Thread Sébastien Han
Hi!

I think it's a pretty useful feature, a good compromise. As you said using
a shared fs implies a lot of things and can dramatically decrease your
performance rather than using the local fs. I tested it and I will use it
for my deployment. I'll be happy to discuss more deeply with you about this
feature :)

I also feel a little concern about this statement:

 It don't work so well, it complicates migration code, and we are building
> a replacement that works.


I have to go further with my tests, maybe we could share some ideas, use
case etc...

Cheers!

On Mon, Aug 6, 2012 at 3:08 PM, Blair Bethwaite 
wrote:
> Hi all,
>
> KVM block migration support in OpenStack
> (https://blueprints.launchpad.net/nova/+spec/kvm-block-migration)
> seems to be somewhat of a secret - there's almost nothing in the
> docs/guides (which to the contrary state that live migration is only
> possible with shared storage) and only a couple of mentions on list,
> yet it's been around since Diablo. Should this be taken to mean it's
> considered unstable, or just that no-one interested in documenting it
> understands the significance of such a feature to deployment
> architects? After all, decent shared storage is an expensive prospect
> with a pile of associated design and management overhead!
>
> I'd be happy to contribute some documentation patches (starting with
> the admin guide) that cover this. But first I'd like to get some
> confirmation that it's here to stay, which will be significant for our
> own large deployment. We've tested with Essex on Ubuntu Precise and
> seen a bit of weird file-system behaviour, which we currently suspect
> might be a consequence of using ext3 in the guest. But also, there
> seems to be some associated lag with interactive services (e.g. active
> VNC session) in the guest, not yet sure how this compares to the
> non-block live migration case.
>
> We'd really appreciate anybody actively using this feature to speak up
> and comment on their mileage, especially with respect to ops.
>
> I'm slightly concerned that KVM may drop this going forward
> (http://www.spinics.net/lists/kvm/msg72228.html), though that would be
> unlikely to affect anybody deploying on Precise.
>
> --
> Cheers,
> ~Blairo
>
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-07 Thread Blair Bethwaite
Hi Sébastien,

Thanks for responding! By the way, I have come across your blog post
regarding this and should reference it for the list:
http://www.sebastien-han.fr/blog/2012/07/12/openstack-block-migration/

On 7 August 2012 17:45, Sébastien Han  wrote:
> I think it's a pretty useful feature, a good compromise. As you said using a
> shared fs implies a lot of things and can dramatically decrease your
> performance rather than using the local fs.

Agreed, scale-out distributed file-systems are hard. Consistent
hashing based systems (like Gluster and Ceph) seem like the answer to
many of the existing issues with systems trying to mix scalability,
performance and POSIX compliance. But the key issue is how one
measures "performance" for these systems... throughput for large
synchronous reads & writes may scale linearly (up to network
saturation), but random IOPS are another thing entirely. As far as I
can tell, random IOPS are the primary metric of concern in the design
of the nova-compute storage, whereas both capacity and throughput
requirements are relatively easy to specify and simply represent hard
limits that must be met to support the various instance flavours you
plan to offer.

It's interesting to note that RedHat do not recommend using RHS
(RedHat Storage), their RHEL-based Gluster (which they own now)
appliance, for live VM storage.

Additionally, operations issues are much harder to handle with a DFS
(even NFS), e.g., how can I put an upper limit on disk I/O for any
particular instance when its ephemeral disk files are across the
network and potentially striped into opaque objects across multiple
storage bricks...?

> I tested it and I will use it
> for my deployment. I'll be happy to discuss more deeply with you about this
> feature :)

Great. We have tested too. Compared to regular (non-block) live
migrate, we don't see much difference in the guest - both scenarios
involve a minute or two of interruption as the guest is moved (e.g.
VNC and SSH sessions hang temporarily), which I find slightly
surprising - is that your experience too?

> I also feel a little concern about this statement:
>
>>  It don't work so well, it complicates migration code, and we are building
>> a replacement that works.
>
>
> I have to go further with my tests, maybe we could share some ideas, use
> case etc...

I think it may be worth asking about this on the KVM lists, unless
anyone here has further insights...?

I grabbed the KVM 1.0 source from Ubuntu Precise and vanilla KVM 1.1.1
from Sourceforge, block migration appears to remain in place despite
those (sparse) comments from the KVM meeting minutes (though I am
naive to the source layout and project structure, so could have easily
missed something). In any case, it seems unlikely Precise would see a
forced update to the 1.1.x series.

-- 
Cheers,
~Blairo

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-07 Thread Jay Pipes
On 08/07/2012 08:57 AM, Blair Bethwaite wrote:
> Hi Sébastien,
> 
> Thanks for responding! By the way, I have come across your blog post
> regarding this and should reference it for the list:
> http://www.sebastien-han.fr/blog/2012/07/12/openstack-block-migration/
> 
> On 7 August 2012 17:45, Sébastien Han  wrote:
>> I think it's a pretty useful feature, a good compromise. As you said using a
>> shared fs implies a lot of things and can dramatically decrease your
>> performance rather than using the local fs.
> 
> Agreed, scale-out distributed file-systems are hard. Consistent
> hashing based systems (like Gluster and Ceph) seem like the answer to
> many of the existing issues with systems trying to mix scalability,
> performance and POSIX compliance. But the key issue is how one
> measures "performance" for these systems... throughput for large
> synchronous reads & writes may scale linearly (up to network
> saturation), but random IOPS are another thing entirely. As far as I
> can tell, random IOPS are the primary metric of concern in the design
> of the nova-compute storage, whereas both capacity and throughput
> requirements are relatively easy to specify and simply represent hard
> limits that must be met to support the various instance flavours you
> plan to offer.
> 
> It's interesting to note that RedHat do not recommend using RHS
> (RedHat Storage), their RHEL-based Gluster (which they own now)
> appliance, for live VM storage.
> 
> Additionally, operations issues are much harder to handle with a DFS
> (even NFS), e.g., how can I put an upper limit on disk I/O for any
> particular instance when its ephemeral disk files are across the
> network and potentially striped into opaque objects across multiple
> storage bricks...?

We at AT&T are also interested in this area, for the record, and will
likely do testing in this area in the next 6-12 months. We will release
any information and findings to the mailing list of course, and
hopefully we can collaborate on this important area.

>> I tested it and I will use it
>> for my deployment. I'll be happy to discuss more deeply with you about this
>> feature :)
> 
> Great. We have tested too. Compared to regular (non-block) live
> migrate, we don't see much difference in the guest - both scenarios
> involve a minute or two of interruption as the guest is moved (e.g.
> VNC and SSH sessions hang temporarily), which I find slightly
> surprising - is that your experience too?

Why would you find this surprising? I'm just curious...

>> I also feel a little concern about this statement:
>>
>>>  It don't work so well, it complicates migration code, and we are building
>>> a replacement that works.
>>
>>
>> I have to go further with my tests, maybe we could share some ideas, use
>> case etc...
> 
> I think it may be worth asking about this on the KVM lists, unless
> anyone here has further insights...?
> 
> I grabbed the KVM 1.0 source from Ubuntu Precise and vanilla KVM 1.1.1
> from Sourceforge, block migration appears to remain in place despite
> those (sparse) comments from the KVM meeting minutes (though I am
> naive to the source layout and project structure, so could have easily
> missed something). In any case, it seems unlikely Precise would see a
> forced update to the 1.1.x series.

cc'd Daniel Berrange, who seems to be keyed in on upstream KVM/Qemu
activity. Perhaps Daniel could shed some light.

Best,
-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-07 Thread Blair Bethwaite
Hi Jay,

On 8 August 2012 06:13, Jay Pipes  wrote:
> Why would you find this surprising? I'm just curious...

The live migration algorithm detailed here:
http://www.linux-kvm.org/page/Migration, seems to me to indicate that
only a brief pause should be expected. Indeed, the summary says,
"Almost unnoticeable guest down time".

But to the contrary. I tested live-migrate (without block migrate)
last night using a guest with 8GB RAM (almost fully committed) and
lost any access/contact with the guest for over 4 minutes - it was
paused for the duration. Not something I'd want to do to a user's
web-server on a regular basis...

> cc'd Daniel Berrange, who seems to be keyed in on upstream KVM/Qemu
> activity. Perhaps Daniel could shed some light.

That would be wonderful. Thanks!

-- 
Cheers,
~Blairo

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-07 Thread Jay Pipes
On 08/07/2012 08:23 PM, Blair Bethwaite wrote:
> Hi Jay,
> 
> On 8 August 2012 06:13, Jay Pipes  wrote:
>> Why would you find this surprising? I'm just curious...
> 
> The live migration algorithm detailed here:
> http://www.linux-kvm.org/page/Migration, seems to me to indicate that
> only a brief pause should be expected. Indeed, the summary says,
> "Almost unnoticeable guest down time".
> 
> But to the contrary. I tested live-migrate (without block migrate)
> last night using a guest with 8GB RAM (almost fully committed) and
> lost any access/contact with the guest for over 4 minutes - it was
> paused for the duration. Not something I'd want to do to a user's
> web-server on a regular basis...

Sorry, from your original post, I didn't think you were referring to
live migration, but rather just server migration. You had written
"Compared to regular (non-block) live migrate", but I read that as
"Compared to regular migrate" and thought you were referring to the
server migration behaviour that Nova supports... sorry about that!

Best,
-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-07 Thread Blair Bethwaite
On 8 August 2012 11:33, Jay Pipes  wrote:
> Sorry, from your original post, I didn't think you were referring to
> live migration, but rather just server migration. You had written
> "Compared to regular (non-block) live migrate", but I read that as
> "Compared to regular migrate" and thought you were referring to the
> server migration behaviour that Nova supports... sorry about that!

Jay, is your use of the wording "behaviour that Nova supports" there,
significant? I mean, you're not trying to indicate that Nova does not
support _live_ migration, are you?

Anyway, I found this relevant and stale bug:
https://bugs.launchpad.net/nova/+bug/883845. VIR_MIGRATE_LIVE remains
undefined in 
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py.
We only just discovered the lack of this as a default option, so we'll
test further, this time with VIR_MIGRATE_LIVE=1 explicitly specified
in nova.conf...

-- 
Cheers,
~Blairo

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-07 Thread Huang Zhiteng
> But to the contrary. I tested live-migrate (without block migrate)
> last night using a guest with 8GB RAM (almost fully committed) and
> lost any access/contact with the guest for over 4 minutes - it was
> paused for the duration. Not something I'd want to do to a user's
> web-server on a regular basis...

4 minutes of pause (down time)?  That's way too long.  Even there was
crazy memory intensive workload inside the VM being migrated, the
worst case is KVM has to pause VM and transmit all 8 GB memory (all
memory are dirty, which is very rare).  If you have 1GbE link between
two host, that worst case pause period (down time) is less than 2
minutes.  My previous experience is: the down time for migrating one
idle (almost no memory access) 8GB VM via 1GbE is less than 1 second;
the down time for migrating a 8 GB VM that page got dirty really
quickly is <60 seconds.  FYI.

-- 
Regards
Huang Zhiteng

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-07 Thread Jay Pipes
On 08/07/2012 09:42 PM, Blair Bethwaite wrote:
> On 8 August 2012 11:33, Jay Pipes  wrote:
>> Sorry, from your original post, I didn't think you were referring to
>> live migration, but rather just server migration. You had written
>> "Compared to regular (non-block) live migrate", but I read that as
>> "Compared to regular migrate" and thought you were referring to the
>> server migration behaviour that Nova supports... sorry about that!
> 
> Jay, is your use of the wording "behaviour that Nova supports" there,
> significant? I mean, you're not trying to indicate that Nova does not
> support _live_ migration, are you?

No, I was referring to the differentiation between server migration in
Nova and live migration in Nova. In other words, the difference between:

$> nova migrate  ...

and

$> nova live-migrate  ...

> Anyway, I found this relevant and stale bug:
> https://bugs.launchpad.net/nova/+bug/883845. VIR_MIGRATE_LIVE remains
> undefined in 
> https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py.
> We only just discovered the lack of this as a default option, so we'll
> test further, this time with VIR_MIGRATE_LIVE=1 explicitly specified
> in nova.conf...

OK, cheers,
-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-08 Thread Daniel P. Berrange
On Wed, Aug 08, 2012 at 09:50:20AM +0800, Huang Zhiteng wrote:
> > But to the contrary. I tested live-migrate (without block migrate)
> > last night using a guest with 8GB RAM (almost fully committed) and
> > lost any access/contact with the guest for over 4 minutes - it was
> > paused for the duration. Not something I'd want to do to a user's
> > web-server on a regular basis...
> 
> 4 minutes of pause (down time)?  That's way too long.  Even there was
> crazy memory intensive workload inside the VM being migrated, the
> worst case is KVM has to pause VM and transmit all 8 GB memory (all
> memory are dirty, which is very rare).  If you have 1GbE link between
> two host, that worst case pause period (down time) is less than 2
> minutes.  My previous experience is: the down time for migrating one
> idle (almost no memory access) 8GB VM via 1GbE is less than 1 second;
> the down time for migrating a 8 GB VM that page got dirty really
> quickly is <60 seconds.  FYI.

KVM has a tunable setting for the maximum allowable live migration
downtime, which IIRC defaults to something very small like 250ms.

If the migration can't be completed within this downtime limit,
KVM will simply never complete migration. Given that Nova does
not tune this downtime setting, I don't see how you'd see 4 mins
downtime unless it was not truely live migration, or there was
something else broken (eg the network bridge device had a delay
inserted by the STP protocol which made the VM /appear/ to be
unreponsive on the network even though it was running fine).

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-08 Thread Daniel P. Berrange
On Tue, Aug 07, 2012 at 04:13:22PM -0400, Jay Pipes wrote:
> On 08/07/2012 08:57 AM, Blair Bethwaite wrote:
> >> I also feel a little concern about this statement:
> >>
> >>>  It don't work so well, it complicates migration code, and we are building
> >>> a replacement that works.
> >>
> >>
> >> I have to go further with my tests, maybe we could share some ideas, use
> >> case etc...
> > 
> > I think it may be worth asking about this on the KVM lists, unless
> > anyone here has further insights...?
> > 
> > I grabbed the KVM 1.0 source from Ubuntu Precise and vanilla KVM 1.1.1
> > from Sourceforge, block migration appears to remain in place despite
> > those (sparse) comments from the KVM meeting minutes (though I am
> > naive to the source layout and project structure, so could have easily
> > missed something). In any case, it seems unlikely Precise would see a
> > forced update to the 1.1.x series.
> 
> cc'd Daniel Berrange, who seems to be keyed in on upstream KVM/Qemu
> activity. Perhaps Daniel could shed some light.

Block migration is a part of the KVM that none of the upstream developers
really like, is not entirely reliable, and most distros typically do not
want to support it due to its poor design (eg not supported in RHEL).

It is quite likely that it will be removed in favour of an alternative
implementation. What that alternative impl will be, and when I will
arrive, I can't say right now. A lot of the work (possibly all) will
probably be pushed up into libvirt, or even the higher level mgmt apps
using libvirt. It could well involve the mgmt app having to setup an
NBD or iSCSI server on the source host, and then launching QEMU on the
destination host configured to stream the data across from the NBD/iSCSI
server in parallel with the migration stream. But this is all just talk
for now, no firm decisions have been made, beyond a general desire to
kill the current block migration code.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-08 Thread Kiall Mac Innes
>From memory (a fuzzy memory at that!) Nova will fallback to block migration
if believes shared storage is unavailable.

This would explain the delay, but someone who's read the code recently can
confirm...

Thanks,
Kiall
On Aug 8, 2012 11:08 AM, "Daniel P. Berrange"  wrote:

> On Wed, Aug 08, 2012 at 09:50:20AM +0800, Huang Zhiteng wrote:
> > > But to the contrary. I tested live-migrate (without block migrate)
> > > last night using a guest with 8GB RAM (almost fully committed) and
> > > lost any access/contact with the guest for over 4 minutes - it was
> > > paused for the duration. Not something I'd want to do to a user's
> > > web-server on a regular basis...
> >
> > 4 minutes of pause (down time)?  That's way too long.  Even there was
> > crazy memory intensive workload inside the VM being migrated, the
> > worst case is KVM has to pause VM and transmit all 8 GB memory (all
> > memory are dirty, which is very rare).  If you have 1GbE link between
> > two host, that worst case pause period (down time) is less than 2
> > minutes.  My previous experience is: the down time for migrating one
> > idle (almost no memory access) 8GB VM via 1GbE is less than 1 second;
> > the down time for migrating a 8 GB VM that page got dirty really
> > quickly is <60 seconds.  FYI.
>
> KVM has a tunable setting for the maximum allowable live migration
> downtime, which IIRC defaults to something very small like 250ms.
>
> If the migration can't be completed within this downtime limit,
> KVM will simply never complete migration. Given that Nova does
> not tune this downtime setting, I don't see how you'd see 4 mins
> downtime unless it was not truely live migration, or there was
> something else broken (eg the network bridge device had a delay
> inserted by the STP protocol which made the VM /appear/ to be
> unreponsive on the network even though it was running fine).
>
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/:|
> |: http://libvirt.org  -o- http://virt-manager.org:|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/:|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc:|
>
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-09 Thread Blair Bethwaite
Hi Daniel,

Thanks for following this up!

On 8 August 2012 19:53, Daniel P. Berrange  wrote:
> not tune this downtime setting, I don't see how you'd see 4 mins
> downtime unless it was not truely live migration, or there was

Yes, quite right. It turns out Nova is not passing/setting libvirt's
VIR_MIGRATE_LIVE when it is asked to "live-migrate" a guest, so it is
not proper live-migration. That is the default behaviour unless the
flag is added to the migrate flags in nova.conf, unfortunately that
flag isn't currently mentioned in the OpenStack docs either.

-- 
Cheers,
~Blairo

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-09 Thread Blair Bethwaite
Daniel,

Thanks for providing this insight, most useful. I'm interpreting this
as: block migration can be used in non-critical applications, mileage
will vary, thorough testing in the particular environment is
recommended. An alternative implementation will come, but the higher
level feature (live-migration without shared storage) is unlikely to
disappear.

Is that a reasonable appraisal?

On 8 August 2012 19:59, Daniel P. Berrange  wrote:
> Block migration is a part of the KVM that none of the upstream developers
> really like, is not entirely reliable, and most distros typically do not
> want to support it due to its poor design (eg not supported in RHEL).

Would you mind/be-able-to elaborate on those reliability issues? E.g.,
is there anything we can do to mitigate them?

-- 
Cheers,
~Blairo

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-09 Thread Vishvananda Ishaya

On Aug 9, 2012, at 1:03 AM, Blair Bethwaite  wrote:

> Hi Daniel,
> 
> Thanks for following this up!
> 
> On 8 August 2012 19:53, Daniel P. Berrange  wrote:
>> not tune this downtime setting, I don't see how you'd see 4 mins
>> downtime unless it was not truely live migration, or there was
> 
> Yes, quite right. It turns out Nova is not passing/setting libvirt's
> VIR_MIGRATE_LIVE when it is asked to "live-migrate" a guest, so it is
> not proper live-migration. That is the default behaviour unless the
> flag is added to the migrate flags in nova.conf, unfortunately that
> flag isn't currently mentioned in the OpenStack docs either.

Can you file a bug on this to change the default? I don't see any reason why 
this should be off.

Vish


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-09 Thread Daniel P. Berrange
On Thu, Aug 09, 2012 at 07:10:17AM -0700, Vishvananda Ishaya wrote:
> 
> On Aug 9, 2012, at 1:03 AM, Blair Bethwaite  wrote:
> 
> > Hi Daniel,
> > 
> > Thanks for following this up!
> > 
> > On 8 August 2012 19:53, Daniel P. Berrange  wrote:
> >> not tune this downtime setting, I don't see how you'd see 4 mins
> >> downtime unless it was not truely live migration, or there was
> > 
> > Yes, quite right. It turns out Nova is not passing/setting libvirt's
> > VIR_MIGRATE_LIVE when it is asked to "live-migrate" a guest, so it is
> > not proper live-migration. That is the default behaviour unless the
> > flag is added to the migrate flags in nova.conf, unfortunately that
> > flag isn't currently mentioned in the OpenStack docs either.
> 
> Can you file a bug on this to change the default? I don't see any
> reason why this should be off.

With non-live migration, the migration operation is guaranteed to
complete. With live migration, you can get into a non-convergence
scenario where the guest is dirtying data faster than it can be
migrated. With the way Nova currently works the live migration
will just run forever with no way to stop it. So if you want to
enable live migration by default, we'll need todo more than
simply set the flag. Nova will need to be able to monitor the
migration, and either cancel it after some time, or tune the
max allowed downtime to let it complete


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-09 Thread Vishvananda Ishaya

On Aug 9, 2012, at 7:13 AM, "Daniel P. Berrange"  wrote:

> 
> With non-live migration, the migration operation is guaranteed to
> complete. With live migration, you can get into a non-convergence
> scenario where the guest is dirtying data faster than it can be
> migrated. With the way Nova currently works the live migration
> will just run forever with no way to stop it. So if you want to
> enable live migration by default, we'll need todo more than
> simply set the flag. Nova will need to be able to monitor the
> migration, and either cancel it after some time, or tune the
> max allowed downtime to let it complete

Ah good to know. So it sounds like we should keep the default as-is
for now and revisit it later.

Vish

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-12 Thread Blair Bethwaite
Hi Vish,

On 10 August 2012 00:27, Vishvananda Ishaya  wrote:
>
> On Aug 9, 2012, at 7:13 AM, "Daniel P. Berrange"  wrote:
>
>>
>> With non-live migration, the migration operation is guaranteed to
>> complete. With live migration, you can get into a non-convergence
>> scenario where the guest is dirtying data faster than it can be
>> migrated. With the way Nova currently works the live migration
>> will just run forever with no way to stop it. So if you want to
>> enable live migration by default, we'll need todo more than
>> simply set the flag. Nova will need to be able to monitor the
>> migration, and either cancel it after some time, or tune the
>> max allowed downtime to let it complete
>
> Ah good to know. So it sounds like we should keep the default as-is
> for now and revisit it later.

I'm not so sure. It seems to me that "nova migrate" should be the
offline/paused migration and "nova live-migration" should be _live_
migration, like it says. Semantic mismatches like this exposed to
operators/users are bad news. As it is, I don't even know what "nova
migrate" is supposed to do...? There's at least a need to improve the
docs on this.

Daniel's point about the non-convergence cases with
[live|block]-migration is certainly good to know. It sounds like in
practice the key settings, such as the allowable live-migration
downtime, should be tuned to the deployment. Nova should probably
default to a conservatively high allowable downtime.

Daniel; any advice about choosing a sensible value for the allowable downtime?

-- 
Cheers,
~Blairo

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp