Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-12 Thread Blair Bethwaite
Hi Vish,

On 10 August 2012 00:27, Vishvananda Ishaya vishvana...@gmail.com wrote:

 On Aug 9, 2012, at 7:13 AM, Daniel P. Berrange berra...@redhat.com wrote:


 With non-live migration, the migration operation is guaranteed to
 complete. With live migration, you can get into a non-convergence
 scenario where the guest is dirtying data faster than it can be
 migrated. With the way Nova currently works the live migration
 will just run forever with no way to stop it. So if you want to
 enable live migration by default, we'll need todo more than
 simply set the flag. Nova will need to be able to monitor the
 migration, and either cancel it after some time, or tune the
 max allowed downtime to let it complete

 Ah good to know. So it sounds like we should keep the default as-is
 for now and revisit it later.

I'm not so sure. It seems to me that nova migrate should be the
offline/paused migration and nova live-migration should be _live_
migration, like it says. Semantic mismatches like this exposed to
operators/users are bad news. As it is, I don't even know what nova
migrate is supposed to do...? There's at least a need to improve the
docs on this.

Daniel's point about the non-convergence cases with
[live|block]-migration is certainly good to know. It sounds like in
practice the key settings, such as the allowable live-migration
downtime, should be tuned to the deployment. Nova should probably
default to a conservatively high allowable downtime.

Daniel; any advice about choosing a sensible value for the allowable downtime?

-- 
Cheers,
~Blairo

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-09 Thread Blair Bethwaite
Hi Daniel,

Thanks for following this up!

On 8 August 2012 19:53, Daniel P. Berrange berra...@redhat.com wrote:
 not tune this downtime setting, I don't see how you'd see 4 mins
 downtime unless it was not truely live migration, or there was

Yes, quite right. It turns out Nova is not passing/setting libvirt's
VIR_MIGRATE_LIVE when it is asked to live-migrate a guest, so it is
not proper live-migration. That is the default behaviour unless the
flag is added to the migrate flags in nova.conf, unfortunately that
flag isn't currently mentioned in the OpenStack docs either.

-- 
Cheers,
~Blairo

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-09 Thread Blair Bethwaite
Daniel,

Thanks for providing this insight, most useful. I'm interpreting this
as: block migration can be used in non-critical applications, mileage
will vary, thorough testing in the particular environment is
recommended. An alternative implementation will come, but the higher
level feature (live-migration without shared storage) is unlikely to
disappear.

Is that a reasonable appraisal?

On 8 August 2012 19:59, Daniel P. Berrange berra...@redhat.com wrote:
 Block migration is a part of the KVM that none of the upstream developers
 really like, is not entirely reliable, and most distros typically do not
 want to support it due to its poor design (eg not supported in RHEL).

Would you mind/be-able-to elaborate on those reliability issues? E.g.,
is there anything we can do to mitigate them?

-- 
Cheers,
~Blairo

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-09 Thread Vishvananda Ishaya

On Aug 9, 2012, at 1:03 AM, Blair Bethwaite blair.bethwa...@gmail.com wrote:

 Hi Daniel,
 
 Thanks for following this up!
 
 On 8 August 2012 19:53, Daniel P. Berrange berra...@redhat.com wrote:
 not tune this downtime setting, I don't see how you'd see 4 mins
 downtime unless it was not truely live migration, or there was
 
 Yes, quite right. It turns out Nova is not passing/setting libvirt's
 VIR_MIGRATE_LIVE when it is asked to live-migrate a guest, so it is
 not proper live-migration. That is the default behaviour unless the
 flag is added to the migrate flags in nova.conf, unfortunately that
 flag isn't currently mentioned in the OpenStack docs either.

Can you file a bug on this to change the default? I don't see any reason why 
this should be off.

Vish


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-09 Thread Daniel P. Berrange
On Thu, Aug 09, 2012 at 07:10:17AM -0700, Vishvananda Ishaya wrote:
 
 On Aug 9, 2012, at 1:03 AM, Blair Bethwaite blair.bethwa...@gmail.com wrote:
 
  Hi Daniel,
  
  Thanks for following this up!
  
  On 8 August 2012 19:53, Daniel P. Berrange berra...@redhat.com wrote:
  not tune this downtime setting, I don't see how you'd see 4 mins
  downtime unless it was not truely live migration, or there was
  
  Yes, quite right. It turns out Nova is not passing/setting libvirt's
  VIR_MIGRATE_LIVE when it is asked to live-migrate a guest, so it is
  not proper live-migration. That is the default behaviour unless the
  flag is added to the migrate flags in nova.conf, unfortunately that
  flag isn't currently mentioned in the OpenStack docs either.
 
 Can you file a bug on this to change the default? I don't see any
 reason why this should be off.

With non-live migration, the migration operation is guaranteed to
complete. With live migration, you can get into a non-convergence
scenario where the guest is dirtying data faster than it can be
migrated. With the way Nova currently works the live migration
will just run forever with no way to stop it. So if you want to
enable live migration by default, we'll need todo more than
simply set the flag. Nova will need to be able to monitor the
migration, and either cancel it after some time, or tune the
max allowed downtime to let it complete


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-09 Thread Vishvananda Ishaya

On Aug 9, 2012, at 7:13 AM, Daniel P. Berrange berra...@redhat.com wrote:

 
 With non-live migration, the migration operation is guaranteed to
 complete. With live migration, you can get into a non-convergence
 scenario where the guest is dirtying data faster than it can be
 migrated. With the way Nova currently works the live migration
 will just run forever with no way to stop it. So if you want to
 enable live migration by default, we'll need todo more than
 simply set the flag. Nova will need to be able to monitor the
 migration, and either cancel it after some time, or tune the
 max allowed downtime to let it complete

Ah good to know. So it sounds like we should keep the default as-is
for now and revisit it later.

Vish

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-08 Thread Daniel P. Berrange
On Wed, Aug 08, 2012 at 09:50:20AM +0800, Huang Zhiteng wrote:
  But to the contrary. I tested live-migrate (without block migrate)
  last night using a guest with 8GB RAM (almost fully committed) and
  lost any access/contact with the guest for over 4 minutes - it was
  paused for the duration. Not something I'd want to do to a user's
  web-server on a regular basis...
 
 4 minutes of pause (down time)?  That's way too long.  Even there was
 crazy memory intensive workload inside the VM being migrated, the
 worst case is KVM has to pause VM and transmit all 8 GB memory (all
 memory are dirty, which is very rare).  If you have 1GbE link between
 two host, that worst case pause period (down time) is less than 2
 minutes.  My previous experience is: the down time for migrating one
 idle (almost no memory access) 8GB VM via 1GbE is less than 1 second;
 the down time for migrating a 8 GB VM that page got dirty really
 quickly is 60 seconds.  FYI.

KVM has a tunable setting for the maximum allowable live migration
downtime, which IIRC defaults to something very small like 250ms.

If the migration can't be completed within this downtime limit,
KVM will simply never complete migration. Given that Nova does
not tune this downtime setting, I don't see how you'd see 4 mins
downtime unless it was not truely live migration, or there was
something else broken (eg the network bridge device had a delay
inserted by the STP protocol which made the VM /appear/ to be
unreponsive on the network even though it was running fine).

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-08 Thread Daniel P. Berrange
On Tue, Aug 07, 2012 at 04:13:22PM -0400, Jay Pipes wrote:
 On 08/07/2012 08:57 AM, Blair Bethwaite wrote:
  I also feel a little concern about this statement:
 
   It don't work so well, it complicates migration code, and we are building
  a replacement that works.
 
 
  I have to go further with my tests, maybe we could share some ideas, use
  case etc...
  
  I think it may be worth asking about this on the KVM lists, unless
  anyone here has further insights...?
  
  I grabbed the KVM 1.0 source from Ubuntu Precise and vanilla KVM 1.1.1
  from Sourceforge, block migration appears to remain in place despite
  those (sparse) comments from the KVM meeting minutes (though I am
  naive to the source layout and project structure, so could have easily
  missed something). In any case, it seems unlikely Precise would see a
  forced update to the 1.1.x series.
 
 cc'd Daniel Berrange, who seems to be keyed in on upstream KVM/Qemu
 activity. Perhaps Daniel could shed some light.

Block migration is a part of the KVM that none of the upstream developers
really like, is not entirely reliable, and most distros typically do not
want to support it due to its poor design (eg not supported in RHEL).

It is quite likely that it will be removed in favour of an alternative
implementation. What that alternative impl will be, and when I will
arrive, I can't say right now. A lot of the work (possibly all) will
probably be pushed up into libvirt, or even the higher level mgmt apps
using libvirt. It could well involve the mgmt app having to setup an
NBD or iSCSI server on the source host, and then launching QEMU on the
destination host configured to stream the data across from the NBD/iSCSI
server in parallel with the migration stream. But this is all just talk
for now, no firm decisions have been made, beyond a general desire to
kill the current block migration code.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-08 Thread Kiall Mac Innes
From memory (a fuzzy memory at that!) Nova will fallback to block migration
if believes shared storage is unavailable.

This would explain the delay, but someone who's read the code recently can
confirm...

Thanks,
Kiall
On Aug 8, 2012 11:08 AM, Daniel P. Berrange berra...@redhat.com wrote:

 On Wed, Aug 08, 2012 at 09:50:20AM +0800, Huang Zhiteng wrote:
   But to the contrary. I tested live-migrate (without block migrate)
   last night using a guest with 8GB RAM (almost fully committed) and
   lost any access/contact with the guest for over 4 minutes - it was
   paused for the duration. Not something I'd want to do to a user's
   web-server on a regular basis...
 
  4 minutes of pause (down time)?  That's way too long.  Even there was
  crazy memory intensive workload inside the VM being migrated, the
  worst case is KVM has to pause VM and transmit all 8 GB memory (all
  memory are dirty, which is very rare).  If you have 1GbE link between
  two host, that worst case pause period (down time) is less than 2
  minutes.  My previous experience is: the down time for migrating one
  idle (almost no memory access) 8GB VM via 1GbE is less than 1 second;
  the down time for migrating a 8 GB VM that page got dirty really
  quickly is 60 seconds.  FYI.

 KVM has a tunable setting for the maximum allowable live migration
 downtime, which IIRC defaults to something very small like 250ms.

 If the migration can't be completed within this downtime limit,
 KVM will simply never complete migration. Given that Nova does
 not tune this downtime setting, I don't see how you'd see 4 mins
 downtime unless it was not truely live migration, or there was
 something else broken (eg the network bridge device had a delay
 inserted by the STP protocol which made the VM /appear/ to be
 unreponsive on the network even though it was running fine).

 Regards,
 Daniel
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/:|
 |: http://libvirt.org  -o- http://virt-manager.org:|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/:|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc:|

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-07 Thread Sébastien Han
Hi!

I think it's a pretty useful feature, a good compromise. As you said using
a shared fs implies a lot of things and can dramatically decrease your
performance rather than using the local fs. I tested it and I will use it
for my deployment. I'll be happy to discuss more deeply with you about this
feature :)

I also feel a little concern about this statement:

 It don't work so well, it complicates migration code, and we are building
 a replacement that works.


I have to go further with my tests, maybe we could share some ideas, use
case etc...

Cheers!

On Mon, Aug 6, 2012 at 3:08 PM, Blair Bethwaite blair.bethwa...@gmail.com
wrote:
 Hi all,

 KVM block migration support in OpenStack
 (https://blueprints.launchpad.net/nova/+spec/kvm-block-migration)
 seems to be somewhat of a secret - there's almost nothing in the
 docs/guides (which to the contrary state that live migration is only
 possible with shared storage) and only a couple of mentions on list,
 yet it's been around since Diablo. Should this be taken to mean it's
 considered unstable, or just that no-one interested in documenting it
 understands the significance of such a feature to deployment
 architects? After all, decent shared storage is an expensive prospect
 with a pile of associated design and management overhead!

 I'd be happy to contribute some documentation patches (starting with
 the admin guide) that cover this. But first I'd like to get some
 confirmation that it's here to stay, which will be significant for our
 own large deployment. We've tested with Essex on Ubuntu Precise and
 seen a bit of weird file-system behaviour, which we currently suspect
 might be a consequence of using ext3 in the guest. But also, there
 seems to be some associated lag with interactive services (e.g. active
 VNC session) in the guest, not yet sure how this compares to the
 non-block live migration case.

 We'd really appreciate anybody actively using this feature to speak up
 and comment on their mileage, especially with respect to ops.

 I'm slightly concerned that KVM may drop this going forward
 (http://www.spinics.net/lists/kvm/msg72228.html), though that would be
 unlikely to affect anybody deploying on Precise.

 --
 Cheers,
 ~Blairo

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-07 Thread Blair Bethwaite
Hi Sébastien,

Thanks for responding! By the way, I have come across your blog post
regarding this and should reference it for the list:
http://www.sebastien-han.fr/blog/2012/07/12/openstack-block-migration/

On 7 August 2012 17:45, Sébastien Han han.sebast...@gmail.com wrote:
 I think it's a pretty useful feature, a good compromise. As you said using a
 shared fs implies a lot of things and can dramatically decrease your
 performance rather than using the local fs.

Agreed, scale-out distributed file-systems are hard. Consistent
hashing based systems (like Gluster and Ceph) seem like the answer to
many of the existing issues with systems trying to mix scalability,
performance and POSIX compliance. But the key issue is how one
measures performance for these systems... throughput for large
synchronous reads  writes may scale linearly (up to network
saturation), but random IOPS are another thing entirely. As far as I
can tell, random IOPS are the primary metric of concern in the design
of the nova-compute storage, whereas both capacity and throughput
requirements are relatively easy to specify and simply represent hard
limits that must be met to support the various instance flavours you
plan to offer.

It's interesting to note that RedHat do not recommend using RHS
(RedHat Storage), their RHEL-based Gluster (which they own now)
appliance, for live VM storage.

Additionally, operations issues are much harder to handle with a DFS
(even NFS), e.g., how can I put an upper limit on disk I/O for any
particular instance when its ephemeral disk files are across the
network and potentially striped into opaque objects across multiple
storage bricks...?

 I tested it and I will use it
 for my deployment. I'll be happy to discuss more deeply with you about this
 feature :)

Great. We have tested too. Compared to regular (non-block) live
migrate, we don't see much difference in the guest - both scenarios
involve a minute or two of interruption as the guest is moved (e.g.
VNC and SSH sessions hang temporarily), which I find slightly
surprising - is that your experience too?

 I also feel a little concern about this statement:

  It don't work so well, it complicates migration code, and we are building
 a replacement that works.


 I have to go further with my tests, maybe we could share some ideas, use
 case etc...

I think it may be worth asking about this on the KVM lists, unless
anyone here has further insights...?

I grabbed the KVM 1.0 source from Ubuntu Precise and vanilla KVM 1.1.1
from Sourceforge, block migration appears to remain in place despite
those (sparse) comments from the KVM meeting minutes (though I am
naive to the source layout and project structure, so could have easily
missed something). In any case, it seems unlikely Precise would see a
forced update to the 1.1.x series.

-- 
Cheers,
~Blairo

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-07 Thread Jay Pipes
On 08/07/2012 08:57 AM, Blair Bethwaite wrote:
 Hi Sébastien,
 
 Thanks for responding! By the way, I have come across your blog post
 regarding this and should reference it for the list:
 http://www.sebastien-han.fr/blog/2012/07/12/openstack-block-migration/
 
 On 7 August 2012 17:45, Sébastien Han han.sebast...@gmail.com wrote:
 I think it's a pretty useful feature, a good compromise. As you said using a
 shared fs implies a lot of things and can dramatically decrease your
 performance rather than using the local fs.
 
 Agreed, scale-out distributed file-systems are hard. Consistent
 hashing based systems (like Gluster and Ceph) seem like the answer to
 many of the existing issues with systems trying to mix scalability,
 performance and POSIX compliance. But the key issue is how one
 measures performance for these systems... throughput for large
 synchronous reads  writes may scale linearly (up to network
 saturation), but random IOPS are another thing entirely. As far as I
 can tell, random IOPS are the primary metric of concern in the design
 of the nova-compute storage, whereas both capacity and throughput
 requirements are relatively easy to specify and simply represent hard
 limits that must be met to support the various instance flavours you
 plan to offer.
 
 It's interesting to note that RedHat do not recommend using RHS
 (RedHat Storage), their RHEL-based Gluster (which they own now)
 appliance, for live VM storage.
 
 Additionally, operations issues are much harder to handle with a DFS
 (even NFS), e.g., how can I put an upper limit on disk I/O for any
 particular instance when its ephemeral disk files are across the
 network and potentially striped into opaque objects across multiple
 storage bricks...?

We at ATT are also interested in this area, for the record, and will
likely do testing in this area in the next 6-12 months. We will release
any information and findings to the mailing list of course, and
hopefully we can collaborate on this important area.

 I tested it and I will use it
 for my deployment. I'll be happy to discuss more deeply with you about this
 feature :)
 
 Great. We have tested too. Compared to regular (non-block) live
 migrate, we don't see much difference in the guest - both scenarios
 involve a minute or two of interruption as the guest is moved (e.g.
 VNC and SSH sessions hang temporarily), which I find slightly
 surprising - is that your experience too?

Why would you find this surprising? I'm just curious...

 I also feel a little concern about this statement:

  It don't work so well, it complicates migration code, and we are building
 a replacement that works.


 I have to go further with my tests, maybe we could share some ideas, use
 case etc...
 
 I think it may be worth asking about this on the KVM lists, unless
 anyone here has further insights...?
 
 I grabbed the KVM 1.0 source from Ubuntu Precise and vanilla KVM 1.1.1
 from Sourceforge, block migration appears to remain in place despite
 those (sparse) comments from the KVM meeting minutes (though I am
 naive to the source layout and project structure, so could have easily
 missed something). In any case, it seems unlikely Precise would see a
 forced update to the 1.1.x series.

cc'd Daniel Berrange, who seems to be keyed in on upstream KVM/Qemu
activity. Perhaps Daniel could shed some light.

Best,
-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-07 Thread Blair Bethwaite
Hi Jay,

On 8 August 2012 06:13, Jay Pipes jaypi...@gmail.com wrote:
 Why would you find this surprising? I'm just curious...

The live migration algorithm detailed here:
http://www.linux-kvm.org/page/Migration, seems to me to indicate that
only a brief pause should be expected. Indeed, the summary says,
Almost unnoticeable guest down time.

But to the contrary. I tested live-migrate (without block migrate)
last night using a guest with 8GB RAM (almost fully committed) and
lost any access/contact with the guest for over 4 minutes - it was
paused for the duration. Not something I'd want to do to a user's
web-server on a regular basis...

 cc'd Daniel Berrange, who seems to be keyed in on upstream KVM/Qemu
 activity. Perhaps Daniel could shed some light.

That would be wonderful. Thanks!

-- 
Cheers,
~Blairo

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-07 Thread Jay Pipes
On 08/07/2012 08:23 PM, Blair Bethwaite wrote:
 Hi Jay,
 
 On 8 August 2012 06:13, Jay Pipes jaypi...@gmail.com wrote:
 Why would you find this surprising? I'm just curious...
 
 The live migration algorithm detailed here:
 http://www.linux-kvm.org/page/Migration, seems to me to indicate that
 only a brief pause should be expected. Indeed, the summary says,
 Almost unnoticeable guest down time.
 
 But to the contrary. I tested live-migrate (without block migrate)
 last night using a guest with 8GB RAM (almost fully committed) and
 lost any access/contact with the guest for over 4 minutes - it was
 paused for the duration. Not something I'd want to do to a user's
 web-server on a regular basis...

Sorry, from your original post, I didn't think you were referring to
live migration, but rather just server migration. You had written
Compared to regular (non-block) live migrate, but I read that as
Compared to regular migrate and thought you were referring to the
server migration behaviour that Nova supports... sorry about that!

Best,
-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-07 Thread Blair Bethwaite
On 8 August 2012 11:33, Jay Pipes jaypi...@gmail.com wrote:
 Sorry, from your original post, I didn't think you were referring to
 live migration, but rather just server migration. You had written
 Compared to regular (non-block) live migrate, but I read that as
 Compared to regular migrate and thought you were referring to the
 server migration behaviour that Nova supports... sorry about that!

Jay, is your use of the wording behaviour that Nova supports there,
significant? I mean, you're not trying to indicate that Nova does not
support _live_ migration, are you?

Anyway, I found this relevant and stale bug:
https://bugs.launchpad.net/nova/+bug/883845. VIR_MIGRATE_LIVE remains
undefined in 
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py.
We only just discovered the lack of this as a default option, so we'll
test further, this time with VIR_MIGRATE_LIVE=1 explicitly specified
in nova.conf...

-- 
Cheers,
~Blairo

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-07 Thread Huang Zhiteng
 But to the contrary. I tested live-migrate (without block migrate)
 last night using a guest with 8GB RAM (almost fully committed) and
 lost any access/contact with the guest for over 4 minutes - it was
 paused for the duration. Not something I'd want to do to a user's
 web-server on a regular basis...

4 minutes of pause (down time)?  That's way too long.  Even there was
crazy memory intensive workload inside the VM being migrated, the
worst case is KVM has to pause VM and transmit all 8 GB memory (all
memory are dirty, which is very rare).  If you have 1GbE link between
two host, that worst case pause period (down time) is less than 2
minutes.  My previous experience is: the down time for migrating one
idle (almost no memory access) 8GB VM via 1GbE is less than 1 second;
the down time for migrating a 8 GB VM that page got dirty really
quickly is 60 seconds.  FYI.

-- 
Regards
Huang Zhiteng

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] KVM live block migration: stability, future, docs

2012-08-07 Thread Jay Pipes
On 08/07/2012 09:42 PM, Blair Bethwaite wrote:
 On 8 August 2012 11:33, Jay Pipes jaypi...@gmail.com wrote:
 Sorry, from your original post, I didn't think you were referring to
 live migration, but rather just server migration. You had written
 Compared to regular (non-block) live migrate, but I read that as
 Compared to regular migrate and thought you were referring to the
 server migration behaviour that Nova supports... sorry about that!
 
 Jay, is your use of the wording behaviour that Nova supports there,
 significant? I mean, you're not trying to indicate that Nova does not
 support _live_ migration, are you?

No, I was referring to the differentiation between server migration in
Nova and live migration in Nova. In other words, the difference between:

$ nova migrate SERVER ...

and

$ nova live-migrate SERVER ...

 Anyway, I found this relevant and stale bug:
 https://bugs.launchpad.net/nova/+bug/883845. VIR_MIGRATE_LIVE remains
 undefined in 
 https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py.
 We only just discovered the lack of this as a default option, so we'll
 test further, this time with VIR_MIGRATE_LIVE=1 explicitly specified
 in nova.conf...

OK, cheers,
-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] KVM live block migration: stability, future, docs

2012-08-06 Thread Blair Bethwaite
Hi all,

KVM block migration support in OpenStack
(https://blueprints.launchpad.net/nova/+spec/kvm-block-migration)
seems to be somewhat of a secret - there's almost nothing in the
docs/guides (which to the contrary state that live migration is only
possible with shared storage) and only a couple of mentions on list,
yet it's been around since Diablo. Should this be taken to mean it's
considered unstable, or just that no-one interested in documenting it
understands the significance of such a feature to deployment
architects? After all, decent shared storage is an expensive prospect
with a pile of associated design and management overhead!

I'd be happy to contribute some documentation patches (starting with
the admin guide) that cover this. But first I'd like to get some
confirmation that it's here to stay, which will be significant for our
own large deployment. We've tested with Essex on Ubuntu Precise and
seen a bit of weird file-system behaviour, which we currently suspect
might be a consequence of using ext3 in the guest. But also, there
seems to be some associated lag with interactive services (e.g. active
VNC session) in the guest, not yet sure how this compares to the
non-block live migration case.

We'd really appreciate anybody actively using this feature to speak up
and comment on their mileage, especially with respect to ops.

I'm slightly concerned that KVM may drop this going forward
(http://www.spinics.net/lists/kvm/msg72228.html), though that would be
unlikely to affect anybody deploying on Precise.

-- 
Cheers,
~Blairo

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp