Re: [openstack-dev] [gate] [nova] live migration, libvirt 1.3, and the gate

2016-06-01 Thread Sean Dague
On 06/01/2016 01:33 PM, Matt Riedemann wrote:

> 
> Sounds like there was a bad check in nova which is fixed here:
> 
> https://review.openstack.org/#/c/323467/
> 
> And a d-g change depends on that here:
> 
> https://review.openstack.org/#/c/320925/
> 
> Is there anything more to do for this? I'm assuming we should backport
> the nova change to the stable branches because the d-g change is going
> to break those multinode jobs on stable, although they are already
> non-voting jobs so it doesn't really matter. But if we knowingly break
> those jobs on stable branches, we should fix them to work or exclude
> them from running on stable branch changes since it'd be a waste of test
> resources.

The intent is to backport them. We probably can land the d-g change
without waiting for the backports, but they are super straight forward,
so should be easy to go in quick.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gate] [nova] live migration, libvirt 1.3, and the gate

2016-06-01 Thread Matt Riedemann

On 5/31/2016 8:36 AM, Daniel P. Berrange wrote:

On Tue, May 31, 2016 at 08:19:33AM -0400, Sean Dague wrote:

On 05/30/2016 06:25 AM, Kashyap Chamarthy wrote:

On Thu, May 26, 2016 at 10:55:47AM -0400, Sean Dague wrote:

On 05/26/2016 05:38 AM, Kashyap Chamarthy wrote:

On Wed, May 25, 2016 at 05:42:04PM +0200, Kashyap Chamarthy wrote:

[...]


So, in short, the central issue seems to be this: the custom 'gate64'
model is not being trasnalted by libvirt into a model that QEMU can
recognize.


An update:

Upstream libvirt points out that this turns to be regression, and
bisected it to commit (in libvirt Git): 1.2.9-31-g445a09b -- "qemu:
Don't compare CPU against host for TCG".

So, I expect there's going to be fix pretty soon upstream libvirt.


Which is good... I wonder how long we'll be waiting for that back in our
distro packages though.


Yeah, until the fix lands, our current options seem to be:

  (a) Revert to a known good version of libvirt


Downgrading libvirt so dramatically isn't a thing we'll be able to do.


  (b) Use nested virt (i.e. ) -- I doubt is possible
  on RAX environment, which is using Xen, last I know.


We turned off nested virt even where it was enabled, because it locks up
at a non trivial rate. So not really an option.


Hmm, if the guest is using 'qemu' and not 'kvm', then there should be
no dependancy between the host CPU and guest CPU whatsoever. ie we can
present arbitrary CPU to the guest, whether the host CPU has matching
features or not.

I wonder if there is a bug in Nova where it is trying todo a host/guest
CPU compatibility check even for 'qemu' guests, when it should only do
them for 'kvm' guests.

If we can avoid the CPU compatibility check with qemu guest, then the
fact that there's a libvirt bug here should be irrelevant, and we could
avoid needing to invent a gate64 CPU model too.


Regards,
Daniel



Sounds like there was a bad check in nova which is fixed here:

https://review.openstack.org/#/c/323467/

And a d-g change depends on that here:

https://review.openstack.org/#/c/320925/

Is there anything more to do for this? I'm assuming we should backport 
the nova change to the stable branches because the d-g change is going 
to break those multinode jobs on stable, although they are already 
non-voting jobs so it doesn't really matter. But if we knowingly break 
those jobs on stable branches, we should fix them to work or exclude 
them from running on stable branch changes since it'd be a waste of test 
resources.


--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gate] [nova] live migration, libvirt 1.3, and the gate

2016-05-31 Thread Daniel P. Berrange
On Tue, May 31, 2016 at 08:24:03AM -0400, Sean Dague wrote:
> On 05/31/2016 05:39 AM, Daniel P. Berrange wrote:
> > On Tue, May 24, 2016 at 01:59:17PM -0400, Sean Dague wrote:
> >> The team working on live migration testing started with an experimental
> >> job on Ubuntu 16.04 to try to be using the latest and greatest libvirt +
> >> qemu under the assumption that a set of issues we were seeing are
> >> solved. The short answer is, it doesn't look like this is going to work.
> >>
> >> We run tests on a bunch of different clouds. Those clouds expose
> >> different cpu flags to us. These are not standard things that map to
> >> "Haswell". It means live migration in the multinode cases can hit cpus
> >> with different flags. So we found the requirement was to come up with a
> >> least common denominator of cpu flags, which we call gate64, and push
> >> that into the libvirt cpu_map.xml in devstack, and set whenever we are
> >> in a multinode scenario.
> >> (https://github.com/openstack-dev/devstack/blob/master/tools/cpu_map_update.py)
> >>  Not ideal, but with libvirt 1.2.2 it works fine.
> >>
> >> It turns out it works fine because libvirt *actually* seems to take the
> >> data from cpu_map.xml and do a translation to what it believes qemu will
> >> understand. On these systems apparently this turns into "-cpu
> >> Opteron_G1,-pse36"
> >> (http://logs.openstack.org/29/42529/24/check/gate-tempest-dsvm-multinode-full/5f504c5/logs/libvirt/qemu/instance-000b.txt.gz)
> >>
> >> At some point between libvirt 1.2.2 and 1.3.1, this changed. Now libvirt
> >> seems to be passing our cpu_model directly to qemu, and assumes that as
> >> a user you will be responsible for writing all the  stanzas to
> >> add/remove yourself. When libvirt sends 'gate64' to qemu, this explodes,
> >> as qemu has no idea what we are talking about.
> >> http://logs.openstack.org/34/319934/2/experimental/gate-tempest-dsvm-multinode-live-migration/b87d689/logs/screen-n-cpu.txt.gz#_2016-05-24_15_59_12_531
> >>
> >> Unlike libvirt, which has a text file (xml) that configures the cpus
> >> that could exist in the world, qemu builds this in statically at compile
> >> time:
> >> http://git.qemu.org/?p=qemu.git;a=blob;f=target-i386/cpu.c;h=895a386d3b7a94e363ca1bb98821d3251e70c0e0;hb=HEAD#l694
> >>
> >>
> >> So, the existing cpu_map.xml workaround for our testing situation will
> >> no longer work.
> >>
> >> So, we have a number of open questions:
> >>
> >> * Have our cloud providers standardized enough that we might get away
> >> without this custom cpu model? (Have some of them done it and only use
> >> those for multinode?)
> >> * Is there any way to get this feature back in libvirt to do the cpu
> >> computation?
> >> * Would we have to build a whole nova feature around setting libvirt xml
> >>  to be able to test live migration in our clouds?
> >> * Other options?
> >> * Do we give up and go herd goats?
> > 
> > Rather than try to define our own custom CPU models, we can probably
> > just use one of the standard CPU models and then explicitly tell
> > libvirt which flags to turn off in order to get compatibility with
> > our cloud environments.
> > 
> > This is not currently possible with Nova, since our nova.conf option
> > only allow us to specify a bare CPU model. We would have to extend
> > nova.conf to allow us to specify a list of CPU features to add or
> > remove. Libvirt should then correctly pass these changes through
> > to QEMU.
> 
> Yes, that's an option. Given that the libvirt team seemed to acknowledge
> this as a regression, I'd rather not build a user exposed feature for
> all of that just as a workaround for a libvirt regression.

I think that fact that we're hitting this problem in the gate though is
a sign that our users will likely hit it in their own deployments if
using virtualized hosts. I think it is more friendly for users to be
able to customize the CPU features via nova.conf, then to repeat the
hacks done for devstack with editing the libvirt cpu_map.xml file.

IOW, extending nova.conf to support this officially would be a generally
useful feature for nova, beyond your short term CI needs.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gate] [nova] live migration, libvirt 1.3, and the gate

2016-05-31 Thread Daniel P. Berrange
On Tue, May 31, 2016 at 08:19:33AM -0400, Sean Dague wrote:
> On 05/30/2016 06:25 AM, Kashyap Chamarthy wrote:
> > On Thu, May 26, 2016 at 10:55:47AM -0400, Sean Dague wrote:
> >> On 05/26/2016 05:38 AM, Kashyap Chamarthy wrote:
> >>> On Wed, May 25, 2016 at 05:42:04PM +0200, Kashyap Chamarthy wrote:
> >>>
> >>> [...]
> >>>
>  So, in short, the central issue seems to be this: the custom 'gate64'
>  model is not being trasnalted by libvirt into a model that QEMU can
>  recognize.
> >>>
> >>> An update:
> >>>
> >>> Upstream libvirt points out that this turns to be regression, and
> >>> bisected it to commit (in libvirt Git): 1.2.9-31-g445a09b -- "qemu:
> >>> Don't compare CPU against host for TCG".
> >>>
> >>> So, I expect there's going to be fix pretty soon upstream libvirt.
> >>
> >> Which is good... I wonder how long we'll be waiting for that back in our
> >> distro packages though.
> > 
> > Yeah, until the fix lands, our current options seem to be:
> > 
> >   (a) Revert to a known good version of libvirt
> 
> Downgrading libvirt so dramatically isn't a thing we'll be able to do.
> 
> >   (b) Use nested virt (i.e. ) -- I doubt is possible
> >   on RAX environment, which is using Xen, last I know.
> 
> We turned off nested virt even where it was enabled, because it locks up
> at a non trivial rate. So not really an option.

Hmm, if the guest is using 'qemu' and not 'kvm', then there should be
no dependancy between the host CPU and guest CPU whatsoever. ie we can
present arbitrary CPU to the guest, whether the host CPU has matching
features or not.

I wonder if there is a bug in Nova where it is trying todo a host/guest
CPU compatibility check even for 'qemu' guests, when it should only do
them for 'kvm' guests.

If we can avoid the CPU compatibility check with qemu guest, then the
fact that there's a libvirt bug here should be irrelevant, and we could
avoid needing to invent a gate64 CPU model too.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gate] [nova] live migration, libvirt 1.3, and the gate

2016-05-31 Thread Sean Dague
On 05/31/2016 05:39 AM, Daniel P. Berrange wrote:
> On Tue, May 24, 2016 at 01:59:17PM -0400, Sean Dague wrote:
>> The team working on live migration testing started with an experimental
>> job on Ubuntu 16.04 to try to be using the latest and greatest libvirt +
>> qemu under the assumption that a set of issues we were seeing are
>> solved. The short answer is, it doesn't look like this is going to work.
>>
>> We run tests on a bunch of different clouds. Those clouds expose
>> different cpu flags to us. These are not standard things that map to
>> "Haswell". It means live migration in the multinode cases can hit cpus
>> with different flags. So we found the requirement was to come up with a
>> least common denominator of cpu flags, which we call gate64, and push
>> that into the libvirt cpu_map.xml in devstack, and set whenever we are
>> in a multinode scenario.
>> (https://github.com/openstack-dev/devstack/blob/master/tools/cpu_map_update.py)
>>  Not ideal, but with libvirt 1.2.2 it works fine.
>>
>> It turns out it works fine because libvirt *actually* seems to take the
>> data from cpu_map.xml and do a translation to what it believes qemu will
>> understand. On these systems apparently this turns into "-cpu
>> Opteron_G1,-pse36"
>> (http://logs.openstack.org/29/42529/24/check/gate-tempest-dsvm-multinode-full/5f504c5/logs/libvirt/qemu/instance-000b.txt.gz)
>>
>> At some point between libvirt 1.2.2 and 1.3.1, this changed. Now libvirt
>> seems to be passing our cpu_model directly to qemu, and assumes that as
>> a user you will be responsible for writing all the  stanzas to
>> add/remove yourself. When libvirt sends 'gate64' to qemu, this explodes,
>> as qemu has no idea what we are talking about.
>> http://logs.openstack.org/34/319934/2/experimental/gate-tempest-dsvm-multinode-live-migration/b87d689/logs/screen-n-cpu.txt.gz#_2016-05-24_15_59_12_531
>>
>> Unlike libvirt, which has a text file (xml) that configures the cpus
>> that could exist in the world, qemu builds this in statically at compile
>> time:
>> http://git.qemu.org/?p=qemu.git;a=blob;f=target-i386/cpu.c;h=895a386d3b7a94e363ca1bb98821d3251e70c0e0;hb=HEAD#l694
>>
>>
>> So, the existing cpu_map.xml workaround for our testing situation will
>> no longer work.
>>
>> So, we have a number of open questions:
>>
>> * Have our cloud providers standardized enough that we might get away
>> without this custom cpu model? (Have some of them done it and only use
>> those for multinode?)
>> * Is there any way to get this feature back in libvirt to do the cpu
>> computation?
>> * Would we have to build a whole nova feature around setting libvirt xml
>>  to be able to test live migration in our clouds?
>> * Other options?
>> * Do we give up and go herd goats?
> 
> Rather than try to define our own custom CPU models, we can probably
> just use one of the standard CPU models and then explicitly tell
> libvirt which flags to turn off in order to get compatibility with
> our cloud environments.
> 
> This is not currently possible with Nova, since our nova.conf option
> only allow us to specify a bare CPU model. We would have to extend
> nova.conf to allow us to specify a list of CPU features to add or
> remove. Libvirt should then correctly pass these changes through
> to QEMU.

Yes, that's an option. Given that the libvirt team seemed to acknowledge
this as a regression, I'd rather not build a user exposed feature for
all of that just as a workaround for a libvirt regression.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gate] [nova] live migration, libvirt 1.3, and the gate

2016-05-31 Thread Sean Dague
On 05/30/2016 06:25 AM, Kashyap Chamarthy wrote:
> On Thu, May 26, 2016 at 10:55:47AM -0400, Sean Dague wrote:
>> On 05/26/2016 05:38 AM, Kashyap Chamarthy wrote:
>>> On Wed, May 25, 2016 at 05:42:04PM +0200, Kashyap Chamarthy wrote:
>>>
>>> [...]
>>>
 So, in short, the central issue seems to be this: the custom 'gate64'
 model is not being trasnalted by libvirt into a model that QEMU can
 recognize.
>>>
>>> An update:
>>>
>>> Upstream libvirt points out that this turns to be regression, and
>>> bisected it to commit (in libvirt Git): 1.2.9-31-g445a09b -- "qemu:
>>> Don't compare CPU against host for TCG".
>>>
>>> So, I expect there's going to be fix pretty soon upstream libvirt.
>>
>> Which is good... I wonder how long we'll be waiting for that back in our
>> distro packages though.
> 
> Yeah, until the fix lands, our current options seem to be:
> 
>   (a) Revert to a known good version of libvirt

Downgrading libvirt so dramatically isn't a thing we'll be able to do.

>   (b) Use nested virt (i.e. ) -- I doubt is possible
>   on RAX environment, which is using Xen, last I know.

We turned off nested virt even where it was enabled, because it locks up
at a non trivial rate. So not really an option.

>   (c) Or a different CPU model

Right, although it's not super clear what that will be.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gate] [nova] live migration, libvirt 1.3, and the gate

2016-05-31 Thread Daniel P. Berrange
On Tue, May 24, 2016 at 01:59:17PM -0400, Sean Dague wrote:
> The team working on live migration testing started with an experimental
> job on Ubuntu 16.04 to try to be using the latest and greatest libvirt +
> qemu under the assumption that a set of issues we were seeing are
> solved. The short answer is, it doesn't look like this is going to work.
> 
> We run tests on a bunch of different clouds. Those clouds expose
> different cpu flags to us. These are not standard things that map to
> "Haswell". It means live migration in the multinode cases can hit cpus
> with different flags. So we found the requirement was to come up with a
> least common denominator of cpu flags, which we call gate64, and push
> that into the libvirt cpu_map.xml in devstack, and set whenever we are
> in a multinode scenario.
> (https://github.com/openstack-dev/devstack/blob/master/tools/cpu_map_update.py)
>  Not ideal, but with libvirt 1.2.2 it works fine.
> 
> It turns out it works fine because libvirt *actually* seems to take the
> data from cpu_map.xml and do a translation to what it believes qemu will
> understand. On these systems apparently this turns into "-cpu
> Opteron_G1,-pse36"
> (http://logs.openstack.org/29/42529/24/check/gate-tempest-dsvm-multinode-full/5f504c5/logs/libvirt/qemu/instance-000b.txt.gz)
> 
> At some point between libvirt 1.2.2 and 1.3.1, this changed. Now libvirt
> seems to be passing our cpu_model directly to qemu, and assumes that as
> a user you will be responsible for writing all the  stanzas to
> add/remove yourself. When libvirt sends 'gate64' to qemu, this explodes,
> as qemu has no idea what we are talking about.
> http://logs.openstack.org/34/319934/2/experimental/gate-tempest-dsvm-multinode-live-migration/b87d689/logs/screen-n-cpu.txt.gz#_2016-05-24_15_59_12_531
> 
> Unlike libvirt, which has a text file (xml) that configures the cpus
> that could exist in the world, qemu builds this in statically at compile
> time:
> http://git.qemu.org/?p=qemu.git;a=blob;f=target-i386/cpu.c;h=895a386d3b7a94e363ca1bb98821d3251e70c0e0;hb=HEAD#l694
> 
> 
> So, the existing cpu_map.xml workaround for our testing situation will
> no longer work.
> 
> So, we have a number of open questions:
> 
> * Have our cloud providers standardized enough that we might get away
> without this custom cpu model? (Have some of them done it and only use
> those for multinode?)
> * Is there any way to get this feature back in libvirt to do the cpu
> computation?
> * Would we have to build a whole nova feature around setting libvirt xml
>  to be able to test live migration in our clouds?
> * Other options?
> * Do we give up and go herd goats?

Rather than try to define our own custom CPU models, we can probably
just use one of the standard CPU models and then explicitly tell
libvirt which flags to turn off in order to get compatibility with
our cloud environments.

This is not currently possible with Nova, since our nova.conf option
only allow us to specify a bare CPU model. We would have to extend
nova.conf to allow us to specify a list of CPU features to add or
remove. Libvirt should then correctly pass these changes through
to QEMU.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gate] [nova] live migration, libvirt 1.3, and the gate

2016-05-30 Thread Kashyap Chamarthy
On Thu, May 26, 2016 at 10:55:47AM -0400, Sean Dague wrote:
> On 05/26/2016 05:38 AM, Kashyap Chamarthy wrote:
> > On Wed, May 25, 2016 at 05:42:04PM +0200, Kashyap Chamarthy wrote:
> > 
> > [...]
> > 
> >> So, in short, the central issue seems to be this: the custom 'gate64'
> >> model is not being trasnalted by libvirt into a model that QEMU can
> >> recognize.
> > 
> > An update:
> > 
> > Upstream libvirt points out that this turns to be regression, and
> > bisected it to commit (in libvirt Git): 1.2.9-31-g445a09b -- "qemu:
> > Don't compare CPU against host for TCG".
> > 
> > So, I expect there's going to be fix pretty soon upstream libvirt.
> 
> Which is good... I wonder how long we'll be waiting for that back in our
> distro packages though.

Yeah, until the fix lands, our current options seem to be:

  (a) Revert to a known good version of libvirt
  
  (b) Use nested virt (i.e. ) -- I doubt is possible
  on RAX environment, which is using Xen, last I know.
  
  (c) Or a different CPU model


-- 
/kashyap

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gate] [nova] live migration, libvirt 1.3, and the gate

2016-05-26 Thread Sean Dague
On 05/24/2016 01:59 PM, Sean Dague wrote:

> So, we have a number of open questions:
> 
> * Have our cloud providers standardized enough that we might get away
> without this custom cpu model? (Have some of them done it and only use
> those for multinode?)

This is definitely not true on RAX. Experimenting with not doing the
gate64 cpu setting failed in one of the live migration jobs on RAX
because of cpu compat. Here is the cpu comparison between the master and
subnode (http://paste.openstack.org/show/505672/)

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gate] [nova] live migration, libvirt 1.3, and the gate

2016-05-26 Thread Sean Dague
On 05/26/2016 05:38 AM, Kashyap Chamarthy wrote:
> On Wed, May 25, 2016 at 05:42:04PM +0200, Kashyap Chamarthy wrote:
> 
> [...]
> 
>> So, in short, the central issue seems to be this: the custom 'gate64'
>> model is not being trasnalted by libvirt into a model that QEMU can
>> recognize.
> 
> An update:
> 
> Upstream libvirt points out that this turns to be regression, and
> bisected it to commit (in libvirt Git): 1.2.9-31-g445a09b -- "qemu:
> Don't compare CPU against host for TCG".
> 
> So, I expect there's going to be fix pretty soon upstream libvirt.

Which is good... I wonder how long we'll be waiting for that back in our
distro packages though.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gate] [nova] live migration, libvirt 1.3, and the gate

2016-05-26 Thread Kashyap Chamarthy
On Wed, May 25, 2016 at 05:42:04PM +0200, Kashyap Chamarthy wrote:

[...]

> So, in short, the central issue seems to be this: the custom 'gate64'
> model is not being trasnalted by libvirt into a model that QEMU can
> recognize.

An update:

Upstream libvirt points out that this turns to be regression, and
bisected it to commit (in libvirt Git): 1.2.9-31-g445a09b -- "qemu:
Don't compare CPU against host for TCG".

So, I expect there's going to be fix pretty soon upstream libvirt.

> I could reproduce it with upstream libvirt
> (libvirt-1.3.4-2.fc25.x86_64), and filed this bug:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1339680 -- libvirt CPU
> driver fails to translate a custom CPU model into something that
> QEMU recognizes

[...]

-- 
/kashyap

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gate] [nova] live migration, libvirt 1.3, and the gate

2016-05-25 Thread Kashyap Chamarthy
On Tue, May 24, 2016 at 01:59:17PM -0400, Sean Dague wrote:

Thanks for the summary, Sean.

[...]

> It turns out it works fine because libvirt *actually* seems to take the
> data from cpu_map.xml and do a translation to what it believes qemu will
> understand. On these systems apparently this turns into "-cpu
> Opteron_G1,-pse36"
> (http://logs.openstack.org/29/42529/24/check/gate-tempest-dsvm-multinode-full/5f504c5/logs/libvirt/qemu/instance-000b.txt.gz)
> 
> At some point between libvirt 1.2.2 and 1.3.1, this changed. Now libvirt
> seems to be passing our cpu_model directly to qemu, and assumes that as
> a user you will be responsible for writing all the  stanzas to
> add/remove yourself. When libvirt sends 'gate64' to qemu, this explodes,
> as qemu has no idea what we are talking about.
> http://logs.openstack.org/34/319934/2/experimental/gate-tempest-dsvm-multinode-live-migration/b87d689/logs/screen-n-cpu.txt.gz#_2016-05-24_15_59_12_531

[...]

So, in short, the central issue seems to be this: the custom 'gate64'
model is not being trasnalted by libvirt into a model that QEMU can
recognize.

I could reproduce it with upstream libvirt
(libvirt-1.3.4-2.fc25.x86_64), and filed this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=1339680 -- libvirt CPU
driver fails to translate a custom CPU model into something that
QEMU recognizes

Some discussion from libvirt migration developers (comment #3):

"So it looks like the whole code which computes the right CPU model
is skipped. The reason is . Our code avoids
comparing guest CPU definition to host CPU for TCG mode (since the
host CPU is irrelevant in this case). And as a side effect the code
that would translate the gate64 CPU model into something that is
supported by QEMU is skipped too."

> So, the existing cpu_map.xml workaround for our testing situation will
> no longer work.

[...]

-- 
/kashyap

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev