Re: [Openstack-operators] Fwd: Nova hypervisor uuid

2018-11-29 Thread Matt Riedemann

On 11/29/2018 10:27 AM, Ignazio Cassano wrote:

I did in the DB directly.
I am using queens now.
Any python client command to delete hold records or I must use api ?


You can use the CLI:

https://docs.openstack.org/python-novaclient/latest/cli/nova.html#nova-service-delete

https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/compute-service.html#compute-service-delete

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] Fwd: Nova hypervisor uuid

2018-11-29 Thread Matt Riedemann

On 11/29/2018 12:49 AM, Ignazio Cassano wrote:

Hello Mattm
Yes I mean sometimes I have same host/node names with different uuid in 
compute_nodes table in nova database
I must delete nodes with uuid those not match with nova-hypervisor list 
command.

At this time I have the following:
MariaDB [nova]> select hypervisor_hostname,uuid,deleted from compute_nodes;
+-+--+-+
| hypervisor_hostname | uuid | deleted |
+-+--+-+
| tst2-kvm02  | 802b21c2-11fb-4426-86b9-bf25c8a5ae1d |   0 |
| tst2-kvm01  | ce27803b-06cd-44a7-b927-1fa42c813b0f |   0 |
+-+--+-+
2 rows in set (0,00 sec)


But sometimes old uuid are inserted in the table .
I deleted again them.
I restarted kvm nodes and now the table is ok.
I also restarded each controller and the tables is ok.
I do not know because 3 days ago I had same compute nodes names with 
different uuids.


Thanks and Regards
Ignazio


OK I guess if it happens again, please get the 
host/hypervisor_hostname/uuid/deleted values from the compute_nodes 
table before you cleanup any entries.


Also, when you're deleting the resources from the DB, are you doing it 
in the DB directly or via the DELETE /os-services/{service_id} API? 
Because the latter cleans up other related resources to the nova-compute 
service (the services table record, the compute_nodes table record, the 
related resource_providers table record in placement, and the 
host_mappings table record in the nova API DB). The resource 
provider/host mappings cleanup when deleting a compute service is a more 
recent bug fix though which depending on your release you might not have:


https://review.openstack.org/#/q/I7b8622b178d5043ed1556d7bdceaf60f47e5ac80

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] Fwd: Nova hypervisor uuid

2018-11-28 Thread Matt Riedemann

On 11/28/2018 4:19 AM, Ignazio Cassano wrote:

Hi Matt, sorry but I lost your answer and Gianpiero forwarded it to me.
I am sure kvm nodes names are note changed.
Tables where uuid are duplicated are:
dataresource_providers in nova_api db
compute_nodes in nova db
Regards
Ignazio


It would be easier if you simply dumped the result of a select query on 
the compute_nodes table where the duplicate nodes exist (you said 
duplicate UUIDs but I think you mean duplicate host/node names with 
different UUIDs, correct?).


There is a unique constraint on host/hypervisor_hostname (nodename)/deleted:

schema.UniqueConstraint(
'host', 'hypervisor_hostname', 'deleted',
name="uniq_compute_nodes0host0hypervisor_hostname0deleted"),

So I'm wondering if the deleted field is not 0 on one of those because 
if one is marked as deleted, then the compute service will create a new 
compute_nodes table record on startup (and associated resource provider).


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] Nova hypervisor uuid

2018-11-27 Thread Matt Riedemann

On 11/27/2018 11:32 AM, Ignazio Cassano wrote:

Hi  All,
Please anyone know where hypervisor uuid is retrived?
Sometime updating kmv nodes with yum update it changes and in nova 
database 2 uuids are assigned to the same node.

regards
Ignazio




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



To be clear, do you mean the computes_nodes.uuid column value in the 
cell database? Which is also used for the GET /os-hypervisors response 
'id' value if using microversion >= 2.53. If so, that is generated 
randomly* when the compute_nodes table record is created:


https://github.com/openstack/nova/blob/8545ba2af7476e0884b5e7fb90965bef92d605bc/nova/compute/resource_tracker.py#L588

https://github.com/openstack/nova/blob/8545ba2af7476e0884b5e7fb90965bef92d605bc/nova/objects/compute_node.py#L312

When you hit this problem, are you sure the hostname on the compute host 
is not changing? Because when nova-compute starts up, it should look for 
the existing compute node record by host name and node name, which for 
the libvirt driver should be the same. That lookup code is here:


https://github.com/openstack/nova/blob/8545ba2af7476e0884b5e7fb90965bef92d605bc/nova/compute/resource_tracker.py#L815

So the only way nova-compute should create a new compute_nodes table 
record for the same host is if the host/node name changes during the 
upgrade. Is the deleted value in the database the same (0) for both of 
those records?


* The exception to this is for the ironic driver which re-uses the 
ironic node uuid as of this change: https://review.openstack.org/#/c/571535/


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova] about filter the flavor

2018-11-20 Thread Matt Riedemann

On 11/19/2018 9:32 PM, Rambo wrote:
       I have an idea.Now we can't filter the special flavor according 
to the property.Can we achieve it?If we achieved this,we can filter the 
flavor according the property's key and value to filter the flavor. What 
do you think of the idea?Can you tell me more about this ?Thank you very 
much.


To be clear, you want to filter flavors by extra spec key and/or value? 
So something like:


GET /flavors?key=hw%3Acpu_policy

would return all flavors with an extra spec with key "hw:cpu_policy".

And:

GET /flavors?key=hw%3Acpu_policy=dedicated

would return all flavors with extra spec "hw:cpu_policy" with value 
"dedicated".


The query parameter semantics are probably what gets messiest about 
this. Because I could see wanting to couple the key and value together, 
but I'm not sure how you do that, because I don't think you can do this:


GET /flavors?spec=hw%3Acpu_policy=dedicated

Maybe you'd do:

GET /flavors?hw%3Acpu_policy=dedicated

The problem with that is we wouldn't be able to perform any kind of 
request schema validation of it, especially since flavor extra specs are 
not standardized.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [Openstack-sigs] Dropping lazy translation support

2018-11-06 Thread Matt Riedemann

On 11/6/2018 5:24 PM, Rochelle Grober wrote:

Maybe the fastest way to get info would be to turn it off and see where the 
code barfs in a long run (to catch as many projects as possible)?


There is zero integration testing for lazy translation, so "turning it 
off and seeing what breaks" wouldn't result in anything breaking.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack-sigs] Dropping lazy translation support

2018-11-05 Thread Matt Riedemann

On 11/5/2018 1:36 PM, Doug Hellmann wrote:

I think the lazy stuff was all about the API responses. The log
translations worked a completely different way.


Yeah maybe. And if so, I came across this in one of the blueprints:

https://etherpad.openstack.org/p/disable-lazy-translation

Which says that because of a critical bug, the lazy translation was 
disabled in Havana to be fixed in Icehouse but I don't think that ever 
happened before IBM developers dropped it upstream, which is further 
justification for nuking this code from the various projects.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Dropping lazy translation support

2018-11-05 Thread Matt Riedemann
This is a follow up to a dev ML email [1] where I noticed that some 
implementations of the upgrade-checkers goal were failing because some 
projects still use the oslo_i18n.enable_lazy() hook for lazy log message 
translation (and maybe API responses?).


The very old blueprints related to this can be found here [2][3][4].

If memory serves me correctly from my time working at IBM on this, this 
was needed to:


1. Generate logs translated in other languages.

2. Return REST API responses if the "Accept-Language" header was used 
and a suitable translation existed for that language.


#1 is a dead horse since I think at least the Ocata summit when we 
agreed to no longer translate logs since no one used them.


#2 is probably something no one knows about. I can't find end-user 
documentation about it anywhere. It's not tested and therefore I have no 
idea if it actually works anymore.


I would like to (1) deprecate the oslo_i18n.enable_lazy() function so 
new projects don't use it and (2) start removing the enable_lazy() usage 
from existing projects like keystone, glance and cinder.


Are there any users, deployments or vendor distributions that still rely 
on this feature? If so, please speak up now.


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-November/136285.html

[2] https://blueprints.launchpad.net/oslo-incubator/+spec/i18n-messages
[3] https://blueprints.launchpad.net/nova/+spec/i18n-messages
[4] https://blueprints.launchpad.net/nova/+spec/user-locale-api

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] Is anyone running their own script to purge old instance_faults table entries?

2018-11-01 Thread Matt Riedemann
I came across this bug [1] in triage today and I thought this was fixed 
already [2] but either something regressed or there is more to do here.


I'm mostly just wondering, are operators already running any kind of 
script which purges old instance_faults table records before an instance 
is deleted and archived/purged? Because if so, that might be something 
we want to add as a nova-manage command.


[1] https://bugs.launchpad.net/nova/+bug/1800755
[2] https://review.openstack.org/#/c/409943/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Removing the CachingScheduler

2018-10-24 Thread Matt Riedemann

On 10/18/2018 5:07 PM, Matt Riedemann wrote:

It's been deprecated since Pike, and the time has come to remove it [1].

mgagne has been the most vocal CachingScheduler operator I know and he 
has tested out the "nova-manage placement heal_allocations" CLI, added 
in Rocky, and said it will work for migrating his deployment from the 
CachingScheduler to the FilterScheduler + Placement.


If you are using the CachingScheduler and have a problem with its 
removal, now is the time to speak up or forever hold your peace.


[1] https://review.openstack.org/#/c/611723/1


This is your last chance to speak up if you are using the 
CachingScheduler and object to it being removed from nova in Stein. I 
have removed the -W pin from the review since a series of feature work 
is now stacked on top of it.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] Removing the CachingScheduler

2018-10-18 Thread Matt Riedemann

It's been deprecated since Pike, and the time has come to remove it [1].

mgagne has been the most vocal CachingScheduler operator I know and he 
has tested out the "nova-manage placement heal_allocations" CLI, added 
in Rocky, and said it will work for migrating his deployment from the 
CachingScheduler to the FilterScheduler + Placement.


If you are using the CachingScheduler and have a problem with its 
removal, now is the time to speak up or forever hold your peace.


[1] https://review.openstack.org/#/c/611723/1

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [horizon][nova][cinder][keystone][glance][neutron][swift] Horizon feature gaps

2018-10-17 Thread Matt Riedemann

On 10/17/2018 9:24 AM, Ivan Kolodyazhny wrote:


As you may know, unfortunately, Horizon doesn't support all features 
provided by APIs. That's why we created feature gaps list [1].


I'd got a lot of great conversations with projects teams during the PTG 
and we tried to figure out what should be done prioritize these tasks. 
It's really helpful for Horizon to get feedback from other teams to 
understand what features should be implemented next.


While I'm filling launchpad with new bugs and blueprints for [1], it 
would be good to review this list again and find some volunteers to 
decrease feature gaps.


[1] https://etherpad.openstack.org/p/horizon-feature-gap

Thanks everybody for any of your contributions to Horizon.


+openstack-sigs
+openstack-operators

I've left some notes for nova. This looks very similar to the compute 
API OSC gap analysis I did [1]. Unfortunately it's hard to prioritize 
what to really work on without some user/operator feedback - maybe we 
can get the user work group involved in trying to help prioritize what 
people really want that is missing from horizon, at least for compute?


[1] https://etherpad.openstack.org/p/compute-api-microversion-gap-in-osc

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] nova_api resource_providers table issues on ocata

2018-10-17 Thread Matt Riedemann

On 10/17/2018 9:13 AM, Ignazio Cassano wrote:

Hello Sylvain, here the output of some selects:
MariaDB [nova]> select host,hypervisor_hostname from compute_nodes;
+--+-+
| host | hypervisor_hostname |
+--+-+
| podto1-kvm01 | podto1-kvm01    |
| podto1-kvm02 | podto1-kvm02    |
| podto1-kvm03 | podto1-kvm03    |
| podto1-kvm04 | podto1-kvm04    |
| podto1-kvm05 | podto1-kvm05    |
+--+-+

MariaDB [nova]> select host from compute_nodes where host='podto1-kvm01' 
and hypervisor_hostname='podto1-kvm01';

+--+
| host |
+--+
| podto1-kvm01 |
+--+


Does your upgrade tooling run a db archive/purge at all? It's possible 
that the actual services table record was deleted via the os-services 
REST API for some reason, which would delete the compute_nodes table 
record, and then a restart of the nova-compute process would recreate 
the services and compute_nodes table records, but with a new compute 
node uuid and thus a new resource provider.


Maybe query your shadow_services and shadow_compute_nodes tables for 
"podto1-kvm01" and see if a record existed at one point, was deleted and 
then archived to the shadow tables.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [goals][upgrade-checkers] Week R-26 Update

2018-10-12 Thread Matt Riedemann
The big update this week is version 0.1.0 of oslo.upgradecheck was 
released. The documentation along with usage examples can be found here 
[1]. A big thanks to Ben Nemec for getting that done since a few 
projects were waiting for it.


In other updates, some changes were proposed in other projects [2].

And finally, Lance Bragstad and I had a discussion this week [3] about 
the validity of upgrade checks looking for deleted configuration 
options. The main scenario I'm thinking about here is FFU where someone 
is going from Mitaka to Pike. Let's say a config option was deprecated 
in Newton and then removed in Ocata. As the operator is rolling through 
from Mitaka to Pike, they might have missed the deprecation signal in 
Newton and removal in Ocata. Does that mean we should have upgrade 
checks that look at the configuration for deleted options, or options 
where the deprecated alias is removed? My thought is that if things will 
not work once they get to the target release and restart the service 
code, which would definitely impact the upgrade, then checking for those 
scenarios is probably OK. If on the other hand the removed options were 
just tied to functionality that was removed and are otherwise not 
causing any harm then I don't think we need a check for that. It was 
noted that oslo.config has a new validation tool [4] so that would take 
care of some of this same work if run during upgrades. So I think 
whether or not an upgrade check should be looking for config option 
removal ultimately depends on the severity of what happens if the manual 
intervention to handle that removed option is not performed. That's 
pretty broad, but these upgrade checks aren't really set in stone for 
what is applied to them. I'd like to get input from others on this, 
especially operators and if they would find these types of checks useful.


[1] https://docs.openstack.org/oslo.upgradecheck/latest/
[2] https://storyboard.openstack.org/#!/story/2003657
[3] 
http://eavesdrop.openstack.org/irclogs/%23openstack-dev/%23openstack-dev.2018-10-10.log.html#t2018-10-10T15:17:17
[4] 
http://lists.openstack.org/pipermail/openstack-dev/2018-October/135688.html


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-10 Thread Matt Riedemann

On 10/10/2018 7:46 AM, Jay Pipes wrote:

2) in the old microversions change the blind allocation copy to gather
every resource from a nested source RPs too and try to allocate that
from the destination root RP. In nested allocation cases putting this
allocation to placement will fail and nova will fail the migration /
evacuation. However it will succeed if the server does not need nested
allocation neither on the source nor on the destination host (a.k.a the
legacy case). Or if the server has nested allocation on the source host
but does not need nested allocation on the destination host (for
example the dest host does not have nested RP tree yet).


I disagree on this. I'd rather just do a simple check for >1 provider in 
the allocations on the source and if True, fail hard.


The reverse (going from a non-nested source to a nested destination) 
will hard fail anyway on the destination because the POST /allocations 
won't work due to capacity exceeded (or failure to have any inventory at 
all for certain resource classes on the destination's root compute node).


I agree with Jay here. If we know the source has allocations on >1 
provider, just fail fast, why even walk the tree and try to claim those 
against the destination - the nested providers aren't going to be the 
same UUIDs on the destination, *and* trying to squash all of the source 
nested allocations into the single destination root provider and hope it 
works is super hacky and I don't think we should attempt that. Just fail 
if being forced and nested allocations exist on the source.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-10 Thread Matt Riedemann

On 10/9/2018 10:08 AM, Balázs Gibizer wrote:

Question for you as well: if we remove (or change) the force flag in a
new microversion then how should the old microversions behave when
nested allocations would be required?


Fail fast if we can detect we have nested. We don't support forcing 
those types of servers.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter

2018-09-27 Thread Matt Riedemann

On 9/27/2018 3:02 PM, Jay Pipes wrote:
A great example of this would be the proposed "deploy template" from 
[2]. This is nothing more than abusing the placement traits API in order 
to allow passthrough of instance configuration data from the nova flavor 
extra spec directly into the nodes.instance_info field in the Ironic 
database. It's a hack that is abusing the entire concept of the 
placement traits concept, IMHO.


We should have a way *in Nova* of allowing instance configuration 
key/value information to be passed through to the virt driver's spawn() 
method, much the same way we provide for user_data that gets exposed 
after boot to the guest instance via configdrive or the metadata service 
API. What this deploy template thing is is just a hack to get around the 
fact that nova doesn't have a basic way of passing through some collated 
instance configuration key/value information, which is a darn shame and 
I'm really kind of annoyed with myself for not noticing this sooner. :(


We talked about this in Dublin through right? We said a good thing to do 
would be to have some kind of template/profile/config/whatever stored 
off in glare where schema could be registered on that thing, and then 
you pass a handle (ID reference) to that to nova when creating the 
(baremetal) server, nova pulls it down from glare and hands it off to 
the virt driver. It's just that no one is doing that work.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [goals][tc][ptl][uc] starting goal selection for T series

2018-09-26 Thread Matt Riedemann
s before OSC was created (nova/cinder/glance/keystone). For 
newer projects, like placement, it's not a problem because they never 
created any other CLI outside of OSC.


[1] https://etherpad.openstack.org/p/compute-api-microversion-gap-in-osc
[2] https://etherpad.openstack.org/p/nova-ptg-stein (~L721)

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter

2018-09-25 Thread Matt Riedemann

On 9/25/2018 8:36 AM, John Garbutt wrote:

Another thing is about existing flavors configured for these
capabilities-scoped specs. Are you saying during the deprecation we'd
continue to use those even if the filter is disabled? In the review I
had suggested that we add a pre-upgrade check which inspects the
flavors
and if any of these are found, we report a warning meaning those
flavors
need to be updated to use traits rather than capabilities. Would
that be
reasonable?


I like the idea of a warning, but there are features that have not yet 
moved to traits:

https://specs.openstack.org/openstack/ironic-specs/specs/juno-implemented/uefi-boot-for-ironic.html

There is a more general plan that will help, but its not quite ready yet:
https://review.openstack.org/#/c/504952/

As such, I think we can't get pull the plug on flavors including 
capabilities and passing them to Ironic, but (after a cycle of 
deprecation) I think we can now stop pushing capabilities from Ironic 
into Nova and using them for placement.


Forgive my ignorance, but if traits are not on par with capabilities, 
why are we deprecating the capabilities filter?


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] Forum Topic Submission Period

2018-09-20 Thread Matt Riedemann

On 9/20/2018 10:23 AM, Jimmy McArthur wrote:
This is basically the CFP equivalent: 
https://www.openstack.org/summit/berlin-2018/vote-for-speakers  Voting 
isn't necessary, of course, but it should allow you to see submissions 
as they roll in.


Does this work for your purposes?


Yup, that should do it, thanks!

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter

2018-09-20 Thread Matt Riedemann

On 9/20/2018 4:16 AM, John Garbutt wrote:
Following on from the PTG discussions, I wanted to bring everyone's 
attention to Nova's plans to deprecate ComputeCapabilitiesFilter, 
including most of the the integration with Ironic Capabilities.


To be specific, this is my proposal in code form:
https://review.openstack.org/#/c/603102/

Once the code we propose to deprecate is removed we will stop using 
capabilities pushed up from Ironic for 'scheduling', but we would still 
pass capabilities request in the flavor down to Ironic (until we get 
some standard traits and/or deploy templates sorted for things like UEFI).


Functionally, we believe all use cases can be replaced by using the 
simpler placement traits (this is more efficient than post placement 
filtering done using capabilities):

https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/ironic-driver-traits.html

Please note the recent addition of forbidden traits that helps improve 
the usefulness of the above approach:

https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/placement-forbidden-traits.html

For example, a flavor request for GPUs >= 2 could be replaced by a 
custom trait trait that reports if a given Ironic node has 
CUSTOM_MORE_THAN_2_GPUS. That is a bad example (longer term we don't 
want to use traits for this, but that is a discussion for another day) 
but it is the example that keeps being raised in discussions on this topic.


The main reason for reaching out in this email is to ask if anyone has 
needs that the ResourceClass and Traits scheme does not currently 
address, or can think of a problem with a transition to the newer approach.


I left a few comments in the change, but I'm assuming as part of the 
deprecation we'd remove the filter from the default enabled_filters list 
so new installs don't automatically get warnings during scheduling?


Another thing is about existing flavors configured for these 
capabilities-scoped specs. Are you saying during the deprecation we'd 
continue to use those even if the filter is disabled? In the review I 
had suggested that we add a pre-upgrade check which inspects the flavors 
and if any of these are found, we report a warning meaning those flavors 
need to be updated to use traits rather than capabilities. Would that be 
reasonable?


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] Forum Topic Submission Period

2018-09-20 Thread Matt Riedemann

On 9/17/2018 11:13 AM, Jimmy McArthur wrote:
The Forum Topic Submission session started September 12 and will run 
through September 26th.  Now is the time to wrangle the topics you 
gathered during your Brainstorming Phase and start pushing forum topics 
through. Don't rely only on a PTL to make the agenda... step on up and 
place the items you consider important front and center.


As you may have noticed on the Forum Wiki 
(https://wiki.openstack.org/wiki/Forum), we're reusing the normal CFP 
tool this year. We did our best to remove Summit specific language, but 
if you notice something, just know that you are submitting to the 
Forum.  URL is here:


https://www.openstack.org/summit/berlin-2018/call-for-presentations

Looking forward to seeing everyone's submissions!

If you have questions or concerns about the process, please don't 
hesitate to reach out.


Another question. In the before times, when we just had that simple form 
to submit forum sessions and then the TC/UC/Foundation reviewed the list 
and picked the sessions, it was very simple to see what other sessions 
were proposed and say, "oh good someone is covering this already, I 
don't need to worry about it". With the move to the CFP forms like the 
summit sessions, that is no longer available, as far as I know. There 
have been at least a few cases this week where someone has said, "this 
might be a good topic, but keystone is probably already covering it, or 
$FOO SIG is probably already covering it", but without herding the cats 
to ask and find out who is all doing what, it's hard to know.


Is there some way we can get back to having a public view of what has 
been proposed for the forum so we an avoid overlap, or at worst not 
proposing something because people assume someone else is going to cover it?


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Are we ready to put stable/ocata into extended maintenance mode?

2018-09-18 Thread Matt Riedemann
The release page says Ocata is planned to go into extended maintenance 
mode on Aug 27 [1]. There really isn't much to this except it means we 
don't do releases for Ocata anymore [2]. There is a caveat that project 
teams that do not wish to maintain stable/ocata after this point can 
immediately end of life the branch for their project [3]. We can still 
run CI using tags, e.g. if keystone goes ocata-eol, devstack on 
stable/ocata can still continue to install from stable/ocata for nova 
and the ocata-eol tag for keystone. Having said that, if there is no 
undue burden on the project team keeping the lights on for stable/ocata, 
I would recommend not tagging the stable/ocata branch end of life at 
this point.


So, questions that need answering are:

1. Should we cut a final release for projects with stable/ocata branches 
before going into extended maintenance mode? I tend to think "yes" to 
flush the queue of backports. In fact, [3] doesn't mention it, but the 
resolution said we'd tag the branch [4] to indicate it has entered the 
EM phase.


2. Are there any projects that would want to skip EM and go directly to 
EOL (yes this feels like a Monopoly question)?


[1] https://releases.openstack.org/
[2] 
https://docs.openstack.org/project-team-guide/stable-branches.html#maintenance-phases
[3] 
https://docs.openstack.org/project-team-guide/stable-branches.html#extended-maintenance
[4] 
https://governance.openstack.org/tc/resolutions/20180301-stable-branch-eol.html#end-of-life


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova][publiccloud-wg] Proposal to shelve on stop/suspend

2018-09-14 Thread Matt Riedemann
tl;dr: I'm proposing a new parameter to the server stop (and suspend?) 
APIs to control if nova shelve offloads the server.


Long form: This came up during the public cloud WG session this week 
based on a couple of feature requests [1][2]. When a user stops/suspends 
a server, the hypervisor frees up resources on the host but nova 
continues to track those resources as being used on the host so the 
scheduler can't put more servers there. What operators would like to do 
is that when a user stops a server, nova actually shelve offloads the 
server from the host so they can schedule new servers on that host. On 
start/resume of the server, nova would find a new host for the server. 
This also came up in Vancouver where operators would like to free up 
limited expensive resources like GPUs when the server is stopped. This 
is also the behavior in AWS.


The problem with shelve is that it's great for operators but users just 
don't use it, maybe because they don't know what it is and stop works 
just fine. So how do you get users to opt into shelving their server?


I've proposed a high-level blueprint [3] where we'd add a new 
(microversioned) parameter to the stop API with three options:


* auto
* offload
* retain

Naming is obviously up for debate. The point is we would default to auto 
and if auto is used, the API checks a config option to determine the 
behavior - offload or retain. By default we would retain for backward 
compatibility. For users that don't care, they get auto and it's fine. 
For users that do care, they either (1) don't opt into the microversion 
or (2) specify the specific behavior they want. I don't think we need to 
expose what the cloud's configuration for auto is because again, if you 
don't care then it doesn't matter and if you do care, you can opt out of 
this.


"How do we get users to use the new microversion?" I'm glad you asked.

Well, nova CLI defaults to using the latest available microversion 
negotiated between the client and the server, so by default, anyone 
using "nova stop" would get the 'auto' behavior (assuming the client and 
server are new enough to support it). Long-term, openstack client plans 
on doing the same version negotiation.


As for the server status changes, if the server is stopped and shelved, 
the status would be 'SHELVED_OFFLOADED' rather than 'SHUTDOWN'. I 
believe this is fine especially if a user is not being specific and 
doesn't care about the actual backend behavior. On start, the API would 
allow starting (unshelving) shelved offloaded (rather than just stopped) 
instances. Trying to hide shelved servers as stopped in the API would be 
overly complex IMO so I don't want to try and mask that.


It is possible that a user that stopped and shelved their server could 
hit a NoValidHost when starting (unshelving) the server, but that really 
shouldn't happen in a cloud that's configuring nova to shelve by default 
because if they are doing this, their SLA needs to reflect they have the 
capacity to unshelve the server. If you can't honor that SLA, don't 
shelve by default.


So, what are the general feelings on this before I go off and start 
writing up a spec?


[1] https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1791681
[2] https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1791679
[3] https://blueprints.launchpad.net/nova/+spec/shelve-on-stop

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova] Hard fail if you try to rename an AZ with instances in it?

2018-09-14 Thread Matt Riedemann

On 3/28/2018 4:35 PM, Jay Pipes wrote:

On 03/28/2018 03:35 PM, Matt Riedemann wrote:

On 3/27/2018 10:37 AM, Jay Pipes wrote:


If we want to actually fix the issue once and for all, we need to 
make availability zones a real thing that has a permanent identifier 
(UUID) and store that permanent identifier in the instance (not the 
instance metadata).


Or we can continue to paper over major architectural weaknesses like 
this.


Stepping back a second from the rest of this thread, what if we do the 
hard fail bug fix thing, which could be backported to stable branches, 
and then we have the option of completely re-doing this with aggregate 
UUIDs as the key rather than the aggregate name? Because I think the 
former could get done in Rocky, but the latter probably not.


I'm fine with that (and was fine with it before, just stating that 
solving the problem long-term requires different thinking)


Best,
-jay


Just FYI for anyone that cared about this thread, we agreed at the Stein 
PTG to resolve the immediate bug [1] by blocking AZ renames while the AZ 
has instances in it. There won't be a microversion for that change and 
we'll be able to backport it (with a release note I suppose).


[1] https://bugs.launchpad.net/nova/+bug/1782539

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [Openstack-sigs] Open letter/request to TC candidates (and existing elected officials)

2018-09-12 Thread Matt Riedemann

On 9/12/2018 5:32 PM, Melvin Hillsman wrote:
We basically spent the day focusing on two things specific to what you 
bring up and are in agreement with you regarding action not just talk 
around feedback and outreach. [1]
We wiped the agenda clean, discussed our availability (set reasonable 
expectations), and revisited how we can be more diligent and successful 
around these two principles which target your first comment, "...get 
their RFE/bug list ranked from the operator community (because some of 
the requests are not exclusive to public cloud), and then put pressure 
on the TC to help project manage the delivery of the top issue..."


I will not get into much detail because again this response is specific 
to a portion of your email so in keeping with feedback and outreach the 
UC is making it a point to be intentional. We have already got action 
items [2] which target the concern you raise. We have agreed to hold 
each other accountable and adjusted our meeting structure to facilitate 
being successful.


Not that the UC (elected members) are the only ones who can do this but 
we believe it is our responsibility to; regardless of what anyone else 
does. The UC is also expected to enlist others and hopefully through our 
efforts others are encouraged participate and enlist others.


[1] https://etherpad.openstack.org/p/uc-stein-ptg
[2] https://etherpad.openstack.org/p/UC-Election-Qualifications


Awesome, thank you Melvin and others on the UC.

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [Openstack-sigs] Open letter/request to TC candidates (and existing elected officials)

2018-09-12 Thread Matt Riedemann

On 9/12/2018 5:13 PM, Jeremy Stanley wrote:

Sure, and I'm saying that instead I think the influence of TC
members_can_  be more valuable in finding and helping additional
people to do these things rather than doing it all themselves, and
it's not just about the limited number of available hours in the day
for one person versus many. The successes goal champions experience,
the connections they make and the elevated reputation they gain
throughout the community during the process of these efforts builds
new leaders for us all.


Again, I'm not saying TC members should be doing all of the work 
themselves. That's not realistic, especially when critical parts of any 
major effort are going to involve developers from projects on which none 
of the TC members are active contributors (e.g. nova). I want to see TC 
members herd cats, for lack of a better analogy, and help out 
technically (with code) where possible.


Given the repeated mention of how the "help wanted" list continues to 
not draw in contributors, I think the recruiting role of the TC should 
take a back seat to actually stepping in and helping work on those items 
directly. For example, Sean McGinnis is taking an active role in the 
operators guide and other related docs that continue to be discussed at 
every face to face event since those docs were dropped from 
openstack-manuals (in Pike).


I think it's fair to say that the people generally elected to the TC are 
those most visible in the community (it's a popularity contest) and 
those people are generally the most visible because they have the luxury 
of working upstream the majority of their time. As such, it's their duty 
to oversee and spend time working on the hard cross-project technical 
deliverables that operators and users are asking for, rather than think 
of an infinite number of ways to try and draw *others* to help work on 
those gaps. As I think it's the role of a PTL within a given project to 
have a finger on the pulse of the technical priorities of that project 
and manage the developers involved (of which the PTL certainly may be 
one), it's the role of the TC to do the same across openstack as a 
whole. If a PTL doesn't have the time or willingness to do that within 
their project, they shouldn't be the PTL. The same goes for TC members IMO.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack-sigs] [openstack-dev] Open letter/request to TC candidates (and existing elected officials)

2018-09-12 Thread Matt Riedemann

On 9/12/2018 4:14 PM, Jeremy Stanley wrote:

I think Doug's work leading the Python 3 First effort is a great
example. He has helped find and enable several other goal champions
to collaborate on this. I appreciate the variety of other things
Doug already does with his available time and would rather he not
stop doing those things to spend all his time acting as a project
manager.


I specifically called out what Doug is doing as an example of things I 
want to see the TC doing. I want more/all TC members doing that.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack-sigs] Open letter/request to TC candidates (and existing elected officials)

2018-09-12 Thread Matt Riedemann

On 9/12/2018 3:55 PM, Jeremy Stanley wrote:

I almost agree with you. I think the OpenStack TC members should be
actively engaged in recruiting and enabling interested people in the
community to do those things, but I don't think such work should be
solely the domain of the TC and would hate to give the impression
that you must be on the TC to have such an impact.


See my reply to Thierry. This isn't what I'm saying. But I expect the 
elected TC members to be *much* more *directly* involved in managing and 
driving hard cross-project technical deliverables.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Open letter/request to TC candidates (and existing elected officials)

2018-09-12 Thread Matt Riedemann
Rather than take a tangent on Kristi's candidacy thread [1], I'll bring 
this up separately.


Kristi said:

"Ultimately, this list isn’t exclusive and I’d love to hear your and 
other people's opinions about what you think the I should focus on."


Well since you asked...

Some feedback I gave to the public cloud work group yesterday was to get 
their RFE/bug list ranked from the operator community (because some of 
the requests are not exclusive to public cloud), and then put pressure 
on the TC to help project manage the delivery of the top issue. I would 
like all of the SIGs to do this. The upgrades SIG should rank and 
socialize their #1 issue that needs attention from the developer 
community - maybe that's better upgrade CI testing for deployment 
projects, maybe it's getting the pre-upgrade checks goal done for Stein. 
The UC should also be doing this; maybe that's the UC saying, "we need 
help on closing feature gaps in openstack client and/or the SDK". I 
don't want SIGs to bombard the developers with *all* of their 
requirements, but I want to get past *talking* about the *same* issues 
*every* time we get together. I want each group to say, "this is our top 
issue and we want developers to focus on it." For example, the extended 
maintenance resolution [2] was purely birthed from frustration about 
talking about LTS and stable branch EOL every time we get together. It's 
also the responsibility of the operator and user communities to weigh in 
on proposed release goals, but the TC should be actively trying to get 
feedback from those communities about proposed goals, because I bet 
operators and users don't care about mox removal [3].


I want to see the TC be more of a cross-project project management 
group, like a group of Ildikos and what she did between nova and cinder 
to get volume multi-attach done, which took persistent supervision to 
herd the cats and get it delivered. Lance is already trying to do this 
with unified limits. Doug is doing this with the python3 goal. I want my 
elected TC members to be pushing tangible technical deliverables forward.


I don't find any value in the TC debating ad nauseam about visions and 
constellations and "what is openstack?". Scope will change over time 
depending on who is contributing to openstack, we should just accept 
this. And we need to realize that if we are failing to deliver value to 
operators and users, they aren't going to use openstack and then "what 
is openstack?" won't matter because no one will care.


So I encourage all elected TC members to work directly with the various 
SIGs to figure out their top issue and then work on managing those 
deliverables across the community because the TC is particularly well 
suited to do so given the elected position. I realize political and 
bureaucratic "how should openstack deal with x?" things will come up, 
but those should not be the priority of the TC. So instead of 
philosophizing about things like, "should all compute agents be in a 
single service with a REST API" for hours and hours, every few months - 
immediately ask, "would doing that get us any closer to achieving top 
technical priority x?" Because if not, or it's so fuzzy in scope that no 
one sees the way forward, document a decision and then drop it.


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-September/134490.html
[2] 
https://governance.openstack.org/tc/resolutions/20180301-stable-branch-eol.html

[3] https://governance.openstack.org/tc/goals/rocky/mox_removal.html

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [upgrade] request for pre-upgrade check for db purge

2018-09-11 Thread Matt Riedemann

On 9/11/2018 9:01 AM, Dan Smith wrote:

I dunno, adding something to nova.conf that is only used for nova-status
like that seems kinda weird to me. It's just a warning/informational
sort of thing so it just doesn't seem worth the complication to me.


It doesn't seem complicated to me, I'm not sure why the config is weird, 
but maybe just because it's config-driven CLI behavior?




Moving it to an age thing set at one year seems okay, and better than
making the absolute limit more configurable.

Any reason why this wouldn't just be a command line flag to status if
people want it to behave in a specific way from a specific tool?


I always think of the pre-upgrade checks as release-specific and we 
could drop the old ones at some point, so that's why I wasn't thinking 
about adding check-specific options to the command - but since we also 
say it's OK to run "nova-status upgrade check" to verify a green 
install, it's probably good to leave the old checks in place, i.e. 
you're likely always going to want those cells v2 and placement checks 
we added in ocata even long after ocata EOL.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [upgrade] request for pre-upgrade check for db purge

2018-09-10 Thread Matt Riedemann
I created a nova bug [1] to track a request that came up in the upgrades 
SIG room at the PTG today [2] and would like to see if there is any 
feedback from other operators/developers that weren't part of the 
discussion.


The basic problem is that failing to archive/purge deleted records* from 
the database can make upgrades much slower during schema migrations. 
Anecdotes from the room mentioned that it can be literally impossible to 
complete upgrades for keystone and heat in certain scenarios if you 
don't purge the database first.


The request was that a configurable limit gets added to each service 
which is checked as part of the service's pre-upgrade check routine [3] 
and warn if the number of records to purge is over that limit.


For example, the nova-status upgrade check could warn if there are over 
10 deleted records total across all cells databases. Maybe cinder 
would have something similar for deleted volumes. Keystone could have 
something for revoked tokens.


Another idea in the room was flagging on records over a certain age 
limit. For example, if there are deleted instances in nova that were 
deleted >1 year ago.


How do people feel about this? It seems pretty straight-forward to me. 
If people are generally in favor of this, then the question is what 
would be sane defaults - or should we not assume a default and force 
operators to opt into this?


* nova delete doesn't actually delete the record from the instances 
table, it flips a value to hide it - you have to archive/purge those 
records to get them out of the main table.


[1] https://bugs.launchpad.net/nova/+bug/1791824
[2] https://etherpad.openstack.org/p/upgrade-sig-ptg-stein
[3] https://governance.openstack.org/tc/goals/stein/upgrade-checkers.html

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] leaving Openstack mailing lists

2018-09-07 Thread Matt Riedemann

On 9/6/2018 6:42 AM, Saverio Proto wrote:

Hello,

I will be leaving this mailing list in a few days.

I am going to a new job and I will not be involved with Openstack at
least in the short term future.
Still, it was great working with the Openstack community in the past few years.

If you need to reach me about any bug/patch/review that I submitted in
the past, just write directly to my email. I will try to give answers.

Cheers

Saverio


Good luck on the new thing. From a developer perspective, I appreciated 
you putting the screws to us from time to time, since it helps re-align 
priorities.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova][placement][upgrade][qa] Some upgrade-specific news on extraction

2018-09-06 Thread Matt Riedemann
I wanted to recap some upgrade-specific stuff from today outside of the 
other [1] technical extraction thread.


Chris has a change up for review [2] which prompted the discussion.

That change makes placement only work with placement.conf, not 
nova.conf, but does get a passing tempest run in the devstack patch [3].


The main issue here is upgrades. If you think of this like deprecating 
config options, the old config options continue to work for a release 
and then are dropped after a full release (or 3 months across boundaries 
for CDers) [4]. Given that, Chris's patch would break the standard 
deprecation policy. Clearly one simple way outside of code to make that 
work is just copy and rename nova.conf to placement.conf and voila. But 
that depends on *all* deployment/config tooling to get that right out of 
the gate.


The other obvious thing is the database. The placement repo code as-is 
today still has the check for whether or not it should use the placement 
database but falls back to using the nova_api database [5]. So 
technically you could point the extracted placement at the same nova_api 
database and it should work. However, at some point deployers will 
clearly need to copy the placement-related tables out of the nova_api DB 
to a new placement DB and make sure the 'migrate_version' table is 
dropped so that placement DB schema versions can reset to 1.


With respect to grenade and making this work in our own upgrade CI 
testing, we have I think two options (which might not be mutually 
exclusive):


1. Make placement support using nova.conf if placement.conf isn't found 
for Stein with lots of big warnings that it's going away in T. Then 
Rocky nova.conf with the nova_api database configuration just continues 
to work for placement in Stein. I don't think we then have any grenade 
changes to make, at least in Stein for upgrading *from* Rocky. Assuming 
fresh devstack installs in Stein use placement.conf and a 
placement-specific database, then upgrades from Stein to T should also 
be OK with respect to grenade, but likely punts the cut-over issue for 
all other deployment projects (because we don't CI with grenade doing 
Rocky->Stein->T, or FFU in other words).


2. If placement doesn't support nova.conf in Stein, then grenade will 
require an (exceptional) [6] from-rocky upgrade script which will (a) 
write out placement.conf fresh and (b) run a DB migration script, likely 
housed in the placement repo, to create the placement database and copy 
the placement-specific tables out of the nova_api database. Any script 
like this is likely needed regardless of what we do in grenade because 
deployers will need to eventually do this once placement would drop 
support for using nova.conf (if we went with option 1).


That's my attempt at a summary. It's going to be very important that 
operators and deployment project contributors weigh in here if they have 
strong preferences either way, and note that we can likely do both 
options above - grenade could do the fresh cutover from rocky to stein 
but we allow running with nova.conf and nova_api DB in placement in 
stein with plans to drop that support in T.


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-September/subject.html#134184

[2] https://review.openstack.org/#/c/600157/
[3] https://review.openstack.org/#/c/600162/
[4] 
https://governance.openstack.org/tc/reference/tags/assert_follows-standard-deprecation.html#requirements
[5] 
https://github.com/openstack/placement/blob/fb7c1909/placement/db_api.py#L27

[6] https://docs.openstack.org/grenade/latest/readme.html#theory-of-upgrade

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] OpenStack Summit Forum in Berlin: Topic Selection Process

2018-09-06 Thread Matt Riedemann

On 9/6/2018 2:56 PM, Jeremy Stanley wrote:

On 2018-09-06 14:31:01 -0500 (-0500), Matt Riedemann wrote:

On 8/29/2018 1:08 PM, Jim Rollenhagen wrote:

On Wed, Aug 29, 2018 at 12:51 PM, Jimmy McArthur mailto:ji...@openstack.org>> wrote:


 Examples of typical sessions that make for a great Forum:

 Strategic, whole-of-community discussions, to think about the big
 picture, including beyond just one release cycle and new technologies

 e.g. OpenStack One Platform for containers/VMs/Bare Metal (Strategic
 session) the entire community congregates to share opinions on how
 to make OpenStack achieve its integration engine goal


Just to clarify some speculation going on in IRC: this is an example,
right? Not a new thing being announced?

// jim

FYI for those that didn't see this on the other ML:

http://lists.openstack.org/pipermail/foundation/2018-August/002617.html

[...]

While I agree that's a great post to point out to all corners of the
community, I don't see what it has to do with whether "OpenStack One
Platform for containers/VMs/Bare Metal" was an example forum topic.


Because if I'm not mistaken it was the impetus for the hullabaloo in the 
tc channel that was related to the foundation ML post.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] OpenStack Summit Forum in Berlin: Topic Selection Process

2018-09-06 Thread Matt Riedemann

On 8/29/2018 1:08 PM, Jim Rollenhagen wrote:
On Wed, Aug 29, 2018 at 12:51 PM, Jimmy McArthur <mailto:ji...@openstack.org>> wrote:



Examples of typical sessions that make for a great Forum:

Strategic, whole-of-community discussions, to think about the big
picture, including beyond just one release cycle and new technologies

e.g. OpenStack One Platform for containers/VMs/Bare Metal (Strategic
session) the entire community congregates to share opinions on how
to make OpenStack achieve its integration engine goal


Just to clarify some speculation going on in IRC: this is an example, 
right? Not a new thing being announced?


// jim


FYI for those that didn't see this on the other ML:

http://lists.openstack.org/pipermail/foundation/2018-August/002617.html

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova] [placement] extraction (technical) update

2018-09-05 Thread Matt Riedemann

On 9/5/2018 8:47 AM, Mohammed Naser wrote:

Could placement not do what happened for a while when the nova_api
database was created?


Can you be more specific? I'm having a brain fart here and not 
remembering what you are referring to with respect to the nova_api DB.




I say this because I know that moving the database is a huge task for
us, considering how big it can be in certain cases for us, and it
means control plane outage too


I'm pretty sure you were in the room in YVR when we talked about how 
operators were going to do the database migration and were mostly OK 
with what was discussed, which was a lot will just copy and take the 
downtime (I think CERN said around 10 minutes for them, but they aren't 
a public cloud either), but others might do something more sophisticated 
and nova shouldn't try to pick the best fit for all.


I'm definitely interested in what you do plan to do for the database 
migration to minimize downtime.


+openstack-operators ML since this is an operators discussion now.

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-29 Thread Matt Riedemann

On 8/29/2018 3:21 PM, Tim Bell wrote:

Sounds like a good topic for PTG/Forum?


Yeah it's already on the PTG agenda [1][2]. I started the thread because 
I wanted to get the ball rolling as early as possible, and with people 
that won't attend the PTG and/or the Forum, to weigh in on not only the 
known issues with cross-cell migration but also the things I'm not 
thinking about.


[1] https://etherpad.openstack.org/p/nova-ptg-stein
[2] https://etherpad.openstack.org/p/nova-ptg-stein-cells

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] Deprecating Core/Disk/RamFilter

2018-08-24 Thread Matt Riedemann
This is just an FYI that I have proposed that we deprecate the 
core/ram/disk filters [1]. We should have probably done this back in 
Pike when we removed them from the default enabled_filters list and also 
deprecated the CachingScheduler, which is the only in-tree scheduler 
driver that benefits from enabling these filters. With the 
heal_allocations CLI, added in Rocky, we can probably drop the 
CachingScheduler in Stein so the pieces are falling into place. As we 
saw in a recent bug [2], having these enabled in Stein now causes 
blatantly incorrect filtering on ironic nodes.


Comments are welcome here, the review, or in IRC.

[1] https://review.openstack.org/#/c/596502/
[2] https://bugs.launchpad.net/tripleo/+bug/1787910

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova][cinder] Disabling nova volume-update (aka swap volume; aka cinder live migration)

2018-08-24 Thread Matt Riedemann

On 8/21/2018 5:36 AM, Lee Yarwood wrote:

I'm definitely in favor of hiding this from users eventually but
wouldn't this require some form of deprecation cycle?

Warnings within the API documentation would also be useful and even
something we could backport to stable to highlight just how fragile this
API is ahead of any policy change.


The swap volume API in nova defaults to admin-only policy rules by 
default, so for any users that are using it directly, they are (1) 
admins knowingly shooting themselves, or their users, in the foot or (2) 
operators have opened up the policy to non-admins (or some other role of 
user) to hit the API directly. I would ask why that is.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova][cinder][neutron] Cross-cell cold migration

2018-08-24 Thread Matt Riedemann

+operators

On 8/24/2018 4:08 PM, Matt Riedemann wrote:

On 8/23/2018 10:22 AM, Sean McGinnis wrote:
I haven't gone through the workflow, but I thought shelve/unshelve 
could detach
the volume on shelving and reattach it on unshelve. In that workflow, 
assuming
the networking is in place to provide the connectivity, the nova 
compute host
would be connecting to the volume just like any other attach and 
should work

fine. The unknown or tricky part is making sure that there is the network
connectivity or routing in place for the compute host to be able to 
log in to

the storage target.


Yeah that's also why I like shelve/unshelve as a start since it's doing 
volume detach from the source host in the source cell and volume attach 
to the target host in the target cell.


Host aggregates in Nova, as a grouping concept, are not restricted to 
cells at all, so you could have hosts in the same aggregate which span 
cells, so I'd think that's what operators would be doing if they have 
network/storage spanning multiple cells. Having said that, host 
aggregates are not exposed to non-admin end users, so again, if we rely 
on a normal user to do this move operation via resize, the only way we 
can restrict the instance to another host in the same aggregate is via 
availability zones, which is the user-facing aggregate construct in 
nova. I know Sam would care about this because NeCTAR sets 
[cinder]/cross_az_attach=False in nova.conf so servers/volumes are 
restricted to the same AZ, but that's not the default, and specifying an 
AZ when you create a server is not required (although there is a config 
option in nova which allows operators to define a default AZ for the 
instance if the user didn't specify one).


Anyway, my point is, there are a lot of "ifs" if it's not an 
operator/admin explicitly telling nova where to send the server if it's 
moving across cells.




If it's the other scenario mentioned where the volume needs to be 
migrated from
one storage backend to another storage backend, then that may require 
a little
more work. The volume would need to be retype'd or migrated (storage 
migration)

from the original backend to the new backend.


Yeah, the thing with retype/volume migration that isn't great is it 
triggers the swap_volume callback to the source host in nova, so if nova 
was orchestrating the volume retype/move, we'd need to wait for the swap 
volume to be done (not impossible) before proceeding, and only the 
libvirt driver implements the swap volume API. I've always wondered, 
what the hell do non-libvirt deployments do with respect to the volume 
retype/migration APIs in Cinder? Just disable them via policy?





--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-22 Thread Matt Riedemann

Hi everyone,

I have started an etherpad for cells topics at the Stein PTG [1]. The 
main issue in there right now is dealing with cross-cell cold migration 
in nova.


At a high level, I am going off these requirements:

* Cells can shard across flavors (and hardware type) so operators would 
like to move users off the old flavors/hardware (old cell) to new 
flavors in a new cell.


* There is network isolation between compute hosts in different cells, 
so no ssh'ing the disk around like we do today. But the image service is 
global to all cells.


Based on this, for the initial support for cross-cell cold migration, I 
am proposing that we leverage something like shelve offload/unshelve 
masquerading as resize. We shelve offload from the source cell and 
unshelve in the target cell. This should work for both volume-backed and 
non-volume-backed servers (we use snapshots for shelved offloaded 
non-volume-backed servers).


There are, of course, some complications. The main ones that I need help 
with right now are what happens with volumes and ports attached to the 
server. Today we detach from the source and attach at the target, but 
that's assuming the storage backend and network are available to both 
hosts involved in the move of the server. Will that be the case across 
cells? I am assuming that depends on the network topology (are routed 
networks being used?) and storage backend (routed storage?). If the 
network and/or storage backend are not available across cells, how do we 
migrate volumes and ports? Cinder has a volume migrate API for admins 
but I do not know how nova would know the proper affinity per-cell to 
migrate the volume to the proper host (cinder does not have a routed 
storage concept like routed provider networks in neutron, correct?). And 
as far as I know, there is no such thing as port migration in Neutron.


Could Placement help with the volume/port migration stuff? Neutron 
routed provider networks rely on placement aggregates to schedule the VM 
to a compute host in the same network segment as the port used to create 
the VM, however, if that segment does not span cells we are kind of 
stuck, correct?


To summarize the issues as I see them (today):

* How to deal with the targeted cell during scheduling? This is so we 
can even get out of the source cell in nova.


* How does the API deal with the same instance being in two DBs at the 
same time during the move?


* How to handle revert resize?

* How are volumes and ports handled?

I can get feedback from my company's operators based on what their 
deployment will look like for this, but that does not mean it will work 
for others, so I need as much feedback from operators, especially those 
running with multiple cells today, as possible. Thanks in advance.


[1] https://etherpad.openstack.org/p/nova-ptg-stein-cells

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Reminder to take the User Survey

2018-08-20 Thread Matt Van Winkle
Hi everyone,

The deadline for the 2018 OpenStack User Survey deadline is *tomorrow,
August 21 at 11:59pm UTC. *The User Survey is your annual opportunity to
provide direct feedback to the OpenStack community, so we can better
understand your environment and needs. We send all feedback directly to the
project teams who work to improve how we provide value to you.

By completing a deployment in the User Survey, you qualify as an Active
User Contributor (AUC) and will receive a discount for the Berlin Summit -
only $300 USD!

The survey will take less than 20 minutes, and there’s not much time left!

Please your User Survey by *tomorrow*, *Tuesday, August 21 at 11:59pm UTC.*

Get started now: https://www.openstack.org/user-survey

Let me know if you have any questions.

Thank you,
VW

-- 
Matt Van Winkle
Senior Manager, Software Engineering | Salesforce
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova] deployment question consultation

2018-08-18 Thread Matt Riedemann

+ops list

On 8/18/2018 10:20 PM, Matt Riedemann wrote:

On 8/13/2018 9:30 PM, Rambo wrote:
        1.Only in one region situation,what will happen in the cloud 
as expansion of cluster size?Then how solve it?If have the limit 
physical node number under the one region situation?How many nodes 
would be the best in one regione?


This question seems a bit too open-ended and completely subjective.


        2.When to use cellV2 is most suitable in cloud?


When this has been asked in the past, the best answer I've heard is, 
"whatever your current DB and MQ limits are for nova". So if that's 
about 200 hosts before the DB/MQ are struggling, then that could a cell. 
For reference, CERN has 70 cells with ~200 hosts per cell. However, at 
least one public cloud is approaching cells with fewer cells and 
thousands of hosts per cell. So it varies based on where your 
limitations lie. Also note that cells do not have to be defined by DB/MQ 
limits, they can also be used as a way to shard hardware and instance 
(flavor) types. For example, generation 1 hardware in cell1, gen2 
hardware in cell2, etc.



        3.How to shorten the time of batch creation of instance?


This again is completely subjective. It would depend on the 
configuration, size of nova deployment, size of hardware, available 
capacity, etc. Have you done profiling to point out *specific* problem 
areas during multi-create, for example, are you packing VMs onto as few 
hosts as possible to reduce costs? And if so, are you hitting problems 
with that due to rescheduling the server build because you have multiple 
scheduler workers picking the same host(s) for a subset of the VMs in 
the request? Or are you hitting RPC timeouts during select_destinations? 
If so, that might be related to the problem described in [1].


[1] https://review.openstack.org/#/c/510235/




--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?

2018-08-18 Thread Matt Riedemann

On 8/11/2018 12:50 AM, Chris Apsey wrote:
This sounds promising and there seems to be a feasible way to do this, 
but it also sounds like a decent amount of effort and would be a new 
feature in a future release rather than a bugfix - am I correct in that 
assessment?


Yes I'd say it's a blueprint and not a bug fix - it's not something we'd 
backport to stable branches upstream, for example.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] Speaker Selection Process: OpenStack Summit Berlin

2018-08-13 Thread Matt Joyce
CFP work is hard as hell.  Much respect to the review panel members.  It's
a thankless difficult job.

So, in lieu of being thankless,  THANK YOU

-Matt

On Mon, Aug 13, 2018 at 9:59 AM, Allison Price 
wrote:

> Hi everyone,
>
> One quick clarification. The speakers will be announced on* August 14 at
> 1300 UTC / 4:00 AM PDT.*
>
> Cheers,
> Allison
>
>
> On Aug 13, 2018, at 8:53 AM, Jimmy McArthur  wrote:
>
> Greetings!
>
> The speakers for the OpenStack Summit Berlin will be announced August 14,
> at 4:00 AM UTC. Ahead of that, we want to take this opportunity to thank
> our Programming Committee!  They have once again taken time out of their
> busy schedules to help create another round of outstanding content for the
> OpenStack Summit.
>
> The OpenStack Foundation relies on the community-nominated Programming
> Committee, along with your Community Votes to select the content of the
> summit.  If you're curious about this process, you can read more about it
> here
> <https://www.openstack.org/summit/berlin-2018/call-for-presentations/selection-process>
> where we have also listed the Programming Committee members.
>
> If you'd like to nominate yourself or someone you know for the OpenStack
> Summit Denver Programming Committee, you can do so here:
> https://openstackfoundation.formstack.com/forms/openstackdenver2019_
> programmingcommitteenom
>
> Thanks a bunch and we look forward to seeing everyone in Berlin!
>
> Cheers,
> Jimmy
>
>
>
>
> *
> <https://openstackfoundation.formstack.com/forms/openstackdenver2019_programmingcommitteenom>*
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?

2018-08-08 Thread Matt Riedemann

On 8/8/2018 2:42 PM, Chris Apsey wrote:
qemu-system-arm, qemu-system-ppc64, etc. in our environment are all x86 
packages, but they perform system-mode emulation (via dynamic 
instruction translation) for those target environments.  So, you run 
qemu-system-ppc64 on an x86 host in order to get a ppc64-emulated VM. 
Our use case is specifically directed at reverse engineering binaries 
and fuzzing for vulnerabilities inside of those architectures for things 
that aren't built for x86, but there are others.


If you were to apt-get install qemu-system and then hit autocomplete, 
you'd get a list of archiectures that qemu can emulate on x86 hardware - 
that's what we're trying to do incorporate.  We still want to run normal 
qemu-x86 with KVM virtualization extensions, but we ALSO want to run the 
other emulators without the KVM virtualization extensions in order to 
have more choice for target environments.


So to me, openstack would interpret this by checking to see if a target 
host supports the architecture specified in the image (it does this 
correctly), then it would choose the correct qemu-system-xx for spawning 
the instance based on the architecture flag of the image, which it 
currently does not (it always choose qemu-system-x86_64).


Does that make sense?


OK yeah now I'm following you - running ppc guests on an x86 host 
(virt_type=qemu rather than kvm right?).


I would have thought the hw_architecture image property was used for 
this somehow to configure the arch in the guest xml properly, like it's 
used in a few places [1][2][3].


See [4], I'd think we'd set the guest.arch but don't see that happening. 
We do set the guest.os_type though [5].


[1] 
https://github.com/openstack/nova/blob/c18b1c1bd646d7cefa3d3e4b25ce59460d1a6ebc/nova/virt/libvirt/driver.py#L4649
[2] 
https://github.com/openstack/nova/blob/c18b1c1bd646d7cefa3d3e4b25ce59460d1a6ebc/nova/virt/libvirt/driver.py#L4927
[3] 
https://github.com/openstack/nova/blob/c18b1c1bd646d7cefa3d3e4b25ce59460d1a6ebc/nova/virt/libvirt/blockinfo.py#L257

[4] https://libvirt.org/formatcaps.html#elementGuest
[5] 
https://github.com/openstack/nova/blob/c18b1c1bd646d7cefa3d3e4b25ce59460d1a6ebc/nova/virt/libvirt/driver.py#L5196


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?

2018-08-08 Thread Matt Riedemann

On 8/7/2018 8:54 AM, Chris Apsey wrote:
We don't actually have any non-x86 hardware at the moment - we're just 
looking to run certain workloads in qemu full emulation mode sans KVM 
extensions (we know there is a huge performance hit - it's just for a 
few very specific things).  The hosts I'm talking about are normal 
intel-based compute nodes with several different qemu packages installed 
(arm, ppc, mips, x86_64 w/ kvm extensions, etc.).


Is nova designed to work in this kind of scenario?  It seems like many 
pieces are there, but they're just not quite tied together quite right, 
or there is some config option I'm missing.


As far as I know, nova doesn't make anything arch-specific for QEMU. 
Nova will execute some qemu commands like qemu-img but as far as the 
virt driver, it goes through the libvirt-python API bindings which wrap 
over libvirtd which interfaces with QEMU. I would expect that if you're 
on an x86_64 arch host, that you can't have non-x86_64 packages 
installed on there (or they are noarch packages). Like, I don't know how 
your packaging works (are these rpms or debs, or other?) but how do you 
have ppc packages installed on an x86 system?


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?

2018-08-07 Thread Matt Riedemann

On 8/5/2018 1:43 PM, Chris Apsey wrote:
Trying to enable some alternate (non-x86) architectures on xenial + 
queens.  I can load up images and set the property correctly according 
to the supported values 
(https://docs.openstack.org/nova/queens/configuration/config.html) in 
image_properties_default_architecture.  From what I can tell, the 
scheduler works correctly and instances are only scheduled on nodes that 
have the correct qemu binary installed.  However, when the instance 
request lands on this node, it always starts it with qemu-system-x86_64 
rather than qemu-system-arm, qemu-system-ppc, etc.  If I manually set 
the correct binary, everything works as expected.


Am I missing something here, or is this a bug in nova-compute?


image_properties_default_architecture is only used in the scheduler 
filter to pick a compute host, it doesn't do anything about the qemu 
binary used in nova-compute. mnaser added the config option so maybe he 
can share what he's done on his computes.


Do you have qemu-system-x86_64 on non-x86 systems? Seems like a 
package/deploy issue since I'd expect x86 packages shouldn't install on 
a ppc system and vice versa, and only one qemu package should provide 
the binary.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] StarlingX diff analysis

2018-08-07 Thread Matt Riedemann

On 8/7/2018 1:10 AM, Flint WALRUS wrote:
I didn’t had time to check StarlingX code quality, how did you feel it 
while you were doing your analysis?


I didn't dig into the test diffs themselves, but it was my impression 
that from what I was poking around in the local git repo, there were 
several changes which didn't have any test coverage.


For the really big full stack changes (L3 CAT, CPU scaling and 
shared/pinned CPUs on same host), toward the end I just started glossing 
over a lot of that because it's so much code in so many places, so I 
can't really speak very well to how it was written or how well it is 
tested (maybe WindRiver had a more robust CI system running integration 
tests, I don't know).


There were also some things which would have been caught in code review 
upstream. For example, they ignore the "force" parameter for live 
migration so that live migration requests always go through the 
scheduler. However, the "force" parameter is only on newer 
microversions. Before that, if you specified a host at all it would 
bypass the scheduler, but the change didn't take that into account, so 
they still have gaps in some of the things they were trying to 
essentially disable in the API.


On the whole I think the quality is OK. It's not really possible to 
accurately judge that when looking at a single diff this large.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Live-migration experiences?

2018-08-06 Thread Matt Riedemann

On 8/6/2018 8:12 AM, Clint Byrum wrote:

First a few facts about our installation:

* We're using kolla-ansible and basically leaving most nova settings at 
the default, meaning libvirt+kvm
* We will be using block migration, as we have no shared storage of any 
kind.
* We use routed networks to set up L2 segments per-rack. Each rack is 
basically an island unto itself. The VMs on one rack cannot be migrated 
to another rack  because of this.
* Our main resource limitation is disk, followed closely by RAM. As 
such, our main motivation for wanting to do live migration is to be able 
to move VMs off of machines where over-subscribed disk users start to 
threaten the free space of the others.


What release are you on?



* Do people have feedback on live_migrate_permit_auto_convergence? It 
seems like a reasonable trade-off, but since it is defaulted to false, I 
wonder if there are some hidden gotchas there.


You might want to read through [1] and [2]. Those were written by the 
OSIC dev team when that still existed. But there are some (somewhat 
mysterious) mentions to caveats with post-copy you should be aware of. 
At this point, John Garbutt is probably the best person to talk to about 
those since all of the other OSIC devs that worked on this spec are long 
gone.


>
> * General pointers to excellent guides, white papers, etc, that might 
help us avoid doing all of our learning via trial/error.


Check out [3]. I've specifically been meaning to watch the one from 
Boston that John was in.


[1] 
https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/live-migration-force-after-timeout.html
[2] 
https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/live-migration-per-instance-timeout.html

[3] https://www.openstack.org/videos/search?search=live%20migration

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] StarlingX diff analysis

2018-08-06 Thread Matt Riedemann
In case you haven't heard, there was this StarlingX thing announced at 
the last summit. I have gone through the enormous nova diff in their 
repo and the results are in a spreadsheet [1]. Given the enormous 
spreadsheet (see a pattern?), I have further refined that into a set of 
high-level charts [2].


I suspect there might be some negative reactions to even doing this type 
of analysis lest it might seem like promoting throwing a huge pile of 
code over the wall and expecting the OpenStack (or more specifically the 
nova) community to pick it up. That's not my intention at all, nor do I 
expect nova maintainers to be responsible for upstreaming any of this.


This is all educational to figure out what the major differences and 
overlaps are and what could be constructively upstreamed from the 
starlingx staging repo since it's not all NFV and Edge dragons in here, 
there are some legitimate bug fixes and good ideas. I'm sharing it 
because I want to feel like my time spent on this in the last week 
wasn't all for nothing.


[1] 
https://docs.google.com/spreadsheets/d/1ugp1FVWMsu4x3KgrmPf7HGX8Mh1n80v-KVzweSDZunU/edit?usp=sharing
[2] 
https://docs.google.com/presentation/d/1P-__JnxCFUbSVlEoPX26Jz6VaOyNg-jZbBsmmKA2f0c/edit?usp=sharing


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] UC Candidacy

2018-08-03 Thread Matt Van Winkle
Greetings OpenStack Operators and Users,

I’d like to take the opportunity to state my candidacy in the upcoming
UC election. I have enjoyed the work we have been able to accomplish
these last 12 months and I would like to serve another term to help
continue the momentum.

After 6 years in Operations and Engineering for Rackspace’s public
cloud, I have recently joined Salesforce to help with their OpenStack
efforts. At both companies, I’ve had the distinct pleasure of serving
a number of talented engineers and teams as they have worked to scale
and manage the infrastructure. During this time, I’ve also enjoyed
sharing ideas with and learning from other Operators running large
OpenStack clouds in order to find new and creative ways to solve
challenges

With respect to community involvement, my first summit was Portland
and have made all but two since. I’ve also been very active in the
Operators community since helping plan the very first meet-up in San
Jose. I’ve given a few talks in the past and have served as track
chair many times. After Paris, I began chairing the Large Deployments
Team. This team, while inactive now, was a long running group of
operators that shared many ideas on scaling OpenStack and has had some
successes running feature requests to ground with dev teams. It’s been
a distinct pleasure to work with such smart folks from around the
community. Chairing LDT also led to an opportunity to join the Ops
Meetup Team - working with others on planning Operator mid-cycles and
Ops related Summit/Forum sessions.

I was fortunate enough to be part of the group that helped the old UC
craft the bylaw changes that have expanded the committee and made it
the elected body it is today. After serving as an election official in
the first election, I chose to run for an open spot a year ago.
Regardless of the outcome of this election, it is really awesome to
see the evolution of the UC and how it’s able to better coordinate
Operator and User efforts in guiding the community and the development
cycle.

If re-elected, I hope to keep helping more Users and Operators
understand how to take better advantage of the the various events and
dev cycle to drive improvement and change in the software. The UC has
a vision of seeing conversations at and Operators mid-cycle or from an
OpenStack Days OPs session become specific topic submissions at the
next summit. Conversely, we'd love this pattern to be regular enough
that the Dev teams start proposing session ideas for certain feedback
at upcoming OPs gatherings to complete the cycle. While there is still
plenty of work to do to make these things a reality, the UC has been
laying the ground work since the Dublin PTG. I'd like to serve another
term so I can do my part to help keep making progress. Beyond that, I
want to continue the great work of the UC members to date on being an
advocate for the User with the Board, TC and community at large.

I appreciate the time and the consideration.
Thanks!
VW



-- 
Matt Van Winkle
Senior Manager, Software Engineering | Salesforce
Mobile: 210-445-4183
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-ansible] How to manage system upgrades ?

2018-07-30 Thread Matt Riedemann

On 7/27/2018 3:34 AM, Gilles Mocellin wrote:

- for compute nodes : disable compute node and live-evacuate instances...


To be clear, what do you mean exactly by "live-evacuate"? I assume you 
mean live migration of all instances off each (disabled) compute node 
*before* you upgrade it. I wanted to ask because "evacuate" as a server 
operation is something else entirely (it's rebuild on another host which 
is definitely disruptive to the workload on that server).


http://www.danplanet.com/blog/2016/03/03/evacuate-in-nova-one-command-to-confuse-us-all/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Couple of CellsV2 questions

2018-07-23 Thread Matt Riedemann
I'll try to help a bit inline. Also cross-posting to openstack-dev and 
tagging with [nova] to highlight it.


On 7/23/2018 10:43 AM, Jonathan Mills wrote:
I am looking at implementing CellsV2 with multiple cells, and there's a 
few things I'm seeking clarification on:


1) How does a superconductor know that it is a superconductor?  Is its 
operation different in any fundamental way?  Is there any explicit 
configuration or a setting in the database required? Or does it simply 
not care one way or another?


It's a topology term, not really anything in config or the database that 
distinguishes the "super" conductor. I assume you've gone over the 
service layout in the docs:


https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#service-layout

There are also some summit talks from Dan about the topology linked here:

https://docs.openstack.org/nova/latest/user/cells.html#cells-v2

The superconductor is the conductor service at the "top" of the tree 
which interacts with the API and scheduler (controller) services and 
routes operations to the cell. Then once in a cell, the operation should 
ideally be confined there. So, for example, reschedules during a build 
would be confined to the cell. The cell conductor doesn't go back "up" 
to the scheduler to get a new set of hosts for scheduling. This of 
course depends on which release you're using and your configuration, see 
the caveats section in the cellsv2-layout doc.




2) When I ran the command "nova-manage cell_v2 create_cell --name=cell1 
--verbose", the entry created for cell1 in the api database includes 
only one rabbitmq server, but I have three of them as an HA cluster.  
Does it only support talking to one rabbitmq server in this 
configuration? Or can I just update the cell1 transport_url in the 
database to point to all three? Is that a supported configuration?


First, don't update stuff directly in the database if you don't have to. 
:) What you set on the transport_url should be whatever oslo.messaging 
can handle:


https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.transport_url

There is at least one reported bug for this but I'm not sure I fully 
grok it or what its status is at this point:


https://bugs.launchpad.net/nova/+bug/1717915



3) Is there anything wrong with having one cell share the amqp bus with 
your control plane, while having additional cells use their own amqp 
buses? Certainly I realize that the point of CellsV2 is to shard the 
amqp bus for greater horizontal scalability.  But in my case, my first 
cell is on the smaller side, and happens to be colocated with the 
control plane hardware (whereas other cells will be in other parts of 
the datacenter, or in other datacenters with high-speed links).  I was 
thinking of just pointing that first cell back at the same rabbitmq 
servers used by the control plane, but perhaps directing them at their 
own rabbitmq vhost. Is that a terrible idea?


Would need to get input from operators and/or Dan Smith's opinion on 
this one, but I'd say it's no worse than having a flat single cell 
deployment. However, if you're going to do multi-cell long-term anyway, 
then it would be best to get in the mindset and discipline of not 
relying on shared MQ between the controller services and the cells. In 
other words, just do the right thing from the start rather than have to 
worry about maybe changing the deployment / configuration for that one 
cell down the road when it's harder.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-community] Running instance snapshot

2018-07-16 Thread Matt Riedemann

On 7/12/2018 10:09 AM, Alfredo De Luca wrote:
​I tried with glance image-create or nova backup but I got the 
following


Neither of those are server snapshot operations (well backup is, but 
it's probably not what you're looking for).


glance image-create is creating an image in glance, not creating a 
snapshot from a server. That would be 'nova image-create':


https://docs.openstack.org/python-novaclient/latest/cli/nova.html#nova-image-create

What is the error message in the 400 response? It should be in the CLI 
output but if not, what's in the nova-api logs?


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Cinder cross_az_attach=False changes/fixes

2018-07-15 Thread Matt Riedemann
Just an update on an old thread, but I've been working on the 
cross_az_attach=False issues again this past week and I think I have a 
couple of decent fixes.


On 5/31/2017 6:08 PM, Matt Riedemann wrote:

This is a request for any operators out there that configure nova to set:

[cinder]
cross_az_attach=False

To check out these two bug fixes:

1. https://review.openstack.org/#/c/366724/

This is a case where nova is creating the volume during boot from volume 
and providing an AZ to cinder during the volume create request. Today we 
just pass the instance.availability_zone which is None if the instance 
was created without an AZ set. It's unclear to me if that causes the 
volume creation to fail (someone in IRC was showing the volume going 
into ERROR state while Nova was waiting for it to be available), but I 
think it will cause the later attach to fail here [1] because the 
instance AZ (defaults to None) and volume AZ (defaults to nova) may not 
match. I'm still looking for more details on the actual failure in that 
one though.


The proposed fix in this case is pass the AZ associated with any host 
aggregate that the instance is in.


This was indirectly fixed by change 
https://review.openstack.org/#/c/446053/ in Pike where we now set the 
instance.availability_zone in conductor after we get a selected host 
from the scheduler (we get the AZ for the host and set that on the 
instance before sending the instance to compute to build it).


While investigating this on master, I found a new bug where we do an 
up-call to the API DB which fails in a split MQ setup, and I have a fix 
here:


https://review.openstack.org/#/c/582342/



2. https://review.openstack.org/#/c/469675/

This is similar, but rather than checking the AZ when we're on the 
compute and the instance has a host, we're in the API and doing a boot 
from volume where an existing volume is provided during server create. 
By default, the volume's AZ is going to be 'nova'. The code doing the 
check here is getting the AZ for the instance, and since the instance 
isn't on a host yet, it's not in any aggregate, so the only AZ we can 
get is from the server create request itself. If an AZ isn't provided 
during the server create request, then we're comparing 
instance.availability_zone (None) to volume['availability_zone'] 
("nova") and that results in a 400.


My proposed fix is in the case of BFV checks from the API, we default 
the AZ if one wasn't requested when comparing against the volume. By 
default this is going to compare "nova" for nova and "nova" for cinder, 
since CONF.default_availability_zone is "nova" by default in both projects.


I've refined this fix a bit to be more flexible:

https://review.openstack.org/#/c/469675/

So now if doing boot from volume and we're checking 
cross_az_attach=False in the API and the user didn't explicitly request 
an AZ for the instance, we do a few checks:


1. If [DEFAULT]/default_schedule_zone is not None (the default), we use 
that to compare against the volume AZ.


2. If the volume AZ is equal to the [DEFAULT]/default_availability_zone 
(nova by default in both nova and cinder), we're OK - no issues.


3. If the volume AZ is not equal to [DEFAULT]/default_availability_zone, 
it means either the volume was created with a specific AZ or cinder's 
default AZ is configured differently from nova's. In that case, I take 
the volume AZ and put it into the instance RequestSpec so that during 
scheduling, the nova scheduler picks a host in the same AZ as the volume 
- if that AZ isn't in nova, we fail to schedule (NoValidHost) (but that 
shouldn't really happen, why would one have cross_az_attach=False w/o 
mirrored AZ in both cinder and nova?).




--

I'm requesting help from any operators that are setting 
cross_az_attach=False because I have to imagine your users have run into 
this and you're patching around it somehow, so I'd like input on how you 
or your users are dealing with this.


I'm also trying to recreate these in upstream CI [2] which I was already 
able to do with the 2nd bug.


This devstack patch has recreated both issues above and I'm adding the 
fixes to it as dependencies to show the problems are resolved.




Having said all of this, I really hate cross_az_attach as it's 
config-driven API behavior which is not interoperable across clouds. 
Long-term I'd really love to deprecate this option but we need a 
replacement first, and I'm hoping placement with compute/volume resource 
providers in a shared aggregate can maybe make that happen.


[1] 
https://github.com/openstack/nova/blob/f278784ccb06e16ee12a42a585c5615abe65edfe/nova/virt/block_device.py#L368 


[2] https://review.openstack.org/#/c/467674/



--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-client] - missing commands?

2018-06-13 Thread Matt Riedemann

On 6/13/2018 1:42 PM, Flint WALRUS wrote:
Hi guys, I use the «new» openstack-client command as much as possible 
since a couple of years now, but yet I had a hard time recently to find 
equivalent command of the following:


nova force-delete 
&
The command on swift that permit to recursively upload the content of a 
directory and automatically creating the same directory structure using 
pseudo-folders.


Did I miss something somewhere or are those commands missing?

On the nova part I think it’s not that important as a classic openstack 
server delete  seems to do the same, but not quite sure.


Oh wow, great timing:

http://lists.openstack.org/pipermail/openstack-dev/2018-June/131308.html

I've also queued that up for the upcoming bug smash in China next week.

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] large high-performance ephemeral storage

2018-06-13 Thread Matt Riedemann

On 6/13/2018 10:54 AM, Chris Friesen wrote:
Also, migration and resize are not supported for LVM-backed instances.  
I proposed a patch to support them 
(https://review.openstack.org/#/c/337334/) but hit issues and never got 
around to fixing them up.


Yup, I guess I should have read the entire thread first.

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] large high-performance ephemeral storage

2018-06-13 Thread Matt Riedemann

On 6/13/2018 8:58 AM, Blair Bethwaite wrote:
Though we have not used LVM based instance storage before, are there any 
significant gotchas?


I know you can't resize/cold migrate lvm-backed ephemeral root disk 
instances:


https://github.com/openstack/nova/blob/343c2bee234568855fd9e6ba075a05c2e70f3388/nova/virt/libvirt/driver.py#L8136

However, StarlingX has a patch for that (pretty sure anyway, I know 
WindRiver had one):


https://review.openstack.org/#/c/337334/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Reminder to add "nova-status upgrade check" to deployment tooling

2018-06-13 Thread Matt Riedemann
I was going through some recently reported nova bugs and came across [1] 
which I opened at the Summit during one of the FFU sessions where I 
realized the nova upgrade docs don't mention the nova-status upgrade 
check CLI [2] (added in Ocata).


As a result, I was wondering how many deployment tools out there support 
upgrades and from those, which are actually integrating that upgrade 
status check command.


I'm not really familiar with most of them, but I've dabbled in OSA 
enough to know where the code lived for nova upgrades, so I posted a 
patch [3].


I'm hoping this can serve as a template for other deployment projects to 
integrate similar checks into their upgrade (and install verification) 
flows.


[1] https://bugs.launchpad.net/nova/+bug/1772973
[2] https://docs.openstack.org/nova/latest/cli/nova-status.html
[3] https://review.openstack.org/#/c/575125/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova] increasing the number of allowed volumes attached per instance > 26

2018-06-07 Thread Matt Riedemann

On 6/7/2018 1:54 PM, Jay Pipes wrote:


If Cinder tracks volume attachments as consumable resources, then this 
would be my preference.


Cinder does:

https://developer.openstack.org/api-ref/block-storage/v3/#attachments

However, there is no limit in Cinder on those as far as I know.

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova] increasing the number of allowed volumes attached per instance > 26

2018-06-07 Thread Matt Riedemann

+operators (I forgot)

On 6/7/2018 1:07 PM, Matt Riedemann wrote:

On 6/7/2018 12:56 PM, melanie witt wrote:
Recently, we've received interest about increasing the maximum number 
of allowed volumes to attach to a single instance > 26. The limit of 
26 is because of a historical limitation in libvirt (if I remember 
correctly) and is no longer limited at the libvirt level in the 
present day. So, we're looking at providing a way to attach more than 
26 volumes to a single instance and we want your feedback.


The 26 volumes thing is a libvirt driver restriction.

There was a bug at one point because powervm (or powervc) was capping 
out at 80 volumes per instance because of restrictions in the 
build_requests table in the API DB:


https://bugs.launchpad.net/nova/+bug/1621138

They wanted to get to 128, because that's how power rolls.



We'd like to hear from operators and users about their use cases for 
wanting to be able to attach a large number of volumes to a single 
instance. If you could share your use cases, it would help us greatly 
in moving forward with an approach for increasing the maximum.


Some ideas that have been discussed so far include:

A) Selecting a new, higher maximum that still yields reasonable 
performance on a single compute host (64 or 128, for example). Pros: 
helps prevent the potential for poor performance on a compute host 
from attaching too many volumes. Cons: doesn't let anyone opt-in to a 
higher maximum if their environment can handle it.


B) Creating a config option to let operators choose how many volumes 
allowed to attach to a single instance. Pros: lets operators opt-in to 
a maximum that works in their environment. Cons: it's not discoverable 
for those calling the API.


I'm not a fan of a non-discoverable config option which will impact API 
behavior indirectly, i.e. on cloud A I can boot from volume with 64 
volumes but not on cloud B.




C) Create a configurable API limit for maximum number of volumes to 
attach to a single instance that is either a quota or similar to a 
quota. Pros: lets operators opt-in to a maximum that works in their 
environment. Cons: it's yet another quota?


This seems the most reasonable to me if we're going to do this, but I'm 
probably in the minority. Yes more quota limits sucks, but it's (1) 
discoverable by API users and therefore (2) interoperable.


If we did the quota thing, I'd probably default to unlimited and let the 
cinder volume quota cap it for the project as it does today. Then admins 
can tune it as needed.





--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] nova-compute automatically disabling itself?

2018-06-07 Thread Matt Riedemann

On 2/6/2018 6:44 PM, Matt Riedemann wrote:

On 2/6/2018 2:14 PM, Chris Apsey wrote:
but we would rather have intermittent build failures rather than 
compute nodes falling over in the future.


Note that once a compute has a successful build, the consecutive build 
failures counter is reset. So if your limit is the default (10) and you 
have 10 failures in a row, the compute service is auto-disabled. But if 
you have say 5 failures and then a pass, it's reset to 0 failures.


Obviously if you're doing a pack-first scheduling strategy rather than 
spreading instances across the deployment, a burst of failures could 
easily disable a compute, especially if that host is overloaded like you 
saw. I'm not sure if rescheduling is helping you or not - that would be 
useful information since we consider the need to reschedule off a failed 
compute host as a bad thing. At the Forum in Boston when this idea came 
up, it was specifically for the case that operators in the room didn't 
want a bad compute to become a "black hole" in their deployment causing 
lots of reschedules until they get that one fixed.


Just an update on this. There is a change merged in Rocky [1] which is 
also going through backports to Queens and Pike. If you've already 
disabled the "consecutive_build_service_disable_threshold" config option 
then it's a no-op. If you haven't, 
"consecutive_build_service_disable_threshold" is now used to count build 
failures but no longer auto-disable the compute service on the 
configured threshold is met (10 by default). The build failure count is 
then used by a new weigher (enabled by default) to sort hosts with build 
failures to the back of the list of candidate hosts for new builds. Once 
there is a successful build on a given host, the failure count is reset. 
The idea here is that hosts which are failing are given lower priority 
during scheduling.


[1] https://review.openstack.org/#/c/572195/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] Need feedback on spec for handling down cells in the API

2018-06-07 Thread Matt Riedemann
We have a nova spec [1] which is at the point that it needs some API 
user (and operator) feedback on what nova API should be doing when 
listing servers and there are down cells (unable to reach the cell DB or 
it times out).


tl;dr: the spec proposes to return "shell" instances which have the 
server uuid and created_at fields set, and maybe some other fields we 
can set, but otherwise a bunch of fields in the server response would be 
set to UNKNOWN sentinel values. This would be unversioned, and therefore 
could wreak havoc on existing client side code that expects fields like 
'config_drive' and 'updated' to be of a certain format.


There are alternatives listed in the spec so please read this over and 
provide feedback since this is a pretty major UX change.


Oh, and no pressure, but today is the spec freeze deadline for Rocky.

[1] https://review.openstack.org/#/c/557369/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [TC] Stein Goal Selection

2018-06-04 Thread Matt Riedemann
+openstack-operators since we need to have more operator feedback in our 
community-wide goals decisions.


+Melvin as my elected user committee person for the same reasons as 
adding operators into the discussion.


On 6/4/2018 3:38 PM, Matt Riedemann wrote:

On 6/4/2018 1:07 PM, Sean McGinnis wrote:

Python 3 First
==

One of the things brought up in the session was picking things that bring
excitement and are obvious benefits to deployers and users of OpenStack
services. While this one is maybe not as immediately obvious, I think 
this
is something that will end up helping deployers and also falls into 
the tech

debt reduction category that will help us move quicker long term.

Python 2 is going away soon, so I think we need something to help 
compel folks
to work on making sure we are ready to transition. This will also be a 
good

point to help switch the mindset over to Python 3 being the default used
everywhere, with our Python 2 compatibility being just to continue legacy
support.


I still don't really know what this goal means - we have python 3 
support across the projects for the most part don't we? Based on that, 
this doesn't seem like much to take an entire "goal slot" for the release.




Cold Upgrade Support


The other suggestion in the Forum session related to upgrades was the 
addition
of "upgrade check" CLIs for each project, and I was tempted to suggest 
that as
my second strawman choice. For some projects that would be a very 
minimal or
NOOP check, so it would probably be easy to complete the goal. But 
ultimately
what I think would bring the most value would be the work on 
supporting cold
upgrade, even if it will be more of a stretch for some projects to 
accomplish.


I think you might be mixing two concepts here.

The cold upgrade support, per my understanding, is about getting the 
assert:supports-upgrade tag:


https://governance.openstack.org/tc/reference/tags/assert_supports-upgrade.html 



Which to me basically means the project runs a grenade job. There was 
discussion in the room about grenade not being a great tool for all 
projects, but no one is working on a replacement for that, so I don't 
think it's really justification at this point for *not* making it a goal.


The "upgrade check" CLIs is a different thing though, which is more 
about automating as much of the upgrade release notes as possible. See 
the nova docs for examples on how we have used it:


https://docs.openstack.org/nova/latest/cli/nova-status.html

I'm not sure what projects you had in mind when you said, "For some 
projects that would be a very minimal or NOOP check, so it would 
probably be easy to complete the goal." I would expect that projects 
aren't meeting the goal if they are noop'ing everything. But what can be 
automated like this isn't necessarily black and white either.




Upgrades have been a major focus of discussion lately, especially as our
operators have been trying to get closer to the latest work upstream. 
This has

been an ongoing challenge.

There has also been a lot of talk about LTS releases. We've landed on 
fast
forward upgrade to get between several releases, but I think improving 
upgrades
eases the way both for easier and more frequent upgrades and also 
getting to

the point some day where maybe we can think about upgrading over several
releases to be able to do something like an LTS to LTS upgrade.

Neither one of these upgrade goals really has a clearly defined plan that
projects can pick up now and start working on, but I think with those 
involved

in these areas we should be able to come up with a perscriptive plan for
projects to follow.

And it would really move our fast forward upgrade story forward.


Agreed. In the FFU Forum session at the summit I mentioned the 
'nova-status upgrade check' CLI and a lot of people in the room had 
never heard of it because they are still on Mitaka before we added that 
CLI (new in Ocata). But they sounded really interested in it and said 
they wished other projects were doing that to help ease upgrades so they 
won't be stuck on older unmaintained releases for so long. So anything 
we can do to improve upgrades, including our testing for them, will help 
make FFU better.




Next Steps
==

I'm hoping with a strawman proposal we have a basis for debating the 
merits of
these and getting closer to being able to officially select Stein 
goals. We
still have some time, but I would like to avoid making late-cycle 
selections so

teams can start planning ahead for what will need to be done in Stein.

Please feel free to promote other ideas for goals. That would be a 
good way for
us to weigh the pro's and con's between these and whatever else you 
have in
mind. Then hopefully we can come to some consensus and work towards 
clearly
defining what needs to be done and getting things well documented for 
teams to

pick up as soon as they wrap up Ro

Re: [Openstack-operators] [openstack-dev] [nova][glance] Deprecation of nova.image.download.modules extension point

2018-06-04 Thread Matt Riedemann
ad extension point? Should I work to get the 
code for this RBD download into the upstream repository?




I think you should propose your changes upstream with a blueprint, the 
docs for the blueprint process are here:


https://docs.openstack.org/nova/latest/contributor/blueprints.html

Since it's not an API change, this might just be a specless blueprint, 
but you'd need to write up the blueprint and probably post the PoC code 
to Gerrit and then bring it up during the "Open Discussion" section of 
the weekly nova meeting.


Once we can take a look at the code change, we can go from there on 
whether or not to add that in-tree or go some alternative route.


Until that happens, I think we'll just say we won't remove that 
deprecated image download extension code, but that's not going to be an 
unlimited amount of time if you don't propose your changes upstream.


Is there going to be anything blocking or slowing you down on your end 
with regard to contributing this change, like legal approval, license 
agreements, etc? If so, please be up front about that.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] isolate hypervisor to project

2018-06-04 Thread Matt Riedemann

On 6/4/2018 6:43 AM, Tobias Urdin wrote:

I have received a question about a more specialized use case where we
need to isolate several hypervisors

to a specific project. My first thinking was using nova flavors for only
that project and add extra specs properties to use a specific host
aggregate but this

means I need to assign values to all other flavors to not use those
which seems weird.


How could I go about solving this the easies/best way or from the
history of the mailing lists, the most supported way since there is a
lot of changes

to scheduler/placement part right now?


Depending on which release you're on, it sounds like you want to use this:

https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregatemultitenancyisolation

In Rocky we have a replacement for that filter which does pre-filtering 
in Placement which should give you a performance gain when it comes time 
to do the host filtering:


https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#tenant-isolation-with-placement

Note that even if you use AggregateMultiTenancyIsolation for the one 
project, other projects can still randomly land on the hosts in that 
aggregate unless you also assign those to their own aggregates.


It sounds like you're might be looking for a dedicated hosts feature? 
There is an RFE from the public cloud work group for that:


https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1771523

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] attaching network cards to VMs taking a very long time

2018-06-03 Thread Matt Riedemann

On 6/2/2018 1:37 AM, Chris Apsey wrote:
This is great.  I would even go so far as to say the install docs should 
be updated to capture this as the default; as far as I know there is no 
negative impact when running in daemon mode, even on very small 
deployments.  I would imagine that there are operators out there who 
have run into this issue but didn't know how to work through it - making 
stuff like this less painful is key to breaking the 'openstack is hard' 
stigma.


I think changing the default on the root_helper_daemon option is a good 
idea if everyone is setting that anyway. There are some comments in the 
code next to the option that make me wonder if there are edge cases 
where it might not be a good idea, but I don't really know the details, 
someone from the neutron team that knows more about it would have to 
speak up.


Also, I wonder if converting to privsep in the neutron agent would 
eliminate the need for this option altogether and still gain the 
performance benefits.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] attaching network cards to VMs taking a very long time

2018-05-31 Thread Matt Riedemann

On 5/30/2018 9:30 AM, Matt Riedemann wrote:


I can start pushing some docs patches and report back here for review help.


Here are the docs patches in both nova and neutron:

https://review.openstack.org/#/q/topic:bug/1774217+(status:open+OR+status:merged)

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova] proposal to postpone nova-network core functionality removal to Stein

2018-05-31 Thread Matt Riedemann

+openstack-operators

On 5/31/2018 3:04 PM, Matt Riedemann wrote:

On 5/31/2018 1:35 PM, melanie witt wrote:


This cycle at the PTG, we had decided to start making some progress 
toward removing nova-network [1] (thanks to those who have helped!) 
and so far, we've landed some patches to extract common network 
utilities from nova-network core functionality into separate utility 
modules. And we've started proposing removal of nova-network REST APIs 
[2].


At the cells v2 sync with operators forum session at the summit [3], 
we learned that CERN is in the middle of migrating from nova-network 
to neutron and that holding off on removal of nova-network core 
functionality until Stein would help them out a lot to have a safety 
net as they continue progressing through the migration.


If we recall correctly, they did say that removal of the nova-network 
REST APIs would not impact their migration and Surya Seetharaman is 
double-checking about that and will get back to us. If so, we were 
thinking we can go ahead and work on nova-network REST API removals 
this cycle to make some progress while holding off on removing the 
core functionality of nova-network until Stein.


I wanted to send this to the ML to let everyone know what we were 
thinking about this and to receive any additional feedback folks might 
have about this plan.


Thanks,
-melanie

[1] https://etherpad.openstack.org/p/nova-ptg-rocky L301
[2] https://review.openstack.org/567682
[3] 
https://etherpad.openstack.org/p/YVR18-cellsv2-migration-sync-with-operators 
L30


As a reminder, this is the etherpad I started to document the nova-net 
specific compute REST APIs which are candidates for removal:


https://etherpad.openstack.org/p/nova-network-removal-rocky




--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance

2018-05-30 Thread Matt Riedemann

On 5/30/2018 9:41 AM, Matt Riedemann wrote:
Thanks for your patience in debugging this Massimo! I'll get a bug 
reported and patch posted to fix it.


I'm tracking the problem with this bug:

https://bugs.launchpad.net/nova/+bug/1774205

I found that this has actually been fixed since Pike:

https://review.openstack.org/#/c/449640/

But I've got a patch up for another related issue, and a functional test 
to avoid regressions which I can also use when backporting the fix to 
stable/ocata.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance

2018-05-30 Thread Matt Riedemann

On 5/30/2018 5:21 AM, Massimo Sgaravatto wrote:

The problem is indeed with the tenant_id

When I create a VM, tenant_id is ee1865a76440481cbcff08544c7d580a 
(SgaraPrj1), as expected


But when, as admin, I run the "nova migrate" command to migrate the very 
same instance, the tenant_id is 56c3f5c047e74a78a71438c4412e6e13 (admin) !


OK that's good information.

Tracing the code for cold migrate in ocata, we get the request spec that 
was created when the instance was created here:


https://github.com/openstack/nova/blob/stable/ocata/nova/compute/api.py#L3339

As I mentioned earlier, if it was cold migrating an instance created 
before Newton and the online data migration wasn't run on it, we'd 
create a temporary request spec here:


https://github.com/openstack/nova/blob/stable/ocata/nova/conductor/manager.py#L263

But that shouldn't be the case in your scenario.

Right before we call the scheduler, for some reason, we completely 
ignore the request spec retrieved in the API, and re-create it from 
local scope variables in conductor:


https://github.com/openstack/nova/blob/stable/ocata/nova/conductor/tasks/migrate.py#L50

And *that* is precisely where this breaks down and takes the project_id 
from the current context (admin) rather than the instance:


https://github.com/openstack/nova/blob/stable/ocata/nova/objects/request_spec.py#L407

Thanks for your patience in debugging this Massimo! I'll get a bug 
reported and patch posted to fix it.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] attaching network cards to VMs taking a very long time

2018-05-30 Thread Matt Riedemann

On 5/29/2018 8:23 PM, Chris Apsey wrote:
I want to echo the effectiveness of this change - we had vif failures 
when launching more than 50 or so cirros instances simultaneously, but 
moving to daemon mode made this issue disappear and we've tested 5x that 
amount.  This has been the single biggest scalability improvement to 
date.  This option should be the default in the official docs.


This is really good feedback. I'm not sure if there is any kind of 
centralized performance/scale-related documentation, does the LCOO team 
[1] have something that's current? There are also the performance docs 
[2] but that looks pretty stale.


We could add a note to the neutron rootwrap configuration option such 
that if you're running into timeout issues you could consider running 
that in daemon mode, but it's probably not very discoverable. In fact, I 
couldn't find anything about it in the neutron docs, I only found this 
[3] because I know it's defined in oslo.rootwrap (I don't expect 
everyone to know where this is defined).


I found root_helper_daemon in the neutron docs [4] but it doesn't 
mention anything about performance or related options, and it just makes 
it sound like it matters for xenserver, which I'd gloss over if I were 
using libvirt. The root_helper_daemon config option help in neutron 
should probably refer to the neutron-rootwrap-daemon which is in the 
setup.cfg [5].


For better discoverability of this, probably the best place to mention 
it is in the nova vif_plugging_timeout configuration option, since I 
expect that's the first place operators will be looking when they start 
hitting timeouts during vif plugging at scale.


I can start pushing some docs patches and report back here for review help.

[1] https://wiki.openstack.org/wiki/LCOO
[2] https://docs.openstack.org/developer/performance-docs/
[3] 
https://docs.openstack.org/oslo.rootwrap/latest/user/usage.html#daemon-mode
[4] 
https://docs.openstack.org/neutron/latest/configuration/neutron.html#agent.root_helper_daemon

[5] https://github.com/openstack/neutron/blob/f486f0/setup.cfg#L54

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance

2018-05-29 Thread Matt Riedemann

On 5/29/2018 3:07 PM, Massimo Sgaravatto wrote:
The VM that I am trying to migrate was created when the Cloud was 
already running Ocata


OK, I'd added the tenant_id variable in scope to the log message here:

https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/filters/aggregate_multitenancy_isolation.py#L50

And make sure when it fails, it matches what you'd expect. If it's None 
or '' or something weird then we have a bug.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance

2018-05-29 Thread Matt Riedemann

On 5/29/2018 12:44 PM, Jay Pipes wrote:
Either that, or the wrong project_id is being used when attempting to 
migrate? Maybe the admin project_id is being used instead of the 
original project_id who launched the instance?


Could be, but we should be pulling the request spec from the database 
which was created when the instance was created. There is some shim code 
from Newton which will create an essentially fake request spec on-demand 
when doing a move operation if the instance was created before newton, 
which could go back to that bug I was referring to.


Massimo - can you clarify if this is a new server created in your Ocata 
test environment that you're trying to move? Or is this a server created 
before Ocata?


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance

2018-05-29 Thread Matt Riedemann

On 5/29/2018 11:10 AM, Jay Pipes wrote:
The hosts you are attempting to migrate *to* do not have the 
filter_tenant_id property set to the same tenant ID as the compute host 
2 that originally hosted the instance.


That is why you see this in the scheduler logs when evaluating the 
fitness of compute host 1 and compute host 3:


"fails tenant id"

Best,
-jay


Hmm, I'm not sure about that. This is the aggregate right?

# nova  aggregate-show 52
++---+---+--+--+--+
| Id | Name  | Availability Zone | Hosts 
| Metadata 
| UUID 
|

++---+---+--+--+--+
| 52 | SgaraPrj1 | nova  | 'compute-01.cloud.pd.infn.it 
<http://compute-01.cloud.pd.infn.it>', 'compute-02.cloud.pd.infn.it 
<http://compute-02.cloud.pd.infn.it>' | 'availability_zone=nova', 
'filter_tenant_id=ee1865a76440481cbcff08544c7d580a', 'size=normal' | 
675f6291-6997-470d-87e1-e9ea199a379f |

++---+---+--+--+--+


So compute-01 and compute-02 are in that aggregate for the same tenant 
ee1865a76440481cbcff08544c7d580a.


From the logs, it skips compute-02 since the instance is already on 
that host.


> 2018-05-29 11:12:56.375 19428 INFO nova.scheduler.host_manager 
[req-45b8afd5-9683-40a6-8416-295563e37e34 
9bd03f63fa9d4beb8de31e6c2f2c8d12 56c3f5c047e74a78a714\

38c4412e6e13 - - -\
] Host filter ignoring hosts: compute-02.cloud.pd.infn.it 
<http://compute-02.cloud.pd.infn.it>


So it processes compute-01 and compute-03. It should accept compute-01 
since it's in the same tenant-specific aggregate and reject compute-03. 
But the filter rejects both hosts.


It would be useful to know what the tenant_id is when comparing against 
the aggregate metadata:


https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/filters/aggregate_multitenancy_isolation.py#L50

I'm wondering if the RequestSpec.project_id is null? Like, I wonder if 
you're hitting this bug:


https://bugs.launchpad.net/nova/+bug/1739318

Although if this is a clean Ocata environment with new instances, you 
shouldn't have that problem.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Need some feedback on the proposed heal_allocations CLI

2018-05-29 Thread Matt Riedemann

On 5/28/2018 7:31 AM, Sylvain Bauza wrote:
That said, given I'm now working on using Nested Resource Providers for 
VGPU inventories, I wonder about a possible upgrade problem with VGPU 
allocations. Given that :
  - in Queens, VGPU inventories are for the root RP (ie. the compute 
node RP), but,
  - in Rocky, VGPU inventories will be for children RPs (ie. against a 
specific VGPU type), then


if we have VGPU allocations in Queens, when upgrading to Rocky, we 
should maybe recreate the allocations to a specific other inventory ?


For how the heal_allocations CLI works today, if the instance has any 
allocations in placement, it skips that instance. So this scenario 
wouldn't be a problem.




Hope you see the problem with upgrading by creating nested RPs ?


Yes, the CLI doesn't attempt to have any knowledge about nested resource 
providers, it just takes the flavor embedded in the instance and creates 
allocations against the compute node provider using the flavor. It has 
no explicit knowledge about granular request groups or more advanced 
features like that.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] Need some feedback on the proposed heal_allocations CLI

2018-05-24 Thread Matt Riedemann
I've written a nova-manage placement heal_allocations CLI [1] which was 
a TODO from the PTG in Dublin as a step toward getting existing 
CachingScheduler users to roll off that (which is deprecated).


During the CERN cells v1 upgrade talk it was pointed out that CERN was 
able to go from placement-per-cell to centralized placement in Ocata 
because the nova-computes in each cell would automatically recreate the 
allocations in Placement in a periodic task, but that code is gone once 
you're upgraded to Pike or later.


In various other talks during the summit this week, we've talked about 
things during upgrades where, for instance, if placement is down for 
some reason during an upgrade, a user deletes an instance and the 
allocation doesn't get cleaned up from placement so it's going to 
continue counting against resource usage on that compute node even 
though the server instance in nova is gone. So this CLI could be 
expanded to help clean up situations like that, e.g. provide it a 
specific server ID and the CLI can figure out if it needs to clean 
things up in placement.


So there are plenty of things we can build into this, but the patch is 
already quite large. I expect we'll also be backporting this to stable 
branches to help operators upgrade/fix allocation issues. It already has 
several things listed in a code comment inline about things to build 
into this later.


My question is, is this good enough for a first iteration or is there 
something severely missing before we can merge this, like the automatic 
marker tracking mentioned in the code (that will probably be a 
non-trivial amount of code to add). I could really use some operator 
feedback on this to just take a look at what it already is capable of 
and if it's not going to be useful in this iteration, let me know what's 
missing and I can add that in to the patch.


[1] https://review.openstack.org/#/c/565886/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Multiple Ceph pools for Nova?

2018-05-21 Thread Matt Riedemann

On 5/21/2018 11:51 AM, Smith, Eric wrote:
I have 2 Ceph pools, one backed by SSDs and one backed by spinning disks 
(Separate roots within the CRUSH hierarchy). I’d like to run all 
instances in a single project / tenant on SSDs and the rest on spinning 
disks. How would I go about setting this up?


As mentioned elsewhere, host aggregate would work for the compute hosts 
connected to each storage pool. Then you can have different flavors per 
aggregate and charge more for the SSD flavors or restrict the aggregates 
based on tenant [1].


Alternatively, if this is something you plan to eventually scale to a 
larger size, you could even separate the pools with separate cells and 
use resource provider aggregates in placement to mirror the host 
aggregates for tenant-per-cell filtering [2]. It sounds like this is 
very similar to what CERN does (cells per hardware characteristics and 
projects assigned to specific cells). So Belmiro could probably help 
give some guidance here too. Check out the talk he gave today at the 
summit [3].


[1] 
https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregatemultitenancyisolation
[2] 
https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#tenant-isolation-with-placement
[3] 
https://www.openstack.org/videos/vancouver-2018/moving-from-cellsv1-to-cellsv2-at-cern


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] FYI on changes that might impact out of tree scheduler filters

2018-05-17 Thread Matt Riedemann
CERN has upgraded to Cells v2 and is doing performance testing of the 
scheduler and were reporting some things today which got us back to this 
bug [1]. So I've starting pushing some patches related to this but also 
related to an older blueprint I created [2]. In summary, we do quite a 
bit of DB work just to load up a list of instance objects per host that 
the in-tree filters don't even use.


The first change [3] is a simple optimization to avoid the default joins 
on the instance_info_caches and security_groups tables. If you have out 
of tree filters that, for whatever reason, rely on the 
HostState.instances objects to have info_cache or security_groups set, 
they'll continue to work, but will have to round-trip to the DB to 
lazy-load the fields, which is going to be a performance penalty on that 
filter. See the change for details.


The second change in the series [4] is more drastic in that we'll do 
away with pulling the full Instance object per host, which means only a 
select set of optional fields can be lazy-loaded [5], and the rest will 
result in an exception. The patch currently has a workaround config 
option to continue doing things the old way if you have out of tree 
filters that rely on this, but for good citizens with only in-tree 
filters, you will get a performance improvement during scheduling.


There are some other things we can do to optimize more of this flow, but 
this email is just about the ones that have patches up right now.


[1] https://bugs.launchpad.net/nova/+bug/1737465
[2] 
https://blueprints.launchpad.net/nova/+spec/put-host-manager-instance-info-on-a-diet

[3] https://review.openstack.org/#/c/569218/
[4] https://review.openstack.org/#/c/569247/
[5] 
https://github.com/openstack/nova/blob/de52fefa1fd52ccaac6807e5010c5f2a2dcbaab5/nova/objects/instance.py#L66


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Need feedback for nova aborting cold migration function

2018-05-17 Thread Matt Riedemann

On 5/15/2018 3:48 AM, saga...@nttdata.co.jp wrote:

We store the service logs which are created by VM on that storage.


I don't mean to be glib, but have you considered maybe not doing that?

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] attaching network cards to VMs taking a very long time

2018-05-17 Thread Matt Riedemann

On 5/17/2018 9:46 AM, George Mihaiescu wrote:

and large rally tests of 500 instances complete with no issues.


Sure, except you can't ssh into the guests.

The whole reason the vif plugging is fatal and timeout and callback code 
was because the upstream CI was unstable without it. The server would 
report as ACTIVE but the ports weren't wired up so ssh would fail. 
Having an ACTIVE guest that you can't actually do anything with is kind 
of pointless.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] attaching network cards to VMs taking a very long time

2018-05-16 Thread Matt Riedemann

On 5/16/2018 10:30 AM, Radu Popescu | eMAG, Technology wrote:

but I can see nova attaching the interface after a huge amount of time.


What specifically are you looking for in the logs when you see this?

Are you passing pre-created ports to attach to nova or are you passing a 
network ID so nova will create the port for you during the attach call?


This is where the ComputeManager calls the driver to plug the vif on the 
host:


https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L5187

Assuming you're using the libvirt driver, the host vif plug happens here:

https://github.com/openstack/nova/blob/stable/ocata/nova/virt/libvirt/driver.py#L1463

And the guest is updated here:

https://github.com/openstack/nova/blob/stable/ocata/nova/virt/libvirt/driver.py#L1472

vif_plugging_is_fatal and vif_plugging_timeout don't come into play here 
because we're attaching an interface to an existing server - or are you 
talking about during the initial creation of the guest, i.e. this code 
in the driver?


https://github.com/openstack/nova/blob/stable/ocata/nova/virt/libvirt/driver.py#L5257

Are you seeing this in the logs for the given port?

https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L6875

If not, it could mean that neutron-server never send the event to nova, 
so nova-compute timed out waiting for the vif plug callback event to 
tell us that the port is ready and the server can be changed to ACTIVE 
status.


The neutron-server logs should log when external events are being sent 
to nova for the given port, you probably need to trace the requests and 
compare the nova-compute and neutron logs for a given server create request.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] New project creation fails because of a Nova check in a multi-region cloud

2018-05-10 Thread Matt Riedemann

On 5/10/2018 6:30 PM, Jean-Philippe Méthot wrote:
1.I was talking about the region-name parameter underneath 
keystone_authtoken. That is in the pike doc you linked, but I am unaware 
if this is only used for token generation or not. Anyhow, it doesn’t 
seem to have any impact on the issue at hand.


The [keystone]/region_name config option in nova is used to pike the 
identity service endpoint so I think in that case region_one will matter 
if there are multiple identity endpoints in the service catalog. The 
only thing is you're on pike where [keystone]/region_name isn't in 
nova.conf and it's not used, it was added in queens for this lookup:


https://review.openstack.org/#/c/507693/

So that might be why it doesn't seem to make a difference if you set it 
in nova.conf - because the nova code isn't actually using it.


You could try backporting that patch into your pike deployment, set 
region_name to RegionOne and see if it makes a difference (although I 
thought RegionOne was the default if not specified?).


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Need feedback for nova aborting cold migration function

2018-05-10 Thread Matt Riedemann

On 5/9/2018 9:33 PM, saga...@nttdata.co.jp wrote:

We always do the maintenance work on midnight during limited time-slot to 
minimize impact to our users.


Also, why are you doing maintenance with cold migration? Why not do live 
migration for your maintenance (which already supports the abort function).


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] New project creation fails because of a Nova check in a multi-region cloud

2018-05-10 Thread Matt Riedemann

On 5/9/2018 8:11 PM, Jean-Philippe Méthot wrote:
I currently operate a multi-region cloud split between 2 geographic 
locations. I have updated it to Pike not too long ago, but I've been 
running into a peculiar issue. Ever since the Pike release, Nova now 
asks Keystone if a new project exists in Keystone before configuring the 
project’s quotas. However, there doesn’t seem to be any region 
restriction regarding which endpoint Nova will query Keystone on. So, 
right now, if I create a new project in region one, Nova will query 
Keystone in region two. Because my keystone databases are not synched in 
real time between each region, the region two Keystone will tell it that 
the new project doesn't exist, while it exists in region one Keystone.


Thinking that this could be a configuration error, I tried setting the 
region_name in keystone_authtoken, but that didn’t change much of 
anything. Right now I am thinking this may be a bug. Could someone 
confirm that this is indeed a bug and not a configuration error?


To circumvent this issue, I am considering either modifying the database 
by hand or trying to implement realtime replication between both 
Keystone databases. Would there be another solution? (beside modifying 
the code for the Nova check)


This is the specific code you're talking about:

https://github.com/openstack/nova/blob/stable/pike/nova/api/openstack/identity.py#L35

I don't see region_name as a config option for talking to keystone in Pike:

https://docs.openstack.org/nova/pike/configuration/config.html#keystone

But it is in Queens:

https://docs.openstack.org/nova/queens/configuration/config.html#keystone

That was added in this change:

https://review.openstack.org/#/c/507693/

But I think what you're saying is, since you have multiple regions, the 
project could be in any of them at any given time until they synchronize 
so configuring nova for a specific region isn't probably going to help 
in this case, right?


Isn't this somehow resolved with keystone federation? Granted, I'm not 
at all a keystone person, but I'd think this isn't a unique problem.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova][ironic] ironic_host_manager and baremetal scheduler options removal

2018-05-02 Thread Matt Riedemann

On 5/2/2018 12:39 PM, Matt Riedemann wrote:
FWIW, I think we can also backport the data migration CLI to stable 
branches once we have it available so you can do your migration in let's 
say Queens before g


FYI, here is the start on the data migration CLI:

https://review.openstack.org/#/c/565886/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild

2018-05-02 Thread Matt Riedemann

On 5/2/2018 5:39 PM, Jay Pipes wrote:
My personal preference is to add less technical debt and go with a 
solution that checks if image traits have changed in nova-api and if so, 
simply refuse to perform a rebuild.


So, what if when I created my server, the image I used, let's say 
image1, had required trait A and that fit the host.


Then some external service removes (or somehow changes) trait A from the 
compute node resource provider (because people can and will do this, 
there are a few vmware specs up that rely on being able to manage traits 
out of band from nova), and then I rebuild my server with image2 that 
has required trait A. That would match the original trait A in image1 
and we'd say, "yup, lgtm!" and do the rebuild even though the compute 
node resource provider wouldn't have trait A anymore.


Having said that, it could technically happen before traits if the 
operator changed something on the underlying compute host which 
invalidated instances running on that host, but I'd think if that 
happened the operator would be migrating everything off the host and 
disabling it from scheduling before making whatever that kind of change 
would be, let's say they change the hypervisor or something less drastic 
but still image property invalidating.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova][ironic] ironic_host_manager and baremetal scheduler options removal

2018-05-02 Thread Matt Riedemann

On 5/2/2018 12:00 PM, Mathieu Gagné wrote:

If one can still run CachingScheduler (even if it's deprecated), I
think we shouldn't remove the above options.
As you can end up with a broken setup and IIUC no way to migrate to
placement since migration script has yet to be written.


You're currently on cells v1 on mitaka right? So you have some time to 
get this sorted out before getting to Rocky where the IronicHostManager 
is dropped.


I know you're just one case, but I don't know how many people are really 
running the CachingScheduler with ironic either, so it might be rare. It 
would be nice to get other operator input here, like I'm guessing CERN 
has their cells carved up so that certain cells are only serving 
baremetal requests while other cells are only VMs?


FWIW, I think we can also backport the data migration CLI to stable 
branches once we have it available so you can do your migration in let's 
say Queens before getting to Rocky.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova][ironic] ironic_host_manager and baremetal scheduler options removal

2018-05-02 Thread Matt Riedemann

On 5/2/2018 11:40 AM, Mathieu Gagné wrote:

What's the state of caching_scheduler which could still be using those configs?


The CachingScheduler has been deprecated since Pike [1]. We discussed 
the CachingScheduler at the Rocky PTG in Dublin [2] and have a TODO to 
write a nova-manage data migration tool to create allocations in 
Placement for instances that were scheduled using the CachingScheduler 
(since Pike) which don't have their own resource allocations set in 
Placement (remember that starting in Pike the FilterScheduler started 
creating allocations in Placement rather than the ResourceTracker in 
nova-compute).


If you're running computes that are Ocata or Newton, then the 
ResourceTracker in the nova-compute service should be creating the 
allocations in Placement for you, assuming you have the compute service 
configured to talk to Placement (optional in Newton, required in Ocata).


[1] https://review.openstack.org/#/c/492210/
[2] https://etherpad.openstack.org/p/nova-ptg-rocky-placement

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova][ironic] ironic_host_manager and baremetal scheduler options removal

2018-05-02 Thread Matt Riedemann
The baremetal scheduling options were deprecated in Pike [1] and the 
ironic_host_manager was deprecated in Queens [2] and is now being 
removed [3]. Deployments must use resource classes now for baremetal 
scheduling. [4]


The large host subset size value is also no longer needed. [5]

I've gone through all of the references to "ironic_host_manager" that I 
could find in codesearch.o.o and updated projects accordingly [6].


Please reply ASAP to this thread and/or [3] if you have issues with this.

[1] https://review.openstack.org/#/c/493052/
[2] https://review.openstack.org/#/c/521648/
[3] https://review.openstack.org/#/c/565805/
[4] 
https://docs.openstack.org/ironic/latest/install/configure-nova-flavors.html#scheduling-based-on-resource-classes

[5] https://review.openstack.org/565736/
[6] 
https://review.openstack.org/#/q/topic:exact-filters+(status:open+OR+status:merged)


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild

2018-05-02 Thread Matt Riedemann

On 5/1/2018 5:26 PM, Arvind N wrote:
In cases of rebuilding of an instance using a different image where the 
image traits have changed between the original launch and the rebuild, 
is it reasonable to ask to just re-launch a new instance with the new image?


The argument for this approach is that given that the requirements have 
changed, we want the scheduler to pick and allocate the appropriate host 
for the instance.


We don't know if the requirements have changed with the new image until 
we check them.


Here is another option:

What if the API compares the original image required traits against the 
new image required traits, and if the new image has required traits 
which weren't in the original image, then (punt) fail in the API? Then 
you would at least have a chance to rebuild with a new image that has 
required traits as long as those required traits are less than or equal 
to the originally validated traits for the host on which the instance is 
currently running.




The approach above also gives you consistent results vs the other 
approaches where the rebuild may or may not succeed depending on how the 
original allocation of resources went.




Consistently frustrating, I agree. :) Because as a user, I can rebuild 
with some images (that don't have required traits) and can't rebuild 
with other images (that do have required traits).


I see no difference with this and being able to rebuild (with a new 
image) some instances (image-backed) and not others (volume-backed). 
Given that, I expect if we punt on this, someone will just come along 
asking for the support later. Could be a couple of years from now when 
everyone has moved on and it then becomes someone else's problem.


For example(from Alex Xu) ,if you launched an instance on a host which 
has two SRIOV nic. One is normal SRIOV nic(A), another one with some 
kind of offload feature(B).


So, the original request is: resources=SRIOV_VF:1 The instance gets a VF 
from the normal SRIOV nic(A).


But with a new image, the new request is: resources=SRIOV_VF:1 
traits=HW_NIC_OFFLOAD_XX


With all the solutions discussed in the thread, a rebuild request like 
above may or may not succeed depending on whether during the initial 
launch whether nic A or nic B was allocated.


Remember that in rebuild new allocation don't happen, we have to reuse 
the existing allocations.


Given the above background, there seems to be 2 competing options.

1. Fail in the API saying you can't rebuild with a new image with new 
required traits.


2. Look at the current allocations for the instance and try to match the 
new requirement from the image with the allocations.


With #1, we get consistent results in regards to how rebuilds are 
treated when the image traits changed.


With #2, the rebuild may or may not succeed, depending on how well the 
original allocations match up with the new requirements.


#2 will also need to need to account for handling preferred traits or 
granular resource traits if we decide to implement them for images at 
some point...


Option 10: Don't support image-defined traits at all. I know that won't 
happen though.


At this point I'm exhausted with this entire issue and conversation and 
will probably bow out and need someone else to step in with different 
perspective, like melwitt or dansmith.


All of the solutions are bad in their own way, either because they add 
technical debt and poor user experience, or because they make rebuild 
more complicated and harder to maintain for the developers.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova] Default scheduler filters survey

2018-04-30 Thread Matt Riedemann

On 4/30/2018 11:41 AM, Mathieu Gagné wrote:

[6] Used to filter Ironic nodes based on the 'reserved_for_user_id'
Ironic node property.
 This is mainly used when enrolling existing nodes already living
on a different system.
 We reserve the node to a special internal user so the customer
cannot reserve
 the node by mistake until the process is completed.
 Latest version of Nova dropped user_id from RequestSpec. We had to
add it back.


See https://review.openstack.org/#/c/565340/ for context on the 
regression mentioned about RequestSpec.user_id.


Thanks Mathieu for jumping in #openstack-nova and discussing it.

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova] Default scheduler filters survey

2018-04-27 Thread Matt Riedemann

On 4/27/2018 4:02 AM, Tomáš Vondra wrote:
Also, Windows host isolation is done using image metadata. I have filled 
a bug somewhere that it does not work correctly with Boot from Volume.


Likely because for boot from volume the instance.image_id is ''. The 
request spec, which the filter has access to, also likely doesn't have 
the backing image metadata for the volume because the instance isn't 
creating with an image directly. But nova could fetch the image metadata 
from the volume and put that into the request spec. We fixed a similar 
bug recently for the IsolatedHostsFilter:


https://review.openstack.org/#/c/543263/

If you can find the bug, or report a new one, I could take a look.

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova] Concern about trusted certificates API change

2018-04-18 Thread Matt Riedemann

On 4/18/2018 11:57 AM, Jay Pipes wrote:
There is a compute REST API change proposed [1] which will allow users 
to pass trusted certificate IDs to be used with validation of images 
when creating or rebuilding a server. The trusted cert IDs are based 
on certificates stored in some key manager, e.g. Barbican.


The full nova spec is here [2].

The main concern I have is that trusted certs will not be supported 
for volume-backed instances, and some clouds only support 
volume-backed instances.


Yes. And some clouds only support VMWare vCenter virt driver. And some 
only support Hyper-V. I don't believe we should delay adding good 
functionality to (large percentage of) clouds because it doesn't yet 
work with one virt driver or one piece of (badly-designed) functionality.


Maybe it wasn't clear but I'm not advocating that we block the change 
until volume-backed instances are supported with trusted certs. I'm 
suggesting we add a policy rule which allows deployers to at least 
disable it via policy if it's not supported for their cloud.



 > The way the patch is written is that if the user attempts to

boot from volume with trusted certs, it will fail.


And... I think that's perfectly fine.


I agree. I'm the one that noticed the issue and pointed out in the code 
review that we should explicitly fail the request if we can't honor it.




In thinking about a semi-discoverable/configurable solution, I'm 
thinking we should add a policy rule around trusted certs to indicate 
if they can be used or not. Beyond the boot from volume issue, the 
only virt driver that supports trusted cert image validation is the 
libvirt driver, so any cloud that's not using the libvirt driver 
simply cannot support this feature, regardless of boot from volume. We 
have added similar policy rules in the past for backend-dependent 
features like volume extend and volume multi-attach, so I don't think 
this is a new issue.


Alternatively we can block the change in nova until it supports boot 
from volume, but that would mean needing to add trusted cert image 
validation support into cinder along with API changes, effectively 
killing the chance of this getting done in nova in Rocky, and this 
blueprint has been around since at least Ocata so it would be good to 
make progress if possible.


As mentioned above, I don't want to derail progress until (if ever?) 
trusted certs achieves this magical 
works-for-every-driver-and-functionality state. It's not realistic to 
expect this to be done, IMHO, and just keeps good functionality out of 
the hands of many cloud users.


Again, I'm not advocating that we block until boot from volume is 
supported. However, we have a lot of technical debt for "good 
functionality" added over the years that failed to consider 
volume-backed instances, like rebuild, rescue, backup, etc and it's 
painful to deal with that after the fact, as can be seen from the 
various specs proposed for adding that support to those APIs.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] Concern about trusted certificates API change

2018-04-18 Thread Matt Riedemann
There is a compute REST API change proposed [1] which will allow users 
to pass trusted certificate IDs to be used with validation of images 
when creating or rebuilding a server. The trusted cert IDs are based on 
certificates stored in some key manager, e.g. Barbican.


The full nova spec is here [2].

The main concern I have is that trusted certs will not be supported for 
volume-backed instances, and some clouds only support volume-backed 
instances. The way the patch is written is that if the user attempts to 
boot from volume with trusted certs, it will fail.


In thinking about a semi-discoverable/configurable solution, I'm 
thinking we should add a policy rule around trusted certs to indicate if 
they can be used or not. Beyond the boot from volume issue, the only 
virt driver that supports trusted cert image validation is the libvirt 
driver, so any cloud that's not using the libvirt driver simply cannot 
support this feature, regardless of boot from volume. We have added 
similar policy rules in the past for backend-dependent features like 
volume extend and volume multi-attach, so I don't think this is a new issue.


Alternatively we can block the change in nova until it supports boot 
from volume, but that would mean needing to add trusted cert image 
validation support into cinder along with API changes, effectively 
killing the chance of this getting done in nova in Rocky, and this 
blueprint has been around since at least Ocata so it would be good to 
make progress if possible.


[1] https://review.openstack.org/#/c/486204/
[2] 
https://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/nova-validate-certificates.html


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] RFC: Next minimum libvirt / QEMU versions for "Stein" release

2018-04-09 Thread Matt Riedemann

On 4/9/2018 4:58 AM, Kashyap Chamarthy wrote:

Keep in mind that Matt has a tendency to sometimes unfairly
over-simplify others views;-).  More seriously, c'mon Matt; I went out
of my way to spend time learning about Debian's packaging structure and
trying to get the details right by talking to folks on
#debian-backports.  And as you may have seen, I marked the patch[*] as
"RFC", and repeatedly said that I'm working on an agreeable lowest
common denominator.


Sorry Kashyap, I didn't mean to offend. I was hoping "delicious bugs" 
would have made that obvious but I can see how it's not. You've done a 
great, thorough job on sorting this all out.


Since I didn't know what "RFC" meant until googling it today, how about 
dropping that from the patch so I can +2 it?


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] RFC: Next minimum libvirt / QEMU versions for "Stein" release

2018-04-06 Thread Matt Riedemann

On 4/6/2018 12:07 PM, Kashyap Chamarthy wrote:

FWIW, I'd suggest so, if it's not too much maintenance.  It'll just
spare you additional bug reports in that area, and the overall default
experience when dealing with CPU models would be relatively much better.
(Another way to look at it is, multiple other "conservative" long-term
stable distributions also provide libvirt 3.2.0 and QEMU 2.9.0, so that
should give you confidence.)

Again, I don't want to push too hard on this.  If that'll be messy from
a package maintainance POV for you / Debian maintainers, then we could
settle with whatever is in 'Stretch'.


Keep in mind that Kashyap has a tendency to want the latest and greatest 
of libvirt and qemu at all times for all of those delicious bug fixes. 
But we also know that new code also brings new not-yet-fixed bugs.


Keep in mind the big picture here, we're talking about bumping from 
minimum required (in Rocky) libvirt 1.3.1 to at least 3.0.0 (in Stein) 
and qemu 2.5.0 to at least 2.8.0, so I think that's already covering 
some good ground. Let's not get greedy. :)


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] RFC: Next minimum libvirt / QEMU versions for "Stein" release

2018-04-05 Thread Matt Riedemann

On 4/5/2018 3:32 PM, Thomas Goirand wrote:

If you don't absolutely need new features from libvirt 3.2.0 and 3.0.0
is fine, please choose 3.0.0 as minimum.

If you don't absolutely need new features from qemu 2.9.0 and 2.8.0 is
fine, please choose 2.8.0 as minimum.

If you don't absolutely need new features from libguestfs 1.36 and 1.34
is fine, please choose 1.34 as minimum.


New features in the libvirt driver which depend on minimum versions of 
libvirt/qemu/libguestfs (or arch for that matter) are always 
conditional, so I think it's reasonable to go with the lower bound for 
Debian. We can still support the features for the newer versions if 
you're running a system with those versions, but not penalize people 
with slightly older versions if not.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


  1   2   3   4   5   6   7   >