from:"matt"


On 9/12/2018 5:32 PM, Melvin Hillsman wrote:
We basically spent the day focusing on two things specific to what you 
bring up and are in agreement with you regarding action not just talk 
around feedback and outreach. [1]
We wiped the agenda clean, discussed our availability (set reasonable 
expectations), and revisited how we can be more diligent and successful 
around these two principles which target your first comment, "...get 
their RFE/bug list ranked from the operator community (because some of 
the requests are not exclusive to public cloud), and then put pressure 
on the TC to help project manage the delivery of the top issue..."


I will not get into much detail because again this response is specific 
to a portion of your email so in keeping with feedback and outreach the 
UC is making it a point to be intentional. We have already got action 
items [2] which target the concern you raise. We have agreed to hold 
each other accountable and adjusted our meeting structure to facilitate 
being successful.


Not that the UC (elected members) are the only ones who can do this but 
we believe it is our responsibility to; regardless of what anyone else 
does. The UC is also expected to enlist others and hopefully through our 
efforts others are encouraged participate and enlist others.


[1] https://etherpad.openstack.org/p/uc-stein-ptg
[2] https://etherpad.openstack.org/p/UC-Election-Qualifications


Awesome, thank you Melvin and others on the UC.

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] [Openstack-sigs] Open letter/request to TC candidates (and existing elected officials)


On 9/12/2018 5:13 PM, Jeremy Stanley wrote:

Sure, and I'm saying that instead I think the influence of TC
members_can_  be more valuable in finding and helping additional
people to do these things rather than doing it all themselves, and
it's not just about the limited number of available hours in the day
for one person versus many. The successes goal champions experience,
the connections they make and the elevated reputation they gain
throughout the community during the process of these efforts builds
new leaders for us all.


Again, I'm not saying TC members should be doing all of the work 
themselves. That's not realistic, especially when critical parts of any 
major effort are going to involve developers from projects on which none 
of the TC members are active contributors (e.g. nova). I want to see TC 
members herd cats, for lack of a better analogy, and help out 
technically (with code) where possible.


Given the repeated mention of how the "help wanted" list continues to 
not draw in contributors, I think the recruiting role of the TC should 
take a back seat to actually stepping in and helping work on those items 
directly. For example, Sean McGinnis is taking an active role in the 
operators guide and other related docs that continue to be discussed at 
every face to face event since those docs were dropped from 
openstack-manuals (in Pike).


I think it's fair to say that the people generally elected to the TC are 
those most visible in the community (it's a popularity contest) and 
those people are generally the most visible because they have the luxury 
of working upstream the majority of their time. As such, it's their duty 
to oversee and spend time working on the hard cross-project technical 
deliverables that operators and users are asking for, rather than think 
of an infinite number of ways to try and draw *others* to help work on 
those gaps. As I think it's the role of a PTL within a given project to 
have a finger on the pulse of the technical priorities of that project 
and manage the developers involved (of which the PTL certainly may be 
one), it's the role of the TC to do the same across openstack as a 
whole. If a PTL doesn't have the time or willingness to do that within 
their project, they shouldn't be the PTL. The same goes for TC members IMO.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [Openstack-sigs] [openstack-dev] Open letter/request to TC candidates (and existing elected officials)


On 9/12/2018 4:14 PM, Jeremy Stanley wrote:

I think Doug's work leading the Python 3 First effort is a great
example. He has helped find and enable several other goal champions
to collaborate on this. I appreciate the variety of other things
Doug already does with his available time and would rather he not
stop doing those things to spend all his time acting as a project
manager.


I specifically called out what Doug is doing as an example of things I 
want to see the TC doing. I want more/all TC members doing that.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [Openstack-sigs] Open letter/request to TC candidates (and existing elected officials)


On 9/12/2018 3:55 PM, Jeremy Stanley wrote:

I almost agree with you. I think the OpenStack TC members should be
actively engaged in recruiting and enabling interested people in the
community to do those things, but I don't think such work should be
solely the domain of the TC and would hate to give the impression
that you must be on the TC to have such an impact.


See my reply to Thierry. This isn't what I'm saying. But I expect the 
elected TC members to be *much* more *directly* involved in managing and 
driving hard cross-project technical deliverables.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

[Openstack-operators] Open letter/request to TC candidates (and existing elected officials)

Rather than take a tangent on Kristi's candidacy thread [1], I'll bring 
this up separately.


Kristi said:

"Ultimately, this list isn’t exclusive and I’d love to hear your and 
other people's opinions about what you think the I should focus on."


Well since you asked...

Some feedback I gave to the public cloud work group yesterday was to get 
their RFE/bug list ranked from the operator community (because some of 
the requests are not exclusive to public cloud), and then put pressure 
on the TC to help project manage the delivery of the top issue. I would 
like all of the SIGs to do this. The upgrades SIG should rank and 
socialize their #1 issue that needs attention from the developer 
community - maybe that's better upgrade CI testing for deployment 
projects, maybe it's getting the pre-upgrade checks goal done for Stein. 
The UC should also be doing this; maybe that's the UC saying, "we need 
help on closing feature gaps in openstack client and/or the SDK". I 
don't want SIGs to bombard the developers with *all* of their 
requirements, but I want to get past *talking* about the *same* issues 
*every* time we get together. I want each group to say, "this is our top 
issue and we want developers to focus on it." For example, the extended 
maintenance resolution [2] was purely birthed from frustration about 
talking about LTS and stable branch EOL every time we get together. It's 
also the responsibility of the operator and user communities to weigh in 
on proposed release goals, but the TC should be actively trying to get 
feedback from those communities about proposed goals, because I bet 
operators and users don't care about mox removal [3].


I want to see the TC be more of a cross-project project management 
group, like a group of Ildikos and what she did between nova and cinder 
to get volume multi-attach done, which took persistent supervision to 
herd the cats and get it delivered. Lance is already trying to do this 
with unified limits. Doug is doing this with the python3 goal. I want my 
elected TC members to be pushing tangible technical deliverables forward.


I don't find any value in the TC debating ad nauseam about visions and 
constellations and "what is openstack?". Scope will change over time 
depending on who is contributing to openstack, we should just accept 
this. And we need to realize that if we are failing to deliver value to 
operators and users, they aren't going to use openstack and then "what 
is openstack?" won't matter because no one will care.


So I encourage all elected TC members to work directly with the various 
SIGs to figure out their top issue and then work on managing those 
deliverables across the community because the TC is particularly well 
suited to do so given the elected position. I realize political and 
bureaucratic "how should openstack deal with x?" things will come up, 
but those should not be the priority of the TC. So instead of 
philosophizing about things like, "should all compute agents be in a 
single service with a REST API" for hours and hours, every few months - 
immediately ask, "would doing that get us any closer to achieving top 
technical priority x?" Because if not, or it's so fuzzy in scope that no 
one sees the way forward, document a decision and then drop it.


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-September/134490.html
[2] 
https://governance.openstack.org/tc/resolutions/20180301-stable-branch-eol.html

[3] https://governance.openstack.org/tc/goals/rocky/mox_removal.html

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] [upgrade] request for pre-upgrade check for db purge

2018-09-11 Thread Matt Riedemann


On 9/11/2018 9:01 AM, Dan Smith wrote:

I dunno, adding something to nova.conf that is only used for nova-status
like that seems kinda weird to me. It's just a warning/informational
sort of thing so it just doesn't seem worth the complication to me.


It doesn't seem complicated to me, I'm not sure why the config is weird, 
but maybe just because it's config-driven CLI behavior?




Moving it to an age thing set at one year seems okay, and better than
making the absolute limit more configurable.

Any reason why this wouldn't just be a command line flag to status if
people want it to behave in a specific way from a specific tool?


I always think of the pre-upgrade checks as release-specific and we 
could drop the old ones at some point, so that's why I wasn't thinking 
about adding check-specific options to the command - but since we also 
say it's OK to run "nova-status upgrade check" to verify a green 
install, it's probably good to leave the old checks in place, i.e. 
you're likely always going to want those cells v2 and placement checks 
we added in ocata even long after ocata EOL.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

[Openstack-operators] [upgrade] request for pre-upgrade check for db purge

2018-09-10 Thread Matt Riedemann

I created a nova bug [1] to track a request that came up in the upgrades 
SIG room at the PTG today [2] and would like to see if there is any 
feedback from other operators/developers that weren't part of the 
discussion.


The basic problem is that failing to archive/purge deleted records* from 
the database can make upgrades much slower during schema migrations. 
Anecdotes from the room mentioned that it can be literally impossible to 
complete upgrades for keystone and heat in certain scenarios if you 
don't purge the database first.


The request was that a configurable limit gets added to each service 
which is checked as part of the service's pre-upgrade check routine [3] 
and warn if the number of records to purge is over that limit.


For example, the nova-status upgrade check could warn if there are over 
10 deleted records total across all cells databases. Maybe cinder 
would have something similar for deleted volumes. Keystone could have 
something for revoked tokens.


Another idea in the room was flagging on records over a certain age 
limit. For example, if there are deleted instances in nova that were 
deleted >1 year ago.


How do people feel about this? It seems pretty straight-forward to me. 
If people are generally in favor of this, then the question is what 
would be sane defaults - or should we not assume a default and force 
operators to opt into this?


* nova delete doesn't actually delete the record from the instances 
table, it flips a value to hide it - you have to archive/purge those 
records to get them out of the main table.


[1] https://bugs.launchpad.net/nova/+bug/1791824
[2] https://etherpad.openstack.org/p/upgrade-sig-ptg-stein
[3] https://governance.openstack.org/tc/goals/stein/upgrade-checkers.html

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] leaving Openstack mailing lists

2018-09-07 Thread Matt Riedemann


On 9/6/2018 6:42 AM, Saverio Proto wrote:

Hello,

I will be leaving this mailing list in a few days.

I am going to a new job and I will not be involved with Openstack at
least in the short term future.
Still, it was great working with the Openstack community in the past few years.

If you need to reach me about any bug/patch/review that I submitted in
the past, just write directly to my email. I will try to give answers.

Cheers

Saverio


Good luck on the new thing. From a developer perspective, I appreciated 
you putting the screws to us from time to time, since it helps re-align 
priorities.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

[Openstack-operators] [nova][placement][upgrade][qa] Some upgrade-specific news on extraction

2018-09-06 Thread Matt Riedemann

I wanted to recap some upgrade-specific stuff from today outside of the
other [1] technical extraction thread.

Chris has a change up for review [2] which prompted the discussion.

That change makes placement only work with placement.conf, not
nova.conf, but does get a passing tempest run in the devstack patch [3].

The main issue here is upgrades. If you think of this like deprecating
config options, the old config options continue to work for a release
and then are dropped after a full release (or 3 months across boundaries
for CDers) [4]. Given that, Chris's patch would break the standard
deprecation policy. Clearly one simple way outside of code to make that
work is just copy and rename nova.conf to placement.conf and voila. But
that depends on *all* deployment/config tooling to get that right out of
the gate.

The other obvious thing is the database. The placement repo code as-is
today still has the check for whether or not it should use the placement
database but falls back to using the nova_api database [5]. So
technically you could point the extracted placement at the same nova_api
database and it should work. However, at some point deployers will
clearly need to copy the placement-related tables out of the nova_api DB
to a new placement DB and make sure the 'migrate_version' table is
dropped so that placement DB schema versions can reset to 1.

With respect to grenade and making this work in our own upgrade CI
testing, we have I think two options (which might not be mutually
exclusive):

1. Make placement support using nova.conf if placement.conf isn't found
for Stein with lots of big warnings that it's going away in T. Then
Rocky nova.conf with the nova_api database configuration just continues
to work for placement in Stein. I don't think we then have any grenade
changes to make, at least in Stein for upgrading *from* Rocky. Assuming
fresh devstack installs in Stein use placement.conf and a
placement-specific database, then upgrades from Stein to T should also
be OK with respect to grenade, but likely punts the cut-over issue for
all other deployment projects (because we don't CI with grenade doing
Rocky->Stein->T, or FFU in other words).

2. If placement doesn't support nova.conf in Stein, then grenade will
require an (exceptional) [6] from-rocky upgrade script which will (a)
write out placement.conf fresh and (b) run a DB migration script, likely
housed in the placement repo, to create the placement database and copy
the placement-specific tables out of the nova_api database. Any script
like this is likely needed regardless of what we do in grenade because
deployers will need to eventually do this once placement would drop
support for using nova.conf (if we went with option 1).

That's my attempt at a summary. It's going to be very important that
operators and deployment project contributors weigh in here if they have
strong preferences either way, and note that we can likely do both
options above - grenade could do the fresh cutover from rocky to stein
but we allow running with nova.conf and nova_api DB in placement in
stein with plans to drop that support in T.

[1]
http://lists.openstack.org/pipermail/openstack-dev/2018-September/subject.html#134184

[2] https://review.openstack.org/#/c/600157/
[3] https://review.openstack.org/#/c/600162/
[4]
https://governance.openstack.org/tc/reference/tags/assert_follows-standard-deprecation.html#requirements
[5]
https://github.com/openstack/placement/blob/fb7c1909/placement/db_api.py#L27

[6] https://docs.openstack.org/grenade/latest/readme.html#theory-of-upgrade

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] OpenStack Summit Forum in Berlin: Topic Selection Process

2018-09-06 Thread Matt Riedemann


On 9/6/2018 2:56 PM, Jeremy Stanley wrote:

On 2018-09-06 14:31:01 -0500 (-0500), Matt Riedemann wrote:

On 8/29/2018 1:08 PM, Jim Rollenhagen wrote:

On Wed, Aug 29, 2018 at 12:51 PM, Jimmy McArthur mailto:ji...@openstack.org>> wrote:


 Examples of typical sessions that make for a great Forum:

 Strategic, whole-of-community discussions, to think about the big
 picture, including beyond just one release cycle and new technologies

 e.g. OpenStack One Platform for containers/VMs/Bare Metal (Strategic
 session) the entire community congregates to share opinions on how
 to make OpenStack achieve its integration engine goal


Just to clarify some speculation going on in IRC: this is an example,
right? Not a new thing being announced?

// jim

FYI for those that didn't see this on the other ML:

http://lists.openstack.org/pipermail/foundation/2018-August/002617.html

[...]

While I agree that's a great post to point out to all corners of the
community, I don't see what it has to do with whether "OpenStack One
Platform for containers/VMs/Bare Metal" was an example forum topic.


Because if I'm not mistaken it was the impetus for the hullabaloo in the 
tc channel that was related to the foundation ML post.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] OpenStack Summit Forum in Berlin: Topic Selection Process

2018-09-06 Thread Matt Riedemann


On 8/29/2018 1:08 PM, Jim Rollenhagen wrote:
On Wed, Aug 29, 2018 at 12:51 PM, Jimmy McArthur <mailto:ji...@openstack.org>> wrote:



Examples of typical sessions that make for a great Forum:

Strategic, whole-of-community discussions, to think about the big
picture, including beyond just one release cycle and new technologies

e.g. OpenStack One Platform for containers/VMs/Bare Metal (Strategic
session) the entire community congregates to share opinions on how
to make OpenStack achieve its integration engine goal


Just to clarify some speculation going on in IRC: this is an example, 
right? Not a new thing being announced?


// jim


FYI for those that didn't see this on the other ML:

http://lists.openstack.org/pipermail/foundation/2018-August/002617.html

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] [nova] [placement] extraction (technical) update

2018-09-05 Thread Matt Riedemann


On 9/5/2018 8:47 AM, Mohammed Naser wrote:

Could placement not do what happened for a while when the nova_api
database was created?


Can you be more specific? I'm having a brain fart here and not 
remembering what you are referring to with respect to the nova_api DB.




I say this because I know that moving the database is a huge task for
us, considering how big it can be in certain cases for us, and it
means control plane outage too


I'm pretty sure you were in the room in YVR when we talked about how 
operators were going to do the database migration and were mostly OK 
with what was discussed, which was a lot will just copy and take the 
downtime (I think CERN said around 10 minutes for them, but they aren't 
a public cloud either), but others might do something more sophisticated 
and nova shouldn't try to pick the best fit for all.


I'm definitely interested in what you do plan to do for the database 
migration to minimize downtime.


+openstack-operators ML since this is an operators discussion now.

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-29 Thread Matt Riedemann


On 8/29/2018 3:21 PM, Tim Bell wrote:

Sounds like a good topic for PTG/Forum?


Yeah it's already on the PTG agenda [1][2]. I started the thread because 
I wanted to get the ball rolling as early as possible, and with people 
that won't attend the PTG and/or the Forum, to weigh in on not only the 
known issues with cross-cell migration but also the things I'm not 
thinking about.


[1] https://etherpad.openstack.org/p/nova-ptg-stein
[2] https://etherpad.openstack.org/p/nova-ptg-stein-cells

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

[Openstack-operators] [nova] Deprecating Core/Disk/RamFilter

2018-08-24 Thread Matt Riedemann

This is just an FYI that I have proposed that we deprecate the 
core/ram/disk filters [1]. We should have probably done this back in 
Pike when we removed them from the default enabled_filters list and also 
deprecated the CachingScheduler, which is the only in-tree scheduler 
driver that benefits from enabling these filters. With the 
heal_allocations CLI, added in Rocky, we can probably drop the 
CachingScheduler in Stein so the pieces are falling into place. As we 
saw in a recent bug [2], having these enabled in Stein now causes 
blatantly incorrect filtering on ironic nodes.


Comments are welcome here, the review, or in IRC.

[1] https://review.openstack.org/#/c/596502/
[2] https://bugs.launchpad.net/tripleo/+bug/1787910

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] [nova][cinder] Disabling nova volume-update (aka swap volume; aka cinder live migration)

2018-08-24 Thread Matt Riedemann


On 8/21/2018 5:36 AM, Lee Yarwood wrote:

I'm definitely in favor of hiding this from users eventually but
wouldn't this require some form of deprecation cycle?

Warnings within the API documentation would also be useful and even
something we could backport to stable to highlight just how fragile this
API is ahead of any policy change.


The swap volume API in nova defaults to admin-only policy rules by 
default, so for any users that are using it directly, they are (1) 
admins knowingly shooting themselves, or their users, in the foot or (2) 
operators have opened up the policy to non-admins (or some other role of 
user) to hit the API directly. I would ask why that is.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] [nova][cinder][neutron] Cross-cell cold migration

2018-08-24 Thread Matt Riedemann


+operators

On 8/24/2018 4:08 PM, Matt Riedemann wrote:

On 8/23/2018 10:22 AM, Sean McGinnis wrote:
I haven't gone through the workflow, but I thought shelve/unshelve 
could detach
the volume on shelving and reattach it on unshelve. In that workflow, 
assuming
the networking is in place to provide the connectivity, the nova 
compute host
would be connecting to the volume just like any other attach and 
should work

fine. The unknown or tricky part is making sure that there is the network
connectivity or routing in place for the compute host to be able to 
log in to

the storage target.


Yeah that's also why I like shelve/unshelve as a start since it's doing 
volume detach from the source host in the source cell and volume attach 
to the target host in the target cell.


Host aggregates in Nova, as a grouping concept, are not restricted to 
cells at all, so you could have hosts in the same aggregate which span 
cells, so I'd think that's what operators would be doing if they have 
network/storage spanning multiple cells. Having said that, host 
aggregates are not exposed to non-admin end users, so again, if we rely 
on a normal user to do this move operation via resize, the only way we 
can restrict the instance to another host in the same aggregate is via 
availability zones, which is the user-facing aggregate construct in 
nova. I know Sam would care about this because NeCTAR sets 
[cinder]/cross_az_attach=False in nova.conf so servers/volumes are 
restricted to the same AZ, but that's not the default, and specifying an 
AZ when you create a server is not required (although there is a config 
option in nova which allows operators to define a default AZ for the 
instance if the user didn't specify one).


Anyway, my point is, there are a lot of "ifs" if it's not an 
operator/admin explicitly telling nova where to send the server if it's 
moving across cells.




If it's the other scenario mentioned where the volume needs to be 
migrated from
one storage backend to another storage backend, then that may require 
a little
more work. The volume would need to be retype'd or migrated (storage 
migration)

from the original backend to the new backend.


Yeah, the thing with retype/volume migration that isn't great is it 
triggers the swap_volume callback to the source host in nova, so if nova 
was orchestrating the volume retype/move, we'd need to wait for the swap 
volume to be done (not impossible) before proceeding, and only the 
libvirt driver implements the swap volume API. I've always wondered, 
what the hell do non-libvirt deployments do with respect to the volume 
retype/migration APIs in Cinder? Just disable them via policy?





--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

[Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-22 Thread Matt Riedemann


Hi everyone,

I have started an etherpad for cells topics at the Stein PTG [1]. The 
main issue in there right now is dealing with cross-cell cold migration 
in nova.


At a high level, I am going off these requirements:

* Cells can shard across flavors (and hardware type) so operators would 
like to move users off the old flavors/hardware (old cell) to new 
flavors in a new cell.


* There is network isolation between compute hosts in different cells, 
so no ssh'ing the disk around like we do today. But the image service is 
global to all cells.


Based on this, for the initial support for cross-cell cold migration, I 
am proposing that we leverage something like shelve offload/unshelve 
masquerading as resize. We shelve offload from the source cell and 
unshelve in the target cell. This should work for both volume-backed and 
non-volume-backed servers (we use snapshots for shelved offloaded 
non-volume-backed servers).


There are, of course, some complications. The main ones that I need help 
with right now are what happens with volumes and ports attached to the 
server. Today we detach from the source and attach at the target, but 
that's assuming the storage backend and network are available to both 
hosts involved in the move of the server. Will that be the case across 
cells? I am assuming that depends on the network topology (are routed 
networks being used?) and storage backend (routed storage?). If the 
network and/or storage backend are not available across cells, how do we 
migrate volumes and ports? Cinder has a volume migrate API for admins 
but I do not know how nova would know the proper affinity per-cell to 
migrate the volume to the proper host (cinder does not have a routed 
storage concept like routed provider networks in neutron, correct?). And 
as far as I know, there is no such thing as port migration in Neutron.


Could Placement help with the volume/port migration stuff? Neutron 
routed provider networks rely on placement aggregates to schedule the VM 
to a compute host in the same network segment as the port used to create 
the VM, however, if that segment does not span cells we are kind of 
stuck, correct?


To summarize the issues as I see them (today):

* How to deal with the targeted cell during scheduling? This is so we 
can even get out of the source cell in nova.


* How does the API deal with the same instance being in two DBs at the 
same time during the move?


* How to handle revert resize?

* How are volumes and ports handled?

I can get feedback from my company's operators based on what their 
deployment will look like for this, but that does not mean it will work 
for others, so I need as much feedback from operators, especially those 
running with multiple cells today, as possible. Thanks in advance.


[1] https://etherpad.openstack.org/p/nova-ptg-stein-cells

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

[Openstack-operators] Reminder to take the User Survey

2018-08-20 Thread Matt Van Winkle

Hi everyone,

The deadline for the 2018 OpenStack User Survey deadline is *tomorrow,
August 21 at 11:59pm UTC. *The User Survey is your annual opportunity to
provide direct feedback to the OpenStack community, so we can better
understand your environment and needs. We send all feedback directly to the
project teams who work to improve how we provide value to you.

By completing a deployment in the User Survey, you qualify as an Active
User Contributor (AUC) and will receive a discount for the Berlin Summit -
only $300 USD!

The survey will take less than 20 minutes, and there’s not much time left!

Please your User Survey by *tomorrow*, *Tuesday, August 21 at 11:59pm UTC.*

Get started now: https://www.openstack.org/user-survey

Let me know if you have any questions.

Thank you,
VW

-- 
Matt Van Winkle
Senior Manager, Software Engineering | Salesforce
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] [nova] deployment question consultation

2018-08-18 Thread Matt Riedemann


+ops list

On 8/18/2018 10:20 PM, Matt Riedemann wrote:

On 8/13/2018 9:30 PM, Rambo wrote:
        1.Only in one region situation,what will happen in the cloud 
as expansion of cluster size?Then how solve it?If have the limit 
physical node number under the one region situation?How many nodes 
would be the best in one regione?


This question seems a bit too open-ended and completely subjective.


        2.When to use cellV2 is most suitable in cloud?


When this has been asked in the past, the best answer I've heard is, 
"whatever your current DB and MQ limits are for nova". So if that's 
about 200 hosts before the DB/MQ are struggling, then that could a cell. 
For reference, CERN has 70 cells with ~200 hosts per cell. However, at 
least one public cloud is approaching cells with fewer cells and 
thousands of hosts per cell. So it varies based on where your 
limitations lie. Also note that cells do not have to be defined by DB/MQ 
limits, they can also be used as a way to shard hardware and instance 
(flavor) types. For example, generation 1 hardware in cell1, gen2 
hardware in cell2, etc.



        3.How to shorten the time of batch creation of instance?


This again is completely subjective. It would depend on the 
configuration, size of nova deployment, size of hardware, available 
capacity, etc. Have you done profiling to point out *specific* problem 
areas during multi-create, for example, are you packing VMs onto as few 
hosts as possible to reduce costs? And if so, are you hitting problems 
with that due to rescheduling the server build because you have multiple 
scheduler workers picking the same host(s) for a subset of the VMs in 
the request? Or are you hitting RPC timeouts during select_destinations? 
If so, that might be related to the problem described in [1].


[1] https://review.openstack.org/#/c/510235/




--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?

2018-08-18 Thread Matt Riedemann


On 8/11/2018 12:50 AM, Chris Apsey wrote:
This sounds promising and there seems to be a feasible way to do this, 
but it also sounds like a decent amount of effort and would be a new 
feature in a future release rather than a bugfix - am I correct in that 
assessment?


Yes I'd say it's a blueprint and not a bug fix - it's not something we'd 
backport to stable branches upstream, for example.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] Speaker Selection Process: OpenStack Summit Berlin

2018-08-13 Thread Matt Joyce

CFP work is hard as hell.  Much respect to the review panel members.  It's
a thankless difficult job.

So, in lieu of being thankless,  THANK YOU

-Matt

On Mon, Aug 13, 2018 at 9:59 AM, Allison Price 
wrote:

> Hi everyone,
>
> One quick clarification. The speakers will be announced on* August 14 at
> 1300 UTC / 4:00 AM PDT.*
>
> Cheers,
> Allison
>
>
> On Aug 13, 2018, at 8:53 AM, Jimmy McArthur  wrote:
>
> Greetings!
>
> The speakers for the OpenStack Summit Berlin will be announced August 14,
> at 4:00 AM UTC. Ahead of that, we want to take this opportunity to thank
> our Programming Committee!  They have once again taken time out of their
> busy schedules to help create another round of outstanding content for the
> OpenStack Summit.
>
> The OpenStack Foundation relies on the community-nominated Programming
> Committee, along with your Community Votes to select the content of the
> summit.  If you're curious about this process, you can read more about it
> here
> <https://www.openstack.org/summit/berlin-2018/call-for-presentations/selection-process>
> where we have also listed the Programming Committee members.
>
> If you'd like to nominate yourself or someone you know for the OpenStack
> Summit Denver Programming Committee, you can do so here:
> https://openstackfoundation.formstack.com/forms/openstackdenver2019_
> programmingcommitteenom
>
> Thanks a bunch and we look forward to seeing everyone in Berlin!
>
> Cheers,
> Jimmy
>
>
>
>
> *
> <https://openstackfoundation.formstack.com/forms/openstackdenver2019_programmingcommitteenom>*
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?

2018-08-08 Thread Matt Riedemann


On 8/8/2018 2:42 PM, Chris Apsey wrote:
qemu-system-arm, qemu-system-ppc64, etc. in our environment are all x86 
packages, but they perform system-mode emulation (via dynamic 
instruction translation) for those target environments.  So, you run 
qemu-system-ppc64 on an x86 host in order to get a ppc64-emulated VM. 
Our use case is specifically directed at reverse engineering binaries 
and fuzzing for vulnerabilities inside of those architectures for things 
that aren't built for x86, but there are others.


If you were to apt-get install qemu-system and then hit autocomplete, 
you'd get a list of archiectures that qemu can emulate on x86 hardware - 
that's what we're trying to do incorporate.  We still want to run normal 
qemu-x86 with KVM virtualization extensions, but we ALSO want to run the 
other emulators without the KVM virtualization extensions in order to 
have more choice for target environments.


So to me, openstack would interpret this by checking to see if a target 
host supports the architecture specified in the image (it does this 
correctly), then it would choose the correct qemu-system-xx for spawning 
the instance based on the architecture flag of the image, which it 
currently does not (it always choose qemu-system-x86_64).


Does that make sense?


OK yeah now I'm following you - running ppc guests on an x86 host 
(virt_type=qemu rather than kvm right?).


I would have thought the hw_architecture image property was used for 
this somehow to configure the arch in the guest xml properly, like it's 
used in a few places [1][2][3].


See [4], I'd think we'd set the guest.arch but don't see that happening. 
We do set the guest.os_type though [5].


[1] 
https://github.com/openstack/nova/blob/c18b1c1bd646d7cefa3d3e4b25ce59460d1a6ebc/nova/virt/libvirt/driver.py#L4649
[2] 
https://github.com/openstack/nova/blob/c18b1c1bd646d7cefa3d3e4b25ce59460d1a6ebc/nova/virt/libvirt/driver.py#L4927
[3] 
https://github.com/openstack/nova/blob/c18b1c1bd646d7cefa3d3e4b25ce59460d1a6ebc/nova/virt/libvirt/blockinfo.py#L257

[4] https://libvirt.org/formatcaps.html#elementGuest
[5] 
https://github.com/openstack/nova/blob/c18b1c1bd646d7cefa3d3e4b25ce59460d1a6ebc/nova/virt/libvirt/driver.py#L5196


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?

2018-08-08 Thread Matt Riedemann


On 8/7/2018 8:54 AM, Chris Apsey wrote:
We don't actually have any non-x86 hardware at the moment - we're just 
looking to run certain workloads in qemu full emulation mode sans KVM 
extensions (we know there is a huge performance hit - it's just for a 
few very specific things).  The hosts I'm talking about are normal 
intel-based compute nodes with several different qemu packages installed 
(arm, ppc, mips, x86_64 w/ kvm extensions, etc.).


Is nova designed to work in this kind of scenario?  It seems like many 
pieces are there, but they're just not quite tied together quite right, 
or there is some config option I'm missing.


As far as I know, nova doesn't make anything arch-specific for QEMU. 
Nova will execute some qemu commands like qemu-img but as far as the 
virt driver, it goes through the libvirt-python API bindings which wrap 
over libvirtd which interfaces with QEMU. I would expect that if you're 
on an x86_64 arch host, that you can't have non-x86_64 packages 
installed on there (or they are noarch packages). Like, I don't know how 
your packaging works (are these rpms or debs, or other?) but how do you 
have ppc packages installed on an x86 system?


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?

2018-08-07 Thread Matt Riedemann


On 8/5/2018 1:43 PM, Chris Apsey wrote:
Trying to enable some alternate (non-x86) architectures on xenial + 
queens.  I can load up images and set the property correctly according 
to the supported values 
(https://docs.openstack.org/nova/queens/configuration/config.html) in 
image_properties_default_architecture.  From what I can tell, the 
scheduler works correctly and instances are only scheduled on nodes that 
have the correct qemu binary installed.  However, when the instance 
request lands on this node, it always starts it with qemu-system-x86_64 
rather than qemu-system-arm, qemu-system-ppc, etc.  If I manually set 
the correct binary, everything works as expected.


Am I missing something here, or is this a bug in nova-compute?


image_properties_default_architecture is only used in the scheduler 
filter to pick a compute host, it doesn't do anything about the qemu 
binary used in nova-compute. mnaser added the config option so maybe he 
can share what he's done on his computes.


Do you have qemu-system-x86_64 on non-x86 systems? Seems like a 
package/deploy issue since I'd expect x86 packages shouldn't install on 
a ppc system and vice versa, and only one qemu package should provide 
the binary.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [nova] StarlingX diff analysis

2018-08-07 Thread Matt Riedemann


On 8/7/2018 1:10 AM, Flint WALRUS wrote:
I didn’t had time to check StarlingX code quality, how did you feel it 
while you were doing your analysis?


I didn't dig into the test diffs themselves, but it was my impression 
that from what I was poking around in the local git repo, there were 
several changes which didn't have any test coverage.


For the really big full stack changes (L3 CAT, CPU scaling and 
shared/pinned CPUs on same host), toward the end I just started glossing 
over a lot of that because it's so much code in so many places, so I 
can't really speak very well to how it was written or how well it is 
tested (maybe WindRiver had a more robust CI system running integration 
tests, I don't know).


There were also some things which would have been caught in code review 
upstream. For example, they ignore the "force" parameter for live 
migration so that live migration requests always go through the 
scheduler. However, the "force" parameter is only on newer 
microversions. Before that, if you specified a host at all it would 
bypass the scheduler, but the change didn't take that into account, so 
they still have gaps in some of the things they were trying to 
essentially disable in the API.


On the whole I think the quality is OK. It's not really possible to 
accurately judge that when looking at a single diff this large.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] Live-migration experiences?

2018-08-06 Thread Matt Riedemann

On 8/6/2018 8:12 AM, Clint Byrum wrote:

First a few facts about our installation:

* We're using kolla-ansible and basically leaving most nova settings at
the default, meaning libvirt+kvm
* We will be using block migration, as we have no shared storage of any
kind.
* We use routed networks to set up L2 segments per-rack. Each rack is
basically an island unto itself. The VMs on one rack cannot be migrated
to another rack because of this.
* Our main resource limitation is disk, followed closely by RAM. As
such, our main motivation for wanting to do live migration is to be able
to move VMs off of machines where over-subscribed disk users start to
threaten the free space of the others.

What release are you on?

* Do people have feedback on live_migrate_permit_auto_convergence? It
seems like a reasonable trade-off, but since it is defaulted to false, I
wonder if there are some hidden gotchas there.

You might want to read through [1] and [2]. Those were written by the
OSIC dev team when that still existed. But there are some (somewhat
mysterious) mentions to caveats with post-copy you should be aware of.
At this point, John Garbutt is probably the best person to talk to about
those since all of the other OSIC devs that worked on this spec are long
gone.

>
> * General pointers to excellent guides, white papers, etc, that might
help us avoid doing all of our learning via trial/error.

Check out [3]. I've specifically been meaning to watch the one from
Boston that John was in.

[1]
https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/live-migration-force-after-timeout.html
[2]
https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/live-migration-per-instance-timeout.html

[3] https://www.openstack.org/videos/search?search=live%20migration

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

[Openstack-operators] [nova] StarlingX diff analysis

2018-08-06 Thread Matt Riedemann

In case you haven't heard, there was this StarlingX thing announced at 
the last summit. I have gone through the enormous nova diff in their 
repo and the results are in a spreadsheet [1]. Given the enormous 
spreadsheet (see a pattern?), I have further refined that into a set of 
high-level charts [2].


I suspect there might be some negative reactions to even doing this type 
of analysis lest it might seem like promoting throwing a huge pile of 
code over the wall and expecting the OpenStack (or more specifically the 
nova) community to pick it up. That's not my intention at all, nor do I 
expect nova maintainers to be responsible for upstreaming any of this.


This is all educational to figure out what the major differences and 
overlaps are and what could be constructively upstreamed from the 
starlingx staging repo since it's not all NFV and Edge dragons in here, 
there are some legitimate bug fixes and good ideas. I'm sharing it 
because I want to feel like my time spent on this in the last week 
wasn't all for nothing.


[1] 
https://docs.google.com/spreadsheets/d/1ugp1FVWMsu4x3KgrmPf7HGX8Mh1n80v-KVzweSDZunU/edit?usp=sharing
[2] 
https://docs.google.com/presentation/d/1P-__JnxCFUbSVlEoPX26Jz6VaOyNg-jZbBsmmKA2f0c/edit?usp=sharing


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

[Openstack-operators] UC Candidacy

2018-08-03 Thread Matt Van Winkle

Greetings OpenStack Operators and Users,

I’d like to take the opportunity to state my candidacy in the upcoming
UC election. I have enjoyed the work we have been able to accomplish
these last 12 months and I would like to serve another term to help
continue the momentum.

After 6 years in Operations and Engineering for Rackspace’s public
cloud, I have recently joined Salesforce to help with their OpenStack
efforts. At both companies, I’ve had the distinct pleasure of serving
a number of talented engineers and teams as they have worked to scale
and manage the infrastructure. During this time, I’ve also enjoyed
sharing ideas with and learning from other Operators running large
OpenStack clouds in order to find new and creative ways to solve
challenges

With respect to community involvement, my first summit was Portland
and have made all but two since. I’ve also been very active in the
Operators community since helping plan the very first meet-up in San
Jose. I’ve given a few talks in the past and have served as track
chair many times. After Paris, I began chairing the Large Deployments
Team. This team, while inactive now, was a long running group of
operators that shared many ideas on scaling OpenStack and has had some
successes running feature requests to ground with dev teams. It’s been
a distinct pleasure to work with such smart folks from around the
community. Chairing LDT also led to an opportunity to join the Ops
Meetup Team - working with others on planning Operator mid-cycles and
Ops related Summit/Forum sessions.

I was fortunate enough to be part of the group that helped the old UC
craft the bylaw changes that have expanded the committee and made it
the elected body it is today. After serving as an election official in
the first election, I chose to run for an open spot a year ago.
Regardless of the outcome of this election, it is really awesome to
see the evolution of the UC and how it’s able to better coordinate
Operator and User efforts in guiding the community and the development
cycle.

If re-elected, I hope to keep helping more Users and Operators
understand how to take better advantage of the the various events and
dev cycle to drive improvement and change in the software. The UC has
a vision of seeing conversations at and Operators mid-cycle or from an
OpenStack Days OPs session become specific topic submissions at the
next summit. Conversely, we'd love this pattern to be regular enough
that the Dev teams start proposing session ideas for certain feedback
at upcoming OPs gatherings to complete the cycle. While there is still
plenty of work to do to make these things a reality, the UC has been
laying the ground work since the Dublin PTG. I'd like to serve another
term so I can do my part to help keep making progress. Beyond that, I
want to continue the great work of the UC members to date on being an
advocate for the User with the Board, TC and community at large.

I appreciate the time and the consideration.
Thanks!
VW



-- 
Matt Van Winkle
Senior Manager, Software Engineering | Salesforce
Mobile: 210-445-4183
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-ansible] How to manage system upgrades ?

2018-07-30 Thread Matt Riedemann


On 7/27/2018 3:34 AM, Gilles Mocellin wrote:

- for compute nodes : disable compute node and live-evacuate instances...


To be clear, what do you mean exactly by "live-evacuate"? I assume you 
mean live migration of all instances off each (disabled) compute node 
*before* you upgrade it. I wanted to ask because "evacuate" as a server 
operation is something else entirely (it's rebuild on another host which 
is definitely disruptive to the workload on that server).


http://www.danplanet.com/blog/2016/03/03/evacuate-in-nova-one-command-to-confuse-us-all/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [nova] Couple of CellsV2 questions

2018-07-23 Thread Matt Riedemann

I'll try to help a bit inline. Also cross-posting to openstack-dev and 
tagging with [nova] to highlight it.


On 7/23/2018 10:43 AM, Jonathan Mills wrote:
I am looking at implementing CellsV2 with multiple cells, and there's a 
few things I'm seeking clarification on:


1) How does a superconductor know that it is a superconductor?  Is its 
operation different in any fundamental way?  Is there any explicit 
configuration or a setting in the database required? Or does it simply 
not care one way or another?


It's a topology term, not really anything in config or the database that 
distinguishes the "super" conductor. I assume you've gone over the 
service layout in the docs:


https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#service-layout

There are also some summit talks from Dan about the topology linked here:

https://docs.openstack.org/nova/latest/user/cells.html#cells-v2

The superconductor is the conductor service at the "top" of the tree 
which interacts with the API and scheduler (controller) services and 
routes operations to the cell. Then once in a cell, the operation should 
ideally be confined there. So, for example, reschedules during a build 
would be confined to the cell. The cell conductor doesn't go back "up" 
to the scheduler to get a new set of hosts for scheduling. This of 
course depends on which release you're using and your configuration, see 
the caveats section in the cellsv2-layout doc.




2) When I ran the command "nova-manage cell_v2 create_cell --name=cell1 
--verbose", the entry created for cell1 in the api database includes 
only one rabbitmq server, but I have three of them as an HA cluster.  
Does it only support talking to one rabbitmq server in this 
configuration? Or can I just update the cell1 transport_url in the 
database to point to all three? Is that a supported configuration?


First, don't update stuff directly in the database if you don't have to. 
:) What you set on the transport_url should be whatever oslo.messaging 
can handle:


https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.transport_url

There is at least one reported bug for this but I'm not sure I fully 
grok it or what its status is at this point:


https://bugs.launchpad.net/nova/+bug/1717915



3) Is there anything wrong with having one cell share the amqp bus with 
your control plane, while having additional cells use their own amqp 
buses? Certainly I realize that the point of CellsV2 is to shard the 
amqp bus for greater horizontal scalability.  But in my case, my first 
cell is on the smaller side, and happens to be colocated with the 
control plane hardware (whereas other cells will be in other parts of 
the datacenter, or in other datacenters with high-speed links).  I was 
thinking of just pointing that first cell back at the same rabbitmq 
servers used by the control plane, but perhaps directing them at their 
own rabbitmq vhost. Is that a terrible idea?


Would need to get input from operators and/or Dan Smith's opinion on 
this one, but I'd say it's no worse than having a flat single cell 
deployment. However, if you're going to do multi-cell long-term anyway, 
then it would be best to get in the mindset and discipline of not 
relying on shared MQ between the controller services and the cells. In 
other words, just do the right thing from the start rather than have to 
worry about maybe changing the deployment / configuration for that one 
cell down the road when it's harder.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-community] Running instance snapshot

2018-07-16 Thread Matt Riedemann


On 7/12/2018 10:09 AM, Alfredo De Luca wrote:
I tried with glance image-create or nova backup but I got the 
following


Neither of those are server snapshot operations (well backup is, but 
it's probably not what you're looking for).


glance image-create is creating an image in glance, not creating a 
snapshot from a server. That would be 'nova image-create':


https://docs.openstack.org/python-novaclient/latest/cli/nova.html#nova-image-create

What is the error message in the 400 response? It should be in the CLI 
output but if not, what's in the nova-api logs?


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [nova] Cinder cross_az_attach=False changes/fixes

2018-07-15 Thread Matt Riedemann

Just an update on an old thread, but I've been working on the
cross_az_attach=False issues again this past week and I think I have a
couple of decent fixes.

On 5/31/2017 6:08 PM, Matt Riedemann wrote:

This is a request for any operators out there that configure nova to set:

[cinder]
cross_az_attach=False

To check out these two bug fixes:

1. https://review.openstack.org/#/c/366724/

This is a case where nova is creating the volume during boot from volume
and providing an AZ to cinder during the volume create request. Today we
just pass the instance.availability_zone which is None if the instance
was created without an AZ set. It's unclear to me if that causes the
volume creation to fail (someone in IRC was showing the volume going
into ERROR state while Nova was waiting for it to be available), but I
think it will cause the later attach to fail here [1] because the
instance AZ (defaults to None) and volume AZ (defaults to nova) may not
match. I'm still looking for more details on the actual failure in that
one though.

The proposed fix in this case is pass the AZ associated with any host
aggregate that the instance is in.

This was indirectly fixed by change
https://review.openstack.org/#/c/446053/ in Pike where we now set the
instance.availability_zone in conductor after we get a selected host
from the scheduler (we get the AZ for the host and set that on the
instance before sending the instance to compute to build it).

While investigating this on master, I found a new bug where we do an
up-call to the API DB which fails in a split MQ setup, and I have a fix
here:

https://review.openstack.org/#/c/582342/

2. https://review.openstack.org/#/c/469675/

This is similar, but rather than checking the AZ when we're on the
compute and the instance has a host, we're in the API and doing a boot
from volume where an existing volume is provided during server create.
By default, the volume's AZ is going to be 'nova'. The code doing the
check here is getting the AZ for the instance, and since the instance
isn't on a host yet, it's not in any aggregate, so the only AZ we can
get is from the server create request itself. If an AZ isn't provided
during the server create request, then we're comparing
instance.availability_zone (None) to volume['availability_zone']
("nova") and that results in a 400.

My proposed fix is in the case of BFV checks from the API, we default
the AZ if one wasn't requested when comparing against the volume. By
default this is going to compare "nova" for nova and "nova" for cinder,
since CONF.default_availability_zone is "nova" by default in both projects.

I've refined this fix a bit to be more flexible:

https://review.openstack.org/#/c/469675/

So now if doing boot from volume and we're checking
cross_az_attach=False in the API and the user didn't explicitly request
an AZ for the instance, we do a few checks:

1. If [DEFAULT]/default_schedule_zone is not None (the default), we use
that to compare against the volume AZ.

2. If the volume AZ is equal to the [DEFAULT]/default_availability_zone
(nova by default in both nova and cinder), we're OK - no issues.

3. If the volume AZ is not equal to [DEFAULT]/default_availability_zone,
it means either the volume was created with a specific AZ or cinder's
default AZ is configured differently from nova's. In that case, I take
the volume AZ and put it into the instance RequestSpec so that during
scheduling, the nova scheduler picks a host in the same AZ as the volume
- if that AZ isn't in nova, we fail to schedule (NoValidHost) (but that
shouldn't really happen, why would one have cross_az_attach=False w/o
mirrored AZ in both cinder and nova?).

I'm requesting help from any operators that are setting
cross_az_attach=False because I have to imagine your users have run into
this and you're patching around it somehow, so I'd like input on how you
or your users are dealing with this.

I'm also trying to recreate these in upstream CI [2] which I was already
able to do with the 2nd bug.

This devstack patch has recreated both issues above and I'm adding the
fixes to it as dependencies to show the problems are resolved.

Having said all of this, I really hate cross_az_attach as it's
config-driven API behavior which is not interoperable across clouds.
Long-term I'd really love to deprecate this option but we need a
replacement first, and I'm hoping placement with compute/volume resource
providers in a shared aggregate can maybe make that happen.

[1]
https://github.com/openstack/nova/blob/f278784ccb06e16ee12a42a585c5615abe65edfe/nova/virt/block_device.py#L368

[2] https://review.openstack.org/#/c/467674/

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-client] - missing commands?


On 6/13/2018 1:42 PM, Flint WALRUS wrote:
Hi guys, I use the «new» openstack-client command as much as possible 
since a couple of years now, but yet I had a hard time recently to find 
equivalent command of the following:


nova force-delete 
&
The command on swift that permit to recursively upload the content of a 
directory and automatically creating the same directory structure using 
pseudo-folders.


Did I miss something somewhere or are those commands missing?

On the nova part I think it’s not that important as a classic openstack 
server delete  seems to do the same, but not quite sure.


Oh wow, great timing:

http://lists.openstack.org/pipermail/openstack-dev/2018-June/131308.html

I've also queued that up for the upcoming bug smash in China next week.

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] large high-performance ephemeral storage


On 6/13/2018 10:54 AM, Chris Friesen wrote:
Also, migration and resize are not supported for LVM-backed instances.  
I proposed a patch to support them 
(https://review.openstack.org/#/c/337334/) but hit issues and never got 
around to fixing them up.


Yup, I guess I should have read the entire thread first.

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] large high-performance ephemeral storage


On 6/13/2018 8:58 AM, Blair Bethwaite wrote:
Though we have not used LVM based instance storage before, are there any 
significant gotchas?


I know you can't resize/cold migrate lvm-backed ephemeral root disk 
instances:


https://github.com/openstack/nova/blob/343c2bee234568855fd9e6ba075a05c2e70f3388/nova/virt/libvirt/driver.py#L8136

However, StarlingX has a patch for that (pretty sure anyway, I know 
WindRiver had one):


https://review.openstack.org/#/c/337334/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

[Openstack-operators] Reminder to add "nova-status upgrade check" to deployment tooling

I was going through some recently reported nova bugs and came across [1] 
which I opened at the Summit during one of the FFU sessions where I 
realized the nova upgrade docs don't mention the nova-status upgrade 
check CLI [2] (added in Ocata).


As a result, I was wondering how many deployment tools out there support 
upgrades and from those, which are actually integrating that upgrade 
status check command.


I'm not really familiar with most of them, but I've dabbled in OSA 
enough to know where the code lived for nova upgrades, so I posted a 
patch [3].


I'm hoping this can serve as a template for other deployment projects to 
integrate similar checks into their upgrade (and install verification) 
flows.


[1] https://bugs.launchpad.net/nova/+bug/1772973
[2] https://docs.openstack.org/nova/latest/cli/nova-status.html
[3] https://review.openstack.org/#/c/575125/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] [nova] increasing the number of allowed volumes attached per instance > 26


On 6/7/2018 1:54 PM, Jay Pipes wrote:


If Cinder tracks volume attachments as consumable resources, then this 
would be my preference.


Cinder does:

https://developer.openstack.org/api-ref/block-storage/v3/#attachments

However, there is no limit in Cinder on those as far as I know.

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] [nova] increasing the number of allowed volumes attached per instance > 26


+operators (I forgot)

On 6/7/2018 1:07 PM, Matt Riedemann wrote:

On 6/7/2018 12:56 PM, melanie witt wrote:
Recently, we've received interest about increasing the maximum number 
of allowed volumes to attach to a single instance > 26. The limit of 
26 is because of a historical limitation in libvirt (if I remember 
correctly) and is no longer limited at the libvirt level in the 
present day. So, we're looking at providing a way to attach more than 
26 volumes to a single instance and we want your feedback.


The 26 volumes thing is a libvirt driver restriction.

There was a bug at one point because powervm (or powervc) was capping 
out at 80 volumes per instance because of restrictions in the 
build_requests table in the API DB:


https://bugs.launchpad.net/nova/+bug/1621138

They wanted to get to 128, because that's how power rolls.



We'd like to hear from operators and users about their use cases for 
wanting to be able to attach a large number of volumes to a single 
instance. If you could share your use cases, it would help us greatly 
in moving forward with an approach for increasing the maximum.


Some ideas that have been discussed so far include:

A) Selecting a new, higher maximum that still yields reasonable 
performance on a single compute host (64 or 128, for example). Pros: 
helps prevent the potential for poor performance on a compute host 
from attaching too many volumes. Cons: doesn't let anyone opt-in to a 
higher maximum if their environment can handle it.


B) Creating a config option to let operators choose how many volumes 
allowed to attach to a single instance. Pros: lets operators opt-in to 
a maximum that works in their environment. Cons: it's not discoverable 
for those calling the API.


I'm not a fan of a non-discoverable config option which will impact API 
behavior indirectly, i.e. on cloud A I can boot from volume with 64 
volumes but not on cloud B.




C) Create a configurable API limit for maximum number of volumes to 
attach to a single instance that is either a quota or similar to a 
quota. Pros: lets operators opt-in to a maximum that works in their 
environment. Cons: it's yet another quota?


This seems the most reasonable to me if we're going to do this, but I'm 
probably in the minority. Yes more quota limits sucks, but it's (1) 
discoverable by API users and therefore (2) interoperable.


If we did the quota thing, I'd probably default to unlimited and let the 
cinder volume quota cap it for the project as it does today. Then admins 
can tune it as needed.





--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [nova] nova-compute automatically disabling itself?


On 2/6/2018 6:44 PM, Matt Riedemann wrote:

On 2/6/2018 2:14 PM, Chris Apsey wrote:
but we would rather have intermittent build failures rather than 
compute nodes falling over in the future.


Note that once a compute has a successful build, the consecutive build 
failures counter is reset. So if your limit is the default (10) and you 
have 10 failures in a row, the compute service is auto-disabled. But if 
you have say 5 failures and then a pass, it's reset to 0 failures.


Obviously if you're doing a pack-first scheduling strategy rather than 
spreading instances across the deployment, a burst of failures could 
easily disable a compute, especially if that host is overloaded like you 
saw. I'm not sure if rescheduling is helping you or not - that would be 
useful information since we consider the need to reschedule off a failed 
compute host as a bad thing. At the Forum in Boston when this idea came 
up, it was specifically for the case that operators in the room didn't 
want a bad compute to become a "black hole" in their deployment causing 
lots of reschedules until they get that one fixed.


Just an update on this. There is a change merged in Rocky [1] which is 
also going through backports to Queens and Pike. If you've already 
disabled the "consecutive_build_service_disable_threshold" config option 
then it's a no-op. If you haven't, 
"consecutive_build_service_disable_threshold" is now used to count build 
failures but no longer auto-disable the compute service on the 
configured threshold is met (10 by default). The build failure count is 
then used by a new weigher (enabled by default) to sort hosts with build 
failures to the back of the list of candidate hosts for new builds. Once 
there is a successful build on a given host, the failure count is reset. 
The idea here is that hosts which are failing are given lower priority 
during scheduling.


[1] https://review.openstack.org/#/c/572195/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

[Openstack-operators] [nova] Need feedback on spec for handling down cells in the API

We have a nova spec [1] which is at the point that it needs some API 
user (and operator) feedback on what nova API should be doing when 
listing servers and there are down cells (unable to reach the cell DB or 
it times out).


tl;dr: the spec proposes to return "shell" instances which have the 
server uuid and created_at fields set, and maybe some other fields we 
can set, but otherwise a bunch of fields in the server response would be 
set to UNKNOWN sentinel values. This would be unversioned, and therefore 
could wreak havoc on existing client side code that expects fields like 
'config_drive' and 'updated' to be of a certain format.


There are alternatives listed in the spec so please read this over and 
provide feedback since this is a pretty major UX change.


Oh, and no pressure, but today is the spec freeze deadline for Rocky.

[1] https://review.openstack.org/#/c/557369/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] [TC] Stein Goal Selection

2018-06-04 Thread Matt Riedemann

+openstack-operators since we need to have more operator feedback in our 
community-wide goals decisions.


+Melvin as my elected user committee person for the same reasons as 
adding operators into the discussion.


On 6/4/2018 3:38 PM, Matt Riedemann wrote:

On 6/4/2018 1:07 PM, Sean McGinnis wrote:

Python 3 First
==

One of the things brought up in the session was picking things that bring
excitement and are obvious benefits to deployers and users of OpenStack
services. While this one is maybe not as immediately obvious, I think 
this
is something that will end up helping deployers and also falls into 
the tech

debt reduction category that will help us move quicker long term.

Python 2 is going away soon, so I think we need something to help 
compel folks
to work on making sure we are ready to transition. This will also be a 
good

point to help switch the mindset over to Python 3 being the default used
everywhere, with our Python 2 compatibility being just to continue legacy
support.


I still don't really know what this goal means - we have python 3 
support across the projects for the most part don't we? Based on that, 
this doesn't seem like much to take an entire "goal slot" for the release.




Cold Upgrade Support


The other suggestion in the Forum session related to upgrades was the 
addition
of "upgrade check" CLIs for each project, and I was tempted to suggest 
that as
my second strawman choice. For some projects that would be a very 
minimal or
NOOP check, so it would probably be easy to complete the goal. But 
ultimately
what I think would bring the most value would be the work on 
supporting cold
upgrade, even if it will be more of a stretch for some projects to 
accomplish.


I think you might be mixing two concepts here.

The cold upgrade support, per my understanding, is about getting the 
assert:supports-upgrade tag:


https://governance.openstack.org/tc/reference/tags/assert_supports-upgrade.html 



Which to me basically means the project runs a grenade job. There was 
discussion in the room about grenade not being a great tool for all 
projects, but no one is working on a replacement for that, so I don't 
think it's really justification at this point for *not* making it a goal.


The "upgrade check" CLIs is a different thing though, which is more 
about automating as much of the upgrade release notes as possible. See 
the nova docs for examples on how we have used it:


https://docs.openstack.org/nova/latest/cli/nova-status.html

I'm not sure what projects you had in mind when you said, "For some 
projects that would be a very minimal or NOOP check, so it would 
probably be easy to complete the goal." I would expect that projects 
aren't meeting the goal if they are noop'ing everything. But what can be 
automated like this isn't necessarily black and white either.




Upgrades have been a major focus of discussion lately, especially as our
operators have been trying to get closer to the latest work upstream. 
This has

been an ongoing challenge.

There has also been a lot of talk about LTS releases. We've landed on 
fast
forward upgrade to get between several releases, but I think improving 
upgrades
eases the way both for easier and more frequent upgrades and also 
getting to

the point some day where maybe we can think about upgrading over several
releases to be able to do something like an LTS to LTS upgrade.

Neither one of these upgrade goals really has a clearly defined plan that
projects can pick up now and start working on, but I think with those 
involved

in these areas we should be able to come up with a perscriptive plan for
projects to follow.

And it would really move our fast forward upgrade story forward.


Agreed. In the FFU Forum session at the summit I mentioned the 
'nova-status upgrade check' CLI and a lot of people in the room had 
never heard of it because they are still on Mitaka before we added that 
CLI (new in Ocata). But they sounded really interested in it and said 
they wished other projects were doing that to help ease upgrades so they 
won't be stuck on older unmaintained releases for so long. So anything 
we can do to improve upgrades, including our testing for them, will help 
make FFU better.




Next Steps
==

I'm hoping with a strawman proposal we have a basis for debating the 
merits of
these and getting closer to being able to officially select Stein 
goals. We
still have some time, but I would like to avoid making late-cycle 
selections so

teams can start planning ahead for what will need to be done in Stein.

Please feel free to promote other ideas for goals. That would be a 
good way for
us to weigh the pro's and con's between these and whatever else you 
have in
mind. Then hopefully we can come to some consensus and work towards 
clearly
defining what needs to be done and getting things well documented for 
teams to

pick up as soon as they wrap up Ro

Re: [Openstack-operators] [openstack-dev] [nova][glance] Deprecation of nova.image.download.modules extension point

2018-06-04 Thread Matt Riedemann

ad extension point? Should I work to get the 
code for this RBD download into the upstream repository?




I think you should propose your changes upstream with a blueprint, the 
docs for the blueprint process are here:


https://docs.openstack.org/nova/latest/contributor/blueprints.html

Since it's not an API change, this might just be a specless blueprint, 
but you'd need to write up the blueprint and probably post the PoC code 
to Gerrit and then bring it up during the "Open Discussion" section of 
the weekly nova meeting.


Once we can take a look at the code change, we can go from there on 
whether or not to add that in-tree or go some alternative route.


Until that happens, I think we'll just say we won't remove that 
deprecated image download extension code, but that's not going to be an 
unlimited amount of time if you don't propose your changes upstream.


Is there going to be anything blocking or slowing you down on your end 
with regard to contributing this change, like legal approval, license 
agreements, etc? If so, please be up front about that.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [nova] isolate hypervisor to project

2018-06-04 Thread Matt Riedemann


On 6/4/2018 6:43 AM, Tobias Urdin wrote:

I have received a question about a more specialized use case where we
need to isolate several hypervisors

to a specific project. My first thinking was using nova flavors for only
that project and add extra specs properties to use a specific host
aggregate but this

means I need to assign values to all other flavors to not use those
which seems weird.


How could I go about solving this the easies/best way or from the
history of the mailing lists, the most supported way since there is a
lot of changes

to scheduler/placement part right now?


Depending on which release you're on, it sounds like you want to use this:

https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregatemultitenancyisolation

In Rocky we have a replacement for that filter which does pre-filtering 
in Placement which should give you a performance gain when it comes time 
to do the host filtering:


https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#tenant-isolation-with-placement

Note that even if you use AggregateMultiTenancyIsolation for the one 
project, other projects can still randomly land on the hosts in that 
aggregate unless you also assign those to their own aggregates.


It sounds like you're might be looking for a dedicated hosts feature? 
There is an RFE from the public cloud work group for that:


https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1771523

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] attaching network cards to VMs taking a very long time

2018-06-03 Thread Matt Riedemann


On 6/2/2018 1:37 AM, Chris Apsey wrote:
This is great.  I would even go so far as to say the install docs should 
be updated to capture this as the default; as far as I know there is no 
negative impact when running in daemon mode, even on very small 
deployments.  I would imagine that there are operators out there who 
have run into this issue but didn't know how to work through it - making 
stuff like this less painful is key to breaking the 'openstack is hard' 
stigma.


I think changing the default on the root_helper_daemon option is a good 
idea if everyone is setting that anyway. There are some comments in the 
code next to the option that make me wonder if there are edge cases 
where it might not be a good idea, but I don't really know the details, 
someone from the neutron team that knows more about it would have to 
speak up.


Also, I wonder if converting to privsep in the neutron agent would 
eliminate the need for this option altogether and still gain the 
performance benefits.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] attaching network cards to VMs taking a very long time

2018-05-31 Thread Matt Riedemann


On 5/30/2018 9:30 AM, Matt Riedemann wrote:


I can start pushing some docs patches and report back here for review help.


Here are the docs patches in both nova and neutron:

https://review.openstack.org/#/q/topic:bug/1774217+(status:open+OR+status:merged)

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] [nova] proposal to postpone nova-network core functionality removal to Stein

2018-05-31 Thread Matt Riedemann


+openstack-operators

On 5/31/2018 3:04 PM, Matt Riedemann wrote:

On 5/31/2018 1:35 PM, melanie witt wrote:


This cycle at the PTG, we had decided to start making some progress 
toward removing nova-network [1] (thanks to those who have helped!) 
and so far, we've landed some patches to extract common network 
utilities from nova-network core functionality into separate utility 
modules. And we've started proposing removal of nova-network REST APIs 
[2].


At the cells v2 sync with operators forum session at the summit [3], 
we learned that CERN is in the middle of migrating from nova-network 
to neutron and that holding off on removal of nova-network core 
functionality until Stein would help them out a lot to have a safety 
net as they continue progressing through the migration.


If we recall correctly, they did say that removal of the nova-network 
REST APIs would not impact their migration and Surya Seetharaman is 
double-checking about that and will get back to us. If so, we were 
thinking we can go ahead and work on nova-network REST API removals 
this cycle to make some progress while holding off on removing the 
core functionality of nova-network until Stein.


I wanted to send this to the ML to let everyone know what we were 
thinking about this and to receive any additional feedback folks might 
have about this plan.


Thanks,
-melanie

[1] https://etherpad.openstack.org/p/nova-ptg-rocky L301
[2] https://review.openstack.org/567682
[3] 
https://etherpad.openstack.org/p/YVR18-cellsv2-migration-sync-with-operators 
L30


As a reminder, this is the etherpad I started to document the nova-net 
specific compute REST APIs which are candidates for removal:


https://etherpad.openstack.org/p/nova-network-removal-rocky




--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance

2018-05-30 Thread Matt Riedemann


On 5/30/2018 9:41 AM, Matt Riedemann wrote:
Thanks for your patience in debugging this Massimo! I'll get a bug 
reported and patch posted to fix it.


I'm tracking the problem with this bug:

https://bugs.launchpad.net/nova/+bug/1774205

I found that this has actually been fixed since Pike:

https://review.openstack.org/#/c/449640/

But I've got a patch up for another related issue, and a functional test 
to avoid regressions which I can also use when backporting the fix to 
stable/ocata.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance

2018-05-30 Thread Matt Riedemann


On 5/30/2018 5:21 AM, Massimo Sgaravatto wrote:

The problem is indeed with the tenant_id

When I create a VM, tenant_id is ee1865a76440481cbcff08544c7d580a 
(SgaraPrj1), as expected


But when, as admin, I run the "nova migrate" command to migrate the very 
same instance, the tenant_id is 56c3f5c047e74a78a71438c4412e6e13 (admin) !


OK that's good information.

Tracing the code for cold migrate in ocata, we get the request spec that 
was created when the instance was created here:


https://github.com/openstack/nova/blob/stable/ocata/nova/compute/api.py#L3339

As I mentioned earlier, if it was cold migrating an instance created 
before Newton and the online data migration wasn't run on it, we'd 
create a temporary request spec here:


https://github.com/openstack/nova/blob/stable/ocata/nova/conductor/manager.py#L263

But that shouldn't be the case in your scenario.

Right before we call the scheduler, for some reason, we completely 
ignore the request spec retrieved in the API, and re-create it from 
local scope variables in conductor:


https://github.com/openstack/nova/blob/stable/ocata/nova/conductor/tasks/migrate.py#L50

And *that* is precisely where this breaks down and takes the project_id 
from the current context (admin) rather than the instance:


https://github.com/openstack/nova/blob/stable/ocata/nova/objects/request_spec.py#L407

Thanks for your patience in debugging this Massimo! I'll get a bug 
reported and patch posted to fix it.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] attaching network cards to VMs taking a very long time

2018-05-30 Thread Matt Riedemann


On 5/29/2018 8:23 PM, Chris Apsey wrote:
I want to echo the effectiveness of this change - we had vif failures 
when launching more than 50 or so cirros instances simultaneously, but 
moving to daemon mode made this issue disappear and we've tested 5x that 
amount.  This has been the single biggest scalability improvement to 
date.  This option should be the default in the official docs.


This is really good feedback. I'm not sure if there is any kind of 
centralized performance/scale-related documentation, does the LCOO team 
[1] have something that's current? There are also the performance docs 
[2] but that looks pretty stale.


We could add a note to the neutron rootwrap configuration option such 
that if you're running into timeout issues you could consider running 
that in daemon mode, but it's probably not very discoverable. In fact, I 
couldn't find anything about it in the neutron docs, I only found this 
[3] because I know it's defined in oslo.rootwrap (I don't expect 
everyone to know where this is defined).


I found root_helper_daemon in the neutron docs [4] but it doesn't 
mention anything about performance or related options, and it just makes 
it sound like it matters for xenserver, which I'd gloss over if I were 
using libvirt. The root_helper_daemon config option help in neutron 
should probably refer to the neutron-rootwrap-daemon which is in the 
setup.cfg [5].


For better discoverability of this, probably the best place to mention 
it is in the nova vif_plugging_timeout configuration option, since I 
expect that's the first place operators will be looking when they start 
hitting timeouts during vif plugging at scale.


I can start pushing some docs patches and report back here for review help.

[1] https://wiki.openstack.org/wiki/LCOO
[2] https://docs.openstack.org/developer/performance-docs/
[3] 
https://docs.openstack.org/oslo.rootwrap/latest/user/usage.html#daemon-mode
[4] 
https://docs.openstack.org/neutron/latest/configuration/neutron.html#agent.root_helper_daemon

[5] https://github.com/openstack/neutron/blob/f486f0/setup.cfg#L54

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance


On 5/29/2018 3:07 PM, Massimo Sgaravatto wrote:
The VM that I am trying to migrate was created when the Cloud was 
already running Ocata


OK, I'd added the tenant_id variable in scope to the log message here:

https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/filters/aggregate_multitenancy_isolation.py#L50

And make sure when it fails, it matches what you'd expect. If it's None 
or '' or something weird then we have a bug.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance


On 5/29/2018 12:44 PM, Jay Pipes wrote:
Either that, or the wrong project_id is being used when attempting to 
migrate? Maybe the admin project_id is being used instead of the 
original project_id who launched the instance?


Could be, but we should be pulling the request spec from the database 
which was created when the instance was created. There is some shim code 
from Newton which will create an essentially fake request spec on-demand 
when doing a move operation if the instance was created before newton, 
which could go back to that bug I was referring to.


Massimo - can you clarify if this is a new server created in your Ocata 
test environment that you're trying to move? Or is this a server created 
before Ocata?


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance


On 5/29/2018 11:10 AM, Jay Pipes wrote:
The hosts you are attempting to migrate *to* do not have the 
filter_tenant_id property set to the same tenant ID as the compute host 
2 that originally hosted the instance.


That is why you see this in the scheduler logs when evaluating the 
fitness of compute host 1 and compute host 3:


"fails tenant id"

Best,
-jay


Hmm, I'm not sure about that. This is the aggregate right?

# nova  aggregate-show 52
++---+---+--+--+--+
| Id | Name  | Availability Zone | Hosts 
| Metadata 
| UUID 
|

++---+---+--+--+--+
| 52 | SgaraPrj1 | nova  | 'compute-01.cloud.pd.infn.it 
<http://compute-01.cloud.pd.infn.it>', 'compute-02.cloud.pd.infn.it 
<http://compute-02.cloud.pd.infn.it>' | 'availability_zone=nova', 
'filter_tenant_id=ee1865a76440481cbcff08544c7d580a', 'size=normal' | 
675f6291-6997-470d-87e1-e9ea199a379f |

++---+---+--+--+--+


So compute-01 and compute-02 are in that aggregate for the same tenant 
ee1865a76440481cbcff08544c7d580a.


From the logs, it skips compute-02 since the instance is already on 
that host.


> 2018-05-29 11:12:56.375 19428 INFO nova.scheduler.host_manager 
[req-45b8afd5-9683-40a6-8416-295563e37e34 
9bd03f63fa9d4beb8de31e6c2f2c8d12 56c3f5c047e74a78a714\

38c4412e6e13 - - -\
] Host filter ignoring hosts: compute-02.cloud.pd.infn.it 
<http://compute-02.cloud.pd.infn.it>


So it processes compute-01 and compute-03. It should accept compute-01 
since it's in the same tenant-specific aggregate and reject compute-03. 
But the filter rejects both hosts.


It would be useful to know what the tenant_id is when comparing against 
the aggregate metadata:


https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/filters/aggregate_multitenancy_isolation.py#L50

I'm wondering if the RequestSpec.project_id is null? Like, I wonder if 
you're hitting this bug:


https://bugs.launchpad.net/nova/+bug/1739318

Although if this is a clean Ocata environment with new instances, you 
shouldn't have that problem.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [nova] Need some feedback on the proposed heal_allocations CLI


On 5/28/2018 7:31 AM, Sylvain Bauza wrote:
That said, given I'm now working on using Nested Resource Providers for 
VGPU inventories, I wonder about a possible upgrade problem with VGPU 
allocations. Given that :
  - in Queens, VGPU inventories are for the root RP (ie. the compute 
node RP), but,
  - in Rocky, VGPU inventories will be for children RPs (ie. against a 
specific VGPU type), then


if we have VGPU allocations in Queens, when upgrading to Rocky, we 
should maybe recreate the allocations to a specific other inventory ?


For how the heal_allocations CLI works today, if the instance has any 
allocations in placement, it skips that instance. So this scenario 
wouldn't be a problem.




Hope you see the problem with upgrading by creating nested RPs ?


Yes, the CLI doesn't attempt to have any knowledge about nested resource 
providers, it just takes the flavor embedded in the instance and creates 
allocations against the compute node provider using the flavor. It has 
no explicit knowledge about granular request groups or more advanced 
features like that.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

[Openstack-operators] [nova] Need some feedback on the proposed heal_allocations CLI

2018-05-24 Thread Matt Riedemann

I've written a nova-manage placement heal_allocations CLI [1] which was 
a TODO from the PTG in Dublin as a step toward getting existing 
CachingScheduler users to roll off that (which is deprecated).


During the CERN cells v1 upgrade talk it was pointed out that CERN was 
able to go from placement-per-cell to centralized placement in Ocata 
because the nova-computes in each cell would automatically recreate the 
allocations in Placement in a periodic task, but that code is gone once 
you're upgraded to Pike or later.


In various other talks during the summit this week, we've talked about 
things during upgrades where, for instance, if placement is down for 
some reason during an upgrade, a user deletes an instance and the 
allocation doesn't get cleaned up from placement so it's going to 
continue counting against resource usage on that compute node even 
though the server instance in nova is gone. So this CLI could be 
expanded to help clean up situations like that, e.g. provide it a 
specific server ID and the CLI can figure out if it needs to clean 
things up in placement.


So there are plenty of things we can build into this, but the patch is 
already quite large. I expect we'll also be backporting this to stable 
branches to help operators upgrade/fix allocation issues. It already has 
several things listed in a code comment inline about things to build 
into this later.


My question is, is this good enough for a first iteration or is there 
something severely missing before we can merge this, like the automatic 
marker tracking mentioned in the code (that will probably be a 
non-trivial amount of code to add). I could really use some operator 
feedback on this to just take a look at what it already is capable of 
and if it's not going to be useful in this iteration, let me know what's 
missing and I can add that in to the patch.


[1] https://review.openstack.org/#/c/565886/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] Multiple Ceph pools for Nova?

2018-05-21 Thread Matt Riedemann

On 5/21/2018 11:51 AM, Smith, Eric wrote:
I have 2 Ceph pools, one backed by SSDs and one backed by spinning disks
(Separate roots within the CRUSH hierarchy). I’d like to run all
instances in a single project / tenant on SSDs and the rest on spinning
disks. How would I go about setting this up?

As mentioned elsewhere, host aggregate would work for the compute hosts
connected to each storage pool. Then you can have different flavors per
aggregate and charge more for the SSD flavors or restrict the aggregates
based on tenant [1].

Alternatively, if this is something you plan to eventually scale to a
larger size, you could even separate the pools with separate cells and
use resource provider aggregates in placement to mirror the host
aggregates for tenant-per-cell filtering [2]. It sounds like this is
very similar to what CERN does (cells per hardware characteristics and
projects assigned to specific cells). So Belmiro could probably help
give some guidance here too. Check out the talk he gave today at the
summit [3].

[1]
https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregatemultitenancyisolation
[2]
https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#tenant-isolation-with-placement
[3]
https://www.openstack.org/videos/vancouver-2018/moving-from-cellsv1-to-cellsv2-at-cern

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

[Openstack-operators] [nova] FYI on changes that might impact out of tree scheduler filters

2018-05-17 Thread Matt Riedemann

CERN has upgraded to Cells v2 and is doing performance testing of the 
scheduler and were reporting some things today which got us back to this 
bug [1]. So I've starting pushing some patches related to this but also 
related to an older blueprint I created [2]. In summary, we do quite a 
bit of DB work just to load up a list of instance objects per host that 
the in-tree filters don't even use.


The first change [3] is a simple optimization to avoid the default joins 
on the instance_info_caches and security_groups tables. If you have out 
of tree filters that, for whatever reason, rely on the 
HostState.instances objects to have info_cache or security_groups set, 
they'll continue to work, but will have to round-trip to the DB to 
lazy-load the fields, which is going to be a performance penalty on that 
filter. See the change for details.


The second change in the series [4] is more drastic in that we'll do 
away with pulling the full Instance object per host, which means only a 
select set of optional fields can be lazy-loaded [5], and the rest will 
result in an exception. The patch currently has a workaround config 
option to continue doing things the old way if you have out of tree 
filters that rely on this, but for good citizens with only in-tree 
filters, you will get a performance improvement during scheduling.


There are some other things we can do to optimize more of this flow, but 
this email is just about the ones that have patches up right now.


[1] https://bugs.launchpad.net/nova/+bug/1737465
[2] 
https://blueprints.launchpad.net/nova/+spec/put-host-manager-instance-info-on-a-diet

[3] https://review.openstack.org/#/c/569218/
[4] https://review.openstack.org/#/c/569247/
[5] 
https://github.com/openstack/nova/blob/de52fefa1fd52ccaac6807e5010c5f2a2dcbaab5/nova/objects/instance.py#L66


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] Need feedback for nova aborting cold migration function

2018-05-17 Thread Matt Riedemann


On 5/15/2018 3:48 AM, saga...@nttdata.co.jp wrote:

We store the service logs which are created by VM on that storage.


I don't mean to be glib, but have you considered maybe not doing that?

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] attaching network cards to VMs taking a very long time

2018-05-17 Thread Matt Riedemann


On 5/17/2018 9:46 AM, George Mihaiescu wrote:

and large rally tests of 500 instances complete with no issues.


Sure, except you can't ssh into the guests.

The whole reason the vif plugging is fatal and timeout and callback code 
was because the upstream CI was unstable without it. The server would 
report as ACTIVE but the ports weren't wired up so ssh would fail. 
Having an ACTIVE guest that you can't actually do anything with is kind 
of pointless.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] attaching network cards to VMs taking a very long time

2018-05-16 Thread Matt Riedemann


On 5/16/2018 10:30 AM, Radu Popescu | eMAG, Technology wrote:

but I can see nova attaching the interface after a huge amount of time.


What specifically are you looking for in the logs when you see this?

Are you passing pre-created ports to attach to nova or are you passing a 
network ID so nova will create the port for you during the attach call?


This is where the ComputeManager calls the driver to plug the vif on the 
host:


https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L5187

Assuming you're using the libvirt driver, the host vif plug happens here:

https://github.com/openstack/nova/blob/stable/ocata/nova/virt/libvirt/driver.py#L1463

And the guest is updated here:

https://github.com/openstack/nova/blob/stable/ocata/nova/virt/libvirt/driver.py#L1472

vif_plugging_is_fatal and vif_plugging_timeout don't come into play here 
because we're attaching an interface to an existing server - or are you 
talking about during the initial creation of the guest, i.e. this code 
in the driver?


https://github.com/openstack/nova/blob/stable/ocata/nova/virt/libvirt/driver.py#L5257

Are you seeing this in the logs for the given port?

https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L6875

If not, it could mean that neutron-server never send the event to nova, 
so nova-compute timed out waiting for the vif plug callback event to 
tell us that the port is ready and the server can be changed to ACTIVE 
status.


The neutron-server logs should log when external events are being sent 
to nova for the given port, you probably need to trace the requests and 
compare the nova-compute and neutron logs for a given server create request.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] New project creation fails because of a Nova check in a multi-region cloud

2018-05-10 Thread Matt Riedemann


On 5/10/2018 6:30 PM, Jean-Philippe Méthot wrote:
1.I was talking about the region-name parameter underneath 
keystone_authtoken. That is in the pike doc you linked, but I am unaware 
if this is only used for token generation or not. Anyhow, it doesn’t 
seem to have any impact on the issue at hand.


The [keystone]/region_name config option in nova is used to pike the 
identity service endpoint so I think in that case region_one will matter 
if there are multiple identity endpoints in the service catalog. The 
only thing is you're on pike where [keystone]/region_name isn't in 
nova.conf and it's not used, it was added in queens for this lookup:


https://review.openstack.org/#/c/507693/

So that might be why it doesn't seem to make a difference if you set it 
in nova.conf - because the nova code isn't actually using it.


You could try backporting that patch into your pike deployment, set 
region_name to RegionOne and see if it makes a difference (although I 
thought RegionOne was the default if not specified?).


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] Need feedback for nova aborting cold migration function

2018-05-10 Thread Matt Riedemann


On 5/9/2018 9:33 PM, saga...@nttdata.co.jp wrote:

We always do the maintenance work on midnight during limited time-slot to 
minimize impact to our users.


Also, why are you doing maintenance with cold migration? Why not do live 
migration for your maintenance (which already supports the abort function).


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] New project creation fails because of a Nova check in a multi-region cloud

2018-05-10 Thread Matt Riedemann


On 5/9/2018 8:11 PM, Jean-Philippe Méthot wrote:
I currently operate a multi-region cloud split between 2 geographic 
locations. I have updated it to Pike not too long ago, but I've been 
running into a peculiar issue. Ever since the Pike release, Nova now 
asks Keystone if a new project exists in Keystone before configuring the 
project’s quotas. However, there doesn’t seem to be any region 
restriction regarding which endpoint Nova will query Keystone on. So, 
right now, if I create a new project in region one, Nova will query 
Keystone in region two. Because my keystone databases are not synched in 
real time between each region, the region two Keystone will tell it that 
the new project doesn't exist, while it exists in region one Keystone.


Thinking that this could be a configuration error, I tried setting the 
region_name in keystone_authtoken, but that didn’t change much of 
anything. Right now I am thinking this may be a bug. Could someone 
confirm that this is indeed a bug and not a configuration error?


To circumvent this issue, I am considering either modifying the database 
by hand or trying to implement realtime replication between both 
Keystone databases. Would there be another solution? (beside modifying 
the code for the Nova check)


This is the specific code you're talking about:

https://github.com/openstack/nova/blob/stable/pike/nova/api/openstack/identity.py#L35

I don't see region_name as a config option for talking to keystone in Pike:

https://docs.openstack.org/nova/pike/configuration/config.html#keystone

But it is in Queens:

https://docs.openstack.org/nova/queens/configuration/config.html#keystone

That was added in this change:

https://review.openstack.org/#/c/507693/

But I think what you're saying is, since you have multiple regions, the 
project could be in any of them at any given time until they synchronize 
so configuring nova for a specific region isn't probably going to help 
in this case, right?


Isn't this somehow resolved with keystone federation? Granted, I'm not 
at all a keystone person, but I'd think this isn't a unique problem.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] [nova][ironic] ironic_host_manager and baremetal scheduler options removal


On 5/2/2018 12:39 PM, Matt Riedemann wrote:
FWIW, I think we can also backport the data migration CLI to stable 
branches once we have it available so you can do your migration in let's 
say Queens before g


FYI, here is the start on the data migration CLI:

https://review.openstack.org/#/c/565886/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild


On 5/2/2018 5:39 PM, Jay Pipes wrote:
My personal preference is to add less technical debt and go with a 
solution that checks if image traits have changed in nova-api and if so, 
simply refuse to perform a rebuild.


So, what if when I created my server, the image I used, let's say 
image1, had required trait A and that fit the host.


Then some external service removes (or somehow changes) trait A from the 
compute node resource provider (because people can and will do this, 
there are a few vmware specs up that rely on being able to manage traits 
out of band from nova), and then I rebuild my server with image2 that 
has required trait A. That would match the original trait A in image1 
and we'd say, "yup, lgtm!" and do the rebuild even though the compute 
node resource provider wouldn't have trait A anymore.


Having said that, it could technically happen before traits if the 
operator changed something on the underlying compute host which 
invalidated instances running on that host, but I'd think if that 
happened the operator would be migrating everything off the host and 
disabling it from scheduling before making whatever that kind of change 
would be, let's say they change the hypervisor or something less drastic 
but still image property invalidating.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] [nova][ironic] ironic_host_manager and baremetal scheduler options removal


On 5/2/2018 12:00 PM, Mathieu Gagné wrote:

If one can still run CachingScheduler (even if it's deprecated), I
think we shouldn't remove the above options.
As you can end up with a broken setup and IIUC no way to migrate to
placement since migration script has yet to be written.


You're currently on cells v1 on mitaka right? So you have some time to 
get this sorted out before getting to Rocky where the IronicHostManager 
is dropped.


I know you're just one case, but I don't know how many people are really 
running the CachingScheduler with ironic either, so it might be rare. It 
would be nice to get other operator input here, like I'm guessing CERN 
has their cells carved up so that certain cells are only serving 
baremetal requests while other cells are only VMs?


FWIW, I think we can also backport the data migration CLI to stable 
branches once we have it available so you can do your migration in let's 
say Queens before getting to Rocky.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] [nova][ironic] ironic_host_manager and baremetal scheduler options removal


On 5/2/2018 11:40 AM, Mathieu Gagné wrote:

What's the state of caching_scheduler which could still be using those configs?


The CachingScheduler has been deprecated since Pike [1]. We discussed 
the CachingScheduler at the Rocky PTG in Dublin [2] and have a TODO to 
write a nova-manage data migration tool to create allocations in 
Placement for instances that were scheduled using the CachingScheduler 
(since Pike) which don't have their own resource allocations set in 
Placement (remember that starting in Pike the FilterScheduler started 
creating allocations in Placement rather than the ResourceTracker in 
nova-compute).


If you're running computes that are Ocata or Newton, then the 
ResourceTracker in the nova-compute service should be creating the 
allocations in Placement for you, assuming you have the compute service 
configured to talk to Placement (optional in Newton, required in Ocata).


[1] https://review.openstack.org/#/c/492210/
[2] https://etherpad.openstack.org/p/nova-ptg-rocky-placement

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

[Openstack-operators] [nova][ironic] ironic_host_manager and baremetal scheduler options removal

The baremetal scheduling options were deprecated in Pike [1] and the 
ironic_host_manager was deprecated in Queens [2] and is now being 
removed [3]. Deployments must use resource classes now for baremetal 
scheduling. [4]


The large host subset size value is also no longer needed. [5]

I've gone through all of the references to "ironic_host_manager" that I 
could find in codesearch.o.o and updated projects accordingly [6].


Please reply ASAP to this thread and/or [3] if you have issues with this.

[1] https://review.openstack.org/#/c/493052/
[2] https://review.openstack.org/#/c/521648/
[3] https://review.openstack.org/#/c/565805/
[4] 
https://docs.openstack.org/ironic/latest/install/configure-nova-flavors.html#scheduling-based-on-resource-classes

[5] https://review.openstack.org/565736/
[6] 
https://review.openstack.org/#/q/topic:exact-filters+(status:open+OR+status:merged)


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild