Re: [openstack-dev] Development workflow for bunch of patches

2017-04-19 Thread Eric Fried
I've always used rebase rather than cherry-pick in this situation.
Bonus is that sometimes (if no conflicts) I can do the rebase in gerrit
with two clicks rather than locally with a bunch of typing.
@kevinbenton, is there a benefit to using cherry-pick rather than rebase?

Thanks,
Eric Fried (efried)

On 04/19/2017 03:39 AM, Sławek Kapłoński wrote:
> Hello,
> 
> Thanks a lot :)
> 
> — 
> Best regards
> Slawek Kaplonski
> sla...@kaplonski.pl <mailto:sla...@kaplonski.pl>
> 
> 
> 
>> Wiadomość napisana przez Kevin Benton > <mailto:ke...@benton.pub>> w dniu 19.04.2017, o godz. 10:25:
>>
>> Whenever you want to work on the second patch you would need to first
>> checkout the latest version of the first patch and then cherry-pick
>> the later patch on top of it. That way when you update the second one
>> it won't affect the first patch.
>>
>> The -R flag can also be used to prevent unexpected rebases of the
>> parent patch. More details here:
>>
>> https://docs.openstack.org/infra/manual/developers.html#adding-a-dependency
>>
>> On Wed, Apr 19, 2017 at 1:11 AM, Sławek Kapłoński > <mailto:sla...@kaplonski.pl>> wrote:
>>
>> Hello,
>>
>> I have a question about how to deal with bunch of patches which
>> depends one on another.
>> I did patch to neutron (https://review.openstack.org/#/c/449831/
>> <https://review.openstack.org/#/c/449831/>) which is not merged
>> yet but I wanted to start also another patch which is depend on
>> this one (https://review.openstack.org/#/c/457816/
>> <https://review.openstack.org/#/c/457816/>).
>> Currently I was trying to do something like:
>> 1. git review -d 
>> 2. git checkout -b new_branch_for_second_patch
>> 3. Make second patch, commit all changes
>> 4. git review <— this will ask me if I really want to push two
>> patches to gerrit so I answered „yes”
>>
>> Everything is easy for me as long as I’m not doing more changes in
>> first patch. How I should work with it if I let’s say want to
>> change something in first patch and later I want to make another
>> change to second patch? IIRC when I tried to do something like
>> that and I made „git review” to push changes in second patch,
>> first one was also updated (and I lost changes made for this one
>> in another branch).
>> How I should work with something like that? Is there any guide
>> about that (I couldn’t find such)?
>>
>> — 
>> Best regards
>> Slawek Kaplonski
>> sla...@kaplonski.pl <mailto:sla...@kaplonski.pl>
>>
>>
>>
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> <http://openstack-dev-requ...@lists.openstack.org/?subject:unsubscribe>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org
>> <mailto:openstack-dev-requ...@lists.openstack.org>?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Taskflow] Current state or the project ?

2017-04-19 Thread Eric Fried
Robin-

> Others (with a slightly less bias than I might have, haha) though I 
> think should chime in on there experiences :)

I can tell you we've been using TaskFlow in some fairly nontrivial ways
in the PowerVM compute driver [1][2][3] and pypowervm [4], the library
that supports it.  We've found it to be a boon, especially for automated
cleanup (via revert() chains) when something goes wrong.  Doing this
kind of workflow management is inherently complicated, but we find
TaskFlow makes it about as straightforward as we could reasonably expect
it to be.

Good luck.

Eric Fried (efried)

[1]
https://github.com/openstack/nova-powervm/tree/stable/ocata/nova_powervm/virt/powervm/tasks
[2]
https://github.com/openstack/nova-powervm/blob/stable/ocata/nova_powervm/virt/powervm/driver.py#L380
[3]
https://github.com/openstack/nova-powervm/blob/stable/ocata/nova_powervm/virt/powervm/driver.py#L567
[4]
https://github.com/powervm/pypowervm/blob/release/1.1.2/pypowervm/utils/transaction.py#L498

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][oslo.utils] Bug-1680130 Check validation of UUID length

2017-04-24 Thread Eric Fried
+1.  Provide a sanitize_uuid() or similar, which may be as simple as:

def sanitize_uuid(val):
try:
return uuid.UUID(val)
except ValueError:
raise SomePossiblyNewException(...)

UUID consumers are encouraged, but not required, to use it - so we
retain backward compatibility overall, and fixes like this one can be
implemented individually.

On 04/24/2017 08:53 AM, Doug Hellmann wrote:
> Excerpts from Jadhav, Pooja's message of 2017-04-24 13:45:07 +:
>> Hi Devs,
>>
>> I want your opinion about bug: https://bugs.launchpad.net/nova/+bug/1680130
>>
>> When user passes incorrect formatted UUID, volume UUID like: 
>> -----(please note double hyphen) for 
>> attaching a volume to an instance using "volume-attach" API then it results 
>> into DBDataError with following error message: "Data too long for column 
>> 'volume_id'". The reason is in database "block_device_mapping" table has 
>> "volume_id" field of 36 characters only so while inserting data to the table 
>> through 'BlockDeviceMapping' object it raises DBDaTaError.
>>
>> In current code, volume_id is of 'UUID' format so it uses "is_uuid_like"[4] 
>> method of oslo_utils and in this method, it removes all the hyphens and 
>> checks 32 length UUID and returns true or false. As 
>> "-----" this UUID treated as valid and 
>> further it goes into database table for insertion, as its size is more than 
>> 36 characters it gives DBDataError.
>>
>> There are various solutions we can apply to validate volume UUID in this 
>> case:
>>
>> Solution 1:
>> We can restrict the length of volume UUID using maxlength property in schema 
>> validation.
>>
>> Advantage:
>> This solution is better than solution 2 and 3 as we can restrict the invalid 
>> UUID at schema [1] level itself by adding 'maxLength'[2].
>>
>> Solution 2:
>> Before creating a volume BDM object, we can check that the provided volume 
>> is actually present or not.
>>
>> Advantage:
>> Volume BDM creation can be avoided if the volume does not exists.
>>
>> Disadvantage:
>> IMO this solution is not better because we need to change the current code. 
>> Because in the current code after creating volume BDM it is checking volume 
>> is exists or not.
>> We have to check volume existence before creating volume BDM object. For 
>> that we need to modify the "_check_attach_and_reserve_volume" method [3]. 
>> But this method get used at 3 places. According to it, we have to modify all 
>> the occurrences as per behavior.
>>
>> Solution 3:
>> We can check UUID in central place means in "is_uuid_like" method of 
>> oslo_utils [4].
>>
>> Advantage:
>> If we change the "is_uuid_like" method then same issue might be solved for 
>> the rest of the APIs.
>>
>> Disadvantage:
>> IMO this also not a better solution because if we change the "is_uuid_like" 
>> method then it will affect on several different projects.
> 
> Another option would be to convert the input value to a canonical form.
> So if is_uuid_like() returns true, then pass the value to a new function
> format_canonical_uuid() which would format it with the proper number of
> hyphens. That value could then be stored correctly.
> 
> Doug
> 
>>
>> Please let me know your opinion for the same.
>>
>> [1] 
>> https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/schemas/volumes.py#L65
>>
>> [2] 
>> https://github.com/openstack/nova/blob/master/nova/api/validation/parameter_types.py#L297
>>
>> [3] https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3721
>>
>> [4] 
>> https://github.com/openstack/oslo.utils/blob/master/oslo_utils/uuidutils.py#L45
>>
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][oslo.utils] Bug-1680130 Check validation of UUID length

2017-04-24 Thread Eric Fried
That's not the only way you can break this, though.  For example,
'12-3-45-6-78-12-3456-781-234-56-781-234-56-79' still passes the
modified is_uuid_like(), but still manifests the bug.

Trying to get is_uuid_like() to cover all possible formatting snafus
while still allowing the same formats as before (e.g. without any
hyphens at all) is a rabbit hole of mystical depths.

On 04/24/2017 09:44 AM, Jay Pipes wrote:
> On 04/24/2017 09:45 AM, Jadhav, Pooja wrote:
>> Solution 3:
>>
>> We can check UUID in central place means in "is_uuid_like" method of
>> oslo_utils [4].
> 
> This gets my vote. It's a bug in the is_uuid_like() function, IMHO, that
> is returns True for badly-formatted UUID values (like having two
> consecutive hyphens).
> 
> FTR, the fix would be pretty simple. Just change this [1] line from this:
> 
> return str(uuid.UUID(val)).replace('-', '') == _format_uuid_string(val)
> 
> to this:
> 
> # Disallow two consecutive hyphens
> if '--' in val:
> raise TypeError
> return str(uuid.UUID(val)).replace('-', '') == _format_uuid_string(val)
> 
> Fix it there and you fix this issue for all projects that use it.
> 
> Best,
> -jay
> 
> [1]
> https://github.com/openstack/oslo.utils/blob/master/oslo_utils/uuidutils.py#L56
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][glance] Who needs multiple api_servers?

2017-04-27 Thread Eric Fried
Y'all-

TL;DR: Does glance ever really need/use multiple endpoint URLs?

I'm working on bp use-service-catalog-for-endpoints[1], which intends
to deprecate disparate conf options in various groups, and centralize
acquisition of service endpoint URLs.  The idea is to introduce
nova.utils.get_service_url(group) -- note singular 'url'.

One affected conf option is [glance]api_servers[2], which currently
accepts a *list* of endpoint URLs.  The new API will only ever return *one*.

Thus, as planned, this blueprint will have the side effect of
deprecating support for multiple glance endpoint URLs in Pike, and
removing said support in Queens.

Some have asserted that there should only ever be one endpoint URL for
a given service_type/interface combo[3].  I'm fine with that - it
simplifies things quite a bit for the bp impl - but wanted to make sure
there were no loudly-dissenting opinions before we get too far down this
path.

[1]
https://blueprints.launchpad.net/nova/+spec/use-service-catalog-for-endpoints
[2]
https://github.com/openstack/nova/blob/7e7bdb198ed6412273e22dea72e37a6371fce8bd/nova/conf/glance.py#L27-L37
[3]
http://eavesdrop.openstack.org/irclogs/%23openstack-keystone/%23openstack-keystone.2017-04-27.log.html#t2017-04-27T20:38:29

Thanks,
Eric Fried (efried)
.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [nova][glance] Who needs multiple api_servers?

2017-04-28 Thread Eric Fried
Blair, Mike-

There will be an endpoint_override that will bypass the service
catalog.  It still only takes one URL, though.

Thanks,
Eric (efried)

On 04/27/2017 11:50 PM, Blair Bethwaite wrote:
> We at Nectar are in the same boat as Mike. Our use-case is a little
> bit more about geo-distributed operations though - our Cells are in
> different States around the country, so the local glance-apis are
> particularly important for caching popular images close to the
> nova-computes. We consider these glance-apis as part of the underlying
> cloud infra rather than user-facing, so I think we'd prefer not to see
> them in the service-catalog returned to users either... is there going
> to be a (standard) way to hide them?
> 
> On 28 April 2017 at 09:15, Mike Dorman  wrote:
>> We make extensive use of the [glance]/api_servers list.  We configure that 
>> on hypervisors to direct them to Glance servers which are more “local” 
>> network-wise (in order to reduce network traffic across security 
>> zones/firewalls/etc.)  This way nova-compute can fail over in case one of 
>> the Glance servers in the list is down, without putting them behind a load 
>> balancer.  We also don’t run https for these “internal” Glance calls, to 
>> save the overhead when transferring images.
>>
>> End-user calls to Glance DO go through a real load balancer and then are 
>> distributed out to the Glance servers on the backend.  From the end-user’s 
>> perspective, I totally agree there should be one, and only one URL.
>>
>> However, we would be disappointed to see the change you’re suggesting 
>> implemented.  We would lose the redundancy we get now by providing a list.  
>> Or we would have to shunt all the calls through the user-facing endpoint, 
>> which would generate a lot of extra traffic (in places where we don’t want 
>> it) for image transfers.
>>
>> Thanks,
>> Mike
>>
>>
>>
>> On 4/27/17, 4:02 PM, "Matt Riedemann"  wrote:
>>
>> On 4/27/2017 4:52 PM, Eric Fried wrote:
>> > Y'all-
>> >
>> >   TL;DR: Does glance ever really need/use multiple endpoint URLs?
>> >
>> >   I'm working on bp use-service-catalog-for-endpoints[1], which intends
>> > to deprecate disparate conf options in various groups, and centralize
>> > acquisition of service endpoint URLs.  The idea is to introduce
>> > nova.utils.get_service_url(group) -- note singular 'url'.
>> >
>> >   One affected conf option is [glance]api_servers[2], which currently
>> > accepts a *list* of endpoint URLs.  The new API will only ever return 
>> *one*.
>> >
>> >   Thus, as planned, this blueprint will have the side effect of
>> > deprecating support for multiple glance endpoint URLs in Pike, and
>> > removing said support in Queens.
>> >
>> >   Some have asserted that there should only ever be one endpoint URL 
>> for
>> > a given service_type/interface combo[3].  I'm fine with that - it
>> > simplifies things quite a bit for the bp impl - but wanted to make sure
>> > there were no loudly-dissenting opinions before we get too far down 
>> this
>> > path.
>> >
>> > [1]
>> > 
>> https://blueprints.launchpad.net/nova/+spec/use-service-catalog-for-endpoints
>> > [2]
>> > 
>> https://github.com/openstack/nova/blob/7e7bdb198ed6412273e22dea72e37a6371fce8bd/nova/conf/glance.py#L27-L37
>> > [3]
>> > 
>> http://eavesdrop.openstack.org/irclogs/%23openstack-keystone/%23openstack-keystone.2017-04-27.log.html#t2017-04-27T20:38:29
>> >
>> > Thanks,
>> > Eric Fried (efried)
>> > .
>> >
>> > 
>> __
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe: 
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>>
>> +openstack-operators
>>
>> --
>>
>> Thanks,
>>
>> Matt
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: 
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> ___
>> OpenStack-operators mailing list
>> openstack-operat...@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> 
> 
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [nova][glance] Who needs multiple api_servers?

2017-04-28 Thread Eric Fried
> 
> 
> On 04/27/2017 11:50 PM, Blair Bethwaite wrote:
> > We at Nectar are in the same boat as Mike. Our use-case is a little
> > bit more about geo-distributed operations though - our Cells are in
> > different States around the country, so the local glance-apis are
> > particularly important for caching popular images close to the
> > nova-computes. We consider these glance-apis as part of the underlying
> > cloud infra rather than user-facing, so I think we'd prefer not to see
> > them in the service-catalog returned to users either... is there going
> > to be a (standard) way to hide them?
> >
> > On 28 April 2017 at 09:15, Mike Dorman  wrote:
> >> We make extensive use of the [glance]/api_servers list.  We configure 
> that on hypervisors to direct them to Glance servers which are more “local” 
> network-wise (in order to reduce network traffic across security 
> zones/firewalls/etc.)  This way nova-compute can fail over in case one of the 
> Glance servers in the list is down, without putting them behind a load 
> balancer.  We also don’t run https for these “internal” Glance calls, to save 
> the overhead when transferring images.
> >>
> >> End-user calls to Glance DO go through a real load balancer and then 
> are distributed out to the Glance servers on the backend.  From the 
> end-user’s perspective, I totally agree there should be one, and only one URL.
> >>
> >> However, we would be disappointed to see the change you’re suggesting 
> implemented.  We would lose the redundancy we get now by providing a list.  
> Or we would have to shunt all the calls through the user-facing endpoint, 
> which would generate a lot of extra traffic (in places where we don’t want 
> it) for image transfers.
> >>
> >> Thanks,
> >> Mike
> >>
> >>
> >>
> >> On 4/27/17, 4:02 PM, "Matt Riedemann"  wrote:
> >>
> >> On 4/27/2017 4:52 PM, Eric Fried wrote:
> >> > Y'all-
> >> >
> >> >   TL;DR: Does glance ever really need/use multiple endpoint URLs?
> >> >
> >> >   I'm working on bp use-service-catalog-for-endpoints[1], which 
> intends
> >> > to deprecate disparate conf options in various groups, and 
> centralize
> >> > acquisition of service endpoint URLs.  The idea is to introduce
> >> > nova.utils.get_service_url(group) -- note singular 'url'.
> >> >
> >> >   One affected conf option is [glance]api_servers[2], which 
> currently
> >> > accepts a *list* of endpoint URLs.  The new API will only ever 
> return *one*.
> >> >
> >> >   Thus, as planned, this blueprint will have the side effect of
> >> > deprecating support for multiple glance endpoint URLs in Pike, 
> and
> >> > removing said support in Queens.
> >> >
> >> >   Some have asserted that there should only ever be one endpoint 
> URL for
> >> > a given service_type/interface combo[3].  I'm fine with that - it
> >> > simplifies things quite a bit for the bp impl - but wanted to 
> make sure
> >> > there were no loudly-dissenting opinions before we get too far 
> down this
> >> > path.
> >> >
> >> > [1]
> >> > 
> https://blueprints.launchpad.net/nova/+spec/use-service-catalog-for-endpoints
> >> > [2]
> >> > 
> https://github.com/openstack/nova/blob/7e7bdb198ed6412273e22dea72e37a6371fce8bd/nova/conf/glance.py#L27-L37
> >> > [3]
> >> > 
> http://eavesdrop.openstack.org/irclogs/%23openstack-keystone/%23openstack-keystone.2017-04-27.log.html#t2017-04-27T20:38:29
> >> >
> >> > Thanks,
> >> > Eric Fried (efried)
> >> > .
> >> >
> >> > 
> __
> >> > OpenStack Development Mailing List (not for usage questions)
> >> > Unsubscribe: 
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >> >
> >>
> >> +openstack-operators
> &

Re: [openstack-dev] [all][tc][cinder][mistral][manila] A path forward to shiny consistent service types

2017-04-28 Thread Eric Fried
I love this.  Will it be done by July 20th [1] so I can use it in Pike
for [2]?

[1] https://wiki.openstack.org/wiki/Nova/Pike_Release_Schedule
[2] https://review.openstack.org/#/c/458257/4/nova/utils.py@1508

On 04/28/2017 05:26 PM, Monty Taylor wrote:
> Hey everybody!
> 
> Yay! (I'm sure you're all saying this, given the topic. I'll let you
> collect yourself from your exuberant celebration)
> 
> == Background ==
> 
> As I'm sure you all know, we've been trying to make some hearway for a
> while on getting service-types that are registered in the keystone
> service catalog to be consistent. The reason for this is so that API
> Consumers can know how to request a service from the catalog. That might
> sound like a really easy task - but uh-hoh, you'd be so so wrong. :)
> 
> The problem is that we have some services that went down the path of
> suggesting people register a new service in the catalog with a version
> appended. This pattern was actually started by nova for the v3 api but
> which we walked back from - with "computev3". The pattern was picked up
> by at least cinder (volumev2, volumev3) and mistral (workflowv2) that I
> am aware of. We're also suggesting in the service-types-authority that
> manila go by "shared-file-system" instead of "share".
> 
> (Incidentally, this is related to a much larger topic of version
> discovery, which I will not bore you with in this email, but about which
> I have a giant pile of words just waiting for you in a little bit. Get
> excited about that!)
> 
> == Proposed Solution ==
> 
> As a follow up to the consuming version discovery spec, which you should
> absolutely run away from and never read, I wrote these:
> 
> https://review.openstack.org/#/c/460654/ (Consuming historical aliases)
> and
> https://review.openstack.org/#/c/460539/ (Listing historical aliases)
> 
> It's not a particularly clever proposal - but it breaks down like this:
> 
> * Make a list of the known historical aliases we're aware of - in a
> place that isn't just in one of our python libraries (460539)
> * Write down a process for using them as part of finding a service from
> the catalog so that there is a clear method that can be implemented by
> anyone doing libraries or REST interactions. (460654)
> * Get agreement on that process as the "recommended" way to look up
> services by service-type in the catalog.
> * Implement it in the base libraries OpenStack ships.
> * Contact the authors of as many OpenStack API libraries that we can find.
> * Add tempest tests to verify the mappings in both directions.
> * Change things in devstack/deployer guides.
> 
> The process as described is backwards compatible. That is, once
> implemented it means that a user can request "volumev2" or
> "block-storage" with version=2 - and both will return the endpoint the
> user expects. It also means that we're NOT asking existing clouds to run
> out and break their users. New cloud deployments can do the new thing -
> but the old values are handled in both directions.
> 
> There is a hole, which is that people who are not using the base libs
> OpenStack ships may find themselves with a new cloud that has a
> different service-type in the catalog than they have used before. It's
> not idea, to be sure. BUT - hopefully active outreach to the community
> libraries coupled with documentation will keep the issues to a minimum.
> 
> If we can agree on the matching and fallback model, I am volunteering to
> do the work to implement in every client library in which it needs to be
> implemented across OpenStack and to add the tempest tests. (it's
> actually mostly a patch to keystoneauth, so that's actually not _that_
> impressive of a volunteer) I will also reach out to as many of the
> OpenStack API client library authors as I can find, point them at the
> docs and suggest they add the support.
> 
> Thoughts? Anyone violently opposed?
> 
> Thanks for reading...
> 
> Monty
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [nova][glance] Who needs multiple api_servers?

2017-05-01 Thread Eric Fried
Matt-

Yeah, clearly other projects have the same issuethis blueprint is
trying to solve in nova.  I think the idea is that, once the
infrastructure is in place and nova has demonstrated the concept, other
projects can climbaboard.

It's conceivable that the new get_service_url() method could be
moved to a more common lib (ksaor os-client-config perhaps) in the
future to facilitate this.

Eric (efried)

On 05/01/2017 09:17 AM, Matthew Treinish wrote:
> On Mon, May 01, 2017 at 05:00:17AM -0700, Flavio Percoco wrote:
>> On 28/04/17 11:19 -0500, Eric Fried wrote:
>>> If it's *just* glance we're making an exception for, I prefer #1 (don't
>>> deprecate/remove [glance]api_servers).  It's way less code &
>>> infrastructure, and it discourages others from jumping on the
>>> multiple-endpoints bandwagon.  If we provide endpoint_override_list
>>> (handwave), people will think it's okay to use it.
>>>
>>> Anyone aware of any other services that use multiple endpoints?
>> Probably a bit late but yeah, I think this makes sense. I'm not aware of 
>> other
>> projects that have list of api_servers.
> I thought it was just nova too, but it turns out cinder has the same exact
> option as nova: (I hit this in my devstack patch trying to get glance deployed
> as a wsgi app)
>
> https://github.com/openstack/cinder/blob/d47eda3a3ba9971330b27beeeb471e2bc94575ca/cinder/common/config.py#L51-L55
>
> Although from what I can tell you don't have to set it and it will fallback to
> using the catalog, assuming you configured the catalog info for cinder:
>
> https://github.com/openstack/cinder/blob/19d07a1f394c905c23f109c1888c019da830b49e/cinder/image/glance.py#L117-L129
>
>
> -Matt Treinish
>
>
>>> On 04/28/2017 10:46 AM, Mike Dorman wrote:
>>>> Maybe we are talking about two different things here?  I’m a bit confused.
>>>>
>>>> Our Glance config in nova.conf on HV’s looks like this:
>>>>
>>>> [glance]
>>>> api_servers=http://glance1:9292,http://glance2:9292,http://glance3:9292,http://glance4:9292
>>>> glance_api_insecure=True
>>>> glance_num_retries=4
>>>> glance_protocol=http
>>
>> FWIW, this feature is being used as intended. I'm sure there are ways to 
>> achieve
>> this using external tools like haproxy/nginx but that adds an extra burden to
>> OPs that is probably not necessary since this functionality is already there.
>>
>> Flavio
>>
>>>> So we do provide the full URLs, and there is SSL support.  Right?  I am 
>>>> fairly certain we tested this to ensure that if one URL fails, nova goes 
>>>> on to retry the next one.  That failure does not get bubbled up to the 
>>>> user (which is ultimately the goal.)
>>>>
>>>> I don’t disagree with you that the client side choose-a-server-at-random 
>>>> is not a great load balancer.  (But isn’t this roughly the same thing that 
>>>> oslo-messaging does when we give it a list of RMQ servers?)  For us it’s 
>>>> more about the failure handling if one is down than it is about actually 
>>>> equally distributing the load.
>>>>
>>>> In my mind options One and Two are the same, since today we are already 
>>>> providing full URLs and not only server names.  At the end of the day, I 
>>>> don’t feel like there is a compelling argument here to remove this 
>>>> functionality (that people are actively making use of.)
>>>>
>>>> To be clear, I, and I think others, are fine with nova by default getting 
>>>> the Glance endpoint from Keystone.  And that in Keystone there should 
>>>> exist only one Glance endpoint.  What I’d like to see remain is the 
>>>> ability to override that for nova-compute and to target more than one 
>>>> Glance URL for purposes of fail over.
>>>>
>>>> Thanks,
>>>> Mike
>>>>
>>>>
>>>>
>>>>
>>>> On 4/28/17, 8:20 AM, "Monty Taylor"  wrote:
>>>>
>>>> Thank you both for your feedback - that's really helpful.
>>>>
>>>> Let me say a few more words about what we're trying to accomplish here
>>>> overall so that maybe we can figure out what the right way forward is.
>>>> (it may be keeping the glance api servers setting, but let me at least
>>>> make the case real quick)
>>>>
>>>>  From a 10,000 foot view, the thing we're

Re: [openstack-dev] [Openstack-operators] [nova][glance] Who needs multiple api_servers?

2017-05-01 Thread Eric Fried
Sam-

Under the current design, you can provide a specific endpoint
(singular) via the `endpoint_override` conf option.  Based on feedback
on this thread, we will also be keeping support for
`[glance]api_servers` for consumers who actually need to be able to
specify multiple endpoints.  See latest spec proposal[1] for details.

[1] https://review.openstack.org/#/c/461481/

Thanks,
Eric (efried)

On 05/01/2017 12:20 PM, Sam Morrison wrote:
> 
>> On 1 May 2017, at 4:24 pm, Sean McGinnis  wrote:
>>
>> On Mon, May 01, 2017 at 10:17:43AM -0400, Matthew Treinish wrote:
 
>>>
>>> I thought it was just nova too, but it turns out cinder has the same exact
>>> option as nova: (I hit this in my devstack patch trying to get glance 
>>> deployed
>>> as a wsgi app)
>>>
>>> https://github.com/openstack/cinder/blob/d47eda3a3ba9971330b27beeeb471e2bc94575ca/cinder/common/config.py#L51-L55
>>>
>>> Although from what I can tell you don't have to set it and it will fallback 
>>> to
>>> using the catalog, assuming you configured the catalog info for cinder:
>>>
>>> https://github.com/openstack/cinder/blob/19d07a1f394c905c23f109c1888c019da830b49e/cinder/image/glance.py#L117-L129
>>>
>>>
>>> -Matt Treinish
>>>
>>
>> FWIW, that came with the original fork out of Nova. I do not have any real
>> world data on whether that is used or not.
> 
> Yes this is used in cinder.
> 
> A lot of the projects you can set endpoints for them to use. This is 
> extremely useful in a a large production Openstack install where you want to 
> control the traffic.
> 
> I can understand using the catalog in certain situations and feel it’s OK for 
> that to be the default but please don’t prevent operators configuring it 
> differently.
> 
> Glance is the big one as you want to control the data flow efficiently but 
> any service to service configuration should ideally be able to be manually 
> configured.
> 
> Cheers,
> Sam
> 
> 
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] neutron-lib impact: neutron.plugins.common.constants are now in neutron-lib

2017-05-03 Thread Eric Fried
Boden-

Can you please point to the change(s) that move the constants?  Or
provide some other way to figure out which are affected?

Thanks,
Eric (efried)

On 05/03/2017 09:30 AM, Boden Russell wrote:
> If your project uses neutron.plugins.common.constants please read on. If
> not, it's probably safe to discard this message.
> 
> A number of the constants from neutron.plugins.common.constants have
> moved into neutron-lib:
> - The service type constants are in neutron_lib.plugins.constants
> - Many of the others are in neutron_lib.constants
> - A few have not yet been rehomed into neutron-lib
> 
> Suggested actions:
> - If you work on a stadium project, you should already be covered [1].
> - Otherwise, please move your project's imports to use the constants in
> neutron-lib rather than neutron. See the work in [1] for reference.
> 
> Once we see sub-projects moved off these constants, we'll begin removing
> them from neutron.plugins.common.constants (likely as a set of patches).
> We can discuss this topic in our weekly neutron meeting.
> 
> Thanks
> 
> 
> [1]
> https://review.openstack.org/#/q/message:%22use+neutron-lib+constants+rather%22
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [cinder] Follow up on Nova/Cinder summit sessions from an ops perspective

2017-05-15 Thread Eric Fried
> If there are alternative ideas on how to design or model this, I'm all
> ears.

How about aggregates?
https://www.youtube.com/watch?v=fu6jdGGdYU4&feature=youtu.be&t=1784

On 05/15/2017 05:04 PM, Matt Riedemann wrote:
> On 5/15/2017 2:28 PM, Edmund Rhudy (BLOOMBERG/ 120 PARK) wrote:
>> Hi all,
>>
>> I'd like to follow up on a few discussions that took place last week in
>> Boston, specifically in the Compute Instance/Volume Affinity for HPC
>> session
>> (https://etherpad.openstack.org/p/BOS-forum-compute-instance-volume-affinity-hpc).
>>
>>
>> In this session, the discussions all trended towards adding more
>> complexity to the Nova UX, like adding --near and --distance flags to
>> the nova boot command to have the scheduler figure out how to place an
>> instance near some other resource, adding more fields to flavors or
>> flavor extra specs, etc.
>>
>> My question is: is it the right question to ask how to add more
>> fine-grained complications to the OpenStack user experience to support
>> what seemed like a pretty narrow use case?
> 
> I think we can all agree we don't want to complicate the user experience.
> 
>>
>> The only use case that I remember hearing was an operator not wanting it
>> to be possible for a user to launch an instance in a particular Nova AZ
>> and then not be able to attach a volume from a different Cinder AZ, or
>> they try to boot an instance from a volume in the wrong place and get a
>> failure to launch. This seems okay to me, though - either the user has
>> to rebuild their instance in the right place or Nova will just return an
>> error during instance build. Is it worth adding all sorts of
>> convolutions to Nova to avoid the possibility that somebody might have
>> to build instances a second time?
> 
> We might have gone down this path but it's not the intention or the use
> case as I thought I had presented it, and is in the etherpad. For what
> you're describing, we already have the CONF.cinder.cross_az_attach
> option in nova which prevents you from booting or attaching a volume to
> an instance in a different AZ from the instance. That's not what we're
> talking about though.
> 
> The use case, as I got from the mailing list discussion linked in the
> etherpad, is a user wants their volume attached as close to local
> storage for the instance as possible for performance reasons. If this
> could be on the same physical server, great. But there is the case where
> the operator doesn't want to use any local disk on the compute and wants
> to send everything to Cinder, and the backing storage might not be on
> the same physical server, so that's where we started talking about
> --near or --distance (host, rack, row, data center, etc).
> 
>>
>> The feedback I get from my cloud-experienced users most frequently is
>> that they want to know why the OpenStack user experience in the storage
>> area is so radically different from AWS, which is what they all have
>> experience with. I don't really have a great answer for them, except to
>> admit that in our clouds they just have to know what combination of
>> flavors and Horizon options or BDM structure is going to get them the
>> right tradeoff between storage durability and speed. I was pleased with
>> how the session on expanding Cinder's role for Nova ephemeral storage
>> went because of the suggestion of reducing Nova imagebackend's role to
>> just the file driver and having Cinder take over for everything else.
>> That, to me, is the kind of simplification that's a win-win for both
>> devs and ops: devs get to radically simplify a thorny part of the Nova
>> codebase, storage driver development only has to happen in Cinder,
>> operators get a storage workflow that's easier to explain to users.
>>
>> Am I off base in the view of not wanting to add more options to nova
>> boot and more logic to the scheduler? I know the AWS comparison is a
>> little North America-centric (this came up at the summit a few times
>> that EMEA/APAC operators may have very different ideas of a normal cloud
>> workflow), but I am striving to give my users a private cloud that I can
>> define for them in terms of AWS workflows and vocabulary. AWS by design
>> restricts where your volumes can live (you can use instance store
>> volumes and that data is gone on reboot or terminate, or you can put EBS
>> volumes in a particular AZ and mount them on instances in that AZ), and
>> I don't think that's a bad thing, because it makes it easy for the users
>> to understand the contract they're getting from the platform when it
>> comes to where their data is stored and what instances they can attach
>> it to.
>>
> 
> Again, we don't want to make the UX more complicated, but as noted in
> the etherpad, the solution we have today is if you want the same
> instance and volume on the same host for performance reasons, then you
> need to have a 1:1 relationship for AZs and hosts since AZs are exposed
> to the user. In a public cloud where you've got hundreds of t

Re: [openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

2017-05-16 Thread Eric Fried
> The idea is that a regular user calling into a service should not
> be able to set the request id, but outgoing calls from that service
> to other services as part of the same request would.

Yeah, so can anyone explain to me why this is a real problem?  If a
regular user wanted to be a d*ck and inject a bogus (or worse, I
imagine, duplicated) request-id, can any actual harm come out of it?  Or
does it just cause confusion to the guy reading the logs later?

(I'm assuming, of course, that the format will still be validated
strictly (req-$UUID) to preclude code injection kind of stuff.)

Thanks,
Eric (efried)
.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [all] Log coloring under systemd

2017-05-17 Thread Eric Fried
Folks-

As of [1], devstack will include color escapes in the default log
formats under systemd.  Production deployments can emulate as they see fit.

Note that journalctl will strip those color escapes by default, which
is why we thought we lost log coloring with systemd.  Turns out that you
can get the escapes to come through by passing the -a flag to
journalctl.  The doc at [2] has been updated accordingly.  If there are
any other go-to documents that could benefit from similar content,
please let me know (or propose the changes).

Thanks,
Eric (efried)

[1] https://review.openstack.org/#/c/465147/
[2] https://docs.openstack.org/developer/devstack/systemd.html

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [glance] please approve test patch to fix glanceclient

2017-05-24 Thread Eric Fried
Thanks Nikhil.  This one is also needed to make py35 pass:

https://review.openstack.org/#/c/396816/

E

On 05/24/2017 10:55 AM, Nikhil Komawar wrote:
> thanks for bringing it up. this is done.
> 
> On Wed, May 24, 2017 at 10:54 AM, Sean Dague  > wrote:
> 
> python-glanceclient patches have been failing for at least a week due to
> a requests change. The fix was posted 5 days ago -
> https://review.openstack.org/#/c/466385
> 
> 
> It would be nice to get that approved so that other patches could be
> considered.
> 
> -Sean
> 
> --
> Sean Dague
> http://dague.net
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> 
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] using keystone right - catalog, endpoints, tokens and noauth

2017-05-24 Thread Eric Fried
Pavlo-

There's a blueprint [1] whereby we're trying to address a bunch of
these same concerns in nova.  You can see the first part in action here
[2].  However, it has become clear that nova is just one of the many
services that would benefit from get_service_url().  With the full
support of mordred (let's call it The Full Monty), we've got our sights
on moving that method into ksa itself for that purpose.

Please have a look at this blueprint and change set.  Let us know if
your concerns would be addressed if this were available to you from ksa.

[1]
https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/use-service-catalog-for-endpoints.html
[2] https://review.openstack.org/#/c/458257/

Thanks,
efried

On 05/24/2017 04:46 AM, Pavlo Shchelokovskyy wrote:
> Hi all,
> 
> There are several problems or inefficiencies in how we are dealing with
> auth to other services. Although it became much better in Newton, some
> things are still to be improved and I like to discuss how to tackle
> those and my ideas for that.
> 
> Keystone endpoints
> ===
> 
> Apparently since February-ish DevStack no longer sets up 'internal'
> endpoints for most of the services in core devstack [0].
> Luckily we were not broken by that right away - although when
> discovering a service endpoint from keystone catalog we default to
> 'internal' endpoint [1], for most services our devstack plugin still
> configures explicit service URL in the corresponding config section, and
> thus the service discovery from keystone never takes place (or that code
> path is not tested by functional/integration testing).
> 
> AFAIK different endpoint types (internal vs public) are still quite used
> by deployments (and IMO rightfully so), so we have to continue
> supporting that. I propose to take the following actions:
> 
> - in our devstack plugin, stop setting up the direct service URLs in
> config, always use keystone catalog for discovery
> - in every conf section related to external service add
> 'endpoint_type=[internal|public]' option, defaulting to 'internal', with
> a warning in option description (and validated on conductor start) that
> it will be changed to 'public' in the next release
> - use those values from CONF wherever we ask for service URL from
> catalog or instantiate client with session.
> - populate these options in our devstack plugin to be 'public'
> - in Queens, switch the default to 'public' and use defaults in devstack
> plugin, remove warnings.
> 
> Unify clients creation
> 
> 
> again, in those config sections related to service clients, we have many
> options to instantiate clients from (especially glance section, see my
> other recent ML about our image service code). Many of those seem to be
> from the time when keystone catalog was missing some functionality or
> not existing at all, and keystoneauth lib abstracting identity and
> client sessions was not there either.
> 
> To simplify setup and unify as much code as possible I'd like to propose
> the following:
> 
> - in each config section for service client add (if missing) a
> '_url' option that should point to the API of given service and
> will be used *only in noauth mode* when there's no Keystone catalog to
> discover the service endpoint from
> - in the code creating service clients, always create a keystoneauth
> session from config sections, using appropriate keystoneauth identity
> plugin - 'token_endpoint' with fake token _url for noauth mode,
> 'password' for service user client, 'token' when using a token from
> incoming request. The latter will have a benefit to make it possible for
> the session to reauth itself when user token is about to expire, but
> might require changes in some public methods to pass in the full
> task.context instead of just token
> - always create clients from sessions. Although AFAIK all clients ironic
> uses already support this, some in ironic code (e.g. glance) still
> always create a client from token and endpoint directly.
> - deprecate some options explicitly registered by ironic in those
> sections that are becoming redundant - including those that relate to
> HTTP session settings (like timeout, retries, SSL certs and settings) as
> those will be used from options registered by keystoneauth Session, and
> those multiple options that piece together a single service URL.
> 
> This will decrease the complexity of service client-related code and
> will make configuring those cleaner.
> 
> Of course all of this has to be done minding proper deprecation process,
> although that might complicate things (as usual :/).
> 
> Legacy auth
> =
> 
> Probably not worth specific mention, but we implemented a proper
> keystoneauth-based loading of client auth options back in Newton almost
> a year ago, so the code attempting to load auth for clients in a
> deprecated way from "[keystone_authtoken]" section can be safely remov

[openstack-dev] [nova][out-of-tree drivers] InstanceInfo/get_info getting a haircut

2017-06-06 Thread Eric Fried
If you don't maintain an out-of-tree nova compute driver, you can
probably hit Delete now.

A proposed change [1] gets rid of some unused fields from
nova.virt.hardware.InstanceInfo, which is the thing returned by
ComputeDriver.get_info().

I say "unused" in the context of the nova project.  If you have a
derived project that's affected by this, feel free to respond or reach
out to me (efried) on #openstack-nova to discuss.

This change is planned for Pike only.

[1] https://review.openstack.org/#/c/471146/

Thanks,
Eric
.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][scheduler][placement] Trying to understand the proposed direction

2017-06-20 Thread Eric Fried
Nice Stephen!

For those who aren't aware, the rendered version (pretty, so pretty) can
be accessed via the gate-nova-docs-ubuntu-xenial jenkins job:

http://docs-draft.openstack.org/10/475810/1/check/gate-nova-docs-ubuntu-xenial/25e5173//doc/build/html/scheduling.html?highlight=scheduling

On 06/20/2017 09:09 AM, sfinu...@redhat.com wrote:

> 
> I have a document (with a nifty activity diagram in tow) for all the above
> available here:
> 
>   https://review.openstack.org/475810 
> 
> Should be more Google'able that mailing list posts for future us :)
> 
> Stephen
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] PCI handling (including SR-IOV), Traits, Resource Providers and Placement API - how to proceed?

2017-06-26 Thread Eric Fried
Hi Maciej, thanks for bringing this up.

On 06/26/2017 04:59 AM, Maciej Kucia wrote:
> Hi,
> 
> I have recently spent some time digging in Nova PCI devices handling code.
> I would like to propose some improvements:
> https://review.openstack.org/#/c/474218/ (Extended PCI alias)
> https://review.openstack.org/#/q/status:open+project:openstack/nova+topic:PCI 
> 
> but
> 
> There is an ongoing work on Resource Providers, Traits and Placement:
> https://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/resource-providers.html
> https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/resource-provider-traits.html
> https://github.com/openstack/os-traits
> https://docs.openstack.org/developer/nova/placement.html
> 
> I am willing to contribute some work to the PCI handling in Queens. 
> Given the scope of changes a new spec will be needed.
> 
> The current PCI code has some issues that would be nice to fix. Most
> notably:
>  - Broken single responsibility principle 
>A lot of classes are doing more than the name would suggest
>  - Files and classes naming is not consistent
>  - Mixed SR-IOV and PCI code
>  - PCI Pools provide no real performance advantage and add unnecessary
> complexity

I would like to add for consideration the issue that the current
whitelist/allocaton model doesn't work at all for hypervisors like
HyperV and PowerVM that don't directly own/access the devices as Linux
/dev files; and (at least for PowerVM) where VFs can be created on the
fly.  I'm hoping the placement and resource provider work will result in
a world where a compute node can define different kinds of PCI devices
as resource classes against which resources with specific traits can be
claimed.  And hopefully the whitelist goes away (or I can "opt out" of
it) in the process.

> 
> My questions:
>  - I understand that Nova will remain handling low-level operations
> between OpenStack and hypervisor driver.
>Is this correct?
>  - Will the `placement service` take the responsibility of managing PCI
> devices?
>  - Shall the SR-IOV handling be done by Nova or `placement service` (in
> such case Nova would manage SR-IOV as a regular PCI)?
>  - Where to store PCI configuration?
>For example currently nova.conf PCI Whitelist is responsible for some
> SR-IOV configuration.
>Shall it be stored somewhere alongside `SR-IOV` resource provider?
> 
> Thanks,
> Maciej
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat][ironic][telemetry][dragonflow][freezer][kuryr][manila][mistral][monasca][neutron][ansible][congress][rally][senlin][storlets][zun][docs] repos without signs of migration sta

2017-07-11 Thread Eric Fried
> can't speak regarding ceilometer-powervm since it's vendor specific.

I can confirm that ceilometer-powervm (and nova-powervm and
networking-powervm) shouldn't be tracked with this effort, since they
publish to readthedocs and not docs.openstack.org.

Something something Big Tent something Governance something something
Official.

efried
.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [keystone] keystoneauth1 3.0.0 broken keystonemiddleware

2017-07-22 Thread Eric Fried
dims-

SHA was good; just hadn't merged yet.  It has now.  All greens.
Assuming lbragstad/morgan/mordred are on board, let's do it.

Thanks,
efried

On 07/22/2017 10:06 AM, Davanum Srinivas wrote:
> Lance,
> 
> Ack. SHA needs to be fixed (https://review.openstack.org/#/c/486279/)
> 
> Thanks,
> Dims
> 
> On Sat, Jul 22, 2017 at 10:24 AM, Lance Bragstad  wrote:
>> Thanks Dims,
>>
>> Looks like Morgan and Monty have it working through the gate now.
>>
>> On Sat, Jul 22, 2017 at 7:26 AM, Davanum Srinivas  wrote:
>>>
>>> Lance, other keystone cores,
>>>
>>> there's a request for 3.0.1, but one of the reviews that it needs is
>>> not merged yet
>>>
>>> https://review.openstack.org/#/c/486231/
>>>
>>>
>>> Thansk,
>>> Dims
>>>
>>> On Fri, Jul 21, 2017 at 11:40 PM, Lance Bragstad 
>>> wrote:


 On Fri, Jul 21, 2017 at 9:39 PM, Monty Taylor 
 wrote:
>
> On 07/22/2017 07:14 AM, Lance Bragstad wrote:
>>
>> After a little head scratching and a Pantera playlist later, we ended
>> up
>> figuring out the main causes. The failures can be found in the gate
>> [0].
>> The two failures are detailed below:
>>
>> 1.) Keystoneauth version 3.0.0 added a lot of functionality and might
>> return a different url depending on discovery. Keystonemiddleware use
>> to
>> be able to mock urls to keystone in this case because keystoneauth
>> didn't modify the url in between. Keystonemiddleware didn't know how
>> to
>> deal with the new url and the result was a Mock failure. This is
>> something that we can fix in keystonemiddleware once we have a version
>> of keystoneauth that covers all discovery cases and does the right
>> thing. NOTE: If you're mocking requests to keystone and using
>> keystoneauth somewhere in your project's tests, you'll have to deal
>> with
>> this. More on that below.
>
>
> Upon further digging - this one is actually quite a bit easier. There
> are
> cases where keystoneauth finds an unversioned discovery endpoint from a
> versioned endpoint in the catalog. It's done for quite a while, so the
> behavior isn't new. HOWEVER - a bug snuck in that caused the url it
> infers
> to come back without a trailing '/'. So the requests_mock entry in
> keystonemiddleware was for http://keystone.url/admin/ and keystoneauth
> was
> doing a get on http://keystone.url/admin.
>
> It's a behavior change and a bug, so we're working up a fix for it. The
> short story is though that once we fix it it should not cause anyone to
> need
> to update requests_mock entries.


 Ah - thanks for keeping me honest here. Good to know both issues will be
 fixed with the same patch. For context, this was the thought process as
 we
 worked through things earlier [0].

 I appreciate the follow-up!


 [0]

 http://eavesdrop.openstack.org/irclogs/%23openstack-keystone/%23openstack-keystone.2017-07-21.log.html#t2017-07-21T19:57:30

>
>
>> 2.) The other set of failures were because keystoneauth wasn't
>> expecting
>> a URL without a path [1], causing an index error. I tested the fix [2]
>> against keystonemiddleware and it seems to take care of the issue.
>> Eric
>> is working on a fix. Once that patch is fully tested and vetted we'll
>> roll another keystoneauth release (3.0.1) and use that to test
>> keystonemiddleware to handle the mocking issues described in #1. From
>> there we should be able to safely bump the minimum version to 3.0.1,
>> and
>> avoid 3.0.0 all together.
>
>
> Patch is up for this one, and we've confirmed it fixes this issue.
>
>> Let me know if you see anything else suspicious with respect to
>> keystoneauth. Thanks!
>>
>>
>> [0]
>>
>>
>> http://logs.openstack.org/84/486184/1/check/gate-keystonemiddleware-python27-ubuntu-xenial/7c079da/testr_results.html.gz
>> [1]
>>
>>
>> https://github.com/openstack/keystoneauth/blob/5715035f42780d8979d458e9f7e3c625962b2749/keystoneauth1/discover.py#L947
>> [2] https://review.openstack.org/#/c/486231/1
>>
>> On 07/21/2017 04:43 PM, Lance Bragstad wrote:
>>>
>>> The patch to blacklist version 3.0.0 is working through the moment
>>> [0].
>>> We also have a WIP patch proposed to handled the cases exposed by
>>> keystonemiddleware [1].
>>>
>>>
>>> [0] https://review.openstack.org/#/c/486223/
>>> [1] https://review.openstack.org/#/c/486231/
>>>
>>>
>>> On 07/21/2017 03:58 PM, Lance Bragstad wrote:

 We have a patch up to blacklist version 3.0.0 from
 global-requirements
 [0]. We're also going to hold bumping the minimum version of
 keystoneauth until we have things back to normal [1].


 [0] https://r

Re: [openstack-dev] [nova] Working toward Queens feature freeze and RC1

2018-01-04 Thread Eric Fried
Matt, et al-

> * Nested resource providers: I'm going to need someone closer to this
> work like Jay or Eric to provide an update on where things are at in the
> series of changes and what absolutely needs to get done. I have
> personally found it hard to track what the main focus items are for the
> nested resource providers / traits / granular resource provider request
> changes so I need someone to summarize and lay out the review goals for
> the next two weeks.


Overall goals for nested resource providers in Queens:
(A) Virt drivers should be able to start expressing resource inventory
as a hierarchy, including traits, and have that understood by the
resource tracker and scheduler.
(B) Ops should be able to create flavors requesting resources with
traits, including e.g. same-class resources with different traits.

Whereas many big pieces of the framework are merged:

- Placement-side API changes giving providers parents/roots, allowing
tree representation and querying.
- A rudimentary ProviderTree class on the compute side for
representation of tree structure and inventory; and basic usage thereof
by the report client.
- Traits affordance in the placement API.

...we're still missing the following pieces that actually enable those
goals:

- NRP affordance in GET /allocation_candidates
  . PATCHES: -
  . STATUS: Not proposed
  . PRIORITY: Critical
  . OWNER: jaypipes
  . DESCRIPTION: In the current master branch, the placement API will
report allocation candidates from [(a single non-sharing provider) and
(sharing providers associated via aggregate with that non-sharing
provider)].  It needs to be enhanced to report allocation candidates
from [(non-sharing providers in a tree) and (sharing providers
associated via aggregate with any of those non-sharing providers)].
This is critical for two reasons: 1) Without it, NRP doesn't provide any
interesting use cases; and 2) It is prerequisite to the remainder of the
Queens NRP work, listed below.
  . ACTION: Jay to sling some code

- Granular Resource Requests
  . PATCHES:
Placement side: https://review.openstack.org/#/c/517757/
Report client side: https://review.openstack.org/#/c/515811/
  . STATUS: WIP, blocked on the above
  . PRIORITY: High
  . OWNER: efried
  . DESCRIPTION: Ability to request separate groupings of resources from
GET /allocation_candidates via flavor extra specs.  The groundwork
(ability to parse flavors, construct querystrings, parse querystrings,
etc.) has already merged.  The remaining patches need to do the
appropriate join-fu in a new placement microversion; and flip the switch
to send flavor-parsed request groupings from report client.  The former
needs to be able to make use of NRP affordance in GET
/allocation_candidates, so is blocked on the above work item.  The
latter subsumes parsing of traits from flavors (the non-granular part of
which actually got a separate blueprint, request-traits-in-nova).
  . ACTION: Wait for the above

- ComputeDriver.update_provider_tree()
  . PATCHES: Series starting at https://review.openstack.org/#/c/521685/
  . STATUS: Bottom ready for core reviews; top WIP.
  . PRIORITY: ?
  . OWNER: efried
  . DESCRIPTION: This is the next phase in the evolution of compute
driver inventory reporting (get_available_resource => get_inventory =>
update_provider_tree).  The series includes a bunch of enabling
groundwork in SchedulerReportClient and ProviderTree.
  . ACTION: Reviews on the bottom (core reviewers); address
comments/issues in the middle (efried); finish WIPs on top (efried).
Also write up a mini-spec describing this piece in more detail (efried).

Thanks,
Eric (efried)
.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Working toward Queens feature freeze and RC1

2018-01-04 Thread Eric Fried
Folks-

>> - NRP affordance in GET /allocation_candidates
>>    . PATCHES: -
>>    . STATUS: Not proposed
>>    . PRIORITY: Critical
>>    . OWNER: jaypipes
>>    . DESCRIPTION: In the current master branch, the placement API will
>> report allocation candidates from [(a single non-sharing provider) and
>> (sharing providers associated via aggregate with that non-sharing
>> provider)].  It needs to be enhanced to report allocation candidates
>> from [(non-sharing providers in a tree) and (sharing providers
>> associated via aggregate with any of those non-sharing providers)].
>> This is critical for two reasons: 1) Without it, NRP doesn't provide any
>> interesting use cases; and 2) It is prerequisite to the remainder of the
>> Queens NRP work, listed below.
>>    . ACTION: Jay to sling some code
> 
> Just as an aside... while I'm currently starting this work, until the
> virt drivers and eventually the generic device manager or PCI device
> manager is populating parent/child information for resource providers,
> there's nothing that will be returned in the GET /allocation_candidates
> response w.r.t. nested providers.
> 
> So, yes, it's kind of a prerequisite, but until inventory records are
> being populated from the compute nodes, the allocation candidates work
> is going to be all academic/tests.
> 
> Best,
> -jay

Agree it's more of a tangled web than a linear sequence.  My thought was
that it doesn't make sense for virt drivers to expose their inventory in
tree form until it's going to afford them some benefit.

But to that point, I did forget to mention that Xen is trying to do just
that in Queens for VGPU support.  They already have a WIP [1] which
would consume the WIPs at the top of the
ComputeDriver.update_provider_tree() series [2].

[1] https://review.openstack.org/#/c/521041/
[2] https://review.openstack.org/#/c/521685/

I also don't necessarily agree that we need PCI manager changes or a
generic device manager for this to work.  As long as the virt driver
knows how to a) expose the resources in its provider tree, b) consume
the allocation candidate coming from the scheduler, and c) create/attach
resources based on that info, those other pieces would just get in the
way.  I'm hoping the Xen VGPU use case proves that.

E
.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ResMgmt SIG]Proposal to form Resource Management SIG

2018-01-08 Thread Eric Fried
> I think having a bi-weekly cross-project (or even cross-ecosystem if
> we're talking about OpenStack+k8s) status email reporting any big events
> in the resource tracking world would be useful. As far as regular
> meetings for a resource management SIG, I'm +0 on that. I prefer to have
> targeted topical meetings over regular meetings.

Agree with this.  That said, please include me (efried) in whatever
shakes out.

On 01/08/2018 02:12 PM, Jay Pipes wrote:
> On 01/08/2018 12:26 PM, Zhipeng Huang wrote:
>> Hi all,
>>
>> With the maturing of resource provider/placement feature landing in
>> OpenStack in recent release, and also in light of Kubernetes community
>> increasing attention to the similar effort, I want to propose to form
>> a Resource Management SIG as a contact point for OpenStack community
>> to communicate with Kubernetes Resource Management WG[0] and other
>> related SIGs.
>>
>> The formation of the SIG is to provide a gathering of similar
>> interested parties and establish an official channel. Currently we
>> have already OpenStack developers actively participating in kubernetes
>> discussion (e.g. [1]), we would hope the ResMgmt SIG could further
>> help such activities and better align the resource mgmt mechanism,
>> especially the data modeling between the two communities (or even more
>> communities with similar desire).
>>
>> I have floated the idea with Jay Pipes and Chris Dent and received
>> positive feedback. The SIG will have a co-lead structure so that
>> people could spearheading in the area they are most interested in. For
>> example for me as Cyborg dev, I will mostly lead in the area of
>> acceleration[2].
>>
>> If you are also interested please reply to this thread, and let's find
>> a efficient way to form this SIG. Efficient means no extra unnecessary
>> meetings and other undue burdens.
> 
> +1
> 
> From the Nova perspective, the scheduler meeting (which is Mondays at
> 1400 UTC) is the primary meeting where resource tracking and accounting
> issues are typically discussed.
> 
> Chris Dent has done a fabulous job recording progress on the resource
> providers and placement work over the last couple releases by issuing
> status emails to the openstack-dev@ mailing list each Friday.
> 
> I think having a bi-weekly cross-project (or even cross-ecosystem if
> we're talking about OpenStack+k8s) status email reporting any big events
> in the resource tracking world would be useful. As far as regular
> meetings for a resource management SIG, I'm +0 on that. I prefer to have
> targeted topical meetings over regular meetings.
> 
> Best,
> -jay
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] resource providers update 18-03

2018-01-19 Thread Eric Fried
> Earlier in the week I did some exercising by humans and was confused
> by the state of traits handling on /allocation_candidates (it could be
> the current state is the expected state but the code didn't make that
> clear) so I made a bug on it make sure that confusion didn't get forgotten:
> 
>     https://bugs.launchpad.net/nova/+bug/1743860

I can help with the confusion.  The current state is indeed expected (at
least by me).  There were some WIPs early in the cycle to get just the
?required= part of traits in place, BUT the granular resource requests
effort was a superset of that.  Granular was mostly finished even at
that time, but the final piece of the puzzle relies on code that's in
progress right now (NRP in allocation candidates) so has been on hold.
Whereas I hope it's still possible to tie all that off in Q, we're now
getting to a point where it's prudent to hedge our bets and make sure we
at least support traits on the single (un-numbered) request group.

TL;DR: Yes, let's move forward with Alex's patch:

> (Looks like Alex is working on the correct fix at
> 
>     https://review.openstack.org/#/c/535642/

...but also make sure we get lots of review focus on Jay's
NRP-in-alloc-cands series to give Granular a fighting chance.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [infra][all] New Zuul Depends-On syntax

2018-01-25 Thread Eric Fried
For my part, I tried it [1] and it doesn't seem to have worked.  (The
functional test failure is what the dep is supposed to have fixed.)  Did
I do something wrong?

[1] https://review.openstack.org/#/c/533821/12

On 01/25/2018 09:33 PM, Mathieu Gagné wrote:
> On Thu, Jan 25, 2018 at 7:08 PM, James E. Blair  wrote:
>> Mathieu Gagné  writes:
>>
>>> On Thu, Jan 25, 2018 at 3:55 PM, Ben Nemec  wrote:


 I'm curious what this means as far as best practices for inter-patch
 references.  In the past my understanding was the the change id was
 preferred, both because if gerrit changed its URL format the change id 
 links
 would be updated appropriately, and also because change ids can be looked 
 up
 offline in git commit messages.  Would that still be the case for 
 everything
 except depends-on now?
>>
>> Yes, that's a down-side of URLs.  I personally think it's fine to keep
>> using change-ids for anything other than Depends-On, though in many of
>> those cases the commit sha may work as well.
>>
>>> That's my concern too. Also AFAIK, Change-Id is branch agnostic. This
>>> means you can more easily cherry-pick between branches without having
>>> to change the URL to match the new branch for your dependencies.
>>
>> Yes, there is a positive and negative aspect to this issue.
>>
>> On the one hand, for those times where it was convenient to say "depend
>> on this change in all its forms across all branches of all projects",
>> one must now add a URL for each.
>>
>> On the other hand, with URLs, it is now possible to indicate that a
>> change specifically depends on another change targeted to one branch, or
>> targeted to several branches.  Simply list each URL (or don't) as
>> appropriate.  That wasn't possible before -- it wall all or none.
>>
>> -Jim
>>
> 
>> The old syntax will continue to work for a while
> 
> I still believe Change-Id should be supported and not removed as
> suggested. The use of URL assumes you have access to Gerrit to fetch
> more information about the change.
> This might not always be true or possible, especially when Gerrit is
> kept private and only the git repository is replicated publicly and
> you which to cherry-pick something (and its dependencies) from it.
> 
> --
> Mathieu
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][placement] Re: VMWare's resource pool / cluster and nested resource providers

2018-01-27 Thread Eric Fried
Rado-

    [+dev ML.  We're getting pretty general here; maybe others will get
some use out of this.]

> is there a way to make the scheduler allocate only from one specific RP

    "...one specific RP" - is that Resource Provider or Resource Pool?

    And are we talking about scheduling an instance to a specific
compute node, or are we talking about making sure that all the requested
resources are pulled from the same compute node (but it could be any one
of several compute nodes)?  Or justlimiting the scheduler to any node in
a specific resource pool?

    To make sure I'm fully grasping the VMWare-specific
ratios/relationships between resource pools and compute nodes,I have
been assuming:

controller 1:many compute "host"(where n-cpu runs)
compute "host"  1:many resource pool
resource pool 1:many compute "node" (where instances can be scheduled)
compute "node" 1:many instance

    (I don't know if this "host" vs"node" terminology is correct, but
I'm going to keep pretending it is for the purposes of this note.)

    In particular, if that last line is true, then you do *not* want
multiple compute "nodes" in the same provider tree.

> if no custom trait is specified in the request?

    I am not aware of anything current or planned that will allow you to
specify an aggregate you want to deploy from; so the only way I'm aware
of that you could pin a request to a resource pool is to create a custom
trait for that resource pool, tag all compute nodes in the pool with
that trait, and specify that trait in your flavor.  This way you don't
use nested-ness at all.  And in this model, there's also no need to
create resource providers corresponding to resource pools - their
solemanifestation is via traits.

    (Bonus: this model will work with what we've got merged in Queens -
we didn't quiiite finish the piece of NRP that makes them work for
allocation candidates, but we did merge trait support.  We're also
*mostly* there with aggregates, but I wouldn't want to rely on them
working perfectly and we're not claiming full support for them.)

    To be explicit, in the model I'm suggesting, your compute "host",
within update_provider_tree, would create new_root()s for each compute
"node".  So the "tree" isn't really a tree - it's a flat list of
computes, of which one happens to correspond to the `nodename` and
represents the compute "host".  (I assume deploys can happen to the
compute "host" just like they can to a compute "node"?  If not, just
give that guy no inventory and he'll be avoided.)  It would then
update_traits(node, ['CUSTOM_RPOOL_X']) for each.  It would also
update_inventory() for each as appropriate.

    Now on your deploys, to get scheduled to a particular resource pool,
you would have to specify required=CUSTOM_RPOOL_X in your flavor.

    That's it.  You never use new_child().  There are no providers
corresponding to pools.  There are no aggregates.

    Are we making progress, or am I confused/confusing?

Eric


On 01/27/2018 01:50 AM, Radoslav Gerganov wrote:
>
> +Chris
>
>
> Hi Eric,
>
> Thanks a lot for sending this.  I must admit that I am still trying to
> catch up with how the scheduler (will) work when there are nested RPs,
> traits, etc.  I thought mostly about the case when we use a custom
> trait to force allocations only from one resource pool.  However, if
> no trait is specified then we can end up in the situation that you
> describe (allocating different resources from different resource
> pools) and this is not what we want.  If we go with the model that you
> propose, is there a way to make the scheduler allocate only from one
> specific RP if no custom trait is specified in the request?
>
> Thanks,
>
> Rado
>
>
> 
> *From:* Eric Fried 
> *Sent:* Friday, January 26, 2018 10:20 PM
> *To:* Radoslav Gerganov
> *Cc:* Jay Pipes
> *Subject:* VMWare's resource pool / cluster and nested resource providers
>  
> Rado-
>
>     It occurred to me just now that the model you described to me
> [1] isn't
> going to work, unless there's something I really misunderstood.
>
>     The problem is that the placement API will think it can allocate
> resources from anywhere in the tree for a given allocation request
> (unless you always use a single numbered request group [2] in your
> flavors, which doesn't sound like a clean plan).
>
>     So if you have *any* model where multiple compute nodes reside
> in the
> same provider tree, and I come along with a request for say
> VCPU:1,MEMORY_MB:2048,DISK_GB:512, placement will happily g

Re: [openstack-dev] [nova][placement] Re: VMWare's resource pool / cluster and nested resource providers

2018-01-29 Thread Eric Fried
We had some lively discussion in #openstack-nova today, which I'll try
to summarize here.

First of all, the hierarchy:

   controller (n-cond)
/   \
 cluster/n-cpu cluster/n-cpu
 /   \   / \
 res. poolres. pool ......
/ \   /\
 host   host ...   ...
 /  \  /\
... ...  inst  inst

Important points:

(1) Instances do indeed get deployed to individual hosts, BUT vCenter
can and does move them around within a cluster independent of nova-isms
like live migration.

(2) VMWare wants the ability to specify that an instance should be
deployed to a specific resource pool.

(3) VMWare accounts for resources at the level of the resource pool (not
host).

(4) Hosts can move fluidly among resource pools.

(5) Conceptually, VMWare would like you not to see or think about the
'host' layer at all.

(6) It has been suggested that resource pools may be best represented
via aggregates.  But to satisfy (2), this would require support for
doing allocation requests that specify one (e.g. porting the GET
/resource_providers ?member_of= queryparam to GET
/allocation_candidates, and the corresponding flavor enhancements).  And
doing so would mean getting past our reluctance up to this point of
exposing aggregates by name/ID to users.

Here are some possible models:

(A) Today's model, where the cluster/n-cpu is represented as a single
provider owning all resources.  This requires some creative finagling of
inventory fields to ensure that a resource request might actually be
satisfied by a single host under this broad umbrella.  (An example cited
was to set VCPU's max_unit to whatever one host could provide.)  It is
not clear to me if/how resource pools have been represented in this
model thus far, or if/how it is currently possible to (2) target an
instance to a specific one.  I also don't see how anything we've done
with traits or aggregates would help with that aspect in this model.

(B) Representing each host as a root provider, each owning its own
actual inventory, each possessing a CUSTOM_RESOURCE_POOL_X trait
indicating which pool it belongs to at the moment; or representing pools
via aggregates as in (6).  This model breaks because of (1), unless we
give virt drivers some mechanism to modify allocations (e.g. via POST
/allocations) without doing an actual migration.

(C) Representing each resource pool as a root provider which presents
the collective inventory of all its hosts.  Each could possess its own
unique CUSTOM_RESOURCE_POOL_X trait.  Or we could possibly adapt
whatever mechanism Ironic uses when it targets a particular baremetal
node.  Or we could use aggregates as in (6), where each aggregate is
associated with just one provider.  This one breaks down because we
don't currently have a way for nova to know that, when an instance's
resources were allocated from the provider corresponding to resource
pool X, that means we should schedule the instance to (nova, n-cpu) host
Y.  There may be some clever solution for this involving aggregates (NOT
sharing providers!), but it has not been thought through.  It also
entails the same "creative finagling of inventory" described in (A).

(D) Using actual nested resource providers: the "cluster" is the
(inventory-less) root provider, and each resource pool is a child of the
cluster.  This is closest to representing the real logical hierarchy,
and is desirable for that reason.  The drawback is that you then MUST
use some mechanism to ensure allocations are never spread across pools.
If your request *always* targets a specific resource pool, that works.
Otherwise, you would have to use a numbered request group, as described
below.  It also entails the same "creative finagling of inventory"
described in (A).

(E) Take (D) a step further by adding each 'host' as a child of its
respective resource pool.  No "creative finagling", but same "moving
allocations" issue as (B).

I'm sure I've missed/misrepresented things.  Please correct and refine
as necessary.

Thanks,
Eric

On 01/27/2018 12:23 PM, Eric Fried wrote:
> Rado-
> 
>     [+dev ML.  We're getting pretty general here; maybe others will get
> some use out of this.]
> 
>> is there a way to make the scheduler allocate only from one specific RP
> 
>     "...one specific RP" - is that Resource Provider or Resource Pool?
> 
>     And are we talking about scheduling an instance to a specific
> compute node, or are we talking about making sure that all the requested
> resources are pulled from the same compute node (but it could be any one
> of several compute nodes)?  Or justlimiting the scheduler to any node in
> a specific resource pool?
> 
>     To make sure I'm fully grasping the VMWare-specific
> ratios/relationships b

Re: [openstack-dev] [nova] Should we get auth from context for Neutron endpoint?

2018-02-06 Thread Eric Fried
Zheng 先生-

I *think* you're right that 'network' should be included in [2].  I
can't think of any reason it shouldn't be.  Does that fix the problem by
itself?

I believe the Neutron API code is already getting its auth from
context... sometimes [5].  If you want to make sure it's an admin token,
add admin=True here [6] - but that may have further-reaching implications.

[5]
https://github.com/openstack/nova/blob/9519601401ee116a9197fe3b5d571495a96912e9/nova/network/neutronv2/api.py#L155
[6]
https://github.com/openstack/nova/blob/9519601401ee116a9197fe3b5d571495a96912e9/nova/network/neutronv2/api.py#L1190

Good luck.

efried

On 02/06/2018 04:48 AM, Zhenyu Zheng wrote:
> Hi Nova,
> 
> While doing some test with my newly deployed devstack env today, it
> turns out that the default devstack deployment cannot cleanup networks
> after the retry attempt exceeded. This is because in the deployment with
> super-conductor and cell-conductor, the retry and cleanup logic is in
> cell-conductor [1], and by default the devstack didn't put Neutron
> endpoint info in nova_cell1.conf. And as the neutron endpoint is also
> not included in the context [2], so we can't find Neutron endpoint when
> try to cleanup network [3].
> 
> The solution is simple though, ether add Neutron endpoint info in
> nova_cell1.conf in devstack or change Nova code to support get auth from
> context. I think the latter one is better as in real deployment there
> could be many cells and by doing this can ignore config it all the time.
> 
> Any particular consideration that Neutron is not included in [2]?
> 
> Suggestions on how this should be fixed?
> 
> I also registered a devstack bug to fix it in devstack [4].
> 
> [1] 
> https://github.com/openstack/nova/blob/bccf26c93a973d000e4339843ce9256814286d10/nova/conductor/manager.py#L604
> [2] 
> https://github.com/openstack/nova/blob/9519601401ee116a9197fe3b5d571495a96912e9/nova/context.py#L121
> [3] https://bugs.launchpad.net/nova/+bug/1747600
> [4] https://bugs.launchpad.net/devstack/+bug/1747598
> 
> BR,
> 
> Kevin Zheng
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cyborg]Dublin PTG Cyborg Nova Interaction Discussion

2018-02-12 Thread Eric Fried
I'm interested.  No date/time preference so far as long as it sticks to
Monday/Tuesday.

efried

On 02/12/2018 09:13 AM, Zhipeng Huang wrote:
> Hi Nova team,
> 
> Cyborg will have ptg sessions on Mon and Tue from 2:00pm to 6:00pm, and
> we would love to invite any of you guys who is interested in nova-cyborg
> interaction to join the discussion. The discussion will mainly focus on:
> 
> (1) Cyborg team recap on the resource provider features that are
> implemented in Queens.
> (2) Joint discussion on what will be the impact on Nova side and future
> collaboration areas.
> 
> The session is planned for 40 mins long.
> 
> If you are interested plz feedback which date best suit for your
> arrangement so that we could arrange the topic accordingly :)
> 
> Thank you very much.
> 
> 
> 
> -- 
> Zhipeng (Howard) Huang
> 
> Standard Engineer
> IT Standard & Patent/IT Product Line
> Huawei Technologies Co,. Ltd
> Email: huangzhip...@huawei.com 
> Office: Huawei Industrial Base, Longgang, Shenzhen
> 
> (Previous)
> Research Assistant
> Mobile Ad-Hoc Network Lab, Calit2
> University of California, Irvine
> Email: zhipe...@uci.edu 
> Office: Calit2 Building Room 2402
> 
> OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][placement] update_provider_tree design updates

2018-03-15 Thread Eric Fried
One of the takeaways from the Queens retrospective [1] was that we
should be summarizing discussions that happen in person/hangout/IRC/etc.
to the appropriate mailing list for the benefit of those who weren't
present (or paying attention :P ).  This is such a summary.

As originally conceived, ComputeDriver.update_provider_tree was intended
to be the sole source of truth for traits and aggregates on resource
providers under its purview.

Then came the idea of reflecting compute driver capabilities as traits
[2], which would be done outside of update_provider_tree, but still
within the bounds of nova compute.

Then Friday discussions at the PTG [3] brought to light the fact that we
need to honor traits set by outside agents (operators, other services
like neutron, etc.), effectively merging those with whatever the virt
driver sets.  Concerns were raised about how to reconcile overlaps, and
in particular how compute (via update_provider_tree or otherwise) can
know if a trait is safe to *remove*.  At the PTG, we agreed we need to
do this, but deferred the details.

...which we discussed earlier this week in IRC [4][5].  We concluded:

- Compute is the source of truth for any and all traits it could ever
assign, which will be a subset of what's in os-traits, plus whatever
CUSTOM_ traits it stakes a claim to.  If an outside agent sets a trait
that's in that list, compute can legitimately remove it.  If an outside
agent removes a trait that's in that list, compute can reassert it.
- Anything outside of that list of compute-owned traits is fair game for
outside agents to set/unset.  Compute won't mess with those, ever.
- Compute (and update_provider_tree) will therefore need to know what
that list comprises.  Furthermore, it must take care to use merging
logic such that it only sets/unsets traits it "owns".
- To facilitate this on the compute side, ProviderTree will get new
methods to add/remove provider traits.  (Technically, it could all be
done via update_traits [6], which replaces the entire set of traits on a
provider, but then every update_provider_tree implementation would have
to write the same kind of merging logic.)
- For operators, we'll need OSC affordance for setting/unsetting
provider traits.

And finally:
- Everything above *also* applies to provider aggregates.  NB: Here
there be tygers.  Unlike traits, the comprehensive list of which can
conceivably be known a priori (even including CUSTOM_*s), aggregate
UUIDs are by their nature unique and likely generated dynamically.
Knowing that you "own" an aggregate UUID is relatively straightforward
when you need to set it; but to know you can/must unset it, you need to
have kept a record of having set it in the first place.  A record that
persists e.g. across compute service restarts.  Can/should virt drivers
write a file?  If so, we better make sure it works across upgrades.  And
so on.  Ugh.  For the time being, we're kinda punting on this issue
until it actually becomes a problem IRL.

And now for the moment you've all been awaiting with bated breath:
- Delta [7] to the update_provider_tree spec [8].
- Patch for ProviderTree methods to add/remove traits/aggregates [9].
- Patch modifying the update_provider_tree docstring, and adding devref
content for update_provider_tree [10].

Please feel free to email or reach out in #openstack-nova if you have
any questions.

Thanks,
efried

[1] https://etherpad.openstack.org/p/nova-queens-retrospective (L122 as
of this writing)
[2] https://review.openstack.org/#/c/538498/
[3] https://etherpad.openstack.org/p/nova-ptg-rocky (L496-502 aotw)
[4]
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-12.log.html#t2018-03-12T16:02:08
[5]
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-12.log.html#t2018-03-12T19:20:23
[6]
https://github.com/openstack/nova/blob/5f38500df6a8e1665b968c3e98b804e0fdfefc63/nova/compute/provider_tree.py#L494
[7] https://review.openstack.org/552122
[8]
http://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/update-provider-tree.html
[9] https://review.openstack.org/553475
[10] https://review.openstack.org/553476


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] update_provider_tree design updates

2018-03-15 Thread Eric Fried
Excellent and astute questions, both of which came up in the discussion,
but I neglected to mention.  (I had to miss *something*, right?)

See inline.

On 03/15/2018 02:29 PM, Chris Dent wrote:
> On Thu, 15 Mar 2018, Eric Fried wrote:
> 
>> One of the takeaways from the Queens retrospective [1] was that we
>> should be summarizing discussions that happen in person/hangout/IRC/etc.
>> to the appropriate mailing list for the benefit of those who weren't
>> present (or paying attention :P ).  This is such a summary.
> 
> Thank you _very_ much for doing this. I've got two questions within.
> 
>> ...which we discussed earlier this week in IRC [4][5].  We concluded:
>>
>> - Compute is the source of truth for any and all traits it could ever
>> assign, which will be a subset of what's in os-traits, plus whatever
>> CUSTOM_ traits it stakes a claim to.  If an outside agent sets a trait
>> that's in that list, compute can legitimately remove it.  If an outside
>> agent removes a trait that's in that list, compute can reassert it.
> 
> Where does that list come from? Or more directly how does Compute
> stake the claim for "mine"?

One piece of the list should come from the traits associated with the
compute driver capabilities [2].  Likewise anything else in the future
that's within compute but outside of virt.  In other words, we're
declaring that it doesn't make sense for an operator to e.g. set the
"has_imagecache" trait on a compute if the compute doesn't do that
itself.  The message being that you can't turn on a capability by
setting a trait.

Beyond that, each virt driver is going to be responsible for figuring
out its own list.  Thinking this through with my PowerVM hat on, it
won't actually be as hard as it initially sounded - though it will
require more careful accounting.  Essentially, the driver is going to
ask the platform questions and get responses in its own language; then
map those responses to trait names.  So we'll be writing blocks like:

 if sys_caps.can_modify_io:
 provider_tree.add_trait(nodename, "CUSTOM_LIVE_RESIZE_CAPABLE")
 else:
 provider_tree.remove_trait(nodename, "CUSTOM_LIVE_RESIZE_CAPABLE")

And, for some subset of the "owned" traits, we should be able to
maintain a dict such that this works:

 for feature in trait_map.values():
 if feature in sys_features:
 provider_tree.add_trait(nodename, trait_map[feature])
 else:
 provider_tree.remove_trait(nodename, trait_map[feature])

BUT what about *dynamic* features?  If I have code like (don't kill me):

 vendor_id_trait = 'CUSTOM_DEV_VENDORID_' + slugify(io_device.vendor_id)
 provider_tree.add_trait(io_dev_rp, vendor_id_trait)

...then there's no way I can know ahead of time what all those might be.
 (In particular, if I want to support new devices without updating my
code.)  I.e. I *can't* write the corresponding
provider_tree.remove_trait(...) condition.  Maybe that never becomes a
real problem because we'll never need to remove a dynamic trait.  Or
maybe we can tolerate "leakage".  Or maybe we do something
clever-but-ugly with namespacing (if
trait.startswith('CUSTOM_DEV_VENDORID_')...).  We're consciously kicking
this can down the road.

And note that this "dynamic" problem is likely to be a much larger
portion (possibly all) of the domain when we're talking about aggregates.

Then there's ironic, which is currently set up to get its traits blindly
from Inspector.  So Inspector not only needs to maintain the "owned
traits" list (with all the same difficulties as above), but it must also
either a) communicate that list to ironic virt so the latter can manage
the add/remove logic; or b) own the add/remove logic and communicate the
individual traits with a +/- on them so virt knows whether to add or
remove them.

> How does an outside agent know what Compute has claimed? Presumably
> they want to know that so they can avoid wastefully doing something
> that's going to get clobbered?

Yup [11].  It was deemed that we don't need an API/CLI to discover those
lists (assuming that would even be possible).  The reasoning was
two-pronged:
- We'll document that there are traits "owned" by nova and attempts to
set/unset them will be frustrated.  You can't find out which ones they
are except when a manually-set/-unset trait magically dis-/re-appears.
- It probably won't be an issue because outside agents will be setting
traits based on some specific thing they want to do, and the
documentation for that thing will specify traits that are known not to
interfere with those in nova's wheelhouse.

> [2] https://review.openstack.org/#/c/538498/
[11]
http://eavesdrop.openstack.org/ir

Re: [openstack-dev] [nova] Rocky spec review day

2018-03-21 Thread Eric Fried
+1 for the-earlier-the-better, for the additional reason that, if we
don't finish, we can do another one in time for spec freeze.

And I, for one, wouldn't be offended if we could "officially start
development" (i.e. focus on patches, start runways, etc.) before the
mystical but arbitrary spec freeze date.

On 03/20/2018 07:29 PM, Matt Riedemann wrote:
> On 3/20/2018 6:47 PM, melanie witt wrote:
>> I was thinking that 2-3 weeks ahead of spec freeze would be
>> appropriate, so that would be March 27 (next week) or April 3 if we do
>> it on a Tuesday.
> 
> It's spring break here on April 3 so I'll be listening to screaming
> kids, I mean on vacation. Not that my schedule matters, just FYI.
> 
> But regardless of that, I think the earlier the better to flush out
> what's already there, since we've already approved quite a few
> blueprints this cycle (32 to so far).
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Review runways this cycle

2018-03-22 Thread Eric Fried

> I think the only concern around moving spec freeze out would be that I
> thought the original purpose of the spec freeze was to set expectations
> early about what was approved and not approved instead of having folks
> potentially in the situation where it's technically "maybe" for a large
> chunk of the cycle. I'm not sure which most people prefer -- would you
> rather know early and definitively whether your blueprint is
> approved/not approved or would you rather have the opportunity to get
> approval during a larger window in the cycle and not know definitively
> early on? Can anyone else chime in here?

This is a fair point.

Putting specs into runways doesn't imply (re)moving spec freeze IMO.
It's just a way to get us using runways RIGHT NOW, so that folks with
ready specs can get reviewed sooner, know whether they're approved
sooner, write their code sooner, and get their *code* into an earlier
runway.

A spec in a runway would be treated like anything else: reviewers focus
on it and the author needs to be available to respond quickly to feedback.

I would expect the ratio of specs:code in runways to start off high and
dwindle rapidly as we approach spec freeze.

It's worth pointing out that there's not an expectation for people to
work more/harder when runways are in play.  Just that it increases the
chances of more people looking at the same things at the same time; and
allows us to bring focus to things that might otherwise languish in
ignoreland.

efried

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Review runways this cycle

2018-03-22 Thread Eric Fried
WFM.

November Oscar Victor Alpha you are cleared for takeoff.

On 03/22/2018 03:18 PM, Matt Riedemann wrote:
> On 3/22/2018 2:59 PM, melanie witt wrote:
>> Maybe a good compromise would be to start runways now and move spec
>> freeze out to r-2 (Jun 7). That way we have less pressure on spec
>> review earlier on, more time to review the current queue of approved
>> implementations via runways, and a chance to approve more specs along
>> the way if we find we're flushing the queue down enough.
> 
> This is what I'd prefer to see.
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow

2018-03-23 Thread Eric Fried
Sundar-

First thought is to simplify by NOT keeping inventory information in
the cyborg db at all.  The provider record in the placement service
already knows the device (the provider ID, which you can look up in the
cyborg db) the host (the root_provider_uuid of the provider representing
the device) and the inventory, and (I hope) you'll be augmenting it with
traits indicating what functions it's capable of.  That way, you'll
always get allocation candidates with devices that *can* load the
desired function; now you just have to engage your weigher to prioritize
the ones that already have it loaded so you can prefer those.

Am I missing something?

efried

On 03/22/2018 11:27 PM, Nadathur, Sundar wrote:
> Hi all,
>     There seems to be a possibility of a race condition in the
> Cyborg/Nova flow. Apologies for missing this earlier. (You can refer to
> the proposed Cyborg/Nova spec
> 
> for details.)
> 
> Consider the scenario where the flavor specifies a resource class for a
> device type, and also specifies a function (e.g. encrypt) in the extra
> specs. The Nova scheduler would only track the device type as a
> resource, and Cyborg needs to track the availability of functions.
> Further, to keep it simple, say all the functions exist all the time (no
> reprogramming involved).
> 
> To recap, here is the scheduler flow for this case:
> 
>   * A request spec with a flavor comes to Nova conductor/scheduler. The
> flavor has a device type as a resource class, and a function in the
> extra specs.
>   * Placement API returns the list of RPs (compute nodes) which contain
> the requested device types (but not necessarily the function).
>   * Cyborg will provide a custom filter which queries Cyborg DB. This
> needs to check which hosts contain the needed function, and filter
> out the rest.
>   * The scheduler selects one node from the filtered list, and the
> request goes to the compute node.
> 
> For the filter to work, the Cyborg DB needs to maintain a table with
> triples of (host, function type, #free units). The filter checks if a
> given host has one or more free units of the requested function type.
> But, to keep the # free units up to date, Cyborg on the selected compute
> node needs to notify the Cyborg API to decrement the #free units when an
> instance is spawned, and to increment them when resources are released.
> 
> Therein lies the catch: this loop from the compute node to controller is
> susceptible to race conditions. For example, if two simultaneous
> requests each ask for function A, and there is only one unit of that
> available, the Cyborg filter will approve both, both may land on the
> same host, and one will fail. This is because Cyborg on the controller
> does not decrement resource usage due to one request before processing
> the next request.
> 
> This is similar to this previous Nova scheduling issue
> .
> That was solved by having the scheduler claim a resource in Placement
> for the selected node. I don't see an analog for Cyborg, since it would
> not know which node is selected.
> 
> Thanks in advance for suggestions and solutions.
> 
> Regards,
> Sundar
> 
> 
> 
> 
> 
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][placement] Upgrade placement first!

2018-03-26 Thread Eric Fried
Since forever [0], nova has gently recommended [1] that the placement
service be upgraded first.

However, we've not made any serious effort to test scenarios where this
isn't done.  For example, we don't have grenade tests running placement
at earlier levels.

After a(nother) discussion [2] which touched on the impacts - real and
imagined - of running new nova against old placement, we finally decided
to turn the recommendation into a hard requirement [3].

This gives admins a crystal clear guideline, this lets us simplify our
support statement, and also means we don't have to do 406 fallback code
anymore.  So we can do stuff like [4], and also avoid having to write
(and subsequently remove) code like that in the future.

Please direct any questions to #openstack-nova

Your Faithful Scribe,
efried

[0] Like, since upgrading placement was a thing.
[1]
https://docs.openstack.org/nova/latest/user/upgrade.html#rolling-upgrade-process
(#2, first bullet)
[2]
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-26.log.html#t2018-03-26T17:35:11
[3] https://review.openstack.org/556631
[4] https://review.openstack.org/556633

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow

2018-03-28 Thread Eric Fried
Sundar-

We're running across this issue in several places right now.   One
thing that's definitely not going to get traction is
automatically/implicitly tweaking inventory in one resource class when
an allocation is made on a different resource class (whether in the same
or different RPs).

Slightly less of a nonstarter, but still likely to get significant
push-back, is the idea of tweaking traits on the fly.  For example, your
vGPU case might be modeled as:

PGPU_RP: {
  inventory: {
  CUSTOM_VGPU_TYPE_A: 2,
  CUSTOM_VGPU_TYPE_B: 4,
  }
  traits: [
  CUSTOM_VGPU_TYPE_A_CAPABLE,
  CUSTOM_VGPU_TYPE_B_CAPABLE,
  ]
}

The request would come in for
resources=CUSTOM_VGPU_TYPE_A:1&required=VGPU_TYPE_A_CAPABLE, resulting
in an allocation of CUSTOM_VGPU_TYPE_A:1.  Now while you're processing
that, you would *remove* CUSTOM_VGPU_TYPE_B_CAPABLE from the PGPU_RP.
So it doesn't matter that there's still inventory of
CUSTOM_VGPU_TYPE_B:4, because a request including
required=CUSTOM_VGPU_TYPE_B_CAPABLE won't be satisfied by this RP.
There's of course a window between when the initial allocation is made
and when you tweak the trait list.  In that case you'll just have to
fail the loser.  This would be like any other failure in e.g. the spawn
process; it would bubble up, the allocation would be removed; retries
might happen or whatever.

Like I said, you're likely to get a lot of resistance to this idea as
well.  (Though TBH, I'm not sure how we can stop you beyond -1'ing your
patches; there's nothing about placement that disallows it.)

The simple-but-inefficient solution is simply that we'd still be able
to make allocations for vGPU type B, but you would have to fail right
away when it came down to cyborg to attach the resource.  Which is code
you pretty much have to write anyway.  It's an improvement if cyborg
gets to be involved in the post-get-allocation-candidates
weighing/filtering step, because you can do that check at that point to
help filter out the candidates that would fail.  Of course there's still
a race condition there, but it's no different than for any other resource.

efried

On 03/28/2018 12:27 PM, Nadathur, Sundar wrote:
> Hi Eric and all,
>     I should have clarified that this race condition happens only for
> the case of devices with multiple functions. There is a prior thread
> <http://lists.openstack.org/pipermail/openstack-dev/2018-March/127882.html>
> about it. I was trying to get a solution within Cyborg, but that faces
> this race condition as well.
> 
> IIUC, this situation is somewhat similar to the issue with vGPU types
> <http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-27.log.html#t2018-03-27T13:41:00>
> (thanks to Alex Xu for pointing this out). In the latter case, we could
> start with an inventory of (vgpu-type-a: 2; vgpu-type-b: 4).  But, after
> consuming a unit of  vGPU-type-a, ideally the inventory should change
> to: (vgpu-type-a: 1; vgpu-type-b: 0). With multi-function accelerators,
> we start with an RP inventory of (region-type-A: 1, function-X: 4). But,
> after consuming a unit of that function, ideally the inventory should
> change to: (region-type-A: 0, function-X: 3).
> 
> I understand that this approach is controversial :) Also, one difference
> from the vGPU case is that the number and count of vGPU types is static,
> whereas with FPGAs, one could reprogram it to result in more or fewer
> functions. That said, we could hopefully keep this analogy in mind for
> future discussions.
> 
> We probably will not support multi-function accelerators in Rocky. This
> discussion is for the longer term.
> 
> Regards,
> Sundar
> 
> On 3/23/2018 12:44 PM, Eric Fried wrote:
>> Sundar-
>>
>>  First thought is to simplify by NOT keeping inventory information in
>> the cyborg db at all.  The provider record in the placement service
>> already knows the device (the provider ID, which you can look up in the
>> cyborg db) the host (the root_provider_uuid of the provider representing
>> the device) and the inventory, and (I hope) you'll be augmenting it with
>> traits indicating what functions it's capable of.  That way, you'll
>> always get allocation candidates with devices that *can* load the
>> desired function; now you just have to engage your weigher to prioritize
>> the ones that already have it loaded so you can prefer those.
>>
>>  Am I missing something?
>>
>>  efried
>>
>> On 03/22/2018 11:27 PM, Nadathur, Sundar wrote:
>>> Hi all,
>>>     There seems to be a possibility of a race condition in the
>>> Cyborg/Nova flow. Apologies for missing this

Re: [openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow

2018-03-29 Thread Eric Fried
Sundar-

To be clear, *all* of the solutions will have race conditions.  There's
no getting around the fact that we need to account for situations where
an allocation is made, but then can't be satisfied by cyborg (or
neutron, or nova, or cinder, or whoever).  That failure has to bubble up
and cause retry or failure of the overarching flow.

The objection to "dynamic trait setting" is that traits are intended to
indicate characteristics, not states.

https://www.google.com/search?q=estar+vs+ser

I'll have to let Jay or Dan explain it further.  Because TBH, I don't
see the harm in mucking with traits/inventories dynamically.

The solutions I discussed here are if it's critical that everything be
dynamic and ultimately flexible.  Alex brings up a different option in
another subthread which is more likely how we're going to handle this
for our Nova scenarios in Rocky.  I'll comment further in that subthread.

-efried

On 03/28/2018 06:03 PM, Nadathur, Sundar wrote:
> Thanks, Eric. Looks like there are no good solutions even as candidates,
> but only options with varying levels of unacceptability. It is funny
> that that the option that is considered the least unacceptable is to let
> the problem happen and then fail the request (last one in your list).
> 
> Could I ask what is the objection to the scheme that applies multiple
> traits and removes one as needed, apart from the fact that it has races?
> 
> Regards,
> Sundar
> 
> On 3/28/2018 11:48 AM, Eric Fried wrote:
>> Sundar-
>>
>> We're running across this issue in several places right now.   One
>> thing that's definitely not going to get traction is
>> automatically/implicitly tweaking inventory in one resource class when
>> an allocation is made on a different resource class (whether in the same
>> or different RPs).
>>
>> Slightly less of a nonstarter, but still likely to get significant
>> push-back, is the idea of tweaking traits on the fly.  For example, your
>> vGPU case might be modeled as:
>>
>> PGPU_RP: {
>>    inventory: {
>>    CUSTOM_VGPU_TYPE_A: 2,
>>    CUSTOM_VGPU_TYPE_B: 4,
>>    }
>>    traits: [
>>    CUSTOM_VGPU_TYPE_A_CAPABLE,
>>    CUSTOM_VGPU_TYPE_B_CAPABLE,
>>    ]
>> }
>>
>> The request would come in for
>> resources=CUSTOM_VGPU_TYPE_A:1&required=VGPU_TYPE_A_CAPABLE, resulting
>> in an allocation of CUSTOM_VGPU_TYPE_A:1.  Now while you're processing
>> that, you would *remove* CUSTOM_VGPU_TYPE_B_CAPABLE from the PGPU_RP.
>> So it doesn't matter that there's still inventory of
>> CUSTOM_VGPU_TYPE_B:4, because a request including
>> required=CUSTOM_VGPU_TYPE_B_CAPABLE won't be satisfied by this RP.
>> There's of course a window between when the initial allocation is made
>> and when you tweak the trait list.  In that case you'll just have to
>> fail the loser.  This would be like any other failure in e.g. the spawn
>> process; it would bubble up, the allocation would be removed; retries
>> might happen or whatever.
>>
>> Like I said, you're likely to get a lot of resistance to this idea as
>> well.  (Though TBH, I'm not sure how we can stop you beyond -1'ing your
>> patches; there's nothing about placement that disallows it.)
>>
>> The simple-but-inefficient solution is simply that we'd still be able
>> to make allocations for vGPU type B, but you would have to fail right
>> away when it came down to cyborg to attach the resource.  Which is code
>> you pretty much have to write anyway.  It's an improvement if cyborg
>> gets to be involved in the post-get-allocation-candidates
>> weighing/filtering step, because you can do that check at that point to
>> help filter out the candidates that would fail.  Of course there's still
>> a race condition there, but it's no different than for any other
>> resource.
>>
>> efried
>>
>> On 03/28/2018 12:27 PM, Nadathur, Sundar wrote:
>>> Hi Eric and all,
>>>  I should have clarified that this race condition happens only for
>>> the case of devices with multiple functions. There is a prior thread
>>> <http://lists.openstack.org/pipermail/openstack-dev/2018-March/127882.html>
>>>
>>> about it. I was trying to get a solution within Cyborg, but that faces
>>> this race condition as well.
>>>
>>> IIUC, this situation is somewhat similar to the issue with vGPU types
>>> <http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-27

Re: [openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow

2018-03-29 Thread Eric Fried
We discussed this on IRC [1], hangout, and etherpad [2].  Here is the
summary, which we mostly seem to agree on:

There are two different classes of device we're talking about
modeling/managing.  (We don't know the real nomenclature, so forgive
errors in that regard.)

==> Fully dynamic: You can program one region with one function, and
then still program a different region with a different function, etc.

==> Single program: Once you program the card with a function, *all* its
virtual slots are *only* capable of that function until the card is
reprogrammed.  And while any slot is in use, you can't reprogram.  This
is Sundar's FPGA use case.  It is also Sylvain's VGPU use case.

The "fully dynamic" case is straightforward (in the sense of being what
placement was architected to handle).
* Model the PF/region as a resource provider.
* The RP has inventory of some generic resource class (e.g. "VGPU",
"SRIOV_NET_VF", "FPGA_FUNCTION").  Allocations consume that inventory,
plain and simple.
* As a region gets programmed dynamically, it's acceptable for the thing
doing the programming to set a trait indicating that that function is in
play.  (Sundar, this is the thing I originally said would get
resistance; but we've agreed it's okay.  No blood was shed :)
* Requests *may* use preferred traits to help them land on a card that
already has their function flashed on it. (Prerequisite: preferred
traits, which can be implemented in placement.  Candidates with the most
preferred traits get sorted highest.)

The "single program" case needs to be handled more like what Alex
describes below.  TL;DR: We do *not* support dynamic programming,
traiting, or inventorying at instance boot time - it all has to be done
"up front".
* The PFs can be initially modeled as "empty" resource providers.  Or
maybe not at all.  Either way, *they can not be deployed* in this state.
* An operator or admin (via a CLI, config file, agent like blazar or
cyborg, etc.) preprograms the PF to have the specific desired
function/configuration.
  * This may be cyborg/blazar pre-programming devices to maintain an
available set of each function
  * This may be in response to a user requesting some function, which
causes a new image to be laid down on a device so it will be available
for scheduling
  * This may be a human doing it at cloud-build time
* This results in the resource provider being (created and) set up with
the inventory and traits appropriate to that function.
* Now deploys can happen, using required traits representing the desired
function.

-efried

[1]
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-29.log.html#t2018-03-29T12:52:56
[2] https://etherpad.openstack.org/p/placement-dynamic-traiting

On 03/29/2018 07:38 AM, Alex Xu wrote:
> Agree with that, whatever the tweak inventory or traits, none of them works.
> 
> Same as VGPU, we can support pre-programmed mode for multiple-functions
> region, and each region only can support one type function.
> 
> There are two reasons why Cyborg has a filter:
> * records the usage of functions in a region
> * records which function is programmed.
> 
> For #1, each region provider multiple functions. Each function can be
> assigned to a VM. So we should create ResourceProvider for the region. And
> the resource class is function. That is similar to the SR-IOV device.
> The region(The PF)
> provides functions (VFs).
> 
> For #2, We should use trait to distinguish the function type.
> 
> Then we didn't keep any inventory info in the cyborg again, and we
> needn't any filter in cyborg also,
> and there is no race condition anymore.
> 
> 2018-03-29 2:48 GMT+08:00 Eric Fried  <mailto:openst...@fried.cc>>:
> 
> Sundar-
> 
>         We're running across this issue in several places right
> now.   One
> thing that's definitely not going to get traction is
> automatically/implicitly tweaking inventory in one resource class when
> an allocation is made on a different resource class (whether in the same
> or different RPs).
> 
>         Slightly less of a nonstarter, but still likely to get
> significant
> push-back, is the idea of tweaking traits on the fly.  For example, your
> vGPU case might be modeled as:
> 
> PGPU_RP: {
>   inventory: {
>       CUSTOM_VGPU_TYPE_A: 2,
>       CUSTOM_VGPU_TYPE_B: 4,
>   }
>   traits: [
>       CUSTOM_VGPU_TYPE_A_CAPABLE,
>       CUSTOM_VGPU_TYPE_B_CAPABLE,
>   ]
> }
> 
>         The request would come in for
> resources=CUSTOM_VGPU_TYPE_A:1&required=VGPU_TYPE_A_CAPABLE, resulting
> in an allocation of CUSTOM_VGPU_TYPE_A:1.  Now while you're 

Re: [openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow

2018-03-29 Thread Eric Fried
> That means that for the (re)-programming scenarios you need to
> dynamically adjust the inventory of a particular FPGA resource provider.

Oh, see, this is something I had *thought* was a non-starter.  This
makes the "single program" case way easier to deal with, and allows it
to be handled on the fly:

* Model your region as a provider with separate resource classes for
each function it supports.  The inventory totals for each would be the
total number of virtual slots (or whatever they're called) of that type
that are possible when the device is flashed with that function.
* An allocation is made for one unit of class X.  This percolates down
to cyborg to do the flashing/attaching.  At this time, cyborg *deletes*
the inventories for all the other resource classes.
* In a race with different resource classes, whoever gets to cyborg
first, wins.  The second one will see that the device is already flashed
with X, and fail.  The failure will bubble up, causing the allocation to
be released.
* Requests for multiple different resource classes at once will have to
filter out allocation candidates that put both on the same device.  Not
completely sure how this happens.  Otherwise they would have to fail at
cyborg, resulting in the same bubble/deallocate as above.

-efried

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [placement] Anchor/Relay Providers

2018-03-30 Thread Eric Fried
Folks who care about placement (but especially Jay and Tetsuro)-

I was reviewing [1] and was at first very unsatisfied that we were not
returning the anchor providers in the results.  But as I started digging
into what it would take to fix it, I realized it's going to be
nontrivial.  I wanted to dump my thoughts before the weekend.


It should be legal to have a configuration like:

#CN1 (VCPU, MEMORY_MB)
#/  \
#   /agg1\agg2
#  /  \
# SS1SS2
#  (DISK_GB)  (IPV4_ADDRESS)

And make a request for DISK_GB,IPV4_ADDRESS;
And have it return a candidate including SS1 and SS2.

The CN1 resource provider acts as an "anchor" or "relay": a provider
that doesn't provide any of the requested resource, but connects to one
or more sharing providers that do so.

This scenario doesn't work today (see bug [2]).  Tetsuro has a partial
fix [1].

However, whereas that fix will return you an allocation_request
containing SS1 and SS2, neither the allocation_request nor the
provider_summary mentions CN1.

That's bad.  Consider use cases like Nova's, where we have to land that
allocation_request on a host: we have no good way of figuring out who
that host is.


Starting from the API, the response payload should look like:

{
"allocation_requests": [
{"allocations": {
# This is missing ==>
CN1_UUID: {"resources": {}},
# <==
SS1_UUID: {"resources": {"DISK_GB": 1024}},
SS2_UUID: {"resources": {"IPV4_ADDRESS": 1}}
}}
],
"provider_summaries": {
# This is missing ==>
CN1_UUID: {"resources": {
"VCPU": {"used": 123, "capacity": 456}
}},
# <==
SS1_UUID: {"resources": {
"DISK_GB": {"used": 2048, "capacity": 1048576}
}},
SS2_UUID: {"resources": {
"IPV4_ADDRESS": {"used": 4, "capacity": 32}
}}
},
}

Here's why it's not working currently:

=> CN1_UUID isn't in `summaries` [3]
=> because _build_provider_summaries [4] doesn't return it
=> because it's not in usages because _get_usages_by_provider_and_rc [5]
only finds providers providing resource in that RC
=> and since CN1 isn't providing resource in any requested RC, it ain't
included.

But we have the anchor provider's (internal) ID; it's the ns_rp_id we're
iterating on in this loop [6].  So let's just use that to get the
summary and add it to the mix, right?  Things that make that difficult:

=> We have no convenient helper that builds a summary object without
specifying a resource class (which is a separate problem, because it
means resources we didn't request don't show up in the provider
summaries either - they should).
=> We internally build these gizmos inside out - an AllocationRequest
contains a list of AllocationRequestResource, which contains a provider
UUID, resource class, and amount.  The latter two are required - but
would be n/a for our anchor RP.

I played around with this and came up with something that gets us most
of the way there [7].  It's quick and dirty: there are functional holes
(like returning "N/A" as a resource class; and traits are missing) and
places where things could be made more efficient.  But it's a start.

-efried

[1] https://review.openstack.org/#/c/533437/
[2] https://bugs.launchpad.net/nova/+bug/1732731
[3]
https://review.openstack.org/#/c/533437/6/nova/api/openstack/placement/objects/resource_provider.py@3308
[4]
https://review.openstack.org/#/c/533437/6/nova/api/openstack/placement/objects/resource_provider.py@3062
[5]
https://review.openstack.org/#/c/533437/6/nova/api/openstack/placement/objects/resource_provider.py@2658
[6]
https://review.openstack.org/#/c/533437/6/nova/api/openstack/placement/objects/resource_provider.py@3303
[7] https://review.openstack.org/#/c/558014/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement] Anchor/Relay Providers

2018-03-31 Thread Eric Fried
/me responds to self

Good progress has been made here.

Tetsuro solved the piece where provider summaries were only showing
resources that had been requested - with [8] they show usage information
for *all* their resources.

In order to make use of both [1] and [8], I had to shuffle them into the
same series - I put [8] first - and then balance my (heretofore) WIP [7]
on the top.  So we now have a lovely 5-part series starting at [9].

Regarding the (heretofore) WIP [7], I cleaned it up and made it ready.

QUESTION: Do we need a microversions for [8] and/or [1] and/or [7]?
Each changes the response payload content of GET /allocation_candidates,
so yes; but that content was arguably broken before, so no.  Please
comment on the patches accordingly.

-efried

> [1] https://review.openstack.org/#/c/533437/
> [2] https://bugs.launchpad.net/nova/+bug/1732731
> [3]
https://review.openstack.org/#/c/533437/6/nova/api/openstack/placement/objects/resource_provider.py@3308
> [4]
https://review.openstack.org/#/c/533437/6/nova/api/openstack/placement/objects/resource_provider.py@3062
> [5]
https://review.openstack.org/#/c/533437/6/nova/api/openstack/placement/objects/resource_provider.py@2658
> [6]
https://review.openstack.org/#/c/533437/6/nova/api/openstack/placement/objects/resource_provider.py@3303
> [7] https://review.openstack.org/#/c/558014/
[8] https://review.openstack.org/#/c/558045/
[9] https://review.openstack.org/#/c/558044/

On 03/30/2018 07:34 PM, Eric Fried wrote:
> Folks who care about placement (but especially Jay and Tetsuro)-
> 
> I was reviewing [1] and was at first very unsatisfied that we were not
> returning the anchor providers in the results.  But as I started digging
> into what it would take to fix it, I realized it's going to be
> nontrivial.  I wanted to dump my thoughts before the weekend.
> 
> 
> It should be legal to have a configuration like:
> 
> #CN1 (VCPU, MEMORY_MB)
> #/  \
> #   /agg1\agg2
> #  /  \
> # SS1SS2
> #  (DISK_GB)  (IPV4_ADDRESS)
> 
> And make a request for DISK_GB,IPV4_ADDRESS;
> And have it return a candidate including SS1 and SS2.
> 
> The CN1 resource provider acts as an "anchor" or "relay": a provider
> that doesn't provide any of the requested resource, but connects to one
> or more sharing providers that do so.
> 
> This scenario doesn't work today (see bug [2]).  Tetsuro has a partial
> fix [1].
> 
> However, whereas that fix will return you an allocation_request
> containing SS1 and SS2, neither the allocation_request nor the
> provider_summary mentions CN1.
> 
> That's bad.  Consider use cases like Nova's, where we have to land that
> allocation_request on a host: we have no good way of figuring out who
> that host is.
> 
> 
> Starting from the API, the response payload should look like:
> 
> {
> "allocation_requests": [
> {"allocations": {
> # This is missing ==>
> CN1_UUID: {"resources": {}},
> # <==
> SS1_UUID: {"resources": {"DISK_GB": 1024}},
> SS2_UUID: {"resources": {"IPV4_ADDRESS": 1}}
> }}
> ],
> "provider_summaries": {
> # This is missing ==>
> CN1_UUID: {"resources": {
> "VCPU": {"used": 123, "capacity": 456}
> }},
> # <==
> SS1_UUID: {"resources": {
> "DISK_GB": {"used": 2048, "capacity": 1048576}
> }},
> SS2_UUID: {"resources": {
> "IPV4_ADDRESS": {"used": 4, "capacity": 32}
> }}
> },
> }
> 
> Here's why it's not working currently:
> 
> => CN1_UUID isn't in `summaries` [3]
> => because _build_provider_summaries [4] doesn't return it
> => because it's not in usages because _get_usages_by_provider_and_rc [5]
> only finds providers providing resource in that RC
> => and since CN1 isn't providing resource in any requested RC, it ain't
> included.
> 
> But we have the anchor provider's (internal) ID; it's the ns_rp_id we're
> iterating on in this loop [6].  So let's just use that to get the
> summary and add it to the mix, right?  Things that make that difficult:
> 
> => We have no convenient helper that builds a summary object without
> specifying a resource class (which is a separate problem, because it
> means resources we didn't request don't show up in the provider
> 

Re: [openstack-dev] [nova][oslo] what to do with problematic mocking in nova unit tests

2018-03-31 Thread Eric Fried
Hi Doug, I made this [2] for you.  I tested it locally with oslo.config
master, and whereas I started off with a slightly different set of
errors than you show at [1], they were in the same suites.  Since I
didn't want to tox the world locally, I went ahead and added a
Depends-On from [3].  Let's see how it plays out.

>> [1]
http://logs.openstack.org/12/557012/1/check/cross-nova-py27/37b2a7c/job-output.txt.gz#_2018-03-27_21_41_09_883881
[2] https://review.openstack.org/#/c/558084/
[3] https://review.openstack.org/#/c/557012/

-efried

On 03/30/2018 06:35 AM, Doug Hellmann wrote:
> Anyone?
> 
>> On Mar 28, 2018, at 1:26 PM, Doug Hellmann  wrote:
>>
>> In the course of preparing the next release of oslo.config, Ben noticed
>> that nova's unit tests fail with oslo.config master [1].
>>
>> The underlying issue is that the tests mock things that oslo.config
>> is now calling as part of determining where options are being set
>> in code. This isn't an API change in oslo.config, and it is all
>> transparent for normal uses of the library. But the mocks replace
>> os.path.exists() and open() for the entire duration of a test
>> function (not just for the isolated application code being tested),
>> and so the library behavior change surfaces as a test error.
>>
>> I'm not really in a position to go through and clean up the use of
>> mocks in those (and other?) tests myself, and I would like to not
>> have to revert the feature work in oslo.config, especially since
>> we did it for the placement API stuff for the nova team.
>>
>> I'm looking for ideas about what to do.
>>
>> Doug
>>
>> [1] 
>> http://logs.openstack.org/12/557012/1/check/cross-nova-py27/37b2a7c/job-output.txt.gz#_2018-03-27_21_41_09_883881
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [barbican][nova-powervm][pyghmi][solum][trove] Switching to cryptography from pycrypto

2018-03-31 Thread Eric Fried
Mr. Fire-

> nova-powervm: no open reviews
>   - in test-requirements, but not actually used?
>   - made https://review.openstack.org/558091 for it

Thanks for that.  It passed all our tests; we should merge it early next
week.

-efried

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposing Eric Fried for nova-core

2018-04-03 Thread Eric Fried
Thank you Melanie for the complimentary nomination, to the cores for
welcoming me into the fold, and especially to all (cores and non, Nova
and otherwise) who have mentored me along the way thus far.  I hope to
live up to your example and continue to pay it forward.

-efried

On 04/03/2018 02:20 PM, melanie witt wrote:
> On Mon, 26 Mar 2018 19:00:06 -0700, Melanie Witt wrote:
>> Howdy everyone,
>>
>> I'd like to propose that we add Eric Fried to the nova-core team.
>>
>> Eric has been instrumental to the placement effort with his work on
>> nested resource providers and has been actively contributing to many
>> other areas of openstack [0] like project-config, gerritbot,
>> keystoneauth, devstack, os-loganalyze, and so on.
>>
>> He's an active reviewer in nova [1] and elsewhere in openstack and
>> reviews in-depth, asking questions and catching issues in patches and
>> working with authors to help get code into merge-ready state. These are
>> qualities I look for in a potential core reviewer.
>>
>> In addition to all that, Eric is an active participant in the project in
>> general, helping people with questions in the #openstack-nova IRC
>> channel, contributing to design discussions, helping to write up
>> outcomes of discussions, reporting bugs, fixing bugs, and writing tests.
>> His contributions help to maintain and increase the health of our
>> project.
>>
>> To the existing core team members, please respond with your comments,
>> +1s, or objections within one week.
>>
>> Cheers,
>> -melanie
>>
>> [0] https://review.openstack.org/#/q/owner:efried
>> [1] http://stackalytics.com/report/contribution/nova/90
> 
> Thanks to everyone who responded with their feedback. It's been one week
> and we have had more than enough +1s, so I've added Eric to the team.
> 
> Welcome Eric!
> 
> Best,
> -melanie
> 
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] placement update 18-14

2018-04-06 Thread Eric Fried
>> it's really on nested allocation candidates.
> 
> Yup. And that series is deadlocked on a disagreement about whether
> granular request groups should be "separate by default" (meaning: if you
> request multiple groups of resources, the expectation is that they will
> be served by distinct resource providers) or "unrestricted by default"
> (meaning: if you request multiple groups of resources, those resources
> may or may not be serviced by distinct resource providers).

This is really a granular thing, not a nested thing.  I was holding up
the nrp-in-alloc-cands spec [1] for other reasons, but I've stopped
doing that now.  We should be able to proceed with the nrp work.  I'm
working on the granular code, wherein I can hopefully isolate the
separate-vs-unrestricted decision such that we can go either way once
that issue is resolved.

[1] https://review.openstack.org/#/c/556873/

>> Some negotiation happened with regard to when/if the fixes for
>> shared providers is going to happen. I'm not sure how that resolved,
>> if someone can follow up with that, that would be most excellent.

This is the subject of another thread [2] that's still "dangling".  We
discussed it in the sched meeting this week [3] and concluded [4] that
we shouldn't do it in Rocky.  BUT tetsuro later pointed out that part of
the series in question [5] is still needed to satisfy NRP-in-alloc-cands
(return the whole tree's providers in provider_summaries - even the ones
that aren't providing resource to the request).  That patch changes
behavior, so needs a microversion (mostly done already in that patch),
so needs a spec.  We haven't yet resolved whether this is truly needed,
so haven't assigned a body to the spec work.  I believe Jay is still
planning [6] to parse and respond to the ML thread.  After he clones
himself.

[2]
http://lists.openstack.org/pipermail/openstack-dev/2018-March/128944.html
[3]
http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-04-02-14.00.log.html#l-91
[4]
http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-04-02-14.00.log.html#l-137
[5] https://review.openstack.org/#/c/558045/
[6]
http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-04-02-14.00.log.html#l-104

>> * Shared providers status?
>>    (I really think we need to make this go. It was one of the
>>    original value propositions of placement: being able to accurate
>>    manage shared disk.)
> 
> Agreed, but you know NUMA. And CPU pinning. And vGPUs. And FPGAs.
> And physnet network bandwidth scheduling. And... well, you get the idea.

Right.  I will say that Tetsuro has been doing an excellent job slinging
code for this, though.  So the bottleneck is really reviewer bandwidth
(already an issue for the work we *are* trying to fit in Rocky).

If it's still on the table by Stein, we ought to consider making it a
high priority.  (Our Rocky punchlist seems to be favoring "urgent" over
"important" to some extent.)

-efried

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Changes toComputeVirtAPI.wait_for_instance_event

2018-04-11 Thread Eric Fried
Jichen was able to use this information immediately, to great benefit
[1].  (If those paying attention could have a quick look at that to make
sure he used it right, it would be appreciated; I'm not an expert here.)

[1]
https://review.openstack.org/#/c/527658/31..32/nova/virt/zvm/guest.py@192

On 04/10/2018 09:06 PM, Chen CH Ji wrote:
> Thanks for your info ,really helpful
> 
> Best Regards!
> 
> Kevin (Chen) Ji 纪 晨
> 
> Engineer, zVM Development, CSTL
> Notes: Chen CH Ji/China/IBM@IBMCN Internet: jiche...@cn.ibm.com
> Phone: +86-10-82451493
> Address: 3/F Ring Building, ZhongGuanCun Software Park, Haidian
> District, Beijing 100193, PRC
> 
> Inactive hide details for Andreas Scheuring ---04/10/2018 10:19:21
> PM---Yes, that’s how it works! ---Andreas Scheuring ---04/10/2018
> 10:19:21 PM---Yes, that’s how it works! ---
> 
> From: Andreas Scheuring 
> To: "OpenStack Development Mailing List (not for usage questions)"
> 
> Date: 04/10/2018 10:19 PM
> Subject: Re: [openstack-dev] [nova] Changes
> toComputeVirtAPI.wait_for_instance_event
> 
> 
> 
> 
> 
> Yes, that’s how it works!
> 
> ---
> Andreas Scheuring (andreas_s)
> 
> 
> 
> On 10. Apr 2018, at 16:05, Matt Riedemann <_mriedemos@gmail.com_
> > wrote:
> 
> On 4/9/2018 9:57 PM, Chen CH Ji wrote:
> 
> Could you please help to share whether this kind of event is
> sent by neutron-server or neutron agent ? I searched neutron code
> from [1][2] this means the agent itself need tell neutron server
> the device(VIF) is up then neutron server will send notification
> to nova through REST API and in turn consumed by compute node?
> 
> [1]_https://github.com/openstack/neutron/tree/master/neutron/notify_port_active_direct_
> 
> 
> 
> [2]_https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/rpc.py#L264_
> 
> 
> 
> 
> I believe the neutron agent is the one that is getting (or polling) the
> information from the underlying network backend when VIFs are plugged or
> unplugged from a host, then route that information via RPC to the
> neutron server which then sends an os-server-external-events request to
> the compute REST API, which then routes the event information down to
> the nova-compute host where the instance is currently running.
> 
> -- 
> 
> Thanks,
> 
> Matt
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: _OpenStack-dev-request@lists.openstack.org_
> ?subject:unsubscribe_
> __http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev_
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_cgi-2Dbin_mailman_listinfo_openstack-2Ddev&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=8sI5aZT88Uetyy_XsOddbPjIiLSGM-sFnua3lLy2Xr0&m=tIntFpZ0ffp-_h5CsqN1I9tv64hW2xugxBXaxDn7Z_I&s=z2jOgMD7B3XFoNsUHTtIO6hWKYXH-Dm4L4P0-u-oSSw&e=
> 
> 
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][Deployers] Optional, platform specific, dependancies in requirements.txt

2018-04-12 Thread Eric Fried
+1

This sounds reasonable to me.  I'm glad the issue was raised, but IMO it
shouldn't derail progress on an approved blueprint with ready code.

Jichen, would you please go ahead and file that blueprint template (no
need to write a spec yet) and link it in a review comment on the bottom
zvm patch so we have a paper trail?  I'm thinking something like
"Consistent platform-specific and optional requirements" -- that leaves
us open to decide *how* we're going to "handle" them.

Thanks,
efried

On 04/12/2018 04:13 AM, Chen CH Ji wrote:
> Thanks for Michael for raising this question and detailed information
> from Clark
> 
> As indicated in the mail, xen, vmware etc might already have this kind
> of requirements (and I guess might be more than that) ,
> can we accept z/VM requirements first by following other existing ones
> then next I can create a BP later to indicate this kind
> of change request by referring to Clark's comments and submit patches to
> handle it ? Thanks
> 
> Best Regards!
> 
> Kevin (Chen) Ji 纪 晨
> 
> Engineer, zVM Development, CSTL
> Notes: Chen CH Ji/China/IBM@IBMCN Internet: jiche...@cn.ibm.com
> Phone: +86-10-82451493
> Address: 3/F Ring Building, ZhongGuanCun Software Park, Haidian
> District, Beijing 100193, PRC
> 
> Inactive hide details for Matt Riedemann ---04/12/2018 08:46:25 AM---On
> 4/11/2018 5:09 PM, Michael Still wrote: >Matt Riedemann ---04/12/2018
> 08:46:25 AM---On 4/11/2018 5:09 PM, Michael Still wrote: >
> 
> From: Matt Riedemann 
> To: openstack-dev@lists.openstack.org
> Date: 04/12/2018 08:46 AM
> Subject: Re: [openstack-dev] [Nova][Deployers] Optional, platform
> specific, dependancies in requirements.txt
> 
> 
> 
> 
> 
> On 4/11/2018 5:09 PM, Michael Still wrote:
>>
>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__review.openstack.org_-23_c_523387&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=8sI5aZT88Uetyy_XsOddbPjIiLSGM-sFnua3lLy2Xr0&m=212PUwLYOBlJZ3BiZNuJIFkRfqXoBPJDcWYCDk7vCHg&s=CNosrTHnAR21zOI52fnDRfTqu2zPiAn2oW9f67Qijo4&e=
>  proposes
> adding a z/VM specific
>> dependancy to nova's requirements.txt. When I objected the counter
>> argument is that we have examples of windows specific dependancies
>> (os-win) and powervm specific dependancies in that file already.
>>
>> I think perhaps all three are a mistake and should be removed.
>>
>> My recollection is that for drivers like ironic which may not be
>> deployed by everyone, we have the dependancy documented, and then loaded
>> at runtime by the driver itself instead of adding it to
>> requirements.txt. This is to stop pip for auto-installing the dependancy
>> for anyone who wants to run nova. I had assumed this was at the request
>> of the deployer community.
>>
>> So what do we do with z/VM? Do we clean this up? Or do we now allow
>> dependancies that are only useful to a very small number of deployments
>> into requirements.txt?
> 
> As Eric pointed out in the review, this came up when pypowervm was added:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__review.openstack.org_-23_c_438119_5_requirements.txt&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=8sI5aZT88Uetyy_XsOddbPjIiLSGM-sFnua3lLy2Xr0&m=212PUwLYOBlJZ3BiZNuJIFkRfqXoBPJDcWYCDk7vCHg&s=iyKxF-CcGAFmnQs8B7d5u2zwEiJqq8ivETmrgB77PEg&e=
> 
> And you're asking the same questions I did in there, which was, should
> it go into test-requirements.txt like oslo.vmware and
> python-ironicclient, or should it go under [extras], or go into
> requirements.txt like os-win (we also have the xenapi library now too).
> 
> I don't really think all of these optional packages should be in
> requirements.txt, but we should just be consistent with whatever we do,
> be that test-requirements.txt or [extras]. I remember caring more about
> this back in my rpm packaging days when we actually tracked what was in
> requirements.txt to base what needed to go into the rpm spec, unlike
> Fedora rpm specs which just zero out requirements.txt and depend on
> their own knowledge of what needs to be installed (which is sometimes
> lacking or lagging master).
> 
> I also seem to remember that [extras] was less than user-friendly for
> some reason, but maybe that was just because of how our CI jobs are
> setup? Or I'm just making that up. I know it's pretty simple to install
> the stuff from extras for tox runs, it's just an extra set of
> dependencies to list in the tox.ini.
> 
> Having said all this, I don't have the energy to help push for
> consistency myself, but will happily watch you from the sidelines.
> 
> -- 
> 
> Thanks,
> 
> Matt
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_cgi-2Dbin_mailman_listinfo_openstack-2Ddev&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=8sI5aZT88Uet

Re: [openstack-dev] [Nova][Deployers] Optional, platform specific, dependancies in requirements.txt

2018-04-12 Thread Eric Fried
> Is avoiding three lines of code really worth making future cleanup
> harder? Is a three line change really blocking "an approved blueprint
> with ready code"?

Nope.  What's blocking is deciding that that's the right thing to do.
Which we clearly don't have consensus on, based on what's happening in
this thread.

> global ironic
> if ironic is None:
> ironic = importutils.import_module('ironicclient')

I have a pretty strong dislike for this mechanism.  For one thing, I'm
frustrated when I can't use hotkeys to jump to an ironicclient method
because my IDE doesn't recognize that dynamic import.  I have to go look
up the symbol some other way (and hope I'm getting the right one).  To
me (with my bias as a dev rather than a deployer) that's way worse than
having the 704KB python-ironicclient installed on my machine even though
I've never spawned an ironic VM in my life.

It should also be noted that python-ironicclient is in
test-requirements.txt.

Not that my personal preference ought to dictate or even influence what
we decide to do here.  But dynamic import is not the obviously correct
choice.

-efried

On 04/12/2018 03:28 PM, Michael Still wrote:
> I don't understand why you think the alternative is so hard. Here's how
> ironic does it:
> 
>         global ironic
> 
>         if ironic is None:
> 
>             ironic = importutils.import_module('ironicclient')
> 
> 
> Is avoiding three lines of code really worth making future cleanup
> harder? Is a three line change really blocking "an approved blueprint
> with ready code"?
> 
> Michael
> 
> 
> 
> On Thu, Apr 12, 2018 at 10:42 PM, Eric Fried  <mailto:openst...@fried.cc>> wrote:
> 
> +1
> 
> This sounds reasonable to me.  I'm glad the issue was raised, but IMO it
> shouldn't derail progress on an approved blueprint with ready code.
> 
> Jichen, would you please go ahead and file that blueprint template (no
> need to write a spec yet) and link it in a review comment on the bottom
> zvm patch so we have a paper trail?  I'm thinking something like
> "Consistent platform-specific and optional requirements" -- that leaves
> us open to decide *how* we're going to "handle" them.
> 
> Thanks,
> efried
> 
> On 04/12/2018 04:13 AM, Chen CH Ji wrote:
> > Thanks for Michael for raising this question and detailed information
> > from Clark
> >
> > As indicated in the mail, xen, vmware etc might already have this kind
> > of requirements (and I guess might be more than that) ,
> > can we accept z/VM requirements first by following other existing ones
> > then next I can create a BP later to indicate this kind
> > of change request by referring to Clark's comments and submit patches to
> > handle it ? Thanks
> >
> > Best Regards!
> >
> > Kevin (Chen) Ji 纪 晨
> >
> > Engineer, zVM Development, CSTL
> > Notes: Chen CH Ji/China/IBM@IBMCN Internet: jiche...@cn.ibm.com 
> <mailto:jiche...@cn.ibm.com>
> > Phone: +86-10-82451493
> > Address: 3/F Ring Building, ZhongGuanCun Software Park, Haidian
> > District, Beijing 100193, PRC
> >
> > Inactive hide details for Matt Riedemann ---04/12/2018 08:46:25 AM---On
> > 4/11/2018 5:09 PM, Michael Still wrote: >Matt Riedemann ---04/12/2018
> > 08:46:25 AM---On 4/11/2018 5:09 PM, Michael Still wrote: >
> >
> > From: Matt Riedemann mailto:mriede...@gmail.com>>
> > To: openstack-dev@lists.openstack.org
> <mailto:openstack-dev@lists.openstack.org>
> > Date: 04/12/2018 08:46 AM
> > Subject: Re: [openstack-dev] [Nova][Deployers] Optional, platform
> > specific, dependancies in requirements.txt
> >
> >
> 
> >
> >
> >
> > On 4/11/2018 5:09 PM, Michael Still wrote:
> >>
> >>
> >
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__review.openstack.org_-23_c_523387&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=8sI5aZT88Uetyy_XsOddbPjIiLSGM-sFnua3lLy2Xr0&m=212PUwLYOBlJZ3BiZNuJIFkRfqXoBPJDcWYCDk7vCHg&s=CNosrTHnAR21zOI52fnDRfTqu2zPiAn2oW9f67Qijo4&e=
> 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__review.openstack.org_-23_c_523387&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=8sI5aZT88Uetyy_XsOddbPjIiLSGM-sFnua3lLy2Xr0&m=212PUwLYOBlJZ3BiZNuJIFkRfqXoBPJDcWYCDk

Re: [openstack-dev] [placement] Anchor/Relay Providers

2018-04-16 Thread Eric Fried
I was presenting an example using VM-ish resource classes, because I can
write them down and everybody knows what I'm talking about without me
having to explain what they are.  But remember we want placement to be
usable outside of Nova as well.

But also, I thought we had situations where the VCPU and MEMORY_MB were
themselves provided by sharing providers, associated with a compute host
RP that may be itself devoid of inventory.  (This may even be a viable
way to model VMWare's clustery things today.)

-efried

On 04/16/2018 01:58 PM, Jay Pipes wrote:
> Sorry it took so long to respond. Comments inline.
> 
> On 03/30/2018 08:34 PM, Eric Fried wrote:
>> Folks who care about placement (but especially Jay and Tetsuro)-
>>
>> I was reviewing [1] and was at first very unsatisfied that we were not
>> returning the anchor providers in the results.  But as I started digging
>> into what it would take to fix it, I realized it's going to be
>> nontrivial.  I wanted to dump my thoughts before the weekend.
>>
>> 
>> It should be legal to have a configuration like:
>>
>>  #    CN1 (VCPU, MEMORY_MB)
>>  #    /  \
>>  #   /agg1    \agg2
>>  #  /  \
>>  # SS1    SS2
>>  #  (DISK_GB)  (IPV4_ADDRESS)
>>
>> And make a request for DISK_GB,IPV4_ADDRESS;
>> And have it return a candidate including SS1 and SS2.
>>
>> The CN1 resource provider acts as an "anchor" or "relay": a provider
>> that doesn't provide any of the requested resource, but connects to one
>> or more sharing providers that do so.
> 
> To be honest, such a request just doesn't make much sense to me.
> 
> Think about what that is requesting. I want some DISK_GB resources and
> an IP address. For what? What is going to be *using* those resources?
> 
> Ah... a virtual machine. In other words, something that would *also* be
> requesting some CPU and memory resources as well.
> 
> So, the request is just fatally flawed, IMHO. It doesn't represent a use
> case from the real world.
> 
> I don't believe we should be changing placement (either the REST API or
> the implementation of allocation candidate retrieval) for use cases that
> don't represent real-world requests.
> 
> Best,
> -jay
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement] Anchor/Relay Providers

2018-04-16 Thread Eric Fried
> I still don't see a use in returning the root providers in the
> allocation requests -- since there is nothing consuming resources from
> those providers.
> 
> And we already return the root_provider_uuid for all providers involved
> in allocation requests within the provider_summaries section.
> 
> So, I can kind of see where we might want to change *this* line of the
> nova scheduler:
> 
> https://github.com/openstack/nova/blob/stable/pike/nova/scheduler/filter_scheduler.py#L349
> 
> 
> from this:
> 
>  compute_uuids = list(provider_summaries.keys())
> 
> to this:
> 
>  compute_uuids = set([
>  ps['root_provider_uuid'] for ps in provider_summaries
>  ])

If we're granting that it's possible to get all your resources from
sharing providers, the above doesn't help you to know which of your
compute_uuids belongs to which of those sharing-only allocation requests.

I'm fine deferring this part until we have a use case for sharing-only
allocation requests that aren't prompted by an "attach-*" case where we
already know the target host/consumer.  But I'd like to point out that
there's nothing in the API that prevents us from getting such results.

-efried

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement][nova] Decision time on granular request groups for like resources

2018-04-18 Thread Eric Fried
Thanks for describing the proposals clearly and concisely, Jay.

My preamble would have been that we need to support two use cases:

- "explicit anti-affinity": make sure certain parts of my request land
on *different* providers;
- "any fit": make sure my instance lands *somewhere*.

Both proposals address both use cases, but in different ways.

> "By default, should resources/traits submitted in different numbered
> request groups be supplied by separate resource providers?"

I agree this question needs to be answered, but that won't necessarily
inform which path we choose.  Viewpoint B [3] is set up to go either
way: either we're unrestricted by default and use a queryparam to force
separation; or we're split by default and use a queryparam to allow the
unrestricted behavior.

Otherwise I agree with everything Jay said.

-efried

On 04/18/2018 09:06 AM, Jay Pipes wrote:
> Stackers,
> 
> Eric Fried and I are currently at an impasse regarding a decision that
> will have far-reaching (and end-user facing) impacts to the placement
> API and how nova interacts with the placement service from the nova
> scheduler.
> 
> We need to make a decision regarding the following question:
> 
> 
> There are two competing proposals right now (both being amendments to
> the original granular request groups spec [1]) which outline two
> different viewpoints.
> 
> Viewpoint A [2], from me, is that like resources listed in different
> granular request groups should mean that those resources will be sourced
> from *different* resource providers.
> 
> In other words, if I issue the following request:
> 
> GET /allocation_candidates?resources1=VCPU:1&resources2=VCPU:1
> 
> Then I am assured of getting allocation candidates that contain 2
> distinct resource providers consuming 1 VCPU from each provider.
> 
> Viewpoint B [3], from Eric, is that like resources listed in different
> granular request groups should not necessarily mean that those resources
> will be sourced from different resource providers. They *could* be
> sourced from different providers, or they could be sourced from the same
> provider.
> 
> Both proposals include ways to specify whether certain resources or
> whole request groups can be forced to be sources from either a single
> provider or from different providers.
> 
> In Viewpoint A, the proposal is to have a can_split=RESOURCE1,RESOURCE2
> query parameter that would indicate which resource classes in the
> unnumbered request group that may be split across multiple providers
> (remember that viewpoint A considers different request groups to
> explicitly mean different providers, so it doesn't make sense to have a
> can_split query parameter for numbered request groups).
> 
> In Viewpoint B, the proposal is to have a separate_providers=1,2 query
> parameter that would indicate that the identified request groups should
> be sourced from separate providers. Request groups that are not listed
> in the separate_providers query parameter are not guaranteed to be
> sourced from different providers.
> 
> I know this is a complex subject, but I thought it was worthwhile trying
> to explain the two proposals in as clear terms as I could muster.
> 
> I'm, quite frankly, a bit on the fence about the whole thing and would
> just like to have a clear path forward so that we can start landing the
> 12+ patches that are queued up waiting for a decision on this.
> 
> Thoughts and opinions welcome.
> 
> Thanks,
> -jay
> 
> 
> [1]
> http://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/granular-resource-requests.html
> 
> 
> [2] https://review.openstack.org/#/c/560974/
> 
> [3] https://review.openstack.org/#/c/561717/
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement][nova] Decision time on granular request groups for like resources

2018-04-18 Thread Eric Fried
I can't tell if you're being facetious, but this seems sane, albeit
complex.  It's also extensible as we come up with new and wacky affinity
semantics we want to support.

I can't say I'm sold on requiring `proximity` qparams that cover every
granular group - that seems like a pretty onerous burden to put on the
user right out of the gate.  That said, the idea of not having a default
is quite appealing.  Perhaps as a first pass we can require a single
?proximity={isolate|any} and build on it to support group numbers (etc.)
in the future.

One other thing inline below, not related to the immediate subject.

On 04/18/2018 12:40 PM, Jay Pipes wrote:
> On 04/18/2018 11:58 AM, Matt Riedemann wrote:
>> On 4/18/2018 9:06 AM, Jay Pipes wrote:
>>> "By default, should resources/traits submitted in different numbered
>>> request groups be supplied by separate resource providers?"
>>
>> Without knowing all of the hairy use cases, I'm trying to channel my
>> inner sdague and some of the similar types of discussions we've had to
>> changes in the compute API, and a lot of the time we've agreed that we
>> shouldn't assume a default in certain cases.
>>
>> So for this case, if I'm requesting numbered request groups, why
>> doesn't the API just require that I pass a query parameter telling it
>> how I'd like those requests to be handled, either via affinity or
>> anti-affinity
> So, you're thinking maybe something like this?
> 
> 1) Get me two dedicated CPUs. One of those dedicated CPUs must have AVX2
> capabilities. They must be on different child providers (different NUMA
> cells that are providing those dedicated CPUs).
> 
> GET /allocation_candidates?
> 
>  resources1=PCPU:1&required1=HW_CPU_X86_AVX2
> &resources2=PCPU:1
> &proximity=isolate:1,2
> 
> 2) Get me four dedicated CPUs. Two of those dedicated CPUs must have
> AVX2 capabilities. Two of the dedicated CPUs must have the SSE 4.2
> capability. They may come from the same provider (NUMA cell) or
> different providers.
> 
> GET /allocation_candidates?
> 
>  resources1=PCPU:2&required1=HW_CPU_X86_AVX2
> &resources2=PCPU:2&required2=HW_CPU_X86_SSE42
> &proximity=any:1,2
> 
> 3) Get me 2 dedicated CPUs and 2 SR-IOV VFs. The VFs must be provided by
> separate physical function providers which have different traits marking
> separate physical networks. The dedicated CPUs must come from the same
> provider tree in which the physical function providers reside.
> 
> GET /allocation_candidates?
> 
>  resources1=PCPU:2
> &resources2=SRIOV_NET_VF:1&required2=CUSTOM_PHYSNET_A
> &resources3=SRIOV_NET_VF:1&required3=CUSTOM_PHYSNET_B
> &proximity=isolate:2,3
> &proximity=same_tree:1,2,3
> 
> 3) Get me 2 dedicated CPUs and 2 SR-IOV VFs. The VFs must be provided by
> separate physical function providers which have different traits marking
> separate physical networks. The dedicated CPUs must come from the same
> provider *subtree* in which the second group of VF resources are sourced.
> 
> GET /allocation_candidates?
> 
>  resources1=PCPU:2
> &resources2=SRIOV_NET_VF:1&required2=CUSTOM_PHYSNET_A
> &resources3=SRIOV_NET_VF:1&required3=CUSTOM_PHYSNET_B
> &proximity=isolate:2,3
> &proximity=same_subtree:1,3

The 'same_subtree' concept requires a way to identify how far up the
common ancestor can be.  Otherwise, *everything* is in the same subtree.
 You could arbitrarily say "one step down from the root", but that's not
very flexible.  Allowing the user to specify a *number* of steps down
from the root is getting closer, but it requires the user to have an
understanding of the provider tree's exact structure, which is not ideal.

The idea I've been toying with here is "common ancestor by trait".  For
example, you would tag your NUMA node providers with trait NUMA_ROOT,
and then your request would include:

  ...
  &proximity=common_ancestor_by_trait:NUMA_ROOT:1,3

> 
> 4) Get me 4 SR-IOV VFs. 2 VFs should be sourced from a provider that is
> decorated with the CUSTOM_PHYSNET_A trait. 2 VFs should be sourced from
> a provider that is decorated with the CUSTOM_PHYSNET_B trait. For HA
> purposes, none of the VFs should be sourced from the same provider.
> However, the VFs for each physical network should be within the same
> subtree (NUMA cell) as each other.
> 
> GET /allocation_candidates?
> 
>  resources1=SRIOV_NET_VF:1&required1=CUSTOM_PHYSNET_A
> &resources2=SRIOV_NET_VF:1&required2=CUSTOM_PHYSNET_A
> &resources3=SRIOV_NET_VF:1&required3=CUSTOM_PHYSNET_B
> &resources4=SRIOV_NET_VF:1&required4=CUSTOM_PHYSNET_B
> &proximity=isolate:1,2,3,4
> &proximity=same_subtree:1,2
> &proximity=same_subtree:3,4
> 
> We can go even deeper if you'd like, since NFV means "never-ending
> feature velocity". Just let me know.
> 
> -jay
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo

Re: [openstack-dev] [placement][nova] Decision time on granular request groups for like resources

2018-04-18 Thread Eric Fried
Chris-

Going to accumulate a couple of your emails and answer them.  I could
have answered them separately (anti-affinity).  But in this case I felt
it appropriate to provide responses in a single note (best fit).

> I'm a bit conflicted.  On the one hand...

> On the other hand,

Right; we're in agreement that we need to handle both.

> I'm half tempted to side with mriedem and say that there is no default
> and it must be explicit, but I'm concerned that this would make the
> requests a lot larger if you have to specify it for every resource. 
and
> The request might get unwieldy if we have to specify affinity/anti-
> affinity for each resource.  Maybe you could specify the default for
> the request and then optionally override it for each resource?

Yes, good call.  I'm favoring this as a first pass.  See my other response.

> In either viewpoint, is there a way to represent "I want two resource
> groups, with resource X in each group coming from different resource
> providers (anti-affinity) and resource Y from the same resource provider
> (affinity)?

As proposed, yes.  Though if we go with the above (one flag to specify
request-wide behavior) then there wouldn't be that ability beyond
putting things in the un-numbered vs. numbered groups.  So I guess my
question is: do we have a use case *right now* that requires supporting
"isolate for some, unrestricted for others"?

> I'm not current on the placement implementation details, but would
> this level of flexibility cause complexity problems in the code?

Oh, implementing this is complex af.  Here's what it takes *just* to
satisfy the "any fit" version:

https://review.openstack.org/#/c/517757/10/nova/api/openstack/placement/objects/resource_provider.py@3599

I've made some progress implementing "proximity=isolate:X,Y,..." in my
sandbox, and that's even hairier.  Doing "proximity=isolate"
(request-wide policy) would be a little easier.

-efried

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement][nova] Decision time on granular request groups for like resources

2018-04-18 Thread Eric Fried
> Cool. So let's not use a GET for this and instead change it to a POST
> with a request body that can more cleanly describe what the user is
> requesting, which is something we talked about a long time ago.

I kinda doubt we could agree on a format for this in the Rocky
timeframe.  But for the sake of curiosity, I'd like to see some strawman
proposals for what that request body would look like.  Here's a couple
off the top:

{
  "anti-affinity": [
  {
  "resources": { $RESOURCE_CLASS: amount, ... },
  "required": [ $TRAIT, ... ],
  "forbidden": [ $TRAIT, ... ],
  },
  ...
  ],
  "affinity": [
  ...
  ],
  "any fit": [
  ...
  ],
}

Or maybe:

{
  $ARBITRARY_USER_SPECIFIED_KEY_DESCRIBING_THE_GROUP: {
  "resources": { $RESOURCE_CLASS: amount, ... },
  "required": [ $TRAIT, ... ],
  "forbidden": [ $TRAIT, ... ],
  },
  ...
  "affinity_spec": {
  "isolate": [ $ARBITRARY_KEY, ... ],
  "any": [ $ARBITRARY_KEY, ... ],
  "common_subtree_by_trait": {
  "groups": [ $ARBITRARY_KEY, ... ],
  "traits": [ $TRAIT, ... ],
  },
  }
}

(I think we also now need to fold multiple `member_of` in there somehow.
 And `limit` - does that stay in the querystring?  Etc.)

-efried

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement][nova] Decision time on granular request groups for like resources

2018-04-18 Thread Eric Fried
Sorry, addressing gaffe, bringing this back on-list...

On 04/18/2018 04:36 PM, Ed Leafe wrote:
> On Apr 18, 2018, at 4:11 PM, Eric Fried  wrote:
>>> That makes a lot of sense. Since we are already suffixing the query param 
>>> “resources” to indicate granular, why not add a clarifying term to that 
>>> suffix? E.g., “resources1=“ -> “resources1d” (for ‘different’). The exact 
>>> string we use can be bike shedded, but requiring it be specified sounds 
>>> pretty sane to me.
>>  I'm not understanding what you mean here.  The issue at hand is how
>> numbered groups interact with *each other*.  If I said
>> resources1s=...&resources2d=..., what am I saying about whether the
>> resources in group 1 can or can't land in the same RP as those of group 2?
> OK, sorry. What I meant by the ‘d’ was that that group’s resources must be 
> from a different provider than any other group’s resources (anti-affinity). 
> So in your example, you don’t care if group1 is from the same provider, but 
> you do with group2, so that’s kind of a contradictory set-up (unless you had 
> other groups).
>
> Instead, if the example were changed to 
> resources1s=...&resources2d=..&resources3s=…, then groups 1 and 3 could be 
> allocated from the same provider.
>
> -- Ed Leafe

This is a cool idea.  It doesn't allow the same level of granularity as
being able to list explicit group numbers to be [anti-]affinitized with
specific other groups - but I'm not sure we need that.  I would have to
think through the use cases with this in mind.

-efried

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement][nova] Decision time on granular request groups for like resources

2018-04-18 Thread Eric Fried
> I have a feeling we're just going to go back and forth on this, as we
> have for weeks now, and not reach any conclusion that is satisfactory to
> everyone. And we'll delay, yet again, getting functionality into this
> release that serves 90% of use cases because we are obsessing over the
> 0.01% of use cases that may pop up later.

So I vote that, for the Rocky iteration of the granular spec, we add a
single `proximity={isolate|any}` qparam, required when any numbered
request groups are specified.  I believe this allows us to satisfy the
two NUMA use cases we care most about: "forced sharding" and "any fit".
And as you demonstrated, it leaves the way open for finer-grained and
more powerful semantics to be added in the future.

-efried

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement][nova] Decision time on granular request groups for like resources

2018-04-19 Thread Eric Fried
gibi-

> Can the proximity param specify relationship between the un-numbered and
> the numbered groups as well or only between numbered groups?
> Besides that I'm +1 about proxyimity={isolate|any}

Remembering that the resources in the un-numbered group can be spread
around the tree and sharing providers...

If applying "isolate" to the un-numbered group means that each resource
you specify therein must be satisfied by a different provider, then you
should have just put those resources into numbered groups.

If "isolate" means that *none* of the numbered groups will land on *any*
of the providers satisfying the un-numbered group... that could be hard
to reason about, and I don't know if it's useful.

So thus far I've been thinking about all of these semantics only in
terms of the numbered groups (although Jay's `can_split` was
specifically aimed at the un-numbered group).

That being the case (is that a bikeshed on the horizon?) perhaps
`granular_policy={isolate|any}` is a more appropriate name than `proximity`.

-efried

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement][nova] Decision time on granular request groups for like resources

2018-04-19 Thread Eric Fried
Chris-

Thanks for this perspective.  I totally agree.

> * the common behavior should require the least syntax.

To that point, I had been assuming "any fit" was going to be more common
than "explicit anti-affinity".  But I think this is where we are having
trouble agreeing.  So since, as you point out, we're in the weeds to
begin with when talking about nested, IMO mriedem's suggestion (no
default, require behavior to be specified) is a reasonable compromise.

> it'll be okay. Let's not maintain this painful illusion that we're
> writing stone tablets.

This.  I, for one, was being totally guilty of that.

-efried

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement][nova] Decision time on granular request groups for like resources

2018-04-19 Thread Eric Fried
Sylvain-

> What's the default behaviour if we aren't providing the proximity qparam
> ? Isolate or any ?

What we've been talking about, per mriedem's suggestion, is that the
qparam is required when you specify any numbered request groups.  There
is no default.  If you don't provide the qparam, 400.

(Edge case: the qparam is meaningless if you only provide *one* numbered
request group - assuming it has no bearing on the un-numbered group.  In
that case omitting it might be acceptable... or 400 for consistency.)

-efried

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement][nova] Decision time on granular request groups for like resources

2018-04-19 Thread Eric Fried
Thanks to everyone who contributed to this discussion.  With just a
teeny bit more bikeshedding on the exact syntax [1], we landed on:

group_policy={none|isolate}

I have proposed this delta to the granular spec [2].

-efried

[1]
http://p.anticdent.org/logs/openstack-placement?dated=2018-04-19%2013:48:39.213790#a1c
[2] https://review.openstack.org/#/c/562687/

On 04/19/2018 07:38 AM, Balázs Gibizer wrote:
> 
> 
> On Thu, Apr 19, 2018 at 2:27 PM, Eric Fried  wrote:
>> gibi-
>>
>>>  Can the proximity param specify relationship between the un-numbered
>>> and
>>>  the numbered groups as well or only between numbered groups?
>>>  Besides that I'm +1 about proxyimity={isolate|any}
>>
>> Remembering that the resources in the un-numbered group can be spread
>> around the tree and sharing providers...
>>
>> If applying "isolate" to the un-numbered group means that each resource
>> you specify therein must be satisfied by a different provider, then you
>> should have just put those resources into numbered groups.
>>
>> If "isolate" means that *none* of the numbered groups will land on *any*
>> of the providers satisfying the un-numbered group... that could be hard
>> to reason about, and I don't know if it's useful.
>>
>> So thus far I've been thinking about all of these semantics only in
>> terms of the numbered groups (although Jay's `can_split` was
>> specifically aimed at the un-numbered group).
> 
> Thanks for the explanation. Now it make sense to me to limit the
> proximity param to the numbered groups.
> 
>>
>> That being the case (is that a bikeshed on the horizon?) perhaps
>> `granular_policy={isolate|any}` is a more appropriate name than
>> `proximity`.
> 
> The policy term is more general than proximity therefore the
> granular_policy=any query fragment isn't descriptive enough any more.
> 
> 
> gibi
> 
>>
>> -efried
>>
>> __
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild

2018-04-23 Thread Eric Fried
Semantically, GET /allocation_candidates where we don't actually want to
allocate anything (i.e. we don't want to use the returned candidates) is
goofy, and talking about what the result would look like when there's no
`resources` is going to spider into some weird questions.

Like what does the response payload look like?  In the "good" scenario,
you would be expecting an allocation_request like:

"allocations": {
$rp_uuid: {
"resources": {
# Nada
}
},
}

...which is something we discussed recently [1] in relation to "anchor"
providers, and killed.

No, the question you're really asking in this case is, "Do the resource
providers in this tree contain (or not contain) these traits?"  Which to
me, translates directly to:

 GET /resource_providers?in_tree=$rp_uuid&required={$TRAIT|!$TRAIT, ...}

...which we already support.  The answer is a list of providers. Compare
that to the providers from which resources are already allocated, and
Bob's your uncle.

(I do find it messy/weird that the required/forbidden traits in the
image meta are supposed to apply *anywhere* in the provider tree.  But I
get that that's probably going to make the most sense.)

[1]
http://lists.openstack.org/pipermail/openstack-dev/2018-April/129408.html

On 04/23/2018 02:48 PM, Matt Riedemann wrote:
> We seem to be at a bit of an impasse in this spec amendment [1] so I
> want to try and summarize the alternative solutions as I see them.
> 
> The overall goal of the blueprint is to allow defining traits via image
> properties, like flavor extra specs. Those image-defined traits are used
> to filter hosts during scheduling of the instance. During server create,
> that filtering happens during the normal "GET /allocation_candidates"
> call to placement.
> 
> The problem is during rebuild with a new image that specifies new
> required traits. A rebuild is not a move operation, but we run through
> the scheduler filters to make sure the new image (if one is specified),
> is valid for the host on which the instance is currently running.
> 
> We don't currently call "GET /allocation_candidates" during rebuild
> because that could inadvertently filter out the host we know we need
> [2]. Also, since flavors don't change for rebuild, we haven't had a need
> for getting allocation candidates during rebuild since we're not
> allocating new resources (pretend bug 1763766 [3] does not exist for now).
> 
> Now that we know the problem, here are some of the solutions that have
> been discussed in the spec amendment, again, only for rebuild with a new
> image that has new traits:
> 
> 1. Fail in the API saying you can't rebuild with a new image with new
> required traits.
> 
> Pros:
> 
> - Simple way to keep the new image off a host that doesn't support it.
> - Similar solution to volume-backed rebuild with a new image.
> 
> Cons:
> 
> - Confusing user experience since they might be able to rebuild with
> some new images but not others with no clear explanation about the
> difference.
> 
> 2. Have the ImagePropertiesFilter call "GET
> /resource_providers/{rp_uuid}/traits" and compare the compute node root
> provider traits against the new image's required traits.
> 
> Pros:
> 
> - Avoids having to call "GET /allocation_candidates" during rebuild.
> - Simple way to compare the required image traits against the compute
> node provider traits.
> 
> Cons:
> 
> - Does not account for nested providers so the scheduler could reject
> the image due to its required traits which actually apply to a nested
> provider in the tree. This is somewhat related to bug 1763766.
> 
> 3. Slight variation on #2 except build a set of all traits from all
> providers in the same tree.
> 
> Pros:
> 
> - Handles the nested provider traits issue from #2.
> 
> Cons:
> 
> - Duplicates filtering in ImagePropertiesFilter that could otherwise
> happen in "GET /allocation_candidates".
> 
> 4. Add a microversion to change "GET /allocation_candidates" to make two
> changes:
> 
> a) Add an "in_tree" filter like in "GET /resource_providers". This would
> be needed to limit the scope of what gets returned since we know we only
> want to check against one specific host (the current host for the
> instance).
> 
> b) Make "resources" optional since on a rebuild we don't want to
> allocate new resources (again, notwithstanding bug 1763766).
> 
> Pros:
> 
> - We can call "GET /allocation_candidates?in_tree= UUID>&required=" and if nothing is returned,
> we know the new image's required traits don't work with the current node.
> - The filtering is baked into "GET /allocation_candidates" and not
> client-side in ImagePropertiesFilter.
> 
> Cons:
> 
> - Changes to the "GET /allocation_candidates" API which is going to be
> more complicated and more up-front work, but I don't have a good idea of
> how hard this would be to add since we already have the same "in_tree"
> logic in "GET

Re: [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild

2018-04-23 Thread Eric Fried
Following the discussion on IRC, here's what I think you need to do:

- Assuming the set of traits from your new image is called image_traits...
- Use GET /allocations/{instance_uuid} and pull out the set of all RP
UUIDs.  Let's call this instance_rp_uuids.
- Use the SchedulerReportClient.get_provider_tree_and_ensure_root method
[1] to populate and return the ProviderTree for the host.  (If we're
uncomfortable about the `ensure_root` bit, we can factor that away.)
Call this ptree.
- Collect all the traits in the RPs you've got allocated to your instance:

 traits_in_instance_rps = set()
 for rp_uuid in instance_rp_uuids:
 traits_in_instance_rps.update(ptree.data(rp_uuid).traits)

- See if any of your image traits are *not* in those RPs.

 missing_traits = image_traits - traits_in_instance_rps

- If there were any, it's a no go.

 if missing_traits:
 FAIL(_("The following traits were in the image but not in the
instance's RPs: %s") % ', '.join(missing_traits))

[1]
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L986

On 04/23/2018 03:47 PM, Matt Riedemann wrote:
> On 4/23/2018 3:26 PM, Eric Fried wrote:
>> No, the question you're really asking in this case is, "Do the resource
>> providers in this tree contain (or not contain) these traits?"  Which to
>> me, translates directly to:
>>
>>   GET /resource_providers?in_tree=$rp_uuid&required={$TRAIT|!$TRAIT, ...}
>>
>> ...which we already support.  The answer is a list of providers. Compare
>> that to the providers from which resources are already allocated, and
>> Bob's your uncle.
> 
> OK and that will include filtering the required traits on nested
> providers in that tree rather than just against the root provider? If
> so, then yeah that sounds like an improvement on option 2 or 3 in my
> original email and resolves the issue without having to call (or change)
> "GET /allocation_candidates". I still think it should happen from within
> ImagePropertiesFilter, but that's an implementation detail.
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild

2018-04-23 Thread Eric Fried
> for the GET
> /resource_providers?in_tree=&required=, nested
> resource providers and allocation pose a problem see #3 above.

This *would* work as a quick up-front check as Jay described (if you get
no results from this, you know that at least one of your image traits
doesn't exist anywhere in the tree) except that it doesn't take sharing
providers into account :(

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild

2018-04-24 Thread Eric Fried
> The problem isn't just checking the traits in the nested resource
> provider. We also need to ensure the trait in the exactly same child
> resource provider.

No, we can't get "granular" with image traits.  We accepted this as a
limitation for the spawn aspect of this spec [1], for all the same
reasons [2].  And by the time we've spawned the instance, we've lost the
information about which granular request groups (from the flavor) were
satisfied by which resources - retrofitting that information from a new
image would be even harder.  So we need to accept the same limitation
for rebuild.

[1] "Due to the difficulty of attempting to reconcile granular request
groups between an image and a flavor, only the (un-numbered) trait group
is supported. The traits listed there are merged with those of the
un-numbered request group from the flavor."
(http://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/glance-image-traits.html#proposed-change)
[2]
https://review.openstack.org/#/c/554305/2/specs/rocky/approved/glance-image-traits.rst@86

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild

2018-04-24 Thread Eric Fried
Alex-

On 04/24/2018 09:21 AM, Alex Xu wrote:
> 
> 
> 2018-04-24 20:53 GMT+08:00 Eric Fried  <mailto:openst...@fried.cc>>:
> 
> > The problem isn't just checking the traits in the nested resource
> > provider. We also need to ensure the trait in the exactly same child
> > resource provider.
> 
> No, we can't get "granular" with image traits.  We accepted this as a
> limitation for the spawn aspect of this spec [1], for all the same
> reasons [2].  And by the time we've spawned the instance, we've lost the
> information about which granular request groups (from the flavor) were
> satisfied by which resources - retrofitting that information from a new
> image would be even harder.  So we need to accept the same limitation
> for rebuild.
> 
> [1] "Due to the difficulty of attempting to reconcile granular request
> groups between an image and a flavor, only the (un-numbered) trait group
> is supported. The traits listed there are merged with those of the
> un-numbered request group from the flavor."
> 
> (http://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/glance-image-traits.html#proposed-change
> 
> <http://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/glance-image-traits.html#proposed-change>)
> [2]
> 
> https://review.openstack.org/#/c/554305/2/specs/rocky/approved/glance-image-traits.rst@86
> 
> <https://review.openstack.org/#/c/554305/2/specs/rocky/approved/glance-image-traits.rst@86>
> 
> 
> Why we can return a RP which has a specific trait but we won't consume
> any resources on it?
> If the case is that we request two VFs, and this two VFs have different
> required traits, then that should be granular request.

We don't care about RPs we're not consuming resources from.  Forget
rebuild - if the image used for the original spawn request has traits
pertaining to VFs, we folded those traits into the un-numbered request
group, which means the VF resources would have needed to be in the
un-numbered request group in the flavor as well.  That was the
limitation discussed at [2]: trying to correlate granular groups from an
image to granular groups in a trait would require nontrivial invention
beyond what we're willing to do at this point.  So we're limited at
spawn time to VFs (or whatever) where we can't tell which trait belongs
to which.  The best we can do is ensure that the end result of the
un-numbered request group will collectively satisfy all the traits from
the image.  And this same limitation exists, for the same reasons, on
rebuild.  It even goes a bit further, because if there are *other* VFs
(or whatever) that came from numbered groups in the original request, we
have no way to know that; so if *those* guys have traits required by the
new image, we'll still pass.  Which is almost certainly okay.

-efried

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild

2018-05-03 Thread Eric Fried
>> verify with placement
>> whether the image traits requested are 1) supported by the compute
>> host the instance is residing on and 2) coincide with the
>> already-existing allocations.

Note that #2 is a subset of #1.  The only potential advantage of
including #1 is efficiency: We can do #1 in one API call and bail early
if it fails; but if it passes, we have to do #2 anyway, which is
multiple steps.  So would we rather save one step in the "good path" or
potentially N-1 steps in the failure case?  IMO the cost of the
additional dev/test to implement #1 is higher than that of the potential
extra API calls.  (TL;DR: just implement #2.)

-efried

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Extra feature of vCPU allocation on demands

2018-05-07 Thread Eric Fried
I will be interested to watch this develop.  In PowerVM we already have
shared vs. dedicated processors [1] along with concepts like capped vs.
uncapped, min/max proc units, weights, etc.  But obviously it's all
heavily customized to be PowerVM-specific.  If these concepts made their
way into mainstream Nova, we could hopefully adapt to use them and
remove some tech debt.

[1]
https://github.com/openstack/nova/blob/master/nova/virt/powervm/vm.py#L372-L401

On 05/07/2018 04:55 AM, 倪蔚辰 wrote:
> Hi, all
> 
> I would like to propose a blueprint (not proposed yet), which is related
> to openstack nova. I hope to have some comments by explaining my idea
> through this e-mail. Please contact me if anyone has any comment.
> 
>  
> 
> Background
> 
> Under current OpenStack, vCPUs assigned to each VM can be configured as
> dedicated or shared. In some scenarios, such as deploying Radio Access
> Network VNF, the VM is required to have dedicated vCPUs to insure the
> performance. However, in that case, each VM has a vCPU to do Guest OS
> housekeeping. Usually, this vCPU is not a high performance required vCPU
> and do not take high percentage of dedicated vCPU utilization. There is
> some vCPU resources waste.
> 
>  
> 
> Proposed feature
> 
> I hope to add an extra feature to flavor extra specs. It refers to how
> many dedicated vCPUs and how many shared vCPUs are needed for the VM.
> When VM requires vCPU, OpenStack allocates vCPUs on demands. In the
> background scenario, this idea can save many dedicated vCPUs which take
> Guest OS housekeeping. And the scenario stated above is only one use
> case for the feature. This feature potentially allows user to have more
> flexible VM design to save CPU resource.
> 
>  
> 
> Thanks.
> 
>  
> 
> Weichen
> 
> e-mail: niweic...@chinamobile.com
> 
>  
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cyborg] [Nova] Cyborg traits

2018-05-30 Thread Eric Fried
This all sounds fully reasonable to me.  One thing, though...

>>   * There is a resource class per device category e.g.
>> CUSTOM_ACCELERATOR_GPU, CUSTOM_ACCELERATOR_FPGA.

Let's propose standard resource classes for these ASAP.

https://github.com/openstack/nova/blob/d741f624c81baf89fc8b6b94a2bc20eb5355a818/nova/rc_fields.py

-efried
.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cyborg] [Nova] Cyborg traits

2018-05-31 Thread Eric Fried
Yup.  I'm sure reviewers will bikeshed the names, but the review is the
appropriate place for that to happen.

A couple of test changes will also be required.  You can have a look at
[1] as an example to follow.

-efried

[1] https://review.openstack.org/#/c/511180/

On 05/31/2018 01:02 AM, Nadathur, Sundar wrote:
> On 5/30/2018 1:18 PM, Eric Fried wrote:
>> This all sounds fully reasonable to me.  One thing, though...
>>
>>>>    * There is a resource class per device category e.g.
>>>>  CUSTOM_ACCELERATOR_GPU, CUSTOM_ACCELERATOR_FPGA.
>> Let's propose standard resource classes for these ASAP.
>>
>> https://github.com/openstack/nova/blob/d741f624c81baf89fc8b6b94a2bc20eb5355a818/nova/rc_fields.py
>>
>>
>> -efried
> Makes sense, Eric. The obvious names would be ACCELERATOR_GPU and
> ACCELERATOR_FPGA. Do we just submit a patch to rc_fields.py?
> 
> Thanks,
> Sundar
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-05-31 Thread Eric Fried
This seems reasonable, but...

On 05/31/2018 04:34 AM, Balázs Gibizer wrote:
> 
> 
> On Thu, May 31, 2018 at 11:10 AM, Sylvain Bauza  wrote:
>>>
>>
>> After considering the whole approach, discussing with a couple of
>> folks over IRC, here is what I feel the best approach for a seamless
>> upgrade :
>>  - VGPU inventory will be kept on root RP (for the first type) in
>> Queens so that a compute service upgrade won't impact the DB
>>  - during Queens, operators can run a DB online migration script (like
-^^
Did you mean Rocky?

>> the ones we currently have in
>> https://github.com/openstack/nova/blob/c2f42b0/nova/cmd/manage.py#L375) that
>> will create a new resource provider for the first type and move the
>> inventory and allocations to it.
>>  - it's the responsibility of the virt driver code to check whether a
>> child RP with its name being the first type name already exists to
>> know whether to update the inventory against the root RP or the child RP.
>>
>> Does it work for folks ?
> 
> +1 works for me
> gibi
> 
>> PS : we already have the plumbing in place in nova-manage and we're
>> still managing full Nova resources. I know we plan to move Placement
>> out of the nova tree, but for the Rocky timeframe, I feel we can
>> consider nova-manage as the best and quickiest approach for the data
>> upgrade.
>>
>> -Sylvain
>>
>>
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-05-31 Thread Eric Fried
> 1. Make everything perform the pivot on compute node start (which can be
>re-used by a CLI tool for the offline case)
> 2. Make everything default to non-nested inventory at first, and provide
>a way to migrate a compute node and its instances one at a time (in
>place) to roll through.

I agree that it sure would be nice to do ^ rather than requiring the
"slide puzzle" thing.

But how would this be accomplished, in light of the current "separation
of responsibilities" drawn at the virt driver interface, whereby the
virt driver isn't supposed to talk to placement directly, or know
anything about allocations?  Here's a first pass:

The virt driver, via the return value from update_provider_tree, tells
the resource tracker that "inventory of resource class A on provider B
have moved to provider C" for all applicable AxBxC.  E.g.

[ { 'from_resource_provider': ,
'moved_resources': [VGPU: 4],
'to_resource_provider': 
  },
  { 'from_resource_provider': ,
'moved_resources': [VGPU: 4],
'to_resource_provider': 
  },
  { 'from_resource_provider': ,
'moved_resources': [
SRIOV_NET_VF: 2,
NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND: 1000,
NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND: 1000,
],
'to_resource_provider': 
  }
]

As today, the resource tracker takes the updated provider tree and
invokes [1] the report client method update_from_provider_tree [2] to
flush the changes to placement.  But now update_from_provider_tree also
accepts the return value from update_provider_tree and, for each "move":

- Creates provider C (as described in the provider_tree) if it doesn't
already exist.
- Creates/updates provider C's inventory as described in the
provider_tree (without yet updating provider B's inventory).  This ought
to create the inventory of resource class A on provider C.
- Discovers allocations of rc A on rp B and POSTs to move them to rp C*.
- Updates provider B's inventory.

(*There's a hole here: if we're splitting a glommed-together inventory
across multiple new child providers, as the VGPUs in the example, we
don't know which allocations to put where.  The virt driver should know
which instances own which specific inventory units, and would be able to
report that info within the data structure.  That's getting kinda close
to the virt driver mucking with allocations, but maybe it fits well
enough into this model to be acceptable?)

Note that the return value from update_provider_tree is optional, and
only used when the virt driver is indicating a "move" of this ilk.  If
it's None/[] then the RT/update_from_provider_tree flow is the same as
it is today.

If we can do it this way, we don't need a migration tool.  In fact, we
don't even need to restrict provider tree "reshaping" to release
boundaries.  As long as the virt driver understands its own data model
migrations and reports them properly via update_provider_tree, it can
shuffle its tree around whenever it wants.

Thoughts?

-efried

[1]
https://github.com/openstack/nova/blob/8753c9a38667f984d385b4783c3c2fc34d7e8e1b/nova/compute/resource_tracker.py#L890
[2]
https://github.com/openstack/nova/blob/8753c9a38667f984d385b4783c3c2fc34d7e8e1b/nova/scheduler/client/report.py#L1341

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-05-31 Thread Eric Fried
Rats, typo correction below.

On 05/31/2018 01:26 PM, Eric Fried wrote:
>> 1. Make everything perform the pivot on compute node start (which can be
>>re-used by a CLI tool for the offline case)
>> 2. Make everything default to non-nested inventory at first, and provide
>>a way to migrate a compute node and its instances one at a time (in
>>place) to roll through.
> 
> I agree that it sure would be nice to do ^ rather than requiring the
> "slide puzzle" thing.
> 
> But how would this be accomplished, in light of the current "separation
> of responsibilities" drawn at the virt driver interface, whereby the
> virt driver isn't supposed to talk to placement directly, or know
> anything about allocations?  Here's a first pass:
> 
> The virt driver, via the return value from update_provider_tree, tells
> the resource tracker that "inventory of resource class A on provider B
> have moved to provider C" for all applicable AxBxC.  E.g.
> 
> [ { 'from_resource_provider': ,
> 'moved_resources': [VGPU: 4],
> 'to_resource_provider': 
>   },
>   { 'from_resource_provider': ,
> 'moved_resources': [VGPU: 4],
> 'to_resource_provider': 
>   },
>   { 'from_resource_provider': ,
> 'moved_resources': [
> SRIOV_NET_VF: 2,
> NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND: 1000,
> NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND: 1000,
> ],
> 'to_resource_provider': 
---
s/gpu_rp2_uuid/sriovnic_rp_uuid/ or similar.

>   }
> ]
> 
> As today, the resource tracker takes the updated provider tree and
> invokes [1] the report client method update_from_provider_tree [2] to
> flush the changes to placement.  But now update_from_provider_tree also
> accepts the return value from update_provider_tree and, for each "move":
> 
> - Creates provider C (as described in the provider_tree) if it doesn't
> already exist.
> - Creates/updates provider C's inventory as described in the
> provider_tree (without yet updating provider B's inventory).  This ought
> to create the inventory of resource class A on provider C.
> - Discovers allocations of rc A on rp B and POSTs to move them to rp C*.
> - Updates provider B's inventory.
> 
> (*There's a hole here: if we're splitting a glommed-together inventory
> across multiple new child providers, as the VGPUs in the example, we
> don't know which allocations to put where.  The virt driver should know
> which instances own which specific inventory units, and would be able to
> report that info within the data structure.  That's getting kinda close
> to the virt driver mucking with allocations, but maybe it fits well
> enough into this model to be acceptable?)
> 
> Note that the return value from update_provider_tree is optional, and
> only used when the virt driver is indicating a "move" of this ilk.  If
> it's None/[] then the RT/update_from_provider_tree flow is the same as
> it is today.
> 
> If we can do it this way, we don't need a migration tool.  In fact, we
> don't even need to restrict provider tree "reshaping" to release
> boundaries.  As long as the virt driver understands its own data model
> migrations and reports them properly via update_provider_tree, it can
> shuffle its tree around whenever it wants.
> 
> Thoughts?
> 
> -efried
> 
> [1]
> https://github.com/openstack/nova/blob/8753c9a38667f984d385b4783c3c2fc34d7e8e1b/nova/compute/resource_tracker.py#L890
> [2]
> https://github.com/openstack/nova/blob/8753c9a38667f984d385b4783c3c2fc34d7e8e1b/nova/scheduler/client/report.py#L1341
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-05-31 Thread Eric Fried
Chris-

>> virt driver isn't supposed to talk to placement directly, or know
>> anything about allocations?
> 
> For sake of discussion, how much (if any) easier would it be if we
> got rid of this restriction?

At this point, having implemented the update_[from_]provider_tree flow
as we have, it would probably make things harder.  We still have to do
the same steps, but any bits we wanted to let the virt driver handle
would need some kind of weird callback dance.

But even if we scrapped update_[from_]provider_tree and redesigned from
first principles, virt drivers would have a lot of duplication of the
logic that currently resides in update_from_provider_tree.

So even though the restriction seems to make things awkward, having been
embroiled in this code as I have, I'm actually seeing how it keeps
things as clean and easy to reason about as can be expected for
something that's inherently as complicated as this.

>> the resource tracker that "inventory of resource class A on provider B
>> have moved to provider C" for all applicable AxBxC.  E.g.
> 
> traits too?

The traits are part of the updated provider tree itself.  The existing
logic in update_from_provider_tree handles shuffling those around.  I
don't think the RT needs to be told about any specific trait movement in
order to reason about moving allocations.  Do you see something I'm
missing there?

> The fact that we are using what amounts to a DSL to pass
> some additional instruction back from the virt driver feels squiffy

Yeah, I don't disagree.  The provider_tree object, and updating it via
update_provider_tree, is kind of a DSL already.  The list-of-dicts
format is just a strawman; we could make it an object or whatever (not
that that would make it less DSL-ish).

Perhaps an OVO :P

-efried
.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-06-01 Thread Eric Fried
Sylvain-

On 05/31/2018 02:41 PM, Sylvain Bauza wrote:
> 
> 
> On Thu, May 31, 2018 at 8:26 PM, Eric Fried  <mailto:openst...@fried.cc>> wrote:
> 
> > 1. Make everything perform the pivot on compute node start (which can be
> >    re-used by a CLI tool for the offline case)
> > 2. Make everything default to non-nested inventory at first, and provide
> >    a way to migrate a compute node and its instances one at a time (in
> >    place) to roll through.
> 
> I agree that it sure would be nice to do ^ rather than requiring the
> "slide puzzle" thing.
> 
> But how would this be accomplished, in light of the current "separation
> of responsibilities" drawn at the virt driver interface, whereby the
> virt driver isn't supposed to talk to placement directly, or know
> anything about allocations?  Here's a first pass:
> 
> 
> 
> What we usually do is to implement either at the compute service level
> or at the virt driver level some init_host() method that will reconcile
> what you want.
> For example, we could just imagine a non-virt specific method (and I
> like that because it's non-virt specific) - ie. called by compute's
> init_host() that would lookup the compute root RP inventories, see
> whether one ore more inventories tied to specific resource classes have
> to be moved from the root RP and be attached to a child RP.
> The only subtility that would require a virt-specific update would be
> the name of the child RP (as both Xen and libvirt plan to use the child
> RP name as the vGPU type identifier) but that's an implementation detail
> that a possible virt driver update by the resource tracker would
> reconcile that.

The question was rhetorical; my suggestion (below) was an attempt at
designing exactly what you've described.  Let me know if I can
explain/clarify it further.  I'm looking for feedback as to whether it's
a viable approach.

> The virt driver, via the return value from update_provider_tree, tells
> the resource tracker that "inventory of resource class A on provider B
> have moved to provider C" for all applicable AxBxC.  E.g.
> 
> [ { 'from_resource_provider': ,
>     'moved_resources': [VGPU: 4],
>     'to_resource_provider': 
>   },
>   { 'from_resource_provider': ,
>     'moved_resources': [VGPU: 4],
>     'to_resource_provider': 
>   },
>   { 'from_resource_provider': ,
>     'moved_resources': [
>         SRIOV_NET_VF: 2,
>         NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND: 1000,
>         NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND: 1000,
>     ],
>     'to_resource_provider': 
>   }
> ]
> 
> As today, the resource tracker takes the updated provider tree and
> invokes [1] the report client method update_from_provider_tree [2] to
> flush the changes to placement.  But now update_from_provider_tree also
> accepts the return value from update_provider_tree and, for each "move":
> 
> - Creates provider C (as described in the provider_tree) if it doesn't
> already exist.
> - Creates/updates provider C's inventory as described in the
> provider_tree (without yet updating provider B's inventory).  This ought
> to create the inventory of resource class A on provider C.
> - Discovers allocations of rc A on rp B and POSTs to move them to rp C*.
> - Updates provider B's inventory.
> 
> (*There's a hole here: if we're splitting a glommed-together inventory
> across multiple new child providers, as the VGPUs in the example, we
> don't know which allocations to put where.  The virt driver should know
> which instances own which specific inventory units, and would be able to
> report that info within the data structure.  That's getting kinda close
> to the virt driver mucking with allocations, but maybe it fits well
> enough into this model to be acceptable?)
> 
> Note that the return value from update_provider_tree is optional, and
> only used when the virt driver is indicating a "move" of this ilk.  If
> it's None/[] then the RT/update_from_provider_tree flow is the same as
> it is today.
> 
> If we can do it this way, we don't need a migration tool.  In fact, we
> don't even need to restrict provider tree "reshaping" to release
> boundaries.  As long as the virt driver understands its own data model
> migrations and reports them properly via update_prov

Re: [openstack-dev] [Cyborg] [Nova] Backup plan without nested RPs

2018-06-04 Thread Eric Fried
Sundar-

We've been discussing the upgrade path on another thread [1] and are
working toward a solution [2][3] that would not require downtime or
special scripts (other than whatever's normally required for an upgrade).

We still hope to have all of that ready for Rocky, but if you're
concerned about timing, this work should make it a viable option for you
to start out modeling everything in the compute RP as you say, and then
move it over later.

Thanks,
Eric

[1] http://lists.openstack.org/pipermail/openstack-dev/2018-May/130783.html
[2] http://lists.openstack.org/pipermail/openstack-dev/2018-June/131045.html
[3] https://etherpad.openstack.org/p/placement-migrate-operations

On 06/04/2018 12:49 PM, Nadathur, Sundar wrote:
> Hi,
>  Cyborg needs to create RCs and traits for accelerators. The
> original plan was to do that with nested RPs. To avoid rushing the Nova
> developers, I had proposed that Cyborg could start by applying the
> traits to the compute node RP, and accept the resulting caveats for
> Rocky, till we get nested RP support. That proposal did not find many
> takers, and Cyborg has essentially been in waiting mode.
> 
> Since it is June already, and there is a risk of not delivering anything
> meaningful in Rocky, I am reviving my older proposal, which is
> summarized as below:
> 
>   * Cyborg shall create the RCs and traits as per spec
> (https://review.openstack.org/#/c/554717/), both in Rocky and
> beyond. Only the RPs will change post-Rocky.
>   * In Rocky:
>   o Cyborg will not create nested RPs. It shall apply the device
> traits to the compute node RP.
>   o Cyborg will document the resulting caveat, i.e., all devices in
> the same host should have the same traits. In particular, we
> cannot have a GPU and a FPGA, or 2 FPGAs of different types, in
> the same host.
>   o Cyborg will document that upgrades to post-Rocky releases will
> require operator intervention (as described below).
>   *  For upgrade to post-Rocky world with nested RPs:
>   o The operator needs to stop all running instances that use an
> accelerator.
>   o The operator needs to run a script that removes the Cyborg
> traits and the inventory for Cyborg RCs from compute node RPs.
>   o The operator can then perform the upgrade. The new Cyborg
> agent/driver(s) shall created nested RPs and publish
> inventory/traits as specified.
> 
> IMHO, it is acceptable for Cyborg to do this because it is new and we
> can set expectations for the (lack of) upgrade plan. The alternative is
> that potentially no meaningful use cases get addressed in Rocky for Cyborg.
> 
> Please LMK what you think.
> 
> Regards,
> Sundar
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-06-04 Thread Eric Fried
There has been much discussion.  We've gotten to a point of an initial
proposal and are ready for more (hopefully smaller, hopefully
conclusive) discussion.

To that end, there will be a HANGOUT tomorrow (TUESDAY, JUNE 5TH) at
1500 UTC.  Be in #openstack-placement to get the link to join.

The strawpeople outlined below and discussed in the referenced etherpad
have been consolidated/distilled into a new etherpad [1] around which
the hangout discussion will be centered.

[1] https://etherpad.openstack.org/p/placement-making-the-(up)grade

Thanks,
efried

On 06/01/2018 01:12 PM, Jay Pipes wrote:
> On 05/31/2018 02:26 PM, Eric Fried wrote:
>>> 1. Make everything perform the pivot on compute node start (which can be
>>>     re-used by a CLI tool for the offline case)
>>> 2. Make everything default to non-nested inventory at first, and provide
>>>     a way to migrate a compute node and its instances one at a time (in
>>>     place) to roll through.
>>
>> I agree that it sure would be nice to do ^ rather than requiring the
>> "slide puzzle" thing.
>>
>> But how would this be accomplished, in light of the current "separation
>> of responsibilities" drawn at the virt driver interface, whereby the
>> virt driver isn't supposed to talk to placement directly, or know
>> anything about allocations?
> FWIW, I don't have a problem with the virt driver "knowing about
> allocations". What I have a problem with is the virt driver *claiming
> resources for an instance*.
> 
> That's what the whole placement claims resources things was all about,
> and I'm not interested in stepping back to the days of long racy claim
> operations by having the compute nodes be responsible for claiming
> resources.
> 
> That said, once the consumer generation microversion lands [1], it
> should be possible to *safely* modify an allocation set for a consumer
> (instance) and move allocation records for an instance from one provider
> to another.
> 
> [1] https://review.openstack.org/#/c/565604/
> 
>> Here's a first pass:
>>
>> The virt driver, via the return value from update_provider_tree, tells
>> the resource tracker that "inventory of resource class A on provider B
>> have moved to provider C" for all applicable AxBxC.  E.g.
>>
>> [ { 'from_resource_provider': ,
>>  'moved_resources': [VGPU: 4],
>>  'to_resource_provider': 
>>    },
>>    { 'from_resource_provider': ,
>>  'moved_resources': [VGPU: 4],
>>  'to_resource_provider': 
>>    },
>>    { 'from_resource_provider': ,
>>  'moved_resources': [
>>  SRIOV_NET_VF: 2,
>>  NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND: 1000,
>>  NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND: 1000,
>>  ],
>>  'to_resource_provider': 
>>    }
>> ]
>>
>> As today, the resource tracker takes the updated provider tree and
>> invokes [1] the report client method update_from_provider_tree [2] to
>> flush the changes to placement.  But now update_from_provider_tree also
>> accepts the return value from update_provider_tree and, for each "move":
>>
>> - Creates provider C (as described in the provider_tree) if it doesn't
>> already exist.
>> - Creates/updates provider C's inventory as described in the
>> provider_tree (without yet updating provider B's inventory).  This ought
>> to create the inventory of resource class A on provider C.
> 
> Unfortunately, right here you'll introduce a race condition. As soon as
> this operation completes, the scheduler will have the ability to throw
> new instances on provider C and consume the inventory from it that you
> intend to give to the existing instance that is consuming from provider B.
> 
>> - Discovers allocations of rc A on rp B and POSTs to move them to rp C*.
> 
> For each consumer of resources on rp B, right?
> 
>> - Updates provider B's inventory.
> 
> Again, this is problematic because the scheduler will have already begun
> to place new instances on B's inventory, which could very well result in
> incorrect resource accounting on the node.
> 
> We basically need to have one giant new REST API call that accepts the
> list of "move instructions" and performs all of the instructions in a
> single transaction. :(
> 
>> (*There's a hole here: if we're splitting a glommed-together inventory
>> across multiple new child providers, as the VGPUs in the example, we
>> don't know w

Re: [openstack-dev] [Cyborg] [Nova] Backup plan without nested RPs

2018-06-05 Thread Eric Fried
To summarize: cyborg could model things nested-wise, but there would be
no way to schedule them yet.

Couple of clarifications inline.

On 06/05/2018 08:29 AM, Jay Pipes wrote:
> On 06/05/2018 08:50 AM, Stephen Finucane wrote:
>> I thought nested resource providers were already supported by
>> placement? To the best of my knowledge, what is /not/ supported is
>> virt drivers using these to report NUMA topologies but I doubt that
>> affects you. The placement guys will need to weigh in on this as I
>> could be missing something but it sounds like you can start using this
>> functionality right now.
> 
> To be clear, this is what placement and nova *currently* support with
> regards to nested resource providers:
> 
> 1) When creating a resource provider in placement, you can specify a
> parent_provider_uuid and thus create trees of providers. This was
> placement API microversion 1.14. Also included in this microversion was
> support for displaying the parent and root provider UUID for resource
> providers.
> 
> 2) The nova "scheduler report client" (terrible name, it's mostly just
> the placement client at this point) understands how to call placement
> API 1.14 and create resource providers with a parent provider.
> 
> 3) The nova scheduler report client uses a ProviderTree object [1] to
> cache information about the hierarchy of providers that it knows about.
> For nova-compute workers managing hypervisors, that means the
> ProviderTree object contained in the report client is rooted in a
> resource provider that represents the compute node itself (the
> hypervisor). For nova-compute workers managing baremetal, that means the
> ProviderTree object contains many root providers, each representing an
> Ironic baremetal node.
> 
> 4) The placement API's GET /allocation_candidates endpoint now
> understands the concept of granular request groups [2]. Granular request
> groups are only relevant when a user wants to specify that child
> providers in a provider tree should be used to satisfy part of an
> overall scheduling request. However, this support is yet incomplete --
> see #5 below.

Granular request groups are also usable/useful when sharing providers
are in play. That functionality is complete on both the placement side
and the report client side (see below).

> The following parts of the nested resource providers modeling are *NOT*
> yet complete, however:
> 
> 5) GET /allocation_candidates does not currently return *results* when
> granular request groups are specified. So, while the placement service
> understands the *request* for granular groups, it doesn't yet have the
> ability to constrain the returned candidates appropriately. Tetsuro is
> actively working on this functionality in this patch series:
> 
> https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/nested-resource-providers-allocation-candidates
> 
> 
> 6) The virt drivers need to implement the update_provider_tree()
> interface [3] and construct the tree of resource providers along with
> appropriate inventory records for each child provider in the tree. Both
> libvirt and XenAPI virt drivers have patch series up that begin to take
> advantage of the nested provider modeling. However, a number of concerns
> [4] about in-place nova-compute upgrades when moving from a single
> resource provider to a nested provider tree model were raised, and we
> have begun brainstorming how to handle the migration of existing data in
> the single-provider model to the nested provider model. [5] We are
> blocking any reviews on patch series that modify the local provider
> modeling until these migration concerns are fully resolved.
> 
> 7) The scheduler does not currently pass granular request groups to
> placement.

The code is in place to do this [6] - so the scheduler *will* pass
granular request groups to placement if your flavor specifies them.  As
noted above, such flavors will be limited to exploiting sharing
providers until Tetsuro's series merges.  But no further code work is
required on the scheduler side.

[6] https://review.openstack.org/#/c/515811/

> Once #5 and #6 are resolved, and once the migration/upgrade
> path is resolved, clearly we will need to have the scheduler start
> making requests to placement that represent the granular request groups
> and have the scheduler pass the resulting allocation candidates to its
> filters and weighers.
> 
> Hope this helps highlight where we currently are and the work still left
> to do (in Rocky) on nested resource providers.
> 
> Best,
> -jay
> 
> 
> [1]
> https://github.com/openstack/nova/blob/master/nova/compute/provider_tree.py
> 
> [2]
> https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/granular-resource-requests.html
> 
> 
> [3]
> https://github.com/openstack/nova/blob/f902e0d5d87fb05207e4a7aca73d185775d43df2/nova/virt/driver.py#L833
> 
> 
> [4] http://lists.openstack.org/pipermail/openstack-dev/2018-May/130783.html
> 
> [5] https://

Re: [openstack-dev] [Cyborg] [Nova] Backup plan without nested RPs

2018-06-05 Thread Eric Fried
Alex-

Allocations for an instance are pulled down by the compute manager and
passed into the virt driver's spawn method since [1].  An allocation
comprises a consumer, provider, resource class, and amount.  Once we can
schedule to trees, the allocations pulled down by the compute manager
will span the tree as appropriate.  So in that sense, yes, nova-compute
knows which amounts of which resource classes come from which providers.

However, if you're asking about the situation where we have two
different allocations of the same resource class coming from two
separate providers: Yes, we can still tell which RCxAMOUNT is associated
with which provider; but No, we still have no inherent way to correlate
a specific one of those allocations with the part of the *request* it
came from.  If just the provider UUID isn't enough for the virt driver
to figure out what to do, it may have to figure it out by looking at the
flavor (and/or image metadata), inspecting the traits on the providers
associated with the allocations, etc.  (The theory here is that, if the
virt driver can't tell the difference at that point, then it actually
doesn't matter.)

[1] https://review.openstack.org/#/c/511879/

On 06/05/2018 09:05 AM, Alex Xu wrote:
> Maybe I missed something. Is there anyway the nova-compute can know the
> resources are allocated from which child resource provider? For example,
> the host has two PFs. The request is asking one VF, then the
> nova-compute needs to know the VF is allocated from which PF (resource
> provider). As my understand, currently we only return a list of
> alternative resource provider to the nova-compute, those alternative is
> root resource provider.
> 
> 2018-06-05 21:29 GMT+08:00 Jay Pipes  >:
> 
> On 06/05/2018 08:50 AM, Stephen Finucane wrote:
> 
> I thought nested resource providers were already supported by
> placement? To the best of my knowledge, what is /not/ supported
> is virt drivers using these to report NUMA topologies but I
> doubt that affects you. The placement guys will need to weigh in
> on this as I could be missing something but it sounds like you
> can start using this functionality right now.
> 
> 
> To be clear, this is what placement and nova *currently* support
> with regards to nested resource providers:
> 
> 1) When creating a resource provider in placement, you can specify a
> parent_provider_uuid and thus create trees of providers. This was
> placement API microversion 1.14. Also included in this microversion
> was support for displaying the parent and root provider UUID for
> resource providers.
> 
> 2) The nova "scheduler report client" (terrible name, it's mostly
> just the placement client at this point) understands how to call
> placement API 1.14 and create resource providers with a parent provider.
> 
> 3) The nova scheduler report client uses a ProviderTree object [1]
> to cache information about the hierarchy of providers that it knows
> about. For nova-compute workers managing hypervisors, that means the
> ProviderTree object contained in the report client is rooted in a
> resource provider that represents the compute node itself (the
> hypervisor). For nova-compute workers managing baremetal, that means
> the ProviderTree object contains many root providers, each
> representing an Ironic baremetal node.
> 
> 4) The placement API's GET /allocation_candidates endpoint now
> understands the concept of granular request groups [2]. Granular
> request groups are only relevant when a user wants to specify that
> child providers in a provider tree should be used to satisfy part of
> an overall scheduling request. However, this support is yet
> incomplete -- see #5 below.
> 
> The following parts of the nested resource providers modeling are
> *NOT* yet complete, however:
> 
> 5) GET /allocation_candidates does not currently return *results*
> when granular request groups are specified. So, while the placement
> service understands the *request* for granular groups, it doesn't
> yet have the ability to constrain the returned candidates
> appropriately. Tetsuro is actively working on this functionality in
> this patch series:
> 
> 
> https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/nested-resource-providers-allocation-candidates
> 
> 
> 
> 6) The virt drivers need to implement the update_provider_tree()
> interface [3] and construct the tree of resource providers along
> with appropriate inventory records for each child provider in the
> tree. Both libvirt and XenAPI virt drivers have patch series up that
> begin to take advantage of the n

[openstack-dev] [nova][placement] self links include /placement?

2017-08-11 Thread Eric Fried
I finally got around to fiddling with the placement API today, and
noticed something... disturbing.  To me, anyway.

When I GET a URI, such as '/resource_classes', the response includes e.g.

  {u'links': [{u'href': u'/placement/resource_classes/MEMORY_MB',
 u'rel': u'self'}],
   u'name': u'MEMORY_MB'},

If I try to GET that 'self' link, it fails (404).  I have to strip the
'/placement' prefix to make it work.

That doesn't seem right.  Can anyone comment?

(This is devstack, nova master with
https://review.openstack.org/#/c/492247/5 loaded up.)

Thanks,
efried
.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] self links include /placement?

2017-08-14 Thread Eric Fried
Thanks for the answer, cdent, and the discussion on IRC [1].  In summary:

- Those links are the full `path` component of the resource, to which
one would prepend the protocol://server:port to get its fully-qualified
location.  The '/placement' prefix is determined and included by the web
server, not hardcoded by the placement service (phew).

- Consumers (at least within nova) are hardcoding their request URLs
based on well-known patterns, not using the links at all.  That's kind
of icky, but it's because ksa manipulates the URLs we give it.

- A hypothetical consumer using a HATEOAS-compliant request client might
be able to use the links, which is why we bother to include them at all.

[1]
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2017-08-14.log.html#t2017-08-14T13:04:59

On 08/12/2017 04:03 AM, Chris Dent wrote:
> On Fri, 11 Aug 2017, Eric Fried wrote:
> 
>> I finally got around to fiddling with the placement API today, and
>> noticed something... disturbing.  To me, anyway.
>>
>> When I GET a URI, such as '/resource_classes', the response includes e.g.
> 
> I assume you're using ksa/requests somewhere in your stack and the
> sesion is "mounted" on the service endpoint provided by the service
> catalog?
> 
> If so, that means the sesion is mounted on /placement and is
> prepending '/placement' to the '/resource_classes' URL you are
> providing.
> 
> If not, I'd need more info and pretty much the rest of this message
> is not related to your problem :)
> 
>>  {u'links': [{u'href': u'/placement/resource_classes/MEMORY_MB',
>> u'rel': u'self'}],
>>   u'name': u'MEMORY_MB'},
> 
> Imagine this was HTML instead of JSON and you were using a browser,
> not ksa. That's an absolute URL, the browser knows that when it sees
> an absolute URL it makes the request back to the same host the
> current page came from. That's standard href behavior.
> 
> It would be incorrect to have a URL of /resource_classes/MEMORY_MB
> there as that would mean (using standard semantics)
> host//foo.bar/resource_classes/MEMORY_MB . It could be correct to
> make the href be host://foo.bar/placement/resource_classes/MEMORY_MB
> but that wasn't done in the placement service so we could avoid
> making any assumptions anywhere in the stack about the host or
> protocol in the thing that is hosting the service (and not require
> any of the middlewares that attempt to adjust the WSGI enviroment
> based on headers passed along from a proxy). Also it makes for
> very noisy response bodies.
> 
>> If I try to GET that 'self' link, it fails (404).  I have to strip the
>> '/placement' prefix to make it work.
> 
> Assuming the ksa thing above is what's happening, that's because the
> URL that you are sending is
> /placement/placement/resource_classes/MEMORY_MB
> 
>> That doesn't seem right.  Can anyone comment?
> 
> I've always found requests' mounting behavior very weird. So to me,
> that you are getting 404s when trying to traverse links is expected:
> you're sending requests to bad URLs. The concept of a "mount" with
> an http request is pretty antithetical to link traversing,
> hypertext, etc. On the other hand, none of the so-called REST APIs
> in OpenStack (including placement) really expect, demand or even
> work with HATEOAS, so ... ?
> 
> I'm not sure if it is something we need to account for when ksa
> constructs URLs or not. It's a problem that I've also
> encountered with some of the tricks that gabbi does [1]. The
> proposed solution there is to sort of merge urls where a prefix is
> known to be present (but see the bug for a corner case on why that's
> not great).
> 
> [1] https://github.com/cdent/gabbi/issues/165
> 
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ptg] PTG general info & upcoming registration price hike

2017-08-30 Thread Eric Fried
Thierry-

> If you're still wondering what will happen at this PTG (or you went to
> the first one Atlanta and wonder what changed since), I wrote a blogpost
> you might be interested to read at: https://ttx.re/queens-ptg.html

Thanks for this, super helpful!

efried

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [infra][docs] Why Manila api-ref doc isn't published?

2017-09-05 Thread Eric Fried
Agree with everything fungi has said here.

Per
https://git.openstack.org/cgit/openstack/service-types-authority/tree/README.rst#n97
we want the official service type to be singular rather than plural.

And per the doc migration movement, we want the API references to live
at standard URLs based on their official service type.

So the official API reference should indeed be at [1], which it seems to
be, as you pointed out.

However, I also agree with your point that there are obviously stale
links in the world pointing to the plural version, and adding a redirect
would be a good idea while those get cleaned up.  I have proposed [2]
for this.

Please also note that, per my comment at [3], I feel we should be moving
toward a place where sources linking to API references should be
gleaning the URLs dynamically from the service-types-authority rather
than hardcoding them.

[1] https://developer.openstack.org/api-ref/shared-file-system/
[2] https://review.openstack.org/#/c/500792/

Thanks,
efried

On 09/04/2017 01:24 PM, Jeremy Stanley wrote:
> On 2017-09-04 12:45:59 -0500 (-0500), Anne Gentle wrote:
>> I want to say there are a couple of in-progress patches to clear
>> this up.
>>
>> https://review.openstack.org/#/c/495326/
>> and
>> https://review.openstack.org/#/c/495887/
> [...]
> 
> Only insofar as the service-types-authority change is switching to
> match the URL where the document is now being published, but this
> doesn't actually address all the places where the old URL is still
> being used. At least that confirms for me that the new URL really is
> the one we want, so maybe the old "shared-file-systems" name should
> be added as an alias for the "shared-file-system" service (giving us
> a redirect as I understand it) and then cleanup of various uses for
> the old URL can happen at everyone's convenience?
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [infra][docs] Why Manila api-ref doc isn't published?

2017-09-05 Thread Eric Fried
Sigh.

[3] https://review.openstack.org/#/c/495326/1/www/.htaccess@52

On 09/05/2017 07:30 AM, Eric Fried wrote:
> Agree with everything fungi has said here.
> 
> Per
> https://git.openstack.org/cgit/openstack/service-types-authority/tree/README.rst#n97
> we want the official service type to be singular rather than plural.
> 
> And per the doc migration movement, we want the API references to live
> at standard URLs based on their official service type.
> 
> So the official API reference should indeed be at [1], which it seems to
> be, as you pointed out.
> 
> However, I also agree with your point that there are obviously stale
> links in the world pointing to the plural version, and adding a redirect
> would be a good idea while those get cleaned up.  I have proposed [2]
> for this.
> 
> Please also note that, per my comment at [3], I feel we should be moving
> toward a place where sources linking to API references should be
> gleaning the URLs dynamically from the service-types-authority rather
> than hardcoding them.
> 
> [1] https://developer.openstack.org/api-ref/shared-file-system/
> [2] https://review.openstack.org/#/c/500792/
> 
> Thanks,
> efried
> 
> On 09/04/2017 01:24 PM, Jeremy Stanley wrote:
>> On 2017-09-04 12:45:59 -0500 (-0500), Anne Gentle wrote:
>>> I want to say there are a couple of in-progress patches to clear
>>> this up.
>>>
>>> https://review.openstack.org/#/c/495326/
>>> and
>>> https://review.openstack.org/#/c/495887/
>> [...]
>>
>> Only insofar as the service-types-authority change is switching to
>> match the URL where the document is now being published, but this
>> doesn't actually address all the places where the old URL is still
>> being used. At least that confirms for me that the new URL really is
>> the one we want, so maybe the old "shared-file-systems" name should
>> be added as an alias for the "shared-file-system" service (giving us
>> a redirect as I understand it) and then cleanup of various uses for
>> the old URL can happen at everyone's convenience?
>>
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] removing screen from devstack - RSN

2017-09-07 Thread Eric Fried
John-

You're not the only one for whom the transition to systemd has been
painful.

However...

It *is* possible (some would argue just as easy) to do all things with
systemd that were done with screen.

For starters, have you seen [1] ?

Though looking at that again, I realize it could use a section on how
to do pdb - I'll propose something for that.  In the meantime, feel free
to find me in #openstack-dev and I can talk you through it.

[1] https://docs.openstack.org/devstack/latest/systemd.html

Thanks,
    Eric Fried (efried)

On 09/07/2017 12:34 PM, John Griffith wrote:
> 
> 
> On Thu, Sep 7, 2017 at 11:29 AM, John Griffith  <mailto:john.griffi...@gmail.com>> wrote:
> 
> Please don't, some of us have no issues with screen and use it
> extensively for debugging.  Unless there's a viable option using
> systemd I fail to understand why this is such a big deal.  I've been
> using devstack in screen for a long time without issue, and I still
> use rejoin that supposedly didn't work (without issue).
> 
> I completely get the "run like customers" but in theory I'm not sure
> how screen makes it much different than what customers do, it's
> executing the same binary at the end of the day.  I'd also ask then
> is devstack no longer "dev" stack, but now a preferred method of
> install for running production clouds?  Anyway, I'd just ask to
> leave it as an option, unless there's equivalent options for things
> like using pdb etc.  It's annoying enough that we lost that
> capability for the API services, is there a possibility we can
> reconsider not allowing this an option?
> 
> Thanks,
> John
> 
> On Thu, Sep 7, 2017 at 7:31 AM, Davanum Srinivas  <mailto:dava...@gmail.com>> wrote:
> 
> w00t!
> 
> On Thu, Sep 7, 2017 at 8:45 AM, Sean Dague  <mailto:s...@dague.net>> wrote:
> > On 08/31/2017 06:27 AM, Sean Dague wrote:
> >> The work that started last cycle to make devstack only have a
> single
> >> execution mode, that was the same between automated QA and
> local, is
> >> nearing it's completion.
> >>
> >> https://review.openstack.org/#/c/499186/
> <https://review.openstack.org/#/c/499186/> is the patch that
> will remove
> >> screen from devstack (which was only left as a fall back for
> things like
> >> grenade during Pike). Tests are currently passing on all the
> gating jobs
> >> for it. And experimental looks mostly useful.
> >>
> >> The intent is to merge this in about a week (right before
> PTG). So, if
> >> you have a complicated devstack plugin you think might be
> affected by
> >> this (and were previously making jobs pretend to be grenade
> to keep
> >> screen running), now is the time to run tests against this
> patch and see
> >> where things stand.
> >
> > This patch is in the gate and now merging, and with it
> devstack now has
> > a single run mode, using systemd units, which is the same
> between test
> > and development.
> >
> > Thanks to everyone helping with the transition!
> >
> > -Sean
> >
> > --
> > Sean Dague
> > http://dague.net
> >
> >
> 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
> >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> 
> 
> 
> --
> Davanum Srinivas :: https://twitter.com/dims
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
>

Re: [openstack-dev] removing screen from devstack - RSN

2017-09-07 Thread Eric Fried
All-

The plain pdb doc patch [1] is merging.

On clarkb's suggestion, I took a look at remote-pdb [2], and it turned
out to be easy-peasy to use.  I submitted a followon doc patch for that [3].

Thanks, John, for speaking up and getting this rolling.

Eric

[1] https://review.openstack.org/#/c/501834/
[2] https://pypi.python.org/pypi/remote-pdb
[3] https://review.openstack.org/#/c/501870/

On 09/07/2017 02:30 PM, John Griffith wrote:
> 
> 
> On Thu, Sep 7, 2017 at 1:28 PM, Sean Dague  <mailto:s...@dague.net>> wrote:
> 
> On 09/07/2017 01:52 PM, Eric Fried wrote:
> 
> John-
> 
> You're not the only one for whom the transition to
> systemd has been
> painful.
> 
> However...
> 
> It *is* possible (some would argue just as easy) to do
> all things with
> systemd that were done with screen.
> 
> For starters, have you seen [1] ?
> 
> Though looking at that again, I realize it could use a
> section on how
> to do pdb - I'll propose something for that.  In the meantime,
> feel free
> to find me in #openstack-dev and I can talk you through it.
> 
> [1] https://docs.openstack.org/devstack/latest/systemd.html
> <https://docs.openstack.org/devstack/latest/systemd.html>
> 
> Thanks,
> Eric Fried (efried)
> 
> 
> Thank you Eric. Would love to get a recommended pdb path into the
> docs. Ping me as soon as it's up for review, and I'll get it merged
> quickly.
> 
> Thanks for stepping up here, it's highly appreciated.
> 
> -Sean
> 
> 
> -- 
> Sean Dague
> http://dague.net
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> 
> ​Patch is here [1] for those that are interested:
> 
> [1]: https://review.openstack.org/#/c/501834/1​
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] Adding neutron VIF NUMA locality support

2017-09-07 Thread Eric Fried
Stephen-

FYI, we're teeing up (anti-)affinity and neutron-nova interaction
topics for the "generic device management" discussions at the PTG.
Please jump on the etherpad [1] and expand/expound in the relevant
spots, which, as of this writing, you can find by searching for
"aggregate", "vf selection policy", and "interplay".

Thanks,
Eric

[1]
https://etherpad.openstack.org/p/nova-ptg-queens-generic-device-management

On 09/07/2017 01:07 PM, Mooney, Sean K wrote:
> 
> 
>> -Original Message-
>> From: Stephen Finucane [mailto:sfinu...@redhat.com]
>> Sent: Thursday, September 7, 2017 5:42 PM
>> To: OpenStack Development Mailing List (not for usage questions)
>> 
>> Cc: Jakub Libosvar ; Karthik Sundaravel
>> ; Mooney, Sean K 
>> Subject: [nova] [neutron] Adding neutron VIF NUMA locality support
>>
>> Hey,
>>
>> NUMA locality matters as much for NICs used e.g for Open vSwitch as for
>> SR-IOV devices. At the moment, nova support NUMA affinity for PCI
>> passthrough devices and SR-IOV devices, but it makes no attempt to do
>> the same for other NICs. In the name of NFV enablement, we should
>> probably close this gap.
> [Mooney, Sean K] I like this idea in general, that said in ovs-dpdk we 
> modified
> ovs to schedule the vhost-user port to be processed on a pmd that is on the 
> same
> Numa node as the vm and reallocate the vhsot user port memory where possible
> To also have the same affinity. 
>>
>> I have some ideas around how this could work, but they're fuzzy enough
>> and involve exchanging os-vif objects between nova and neutron. This is
>> probably the most difficult path as we've been trying to get os-vif
>> objects over the nova-neutron wire for a while now, to no success.
> [Mooney, Sean K] actually we have so poc code you should proably review
> This topic.
> https://blueprints.launchpad.net/os-vif/+spec/vif-port-profile
> https://review.openstack.org/#/c/490829/ 
> https://review.openstack.org/#/c/490819/ 
> https://review.openstack.org/#/c/441590/
> the first patch of the neutron side poc should be up before the ptg.
> 
>>
>> Anyone else keen on such a feature? Given that there are a significant
>> amount of people from nova, neutron, and general NFV backgrounds at the
>> PTG next week, we have a very good opportunity to talk about this
>> (either in the nova- neutron sync, if that's not already full, or in
>> some hallway somewhere).
> [Mooney, Sean K] in terms of basic numa affinity this is not as important
> With ovs-dpdk because we make best effort to fix it in ovs this is less 
> pressing
> Then it used to be. It is still important for other backbends but we need
> Also have a mechanism to control numa affinity policy like 
>  https://review.openstack.org/#/c/361140/ to not break existing deployments.
> 
> I have some taught about modeling network backbends
> in placement and also passing traits requests for neutron that this would dove
> tail with so would love to talk to anyone who is interested in this.
> By modeling ovs and other network backend in placement and combining that
> With traits and the nova-neutron negotiation protocol we support several
> Advance usescase.
> 
> By the way  ovs-dpdk allow you to specify vhost-port rx/tx queue mapping 
> to pmd which could give a nice performance boost if done correctly. It
> might be worth extending os-vif to do that in the future though this could
> equally be handeled by the neutron ovs l2 agent.
>>
>> At this point in the day, this is probably very much a Rocky feature,
>> but we could definitely put in whatever groundwork is necessary this
>> cycle to make the work in Rocky as easy possible.
> [Mooney, Sean K] I'm hoping we can get the nova neutron negotiation done in 
> queens.
>>
>> Cheers,
>> Stephen
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] removing screen from devstack - RSN

2017-09-08 Thread Eric Fried
Oh, are we talking about the logs produced by CI jobs?  I thought we
were talking about on your local devstack itself.  Because there, I
don't think you shouldn't be seeing log files like this anymore.
Logging is done via systemd and can be viewed via journalctl [1].

The exceptions are things that run under apache, like horizon, keystone,
and placement - their log files can be found wherever apache is set up
to send 'em.  E.g. [2].

As far as the names go, I *think* we've done away with 'q' as the
neutron prefixy thing at this point.  On my (pike-ish) setup, the
devstack neutron API service is quite appropriately called
devstack@neutron-api.service.

[1] https://docs.openstack.org/devstack/latest/systemd#querying-logs
[2] http://paste.openstack.org/raw/620754/


On 09/08/2017 03:49 PM, Sean Dague wrote:
> I would love to. Those were mostly left because devstack-gate (and
> related tooling like elasticsearch) is not branch aware, so things get
> ugly on the conditionals for changing expected output files.
> 
> That might be a good popup infra topic at PTG.
> 
> On 09/08/2017 04:17 PM, John Villalovos wrote:
>> Does this mean we can now get more user friendly names for the log files?
>>
>> Currently I see names like:
>> screen-dstat.txt.gz
>> screen-etcd.txt.gz 
>> screen-g-api.txt.gz
>> screen-g-reg.txt.gz
>> screen-ir-api.txt.gz   
>> screen-ir-cond.txt.gz  
>> screen-keystone.txt.gz 
>> screen-n-api-meta.txt.gz   
>> screen-n-api.txt.gz
>> screen-n-cauth.txt.gz  
>> screen-n-cond.txt.gz   
>> screen-n-cpu.txt.gz
>> screen-n-novnc.txt.gz  
>> screen-n-sch.txt.gz
>> screen-peakmem_tracker.txt.gz  
>> screen-placement-api.txt.gz
>> screen-q-agt.txt.gz
>> screen-q-dhcp.txt.gz   
>> screen-q-l3.txt.gz 
>> screen-q-meta.txt.gz   
>> screen-q-metering.txt.gz   
>> screen-q-svc.txt.gz
>> screen-s-account.txt.gz
>> screen-s-container.txt.gz  
>> screen-s-object.txt.gz 
>> screen-s-proxy.txt.gz  
>>
>> People new to OpenStack don't really know that 'q' means neutron.
>>
>>
>>
>> On Thu, Sep 7, 2017 at 5:45 AM, Sean Dague > > wrote:
>>
>> On 08/31/2017 06:27 AM, Sean Dague wrote:
>> > The work that started last cycle to make devstack only have a single
>> > execution mode, that was the same between automated QA and local, is
>> > nearing it's completion.
>> >
>> > https://review.openstack.org/#/c/499186/
>>  is the patch that will remove
>> > screen from devstack (which was only left as a fall back for things 
>> like
>> > grenade during Pike). Tests are currently passing on all the gating 
>> jobs
>> > for it. And experimental looks mostly useful.
>> >
>> > The intent is to merge this in about a week (right before PTG). So, if
>> > you have a complicated devstack plugin you think might be affected by
>> > this (and were previously making jobs pretend to be grenade to keep
>> > screen running), now is the time to run tests against this patch and 
>> see
>> > where things stand.
>>
>> This patch is in the gate and now merging, and with it devstack now has
>> a single run mode, using systemd units, which is the same between test
>> and development.
>>
>> Thanks to everyone helping with the transition!
>>
>> -Sean
>>
>> --
>> Sean Dague
>> http://dague.net
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> 
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
>>
>>
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> 
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] removing screen from devstack - RSN

2017-09-11 Thread Eric Fried
Miguel-

Sean was in favor of that, and I started looking into it for plain pdb,
but it barely seemed worth it - it would wind up being like three LOC.
I definitely wouldn't oppose it, though.  Please add me to that review
if you go for it.

What seemed more interesting to me was a utility to do the remote-pdb
thing, but that was more ambitious than I wanted to take on right now.
It would have to watch the relevant log for the message declaring the
telnet port, and then open a telnet session to that port.  To be useful
in a multithreaded service, I imagine it would want to pop open the
telnet session in a new window, which means you gotta have X set up
right, etc.  (Say, maybe it could use screen ;-)

A bonus (for either of the above) would be setting up bash autocomplete
for the service names...

Thanks,
Eric (efried)

On 09/11/2017 10:56 AM, Miguel Angel Ajo Pelayo wrote:
> I wonder if it makes sense to provide a helper script to do what it's
> explained on the document.
> 
> So we could ~/devstack/tools/run_locally.sh n-sch.
> 
> If yes, I'll send the patch.
> 
> On Fri, Sep 8, 2017 at 3:00 PM, Eric Fried  <mailto:openst...@fried.cc>> wrote:
> 
> Oh, are we talking about the logs produced by CI jobs?  I thought we
> were talking about on your local devstack itself.  Because there, I
> don't think you shouldn't be seeing log files like this anymore.
> Logging is done via systemd and can be viewed via journalctl [1].
> 
> The exceptions are things that run under apache, like horizon, keystone,
> and placement - their log files can be found wherever apache is set up
> to send 'em.  E.g. [2].
> 
> As far as the names go, I *think* we've done away with 'q' as the
> neutron prefixy thing at this point.  On my (pike-ish) setup, the
> devstack neutron API service is quite appropriately called
> devstack@neutron-api.service.
> 
> [1] https://docs.openstack.org/devstack/latest/systemd#querying-logs
> <https://docs.openstack.org/devstack/latest/systemd#querying-logs>
> [2] http://paste.openstack.org/raw/620754/
> <http://paste.openstack.org/raw/620754/>
> 
> 
> On 09/08/2017 03:49 PM, Sean Dague wrote:
> > I would love to. Those were mostly left because devstack-gate (and
> > related tooling like elasticsearch) is not branch aware, so things get
> > ugly on the conditionals for changing expected output files.
> >
> > That might be a good popup infra topic at PTG.
> >
> > On 09/08/2017 04:17 PM, John Villalovos wrote:
> >> Does this mean we can now get more user friendly names for the
> log files?
> >>
> >> Currently I see names like:
> >> screen-dstat.txt.gz
> >> screen-etcd.txt.gz
> >> screen-g-api.txt.gz
> >> screen-g-reg.txt.gz
> >> screen-ir-api.txt.gz
> >> screen-ir-cond.txt.gz
> >> screen-keystone.txt.gz
> >> screen-n-api-meta.txt.gz
> >> screen-n-api.txt.gz
> >> screen-n-cauth.txt.gz
> >> screen-n-cond.txt.gz
> >> screen-n-cpu.txt.gz
> >> screen-n-novnc.txt.gz
> >> screen-n-sch.txt.gz
> >> screen-peakmem_tracker.txt.gz
> >> screen-placement-api.txt.gz
> >> screen-q-agt.txt.gz
> >> screen-q-dhcp.txt.gz
> >> screen-q-l3.txt.gz
> >> screen-q-meta.txt.gz
> >> screen-q-metering.txt.gz
> >> screen-q-svc.txt.gz
> >> screen-s-account.txt.gz
> >> screen-s-container.txt.gz
> >> screen-s-object.txt.gz
> >> screen-s-proxy.txt.gz
> >>
> >> People new to OpenStack don't really know that 'q' means neutron.
> >>
> >>
> >>
> >> On Thu, Sep 7, 2017 at 5:45 AM, Sean Dague  <mailto:s...@dague.net>
> >> <mailto:s...@dague.net <mailto:s...@dague.net>>> wrote:
> >>
> >> On 08/31/2017 06:27 AM, Sean Dague wrote:
> >> > The work that started last cycle to make devstack only have
> a single
> >> > execution mode, that was the same between automated QA and
> local, is
> >> > nearing it's completion.
> >> >
> >> > https://review.openstack.org/#/c/499186/
> <https://review.openstack.org/#/c/499186/>
> >> <https://review.openstack.org/#/c/499186/
> <https://review.openstack.org/#/

Re: [openstack-dev] [ptg][nova][neutron] modelling network capabilities and capacity in placement and nova neutron port binding negociation.

2017-09-11 Thread Eric Fried
Yup, I definitely want to be involved in this too.  Please keep me posted.

efried

On 09/11/2017 11:12 AM, Jay Pipes wrote:
> I'm interested in this. I get in to Denver this evening so if we can do
> this session tomorrow or later, that would be super.
> 
> Best,
> -jay
> 
> On 09/11/2017 01:11 PM, Mooney, Sean K wrote:
>> Hi everyone,
>>
>> I’m interested in set up a white boarding session at the ptg to discuss
>>
>> How to model network backend in placement and use that info as part of
>> scheduling
>>
>> This work would also intersect on the nova neutron port binding 
>> negotiation
>>
>> Work that is also in flight so I think there is merit in combining
>> both topic into one
>>
>> Session.
>>
>> For several release we have been discussing a negotiation protocol
>> that would
>>
>> Allow nova/compute services to tell neutron what virtual and physical
>> interfaces
>>
>> a hypervisor can support and then allow neutron to select from that
>> set the most appriote
>>
>> vif type based on the capabilities of the network backend deployed by
>> the host.
>>
>> Extending that concept with the capabilities provided by placement and
>> trait
>>
>> Will enable us to model the network capablites of a specific network
>> backend
>>
>> In an scheduler friendly way without nova needing to understand
>> networking.
>>
>> To that end  if people are interested in  having a while boarding
>> session to dig
>>
>> Into this let me know.
>>
>> Regards
>>
>> Seán
>>
>> --
>> Intel Shannon Limited
>> Registered in Ireland
>> Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
>> Registered Number: 308263
>> Business address: Dromore House, East Park, Shannon, Co. Clare
>>
>>
>>
>> __
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ptg][nova][neutron] modelling network capabilities and capacity in placement and nova neutron port binding negociation.

2017-09-11 Thread Eric Fried
Folks-

I tentatively snagged Tuesday 13:30-14:30 in the compute/vm/bm room for
this, and started an etherpad [1].

o Please add your nick to the interest list if you want to be pinged for
updates (e.g. in case we move rooms/times).  (Miguel, what's your IRC nick?)
o Feel free to flesh out the schedule/scope/topics.
o Let me know if this time/location doesn't work for you.
o It would be nice to have a rep from Neutron handy :)

[1] https://etherpad.openstack.org/p/placement-nova-neutron-queens-ptg

Thanks,
    Eric Fried (efried)

On 09/11/2017 01:05 PM, Sławek Kapłoński wrote:
> Hello,
> 
> I’m also interested in this as it can help to provide guarantee minimum 
> bandwidth for instances.
> 
> —
> Pozdrawiam
> Sławek Kapłoński
> sla...@kaplonski.pl
> 
> 
> 
> 
>> Wiadomość napisana przez Mooney, Sean K  w dniu 
>> 11.09.2017, o godz. 11:11:
>>
>> Hi everyone,
>>
>> I’m interested in set up a white boarding session at the ptg to discuss
>> How to model network backend in placement and use that info as part of 
>> scheduling
>>
>> This work would also intersect on the nova neutron port binding  negotiation
>> Work that is also in flight so I think there is merit in combining both 
>> topic into one
>> Session.
>>
>> For several release we have been discussing a negotiation protocol that would
>> Allow nova/compute services to tell neutron what virtual and physical 
>> interfaces
>> a hypervisor can support and then allow neutron to select from that set the 
>> most appriote
>> vif type based on the capabilities of the network backend deployed by the 
>> host.
>>
>> Extending that concept with the capabilities provided by placement and trait
>> Will enable us to model the network capablites of a specific network backend
>> In an scheduler friendly way without nova needing to understand networking.
>>
>> To that end  if people are interested in  having a while boarding session to 
>> dig
>> Into this let me know.
>>
>> Regards
>> Seán
>> --
>> Intel Shannon Limited
>> Registered in Ireland
>> Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
>> Registered Number: 308263
>> Business address: Dromore House, East Park, Shannon, Co. Clare
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ptg][nova][neutron] modelling network capabilities and capacity in placement and nova neutron port binding negociation.

2017-09-12 Thread Eric Fried
Okay, folks, it's set.

We're using this etherpad [1].

We're meeting in the Compute stack/VM & BM WG room: Ballroom B, Banquet
Level

We're meeting at 13:30 TODAY (Tuesday).  I've blocked out two hours.
(There's currently nothing scheduled in the room afterwards, so we may
be able to bleed over if necessary.)

See y'all there.

    Thanks,
Eric Fried (efried)

[1] https://etherpad.openstack.org/p/placement-nova-neutron-queens-ptg

On 09/12/2017 09:30 AM, Chris Dent wrote:
> On Mon, 11 Sep 2017, Eric Fried wrote:
> 
>> Folks-
>>
>> I tentatively snagged Tuesday 13:30-14:30 in the compute/vm/bm
>> room for
>> this, and started an etherpad [1].
>>
>> o Please add your nick to the interest list if you want to be pinged for
>> updates (e.g. in case we move rooms/times).  (Miguel, what's your IRC
>> nick?)
>> o Feel free to flesh out the schedule/scope/topics.
>> o Let me know if this time/location doesn't work for you.
>> o It would be nice to have a rep from Neutron handy :)
>>
>> [1] https://etherpad.openstack.org/p/placement-nova-neutron-queens-ptg
> 
> I've added links to pics from yesterday's spontaneous whiteboarding
> to this etherpad.
> 
> What (if anything) is the difference between this session and
> etherpad and the one that Sean has created a bit later in thread.
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [zun][unit test] Any python utils can collect pci info?

2017-09-18 Thread Eric Fried
You may get a little help from the methods in nova.pci.utils.

If you're calling out to lspci or accessing sysfs, be aware of this
series [1] and do it via the new privsep mechanisms.

[1]
https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:hurrah-for-privsep

On 09/17/2017 09:41 PM, Hongbin Lu wrote:
> Hi Shunli,
> 
>  
> 
> I am not aware of any prevailing python utils for this. An alternative
> is to shell out Linux commands to collect the information. After a quick
> search, it looks xenapi [1] uses “lspci -vmmnk” to collect PCI device
> detail info and “ls /sys/bus/pci/devices//” to detect the
> PCI device type (PF or VF). FWIW, you might find it helpful to refer the
> implementation of Nova’s xenapi driver for gettiing PCI resources [2].
> Hope it helps.
> 
>  
> 
> [1]
> https://github.com/openstack/os-xenapi/blob/master/os_xenapi/dom0/etc/xapi.d/plugins/xenhost.py#L593
> 
> [2]
> https://github.com/openstack/nova/blob/master/nova/virt/xenapi/host.py#L154
> 
>  
> 
> Best regards,
> 
> Hongbin
> 
>  
> 
> *From:*Shunli Zhou [mailto:shunli6...@gmail.com]
> *Sent:* September-17-17 9:35 PM
> *To:* openstack-dev@lists.openstack.org
> *Subject:* [openstack-dev] [zun][unit test] Any python utils can collect
> pci info?
> 
>  
> 
> Hi all,
> 
>  
> 
> For https://blueprints.launchpad.net/zun/+spec/support-pcipassthroughfilter
> this BP, Nova use the libvirt to collect the PCI device info. But for
> zun, libvirt seems is a heavy dependecies. Is there a python utils that
> can be used to collect the PCI device detail info? Such as the whether
> it's a PF of network pci device of VF, the device capabilities, etc.
> 
>  
> 
> Note: For 'lspci -D -nnmm' , there are some info can not get.
> 
>  
> 
>  
> 
> Thanks
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [docs][ptls][install] Install guide vs. tutorial

2017-09-19 Thread Eric Fried
Alex-

    The non-list reply was deliberate.  I honestly haven't looked at the
docs in question, so really my answer was just me being pedantic about
how I interpret those two words in general when I see them in technical
literature.  Jay's concerns about consistency are probably more important.

    However, posting back to the list, as requested.

Thanks,
Eric

On 09/19/2017 03:49 PM, Alexandra Settle wrote:
> Hi Eric,
>
> I’m not entirely too sure if you meant to just reply to me regarding this 
> thread? :)
>
> However, it would be helpful if you could bring it back to the mailing list 
> to highlight your concerns and continue this discussion :)
>
> Thanks for your reply,
>
> Alex
>
> On 9/19/17, 2:43 PM, "Eric Fried"  wrote:
>
> Alex-
> 
>   Regardless of what the dictionary might say, people associate the word
> "Tutorial" with a set of step-by-step instructions to do a thing.
> "Guide" would be a more general term.
> 
>   I think of a "Tutorial" as being a *single* representative path through
> a process.  A "Guide" could supply different alternatives.
> 
>   I expect a "Tutorial" to get me from start to finish.  A "Guide" might
> help me along the way, but could be sparser.
> 
>   In summary, I believe the word "Tutorial" implies a very specific
> thing, so we should use it if and only if the doc is exactly that.
> 
>   Eric
> 
> On 09/19/2017 07:23 AM, Alexandra Settle wrote:
> > Hi everyone,
> > 
> >  
> > 
> > I hope everyone had a safe trip home after the PTG!
> > 
> >  
> > 
> > Since the doc-migration, quite a number or individuals have had
> > questions regarding the usage of “Install Tutorial” vs. “Install Guide”
> > in our documentation in the openstack-manuals repo and in the
> > project-specific repos. We (the doc team) agree there should be
> > consistency across all repos and would like to bring this to the table
> > to discuss.
> > 
> >  
> > 
> > Previously, we have used the phrase ‘tutorial’ as the literal
> > translation of a tutorial is that of a /‘paper, book, film, or computer
> > program that provides practical information about a specific subject.’/
> > 
> > / /
> > 
> > Thoughts?
> > 
> >  
> > 
> > Alex
> > 
> > 
> > 
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: 
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > 
> 
>


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] ironic and traits

2017-10-16 Thread Eric Fried
* Adding references to the specs: ironic side [1]; nova side [2] (which
just merged).

* Since Jay is on vacation, I'll tentatively note his vote by proxy [3]
that ironic should be the source of truth - i.e. option (a).  I think
the upshot is that it's easier for Ironic to track and resolve conflicts
than for the virt driver to do so.

> The downside is obvious - with a lot of deploy templates
> available it can be a lot of manual work.

* How does option (b) help with this?

* I suggested a way to maintain the "source" of a trait (operator,
inspector, etc.) [4] which would help with resolving conflicts.
However, I agree it would be better to avoid this extra complexity if
possible.

* This is slightly off topic, but it's related and will eventually need
to be considered: How are you going to know whether a
UEFI-capable-but-not-enabled node should have its UEFI mode turned on?
Are you going to parse the traits specified in the flavor?  (This might
work for Ironic, but will be tough in the general case.)

[1] https://review.openstack.org/504531
[2] https://review.openstack.org/507052
[3]
https://review.openstack.org/#/c/507052/4/specs/queens/approved/ironic-traits.rst@88
[4]
https://review.openstack.org/#/c/504531/4/specs/approved/node-traits.rst@196

On 10/16/2017 11:24 AM, Dmitry Tantsur wrote:
> Hi all,
> 
> I promised John to dump my thoughts on traits to the ML, so here we go :)
> 
> I see two roles of traits (or kinds of traits) for bare metal:
> 1. traits that say what the node can do already (e.g. "the node is
> doing UEFI boot")
> 2. traits that say what the node can be *configured* to do (e.g. "the node can
> boot in UEFI mode")
> 
> This seems confusing, but it's actually very useful. Say, I have a flavor that
> requests UEFI boot via a trait. It will match both the nodes that are already 
> in
> UEFI mode, as well as nodes that can be put in UEFI mode.
> 
> This idea goes further with deploy templates (new concept we've been thinking
> about). A flavor can request something like CUSTOM_RAID_5, and it will match 
> the
> nodes that already have RAID 5, or, more interestingly, the nodes on which we
> can build RAID 5 before deployment. The UEFI example above can be treated in a
> similar way.
> 
> This ends up with two sources of knowledge about traits in ironic:
> 1. Operators setting something they know about hardware ("this node is in UEFI
> mode"),
> 2. Ironic drivers reporting something they
>   2.1. know about hardware ("this node is in UEFI mode" - again)
>   2.2. can do about hardware ("I can put this node in UEFI mode")
> 
> For case #1 we are planning on a new CRUD API to set/unset traits for a node.
> Case #2 is more interesting. We have two options, I think:
> 
> a) Operators still set traits on nodes, drivers are simply validating them. 
> E.g.
> an operators sets CUSTOM_RAID_5, and the node's RAID interface checks if it is
> possible to do. The downside is obvious - with a lot of deploy templates
> available it can be a lot of manual work.
> 
> b) Drivers report the traits, and they get somehow added to the traits 
> provided
> by an operator. Technically, there are sub-cases again:
>   b.1) The new traits API returns a union of operator-provided and
> driver-provided traits
>   b.2) The new traits API returns only operator-provided traits; 
> driver-provided
> traits are returned e.g. via a new field (node.driver_traits). Then nova will
> have to merge the lists itself.
> 
> My personal favorite is the last option: I'd like a clear distinction between
> different "sources" of traits, but I'd also like to reduce manual work for
> operators.
> 
> A valid counter-argument is: what if an operator wants to override a
> driver-provided trait? E.g. a node can do RAID 5, but I don't want this
> particular node to do it for any reason. I'm not sure if it's a valid case, 
> and
> what to do about it.
> 
> Let me know what you think.
> 
> Dmitry
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][out-of-tree drivers] Breaking change to ComputeDriver.spawn and friends

2017-10-17 Thread Eric Fried
Out-of-tree virt driver maintainers, please keep an eye on [1], which
will force you to update the signature of your spawn and rebuild
overrides.  See the commit message for the whys and wherefores, and let
me know if you have any questions.

[1] https://review.openstack.org/511879

Thanks,
Eric Fried (efried)
.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][out-of-tree drivers] Breaking change to ComputeDriver.spawn and friends

2017-10-18 Thread Eric Fried
Correct - and you don't *need* to use the param yet if you don't want
to.  Here's what we're doing in nova-powervm [1].  (It won't pass our CI
until the Nova change is merged - we don't have Depends-On working.)

[1] https://review.openstack.org/#/c/512814/

On 10/17/2017 08:18 PM, Chen CH Ji wrote:
> Thanks for sharing this info, I already keep an eye on this change and
> understand the reason,
> so if I understand this correctly, out of tree driver only need
> 'allocations' to be added for spawn and rebuild function and up to the
> driver to use it, correct?
> 
> Best Regards!
> 
> Kevin (Chen) Ji 纪 晨
> 
> Engineer, zVM Development, CSTL
> Notes: Chen CH Ji/China/IBM@IBMCN Internet: jiche...@cn.ibm.com
> Phone: +86-10-82451493
> Address: 3/F Ring Building, ZhongGuanCun Software Park, Haidian
> District, Beijing 100193, PRC
> 
> Inactive hide details for Eric Fried ---10/18/2017 04:07:25
> AM---Out-of-tree virt driver maintainers, please keep an eye on [1]Eric
> Fried ---10/18/2017 04:07:25 AM---Out-of-tree virt driver maintainers,
> please keep an eye on [1], which will force you to update the s
> 
> From: Eric Fried 
> To: "OpenStack Development Mailing List (not for usage questions)"
> 
> Date: 10/18/2017 04:07 AM
> Subject: [openstack-dev] [nova][out-of-tree drivers] Breaking change to
> ComputeDriver.spawn and friends
> 
> 
> 
> 
> 
> Out-of-tree virt driver maintainers, please keep an eye on [1], which
> will force you to update the signature of your spawn and rebuild
> overrides.  See the commit message for the whys and wherefores, and let
> me know if you have any questions.
> 
> [1]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__review.openstack.org_511879&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=8sI5aZT88Uetyy_XsOddbPjIiLSGM-sFnua3lLy2Xr0&m=8oX5g02pZ4Ix5m6E7QSmfcothpbtEouRYL0QxNGYv9M&s=W6Y0oep8w6Gkm_LMDQQb3ISzmAwzMEUej7uIP6wPpGo&e=
> 
> Thanks,
> Eric Fried (efried)
> .
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_cgi-2Dbin_mailman_listinfo_openstack-2Ddev&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=8sI5aZT88Uetyy_XsOddbPjIiLSGM-sFnua3lLy2Xr0&m=8oX5g02pZ4Ix5m6E7QSmfcothpbtEouRYL0QxNGYv9M&s=oD7yXwwFk6qjyA_YLNpc_ZCPqChdsni1Q7G08p_TXZI&e=
> 
> 
> 
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] PCI pass through settings on a flavor without aliases on the API nodes

2017-10-18 Thread Eric Fried
Robert-

No.

Some day, once generic device management is baked, there will be.
Depending how your favorite virt driver decides to model things, one
could envision a flavor with extra specs like:

resources:SRIOV_NET_PF:1
trait:CUSTOM_PCI_VENDORID_8086=required
trait:CUSTOM_PCI_PRODUCTID_154D=required

In the meantime, it's especially useful to get this kind of feedback
from ops so we can ensure we're meeting the right requirements as we
design things.  Please reach out if you want to discuss further.

Thanks,
Eric Fried (efried)

On 10/18/2017 09:56 AM, Van Leeuwen, Robert wrote:
> Hi,
> 
>  
> 
> Does anyone know if it is possible to set PCI pass through on a flavor
> without also needing to set the alias on the nova API nodes as mentioned
> here:
> https://docs.openstack.org/nova/pike/admin/pci-passthrough.html
> 
>  
> 
> E.G you need to set in nova.conf:
> 
> [pci]
> 
> alias = { "vendor_id":"8086", "product_id":"154d ",
> "device_type":"type-PF", "name":"a1" }
> 
>  
> 
> Then you can set the flavor:
> 
> openstack flavor set m1.large --property "pci_passthrough:alias"="a1:2"
> 
>  
> 
>  
> 
> E.g. I would be fine with just setting the PCI vendor/product on the
> flavor instead of also needing to set this at the api node
> 
> So something like:
> 
> openstack flavor set m1.large –property “pci_passthrough:vendor”=”8086”
>  “pci_passthrough:device”=”154d:1”
> 
>  
> 
> Thx,
> 
> Robert  van Leeuwen
> 
> 
> 
> ___
> OpenStack-operators mailing list
> openstack-operat...@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] ironic and traits

2017-10-23 Thread Eric Fried
I agree with Sean.  In general terms:

* A resource provider should be marked with a trait if that feature
  * Can be turned on or off (whether it's currently on or not); or
  * Is always on and can't ever be turned off.
* A consumer wanting that feature present (doesn't matter whether it's
on or off) should specify it as a required *trait*.
* A consumer wanting that feature present and turned on should
  * Specify it as a required trait; AND
  * Indicate that it be turned on via some other mechanism (e.g. a
separate extra_spec).

I believe this satisfies Dmitry's (Ironic's) needs, but also Jay's drive
for placement purity.

Please invite me to the hangout or whatever.

Thanks,
Eric

On 10/23/2017 07:22 AM, Mooney, Sean K wrote:
>  
> 
>  
> 
> *From:*Jay Pipes [mailto:jaypi...@gmail.com]
> *Sent:* Monday, October 23, 2017 12:20 PM
> *To:* OpenStack Development Mailing List 
> *Subject:* Re: [openstack-dev] [ironic] ironic and traits
> 
>  
> 
> Writing from my phone... May I ask that before you proceed with any plan
> that uses traits for state information that we have a hangout or
> videoconference to discuss this? Unfortunately today and tomorrow I'm
> not able to do a hangout but I can do one on Wednesday any time of the day.
> 
>  
> 
> */[Mooney, Sean K] on the uefi boot topic I did bring up at the ptg that
> we wanted to standardizes tratis for “verified boot” /*
> 
> */that included a trait for uefi secure boot enabled and to indicated a
> hardware root of trust, e.g. intel boot guard or similar/*
> 
> */we distinctly wanted to be able to tag nova compute hosts with those
> new traits so we could require that vms that request/*
> 
> */a host with uefi secure boot enabled and a hardware root of trust are
> scheduled only to those nodes. /*
> 
> */ /*
> 
> */There are many other examples that effect both vms and bare metal such
> as, ecc/interleaved memory, cluster on die, /*
> 
> */l3 cache code and data prioritization, vt-d/vt-c, HPET, Hyper
> threading, power states … all of these feature may be present on the
> platform/*
> 
> */but I also need to know if they are turned on. Ruling out state in
> traits means all of this logic will eventually get pushed to scheduler
> filters/*
> 
> */which will be suboptimal long term as more state is tracked. Software
> defined infrastructure may be the future but hardware defined software/*
> 
> */is sadly the present…/*
> 
> */ /*
> 
> */I do however think there should be a sperateion between asking for a
> host that provides x with a trait and  asking for x to be configure via/*
> 
> */A trait. The trait secure_boot_enabled should never result in the
> feature being enabled It should just find a host with it on. If you want/*
> 
> */To request it to be turned on you would request a host with
> secure_boot_capable as a trait and have a flavor extra spec or image
> property to request/*
> 
> */Ironic to enabled it.  these are two very different request and should
> not be treated the same. /*
> 
>  
> 
>  
> 
> Lemme know!
> 
> -jay
> 
>  
> 
> On Oct 23, 2017 5:01 AM, "Dmitry Tantsur"  > wrote:
> 
> Hi Jay!
> 
> I appreciate your comments, but I think you're approaching the
> problem from purely VM point of view. Things simply don't work the
> same way in bare metal, at least not if we want to provide the same
> user experience.
> 
>  
> 
> On Sun, Oct 22, 2017 at 2:25 PM, Jay Pipes  > wrote:
> 
> Sorry for delay, took a week off before starting a new job.
> Comments inline.
> 
> On 10/16/2017 12:24 PM, Dmitry Tantsur wrote:
> 
> Hi all,
> 
> I promised John to dump my thoughts on traits to the ML, so
> here we go :)
> 
> I see two roles of traits (or kinds of traits) for bare metal:
> 1. traits that say what the node can do already (e.g. "the
> node is
> doing UEFI boot")
> 2. traits that say what the node can be *configured* to do
> (e.g. "the node can
> boot in UEFI mode")
> 
> 
> There's only one role for traits. #2 above. #1 is state
> information. Traits are not for state information. Traits are
> only for communicating capabilities of a resource provider
> (baremetal node).
> 
>  
> 
> These are not different, that's what I'm talking about here. No
> users care about the difference between "this node was put in UEFI
> mode by an operator in advance", "this node was put in UEFI mode by
> an ironic driver on demand" and "this node is always in UEFI mode,
> because it's AARCH64 and it does not have BIOS". These situation
> produce the same result (the node is booted in UEFI mode), and thus
> it's up to ironic to hide this difference.
> 
>  
> 
> My suggestion with traits is one way to do it, I'm not sure what you
> suggest though.
> 
>  
> 
> 
> 

  1   2   >