On 06/09/2017 06:31 PM, Ed Leafe wrote:
On Jun 9, 2017, at 4:35 PM, Jay Pipes <jaypi...@gmail.com> wrote:

We can declare that allocating for shared disk is fairly deterministic
if we assume that any given compute node is only associated with one
shared disk provider.

a) we can't assume that
b) a compute node could very well have both local disk and shared disk. how 
would the placement API know which one to pick? This is a sorting/weighing 
decision and thus is something the scheduler is responsible for.

I remember having this discussion, and we concluded that a compute node could 
either have local or shared resources, but not both. There would be a trait to 
indicate shared disk. Has this changed?

I'm not sure it's changed per-se :) It's just that there's nothing preventing this from happening. A compute node can theoretically have local disk and also be associated with a shared storage pool.

* We already have the information the filter scheduler needs now by
  some other means, right?  What are the reasons we don't want to
  use that anymore?

The filter scheduler has most of the information, yes. What it doesn't have is the 
*identifier* (UUID) for things like SRIOV PFs or NUMA cells that the Placement API will 
use to distinguish between things. In other words, the filter scheduler currently does 
things like unpack a NUMATopology object into memory and determine a NUMA cell to place 
an instance to. However, it has no concept that that NUMA cell is (or will soon be once 
nested-resource-providers is done) a resource provider in the placement API. Same for 
SRIOV PFs. Same for VGPUs. Same for FPGAs, etc. That's why we need to return information 
to the scheduler from the placement API that will allow the scheduler to understand 
"hey, this NUMA cell on compute node X is resource provider $UUID".

I guess that this was the point that confused me. The RP uuid is part of the 
provider: the compute node's uuid, and (after 
https://review.openstack.org/#/c/469147/ merges) the PCI device's uuid. So in 
the code that passes the PCI device information to the scheduler, we could add 
that new uuid field, and then the scheduler would have the information to a) 
select the best fit and then b) claim it with the specific uuid. Same for all 
the other nested/shared devices.

How would the scheduler know that a particular SRIOV PF resource provider UUID is on a particular compute node unless the placement API returns information indicating that SRIOV PF is a child of a particular compute node resource provider?

I don't mean to belabor this, but to my mind this seems a lot less disruptive 
to the existing code.

Belabor away :) I don't mind talking through the details. It's important to do.

Best,
-jay

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to