Re: [openstack-dev] vGPUs support for Nova - Implementation

Jay Pipes Fri, 29 Sep 2017 08:18:23 -0700

Hi Sahid, comments inline. :)

On 09/29/2017 04:53 AM, Sahid Orentino Ferdjaoui wrote:

On Thu, Sep 28, 2017 at 05:06:16PM -0400, Jay Pipes wrote:

On 09/28/2017 11:37 AM, Sahid Orentino Ferdjaoui wrote:

Please consider the support of MDEV for the /pci framework which
provides support for vGPUs [0].


Accordingly to the discussion [1]

With this first implementation which could be used as a skeleton for
implementing PCI Devices in Resource Tracker


I'm not entirely sure what you're referring to above as "implementing PCI
devices in Resource Tracker". Could you elaborate? The resource tracker
already embeds a PciManager object that manages PCI devices, as you know.
Perhaps you meant "implement PCI devices as Resource Providers"?


A PciManager? I know that we have a field PCI_DEVICE :) - I guess a
virt driver can return inventory with total of PCI devices. Talking
about manager, not sure.


I'm referring to this:

https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L33

The PciDevTracker class is instantiated in the resource tracker when thefirst ComputeNode object managed by the resource tracker is init'd:


https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L578

On initialization, the PciDevTracker inventories the compute node'scollection of PCI devices by grabbing a list of records from thepci_devices table in the cell database:


https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L69

and then comparing those DB records with information the hypervisorreturns about PCI devices:


https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L160

Each hypervisor returns something different for the list of pci devices,as you know. For libvirt, the call that returns PCI device informationis here:


https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py#L842

The results of that are jammed into a "pci_passthrough_devices" key inthe returned result of the virt driver's get_available_resource() call.For libvirt, that's here:


https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L5809

It is that piece that Eric and myself have been talking aboutstandardizing into a "generic device management" interface that wouldhave an update_inventory() method that accepts a ProviderTree object [1]

[1]https://github.com/openstack/nova/blob/master/nova/compute/provider_tree.py

and would add resource providers corresponding to devices that are madeavailable to guests for use.

You still have to define "traits", basically for physical network
devices, the users want to select device according physical network,
to select device according the placement on host (NUMA), to select the
device according the bandwidth capability... For GPU it's same
story. *And I do not have mentioned devices which support virtual
functions.*

Yes, the generic device manager would be responsible for associatingtraits to the resource providers it adds to the ProviderTree provided toit in the update_inventory() call.

So that is what you plan to do for this release :) - Reasonably I
don't think we are close to have something ready for production.

I don't disagree with you that this is a huge amount of refactoring toundertake over the next couple releases. :)

Jay, I have question, Why you don't start by exposing NUMA ?

I believe you're asking here why we don't start by modeling NUMA nodesas child resource providers of the compute node? Instead of starting bymodeling PCI devices as child providers of the compute node? If that'snot what you're asking, please do clarify...

We're starting with modeling PCI devices as child providers of thecompute node because they are easier to deal with as a whole than NUMAnodes and we have the potential of being able to remove thePciPassthroughFilter from the scheduler in Queens.

I don't see us being able to remove the NUMATopologyFilter from thescheduler in Queens because of the complexity involved in how coupledthe NUMA topology resource handling is to CPU pinning, huge pagesupport, and IO emulation thread pinning.

Hope that answers that question; again, lemme know if that's not thequestion you were asking! :)

For the record, I have zero confidence in any existing "functional" tests
for NUMA, SR-IOV, CPU pinning, huge pages, and the like. Unfortunately, due
to the fact that these features often require hardware that either the
upstream community CI lacks or that depends on libraries, drivers and kernel
versions that really aren't available to non-bleeding edge users (or users
with very deep pockets).


It's good point, if you are not confidence, don't you think it's
premature to move forward on implementing new thing without to have
well trusted functional tests?

Completely agree with you. I would rather see functional integrationtests that are proven to actually test these complex hardware devices*gating* Nova patches before adding any new functionality to Nova.

We're adding lots of functional tests of the placement and resourceproviders modeling. I could definitely use some assistance from folkswith access to this specialized hardware to set up and maintain the CIsystems that can provide they are actually exercising these code paths.

* The Usage

There are no difference between SR-IOV and MDEV, from operators point
of view who knows how to expose SR-IOV devices in Nova, they already
know how to expose MDEV devices (vGPUs).

Operators will be able to expose MDEV devices in the same manner as
they expose SR-IOV:

   1/ Configure whitelist devices

   ['{"vendor_id":"10de"}']

   2/ Create aliases

   [{"vendor_id":"10de", "name":"vGPU"}]

   3/ Configure the flavor

   openstack flavor set --property "pci_passthrough:alias"="vGPU:1"

* Limitations

The mdev does not provide 'product_id' but 'mdev_type' which should be
considered to exactly identify which resource users can request e.g:
nvidia-10. To provide that support we have to add a new field
'mdev_type' so aliases could be something like:

   {"vendor_id":"10de", mdev_type="nvidia-10" "name":"alias-nvidia-10"}
   {"vendor_id":"10de", mdev_type="nvidia-11" "name":"alias-nvidia-11"}

I do have plan to add but first I need to have support from upstream
to continue that work.


As mentioned in IRC and the previous ML discussion, my focus is on the
nested resource providers work and reviews, along with the other two
top-priority scheduler items (move operations and alternate hosts).

I'll do my best to look at your patch series, but please note it's lower
priority than a number of other items.


No worries, the code is here, tested, fully functionnal and
production-ready, I made effort to make it available at the very
beginning of the release. With some good volitions we could fix any
bugs and have support for vGPUs in Queens.

You cannot say it's tested, fully functional and production-ready untilwe see functional integration tests proving that :)

One thing that would be very useful, Sahid, if you could get with Eric Fried
(efried) on IRC and discuss with him the "generic device management" system
that was discussed at the PTG. It's likely that the /pci module is going to
be overhauled in Rocky and it would be good to have the mdev device
management API requirements included in that discussion.

Perhaps you missed the above part of my response. I'd like to repeatthat it would be great to get your input on the generic devicemanagement ideas we've been throwing around.


All the best,
-jay

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] vGPUs support for Nova - Implementation

Reply via email to