subject:"\[openstack\-dev\] vGPUs support for Nova \- Implementation"

Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-10-02 Thread Mooney, Sean K



> -Original Message-
> From: Dan Smith [mailto:d...@danplanet.com]
> Sent: Monday, October 2, 2017 3:53 PM
> To: OpenStack Development Mailing List (not for usage questions)
> 
> Subject: Re: [openstack-dev] vGPUs support for Nova - Implementation
> 
> >> I also think there is value in exposing vGPU in a generic way,
> irrespective of the underlying implementation (whether it is DEMU,
> mdev, SR-IOV or whatever approach Hyper-V/VMWare use).
> >
> > That is a big ask. To start with, all GPUs are not created equal, and
> > various vGPU functionality as designed by the GPU vendors is not
> > consistent, never mind the quirks added between different hypervisor
> > implementations. So I feel like trying to expose this in a generic
> > manner is, at least asking for problems, and more likely bound for
> > failure.
> 
> I feel the opposite. IMHO, Nova’s role in life is not to expose all the
> quirks of the underlying platform, but rather to provide a useful
> abstraction on top of those things. In spite of them.
[Mooney, Sean K] I have to agree with dan here.
vGPUs are a great example of where nova can add value by abstracting
the hypervisor specifics and provide a abstract api to allow requesting
vGPUS without having to encode the semantics of that api provide by the
hypervisor or hardware vendor in what we expose to the tenant.
> 
> > Nova already exposes plenty of hypervisor-specific functionality (or
> > functionality only implemented for one hypervisor), and that's fine.
> 
> And those bits of functionality are some of the most problematic we
> have. Among other reasons, they make it difficult for us to expose
> Thing 2.0, when we’ve encoded Thing 1.0 into our API so rigidly. This
> happens even within one virt driver where Thing 2.0 is significantly
> different than Thing 1.0.
> 
> The vGPU stuff seems well-suited for the generic modeling work that
> we’ve spent the last few years working on, and is a perfect example of
> an area where we can avoid piling on more debt to a not-abstract-enough
> “model” and move forward with the new one. That’s certainly my
> preference, and I think it’s actually less work than the debt-ridden
> way.
> 
> -—Dan
[Mooney, Sean K] I also agree that its likely less work to start fresh with 
the correct generic solution now, then try to adapt the pci passthough code
we have today to support vGPUs with out breaking the current
sriov and passthrough support. how vGPUs are virtualized is GPU vendor specific
so even within a single host you may neen to support multiple methods 
(sriov/mdev...) in a single virt dirver. For example a cloud/host with both amd 
and nvidia
Gpus which uses Libvirt would have to support generating the correct xml for 
both solutions.
> 
> 
> 
> ___
> ___
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-10-02 Thread Dan Smith

>> I also think there is value in exposing vGPU in a generic way, irrespective 
>> of the underlying implementation (whether it is DEMU, mdev, SR-IOV or 
>> whatever approach Hyper-V/VMWare use).
> 
> That is a big ask. To start with, all GPUs are not created equal, and
> various vGPU functionality as designed by the GPU vendors is not
> consistent, never mind the quirks added between different hypervisor
> implementations. So I feel like trying to expose this in a generic
> manner is, at least asking for problems, and more likely bound for
> failure.

I feel the opposite. IMHO, Nova’s role in life is not to expose all the quirks 
of the underlying platform, but rather to provide a useful abstraction on top 
of those things. In spite of them.

> Nova already exposes plenty of hypervisor-specific functionality (or
> functionality only implemented for one hypervisor), and that's fine.

And those bits of functionality are some of the most problematic we have. Among 
other reasons, they make it difficult for us to expose Thing 2.0, when we’ve 
encoded Thing 1.0 into our API so rigidly. This happens even within one virt 
driver where Thing 2.0 is significantly different than Thing 1.0.

The vGPU stuff seems well-suited for the generic modeling work that we’ve spent 
the last few years working on, and is a perfect example of an area where we can 
avoid piling on more debt to a not-abstract-enough “model” and move forward 
with the new one. That’s certainly my preference, and I think it’s actually 
less work than the debt-ridden way.

-—Dan



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-10-02 Thread Sahid Orentino Ferdjaoui

On Fri, Sep 29, 2017 at 04:51:10PM +, Bob Ball wrote:
> Hi Sahid,
> 
> > > a second device emulator along-side QEMU.  There is no mdev 
> > > integration.  I'm concerned about how much mdev-specific functionality 
> > > would have to be faked up in the XenServer-specific driver for vGPU to 
> > > be used in this way.
> >
> > What you are refering with your DEMU it's what QEMU/KVM have with its 
> > vfio-pci. XenServer is
> > reading through MDEV since the vendors provide drivers on *Linux* using the 
> > MDEV framework.
> > MDEV is a kernel layer, used to expose hardwares, it's not hypervisor 
> > specific.
> 
> It is possible that the vendor's userspace libraries use mdev,
> however DEMU has no concept of mdev at all.  If the vendor's
> userspace libraries do use mdev then this is entirely abstracted
> from XenServer's integration.  While I don't have access to the
> vendors source for the userspace libraries or the kernel module my
> understanding was that the kernel module in XenServer's integration
> is for the userspace libraries to talk to the kernel module and for
> IOCTLS.  My reading of mdev implies that /sys/class/mdev_bus should
> exist for it to be used?  It does not exist in XenServer, which to
> me implies that the vendor's driver for XenServer do not use mdev?

I shared our discussion to Alex Williamson, it's response:

> Hi Sahid,
>
> XenServer does not use mdev for vGPU support.  The mdev/vfio
> infrastructure was developed in response to DEMU used on XenServer,
> which we felt was not an upstream acceptable solution.  There has
> been cursory interest in porting vfio to Xen, so it's possible that
> they might use the same mechanism some day, but for now they are
> different solutions, the vifo/mdev solution being the only one
> accepted upstream so far. Thanks,
>
> Alex

It's my mistake. It seems clear now that XenSever can't take the
benefice of that mdev support I have added in /pci module. The support
of vGPUs for Xen will have to wait for the generic device management I
guess.

>
> Bob
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-10-02 Thread Blair Bethwaite

On 29 September 2017 at 22:26, Bob Ball  wrote:
> The concepts of PCI and SR-IOV are, of course, generic, but I think out of 
> principal we should avoid a hypervisor-specific integration for vGPU (indeed 
> Citrix has been clear from the beginning that the vGPU integration we are 
> proposing is intentionally hypervisor agnostic)

To be fair, what this proposal is doing is piggy-backing on Nova's
existing PCI functionality to expose Linux/KVM VFIO mdev, it just so
happens mdev was created for vGPU, but it was designed to extend to
other devices/things too.

> I also think there is value in exposing vGPU in a generic way, irrespective 
> of the underlying implementation (whether it is DEMU, mdev, SR-IOV or 
> whatever approach Hyper-V/VMWare use).

That is a big ask. To start with, all GPUs are not created equal, and
various vGPU functionality as designed by the GPU vendors is not
consistent, never mind the quirks added between different hypervisor
implementations. So I feel like trying to expose this in a generic
manner is, at least asking for problems, and more likely bound for
failure.

Nova already exposes plenty of hypervisor-specific functionality (or
functionality only implemented for one hypervisor), and that's fine.
Maybe there should be a something in OpenStack that would generically
manage vGPU-graphics and/or vGPU-compute etc, but I'm pretty sure it
would never be allowed into Nova :-).

Anyway, take all that with a grain of salt, because frankly I would
love to see this in sooner rather than later - even if it did have a
big "this might change in non-upgradeable ways" sticker on it.

-- 
Cheers,
~Blairo

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-10-02 Thread Sahid Orentino Ferdjaoui

On Fri, Sep 29, 2017 at 11:16:43AM -0400, Jay Pipes wrote:
> Hi Sahid, comments inline. :)
> 
> On 09/29/2017 04:53 AM, Sahid Orentino Ferdjaoui wrote:
> > On Thu, Sep 28, 2017 at 05:06:16PM -0400, Jay Pipes wrote:
> > > On 09/28/2017 11:37 AM, Sahid Orentino Ferdjaoui wrote:
> > > > Please consider the support of MDEV for the /pci framework which
> > > > provides support for vGPUs [0].
> > > > 
> > > > Accordingly to the discussion [1]
> > > > 
> > > > With this first implementation which could be used as a skeleton for
> > > > implementing PCI Devices in Resource Tracker
> > > 
> > > I'm not entirely sure what you're referring to above as "implementing PCI
> > > devices in Resource Tracker". Could you elaborate? The resource tracker
> > > already embeds a PciManager object that manages PCI devices, as you know.
> > > Perhaps you meant "implement PCI devices as Resource Providers"?
> > 
> > A PciManager? I know that we have a field PCI_DEVICE :) - I guess a
> > virt driver can return inventory with total of PCI devices. Talking
> > about manager, not sure.
> 
> I'm referring to this:
> 
> https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L33
>
> [SNIP]
> 
> It is that piece that Eric and myself have been talking about standardizing
> into a "generic device management" interface that would have an
> update_inventory() method that accepts a ProviderTree object [1]

Jay, all of that looks to me perfectly sane even it's not clear what
you want make so generic. That part of code is for the virt layers and
you can't make it like just considering GPU or NET as a generic piece,
they have characteristic which are requirements for virt layers.

In that method 'update_inventory(provider_tree)' which you are going
to introduce for /pci/PciManager, a first step would be to convert the
objects to a understable dict for the whole logic, right, or do you
have an other plan?

In all cases from my POV I don't see any blocker, both work can
co-exist without any pain. And adding features in the current /pci
module is not going to add heavy work but is going to give to us a
clear view of what is needed.

> [1]
> https://github.com/openstack/nova/blob/master/nova/compute/provider_tree.py
> 
> and would add resource providers corresponding to devices that are made
> available to guests for use.
> 
> > You still have to define "traits", basically for physical network
> > devices, the users want to select device according physical network,
> > to select device according the placement on host (NUMA), to select the
> > device according the bandwidth capability... For GPU it's same
> > story. *And I do not have mentioned devices which support virtual
> > functions.*
> 
> Yes, the generic device manager would be responsible for associating traits
> to the resource providers it adds to the ProviderTree provided to it in the
> update_inventory() call.
> 
> > So that is what you plan to do for this release :) - Reasonably I
> > don't think we are close to have something ready for production.
> 
> I don't disagree with you that this is a huge amount of refactoring to
> undertake over the next couple releases. :)

Yes and that is the point. We are going to block the work on /pci
module during a period where we can see a large interest around such
support.

> > Jay, I have question, Why you don't start by exposing NUMA ?
> 
> I believe you're asking here why we don't start by modeling NUMA nodes as
> child resource providers of the compute node? Instead of starting by
> modeling PCI devices as child providers of the compute node? If that's not
> what you're asking, please do clarify...
> 
> We're starting with modeling PCI devices as child providers of the compute
> node because they are easier to deal with as a whole than NUMA nodes and we
> have the potential of being able to remove the PciPassthroughFilter from the
> scheduler in Queens.
> 
> I don't see us being able to remove the NUMATopologyFilter from the
> scheduler in Queens because of the complexity involved in how coupled the
> NUMA topology resource handling is to CPU pinning, huge page support, and IO
> emulation thread pinning.
> 
> Hope that answers that question; again, lemme know if that's not the
> question you were asking! :)

Yes it was the question and you perfectly responded, thanks. I will
try to be more clear in the future :)

As you have noticed the support of NUMA will be quite difficult and it
is not in the TODO right now, which let me think that we are going to
block development on pci module and more of that at the end provide
less support (no NUMA awareness). Is that reasonable ?

> > > For the record, I have zero confidence in any existing "functional" tests
> > > for NUMA, SR-IOV, CPU pinning, huge pages, and the like. Unfortunately, 
> > > due
> > > to the fact that these features often require hardware that either the
> > > upstream community CI lacks or that depends on libraries, drivers and 
> > > kernel
> > > versions that really aren't

Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-29 Thread Bob Ball

Hi Sahid,

> > a second device emulator along-side QEMU.  There is no mdev 
> > integration.  I'm concerned about how much mdev-specific functionality 
> > would have to be faked up in the XenServer-specific driver for vGPU to 
> > be used in this way.
>
> What you are refering with your DEMU it's what QEMU/KVM have with its 
> vfio-pci. XenServer is
> reading through MDEV since the vendors provide drivers on *Linux* using the 
> MDEV framework.
> MDEV is a kernel layer, used to expose hardwares, it's not hypervisor 
> specific.

It is possible that the vendor's userspace libraries use mdev, however DEMU has 
no concept of mdev at all.  If the vendor's userspace libraries do use mdev 
then this is entirely abstracted from XenServer's integration.
While I don't have access to the vendors source for the userspace libraries or 
the kernel module my understanding was that the kernel module in XenServer's 
integration is for the userspace libraries to talk to the kernel module and for 
IOCTLS.

My reading of mdev implies that /sys/class/mdev_bus should exist for it to be 
used?  It does not exist in XenServer, which to me implies that the vendor's 
driver for XenServer do not use mdev?

Bob

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-29 Thread Sahid Orentino Ferdjaoui

On Fri, Sep 29, 2017 at 12:26:07PM +, Bob Ball wrote:
> Hi Sahid,
> 
> > Please consider the support of MDEV for the /pci framework which provides 
> > support for vGPUs [0].
> 

> From my understanding, this MDEV implementation for vGPU would be
> entirely specific to libvirt, is that correct?

No, but Linux specific yes. Windows is supporting SR-IOV.

> XenServer's implementation for vGPU is based on a pooled device
> model (as described in
> http://lists.openstack.org/pipermail/openstack-dev/2017-September/122702.html)

This topic is referring something which I guess everyone understand
now - It's basically why I do have added support of MDEV in /pci to
make it working whatever how the virtual devices are exposed, SR-IOV
or MDEV.

> a second device emulator along-side QEMU.  There is no mdev
> integration.  I'm concerned about how much mdev-specific
> functionality would have to be faked up in the XenServer-specific
> driver for vGPU to be used in this way.

What you are refering with your DEMU it's what QEMU/KVM have with its
vfio-pci. XenServer is reading through MDEV since the vendors provide
drivers on *Linux* using the MDEV framework.

MDEV is a kernel layer, used to expose hardwares, it's not hypervisor
specific.

> I'm not familiar with mdev, but it looks Linux specific, so would not be 
> usable by Hyper-V?
> I've also not been able to find suggestions that VMWare can make use of mdev, 
> although I don't know the architecture of VMWare's integration.
> 
> The concepts of PCI and SR-IOV are, of course, generic, but I think out of 
> principal we should avoid a hypervisor-specific integration for vGPU (indeed 
> Citrix has been clear from the beginning that the vGPU integration we are 
> proposing is intentionally hypervisor agnostic)
> I also think there is value in exposing vGPU in a generic way, irrespective 
> of the underlying implementation (whether it is DEMU, mdev, SR-IOV or 
> whatever approach Hyper-V/VMWare use).
> 
> It's quite difficult for me to see how this will work for other
> hypervisors.  Do you also have a draft alternate spec where more
> details can be discussed?

I would expect that XenServer provides the MDEV UUID, then it's easy
to ask sysfs if you need to get the NUMA node of the physical device
or the mdev_type.

> Bob
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-29 Thread Jay Pipes


Hi Sahid, comments inline. :)

On 09/29/2017 04:53 AM, Sahid Orentino Ferdjaoui wrote:

On Thu, Sep 28, 2017 at 05:06:16PM -0400, Jay Pipes wrote:

On 09/28/2017 11:37 AM, Sahid Orentino Ferdjaoui wrote:

Please consider the support of MDEV for the /pci framework which
provides support for vGPUs [0].

Accordingly to the discussion [1]

With this first implementation which could be used as a skeleton for
implementing PCI Devices in Resource Tracker


I'm not entirely sure what you're referring to above as "implementing PCI
devices in Resource Tracker". Could you elaborate? The resource tracker
already embeds a PciManager object that manages PCI devices, as you know.
Perhaps you meant "implement PCI devices as Resource Providers"?


A PciManager? I know that we have a field PCI_DEVICE :) - I guess a
virt driver can return inventory with total of PCI devices. Talking
about manager, not sure.


I'm referring to this:

https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L33

The PciDevTracker class is instantiated in the resource tracker when the 
first ComputeNode object managed by the resource tracker is init'd:


https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L578

On initialization, the PciDevTracker inventories the compute node's 
collection of PCI devices by grabbing a list of records from the 
pci_devices table in the cell database:


https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L69

and then comparing those DB records with information the hypervisor 
returns about PCI devices:


https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L160

Each hypervisor returns something different for the list of pci devices, 
as you know. For libvirt, the call that returns PCI device information 
is here:


https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py#L842

The results of that are jammed into a "pci_passthrough_devices" key in 
the returned result of the virt driver's get_available_resource() call. 
For libvirt, that's here:


https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L5809

It is that piece that Eric and myself have been talking about 
standardizing into a "generic device management" interface that would 
have an update_inventory() method that accepts a ProviderTree object [1]


[1] 
https://github.com/openstack/nova/blob/master/nova/compute/provider_tree.py


and would add resource providers corresponding to devices that are made 
available to guests for use.



You still have to define "traits", basically for physical network
devices, the users want to select device according physical network,
to select device according the placement on host (NUMA), to select the
device according the bandwidth capability... For GPU it's same
story. *And I do not have mentioned devices which support virtual
functions.*


Yes, the generic device manager would be responsible for associating 
traits to the resource providers it adds to the ProviderTree provided to 
it in the update_inventory() call.



So that is what you plan to do for this release :) - Reasonably I
don't think we are close to have something ready for production.


I don't disagree with you that this is a huge amount of refactoring to 
undertake over the next couple releases. :)



Jay, I have question, Why you don't start by exposing NUMA ?


I believe you're asking here why we don't start by modeling NUMA nodes 
as child resource providers of the compute node? Instead of starting by 
modeling PCI devices as child providers of the compute node? If that's 
not what you're asking, please do clarify...


We're starting with modeling PCI devices as child providers of the 
compute node because they are easier to deal with as a whole than NUMA 
nodes and we have the potential of being able to remove the 
PciPassthroughFilter from the scheduler in Queens.


I don't see us being able to remove the NUMATopologyFilter from the 
scheduler in Queens because of the complexity involved in how coupled 
the NUMA topology resource handling is to CPU pinning, huge page 
support, and IO emulation thread pinning.


Hope that answers that question; again, lemme know if that's not the 
question you were asking! :)



For the record, I have zero confidence in any existing "functional" tests
for NUMA, SR-IOV, CPU pinning, huge pages, and the like. Unfortunately, due
to the fact that these features often require hardware that either the
upstream community CI lacks or that depends on libraries, drivers and kernel
versions that really aren't available to non-bleeding edge users (or users
with very deep pockets).


It's good point, if you are not confidence, don't you think it's
premature to move forward on implementing new thing without to have
well trusted functional tests?


Completely agree with you. I would rather see functional integration 
tests that are proven to actually test these complex hardware devices 
*gating* Nova patches before adding any new

Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-29 Thread Dan Smith


The concepts of PCI and SR-IOV are, of course, generic


They are, although the PowerVM guys have already pointed out that they
don't even refer to virtual devices by PCI address and thus anything 
based on that subsystem isn't going to help them.



but I think out of principal we should avoid a hypervisor-specific
integration for vGPU (indeed Citrix has been clear from the beginning
that the vGPU integration we are proposing is intentionally
hypervisor agnostic) I also think there is value in exposing vGPU in
a generic way, irrespective of the underlying implementation (whether
it is DEMU, mdev, SR-IOV or whatever approach Hyper-V/VMWare use).


I very much agree, of course.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-29 Thread Bob Ball

Hi Sahid,

> Please consider the support of MDEV for the /pci framework which provides 
> support for vGPUs [0].

From my understanding, this MDEV implementation for vGPU would be entirely 
specific to libvirt, is that correct?

XenServer's implementation for vGPU is based on a pooled device model (as 
described in 
http://lists.openstack.org/pipermail/openstack-dev/2017-September/122702.html) 
and directly interfaces with the card using DEMU ("Discrete EMU") as a second 
device emulator along-side QEMU.  There is no mdev integration.  I'm concerned 
about how much mdev-specific functionality would have to be faked up in the 
XenServer-specific driver for vGPU to be used in this way.

I'm not familiar with mdev, but it looks Linux specific, so would not be usable 
by Hyper-V?
I've also not been able to find suggestions that VMWare can make use of mdev, 
although I don't know the architecture of VMWare's integration.

The concepts of PCI and SR-IOV are, of course, generic, but I think out of 
principal we should avoid a hypervisor-specific integration for vGPU (indeed 
Citrix has been clear from the beginning that the vGPU integration we are 
proposing is intentionally hypervisor agnostic)
I also think there is value in exposing vGPU in a generic way, irrespective of 
the underlying implementation (whether it is DEMU, mdev, SR-IOV or whatever 
approach Hyper-V/VMWare use).

It's quite difficult for me to see how this will work for other hypervisors.  
Do you also have a draft alternate spec where more details can be discussed?

Bob
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-29 Thread Sylvain Bauza

On Fri, Sep 29, 2017 at 2:32 AM, Dan Smith  wrote:

> In this serie of patches we are generalizing the PCI framework to
>>> handle MDEV devices. We arguing it's a lot of patches but most of them
>>> are small and the logic behind is basically to make it understand two
>>> new fields MDEV_PF and MDEV_VF.
>>>
>>
>> That's not really "generalizing the PCI framework to handle MDEV devices"
>> :) More like it's just changing the /pci module to understand a different
>> device management API, but ok.
>>
>
> Yeah, the series is adding more fields to our PCI structure to allow for
> more variations in the kinds of things we lump into those tables. This is
> my primary complaint with this approach, and has been since the topic first
> came up. I really want to avoid building any more dependency on the
> existing pci-passthrough mechanisms and focus any new effort on using
> resource providers for this. The existing pci-passthrough code is almost
> universally hated, poorly understood and tested, and something we should
> not be further building upon.
>
> In this serie of patches we make libvirt driver support, as usually,
>>> return resources and attach devices returned by the pci manager. This
>>> part can be reused for Resource Provider.
>>>
>>
>> Perhaps, but the idea behind the resource providers framework is to treat
>> devices as generic things. Placement doesn't need to know about the
>> particular device attachment status.
>>
>
> I quickly went through the patches and left a few comments. The base work
> of pulling some of this out of libvirt is there, but it's all focused on
> the act of populating pci structures from the vgpu information we get from
> libvirt. That code could be made to instead populate a resource inventory,
> but that's about the most of the set that looks applicable to the
> placement-based approach.
>
>
I'll review them too.

As mentioned in IRC and the previous ML discussion, my focus is on the
>> nested resource providers work and reviews, along with the other two
>> top-priority scheduler items (move operations and alternate hosts).
>>
>> I'll do my best to look at your patch series, but please note it's lower
>> priority than a number of other items.
>>
>
> FWIW, I'm not really planning to spend any time reviewing it until/unless
> it is retooled to generate an inventory from the virt driver.
>
> With the two patches that report vgpus and then create guests with them
> when asked converted to resource providers, I think that would be enough to
> have basic vgpu support immediately. No DB migrations, model changes, etc
> required. After that, helping to get the nested-rps and traits work landed
> gets us the ability to expose attributes of different types of those vgpus
> and opens up a lot of possibilities. IMHO, that's work I'm interested in
> reviewing.
>

That's exactly the things I would like to provide for Queens, so operators
would have a possibility to have flavors asking for vGPU resources in
Queens, even if they couldn't yet ask for a specific VGPU type yet (or
asking to be in the same NUMA cell than the CPU). The latter is definitely
needing to have nested resource providers, but the former (just having vGPU
resource classes provided by the virt driver) is possible for Queens.



> One thing that would be very useful, Sahid, if you could get with Eric
>> Fried (efried) on IRC and discuss with him the "generic device management"
>> system that was discussed at the PTG. It's likely that the /pci module is
>> going to be overhauled in Rocky and it would be good to have the mdev
>> device management API requirements included in that discussion.
>>
>
> Definitely this.
>

++


> --Dan
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-29 Thread Sahid Orentino Ferdjaoui

On Thu, Sep 28, 2017 at 05:06:16PM -0400, Jay Pipes wrote:
> On 09/28/2017 11:37 AM, Sahid Orentino Ferdjaoui wrote:
> > Please consider the support of MDEV for the /pci framework which
> > provides support for vGPUs [0].
> > 
> > Accordingly to the discussion [1]
> > 
> > With this first implementation which could be used as a skeleton for
> > implementing PCI Devices in Resource Tracker
> 
> I'm not entirely sure what you're referring to above as "implementing PCI
> devices in Resource Tracker". Could you elaborate? The resource tracker
> already embeds a PciManager object that manages PCI devices, as you know.
> Perhaps you meant "implement PCI devices as Resource Providers"?

A PciManager? I know that we have a field PCI_DEVICE :) - I guess a
virt driver can return inventory with total of PCI devices. Talking
about manager, not sure.

You still have to define "traits", basically for physical network
devices, the users want to select device according physical network,
to select device according the placement on host (NUMA), to select the
device according the bandwidth capability... For GPU it's same
story. *And I do not have mentioned devices which support virtual
functions.*

So that is what you plan to do for this release :) - Reasonably I
don't think we are close to have something ready for production.

Jay, I have question, Why you don't start by exposing NUMA ?

> > we provide support for
> > attaching vGPUs to guests. And also to provide affinity per NUMA
> > nodes. An other important point is that that implementation can take
> > advantage of the ongoing specs like PCI NUMA policies.
> > 
> > * The Implementation [0]
> > 
> > [PATCH 01/13] pci: update PciDevice object field 'address' to accept
> > [PATCH 02/13] pci: add for PciDevice object new field mdev
> > [PATCH 03/13] pci: generalize object unit-tests for different
> > [PATCH 04/13] pci: add support for mdev device type request
> > [PATCH 05/13] pci: generalize stats unit-tests for different
> > [PATCH 06/13] pci: add support for mdev devices type devspec
> > [PATCH 07/13] pci: add support for resource pool stats of mdev
> > [PATCH 08/13] pci: make manager to accept handling mdev devices
> > 
> > In this serie of patches we are generalizing the PCI framework to
> > handle MDEV devices. We arguing it's a lot of patches but most of them
> > are small and the logic behind is basically to make it understand two
> > new fields MDEV_PF and MDEV_VF.
> 
> That's not really "generalizing the PCI framework to handle MDEV devices" :)
> More like it's just changing the /pci module to understand a different
> device management API, but ok.

If you prefer call it like that :) - The point is the /pci manages
physical devices, It can passthrough the whole device or its virtual
functions exposed through SRIOV or MDEV.

> > [PATCH 09/13] libvirt: update PCI node device to report mdev devices
> > [PATCH 10/13] libvirt: report mdev resources
> > [PATCH 11/13] libvirt: add support to start vm with using mdev (vGPU)
> > 
> > In this serie of patches we make libvirt driver support, as usually,
> > return resources and attach devices returned by the pci manager. This
> > part can be reused for Resource Provider.
> 
> Perhaps, but the idea behind the resource providers framework is to treat
> devices as generic things. Placement doesn't need to know about the
> particular device attachment status.
> 
> > [PATCH 12/13] functional: rework fakelibvirt host pci devices
> > [PATCH 13/13] libvirt: resuse SRIOV funtional tests for MDEV devices
> > 
> > Here we reuse 100/100 of the functional tests used for SR-IOV
> > devices. Again here, this part can be reused for Resource Provider.
> 
> Probably not, but I'll take a look :)
> 
> For the record, I have zero confidence in any existing "functional" tests
> for NUMA, SR-IOV, CPU pinning, huge pages, and the like. Unfortunately, due
> to the fact that these features often require hardware that either the
> upstream community CI lacks or that depends on libraries, drivers and kernel
> versions that really aren't available to non-bleeding edge users (or users
> with very deep pockets).

It's good point, if you are not confidence, don't you think it's
premature to move forward on implementing new thing without to have
well trusted functional tests?

> > * The Usage
> > 
> > There are no difference between SR-IOV and MDEV, from operators point
> > of view who knows how to expose SR-IOV devices in Nova, they already
> > know how to expose MDEV devices (vGPUs).
> > 
> > Operators will be able to expose MDEV devices in the same manner as
> > they expose SR-IOV:
> > 
> >   1/ Configure whitelist devices
> > 
> >   ['{"vendor_id":"10de"}']
> > 
> >   2/ Create aliases
> > 
> >   [{"vendor_id":"10de", "name":"vGPU"}]
> > 
> >   3/ Configure the flavor
> > 
> >   openstack flavor set --property "pci_passthrough:alias"="vGPU:1"
> > 
> > * Limitations
> > 
> > The mdev does not provide 'product_id' but 'mdev_type' which should be
> > cons

Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-28 Thread Dan Smith


In this serie of patches we are generalizing the PCI framework to
handle MDEV devices. We arguing it's a lot of patches but most of them
are small and the logic behind is basically to make it understand two
new fields MDEV_PF and MDEV_VF.


That's not really "generalizing the PCI framework to handle MDEV 
devices" :) More like it's just changing the /pci module to understand a 
different device management API, but ok.


Yeah, the series is adding more fields to our PCI structure to allow for 
more variations in the kinds of things we lump into those tables. This 
is my primary complaint with this approach, and has been since the topic 
first came up. I really want to avoid building any more dependency on 
the existing pci-passthrough mechanisms and focus any new effort on 
using resource providers for this. The existing pci-passthrough code is 
almost universally hated, poorly understood and tested, and something we 
should not be further building upon.



In this serie of patches we make libvirt driver support, as usually,
return resources and attach devices returned by the pci manager. This
part can be reused for Resource Provider.


Perhaps, but the idea behind the resource providers framework is to 
treat devices as generic things. Placement doesn't need to know about 
the particular device attachment status.


I quickly went through the patches and left a few comments. The base 
work of pulling some of this out of libvirt is there, but it's all 
focused on the act of populating pci structures from the vgpu 
information we get from libvirt. That code could be made to instead 
populate a resource inventory, but that's about the most of the set that 
looks applicable to the placement-based approach.


As mentioned in IRC and the previous ML discussion, my focus is on the 
nested resource providers work and reviews, along with the other two 
top-priority scheduler items (move operations and alternate hosts).


I'll do my best to look at your patch series, but please note it's lower 
priority than a number of other items.


FWIW, I'm not really planning to spend any time reviewing it 
until/unless it is retooled to generate an inventory from the virt driver.


With the two patches that report vgpus and then create guests with them 
when asked converted to resource providers, I think that would be enough 
to have basic vgpu support immediately. No DB migrations, model changes, 
etc required. After that, helping to get the nested-rps and traits work 
landed gets us the ability to expose attributes of different types of 
those vgpus and opens up a lot of possibilities. IMHO, that's work I'm 
interested in reviewing.


One thing that would be very useful, Sahid, if you could get with Eric 
Fried (efried) on IRC and discuss with him the "generic device 
management" system that was discussed at the PTG. It's likely that the 
/pci module is going to be overhauled in Rocky and it would be good to 
have the mdev device management API requirements included in that 
discussion.


Definitely this.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-28 Thread Jay Pipes

On 09/28/2017 11:37 AM, Sahid Orentino Ferdjaoui wrote:

Please consider the support of MDEV for the /pci framework which
provides support for vGPUs [0].

Accordingly to the discussion [1]

With this first implementation which could be used as a skeleton for
implementing PCI Devices in Resource Tracker

I'm not entirely sure what you're referring to above as "implementing
PCI devices in Resource Tracker". Could you elaborate? The resource
tracker already embeds a PciManager object that manages PCI devices, as
you know. Perhaps you meant "implement PCI devices as Resource Providers"?

we provide support for
attaching vGPUs to guests. And also to provide affinity per NUMA
nodes. An other important point is that that implementation can take
advantage of the ongoing specs like PCI NUMA policies.

* The Implementation [0]

[PATCH 01/13] pci: update PciDevice object field 'address' to accept
[PATCH 02/13] pci: add for PciDevice object new field mdev
[PATCH 03/13] pci: generalize object unit-tests for different
[PATCH 04/13] pci: add support for mdev device type request
[PATCH 05/13] pci: generalize stats unit-tests for different
[PATCH 06/13] pci: add support for mdev devices type devspec
[PATCH 07/13] pci: add support for resource pool stats of mdev
[PATCH 08/13] pci: make manager to accept handling mdev devices

In this serie of patches we are generalizing the PCI framework to
handle MDEV devices. We arguing it's a lot of patches but most of them
are small and the logic behind is basically to make it understand two
new fields MDEV_PF and MDEV_VF.

That's not really "generalizing the PCI framework to handle MDEV
devices" :) More like it's just changing the /pci module to understand a
different device management API, but ok.

[PATCH 09/13] libvirt: update PCI node device to report mdev devices
[PATCH 10/13] libvirt: report mdev resources
[PATCH 11/13] libvirt: add support to start vm with using mdev (vGPU)

In this serie of patches we make libvirt driver support, as usually,
return resources and attach devices returned by the pci manager. This
part can be reused for Resource Provider.

Perhaps, but the idea behind the resource providers framework is to
treat devices as generic things. Placement doesn't need to know about
the particular device attachment status.

[PATCH 12/13] functional: rework fakelibvirt host pci devices
[PATCH 13/13] libvirt: resuse SRIOV funtional tests for MDEV devices

Here we reuse 100/100 of the functional tests used for SR-IOV
devices. Again here, this part can be reused for Resource Provider.

Probably not, but I'll take a look :)

For the record, I have zero confidence in any existing "functional"
tests for NUMA, SR-IOV, CPU pinning, huge pages, and the like.
Unfortunately, due to the fact that these features often require
hardware that either the upstream community CI lacks or that depends on
libraries, drivers and kernel versions that really aren't available to
non-bleeding edge users (or users with very deep pockets).

* The Usage

There are no difference between SR-IOV and MDEV, from operators point
of view who knows how to expose SR-IOV devices in Nova, they already
know how to expose MDEV devices (vGPUs).

Operators will be able to expose MDEV devices in the same manner as
they expose SR-IOV:

1/ Configure whitelist devices

['{"vendor_id":"10de"}']

2/ Create aliases

[{"vendor_id":"10de", "name":"vGPU"}]

3/ Configure the flavor

openstack flavor set --property "pci_passthrough:alias"="vGPU:1"

* Limitations

The mdev does not provide 'product_id' but 'mdev_type' which should be
considered to exactly identify which resource users can request e.g:
nvidia-10. To provide that support we have to add a new field
'mdev_type' so aliases could be something like:

{"vendor_id":"10de", mdev_type="nvidia-10" "name":"alias-nvidia-10"}
{"vendor_id":"10de", mdev_type="nvidia-11" "name":"alias-nvidia-11"}

I do have plan to add but first I need to have support from upstream
to continue that work.

As mentioned in IRC and the previous ML discussion, my focus is on the
nested resource providers work and reviews, along with the other two
top-priority scheduler items (move operations and alternate hosts).

I'll do my best to look at your patch series, but please note it's lower
priority than a number of other items.

One thing that would be very useful, Sahid, if you could get with Eric
Fried (efried) on IRC and discuss with him the "generic device
management" system that was discussed at the PTG. It's likely that the
/pci module is going to be overhauled in Rocky and it would be good to
have the mdev device management API requirements included in that
discussion.

Best,
-jay

[0]
https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:pci-mdev-support
[1]
http://lists.openstack.org/pipermail/openstack-dev/2017-September/122591.html

___

[openstack-dev] vGPUs support for Nova - Implementation

2017-09-28 Thread Sahid Orentino Ferdjaoui

Please consider the support of MDEV for the /pci framework which
provides support for vGPUs [0].

Accordingly to the discussion [1]

With this first implementation which could be used as a skeleton for
implementing PCI Devices in Resource Tracker we provide support for
attaching vGPUs to guests. And also to provide affinity per NUMA
nodes. An other important point is that that implementation can take
advantage of the ongoing specs like PCI NUMA policies.

* The Implementation [0]

[PATCH 01/13] pci: update PciDevice object field 'address' to accept
[PATCH 02/13] pci: add for PciDevice object new field mdev
[PATCH 03/13] pci: generalize object unit-tests for different
[PATCH 04/13] pci: add support for mdev device type request
[PATCH 05/13] pci: generalize stats unit-tests for different
[PATCH 06/13] pci: add support for mdev devices type devspec
[PATCH 07/13] pci: add support for resource pool stats of mdev
[PATCH 08/13] pci: make manager to accept handling mdev devices

In this serie of patches we are generalizing the PCI framework to
handle MDEV devices. We arguing it's a lot of patches but most of them
are small and the logic behind is basically to make it understand two
new fields MDEV_PF and MDEV_VF.

[PATCH 09/13] libvirt: update PCI node device to report mdev devices
[PATCH 10/13] libvirt: report mdev resources
[PATCH 11/13] libvirt: add support to start vm with using mdev (vGPU)

In this serie of patches we make libvirt driver support, as usually,
return resources and attach devices returned by the pci manager. This
part can be reused for Resource Provider.

[PATCH 12/13] functional: rework fakelibvirt host pci devices
[PATCH 13/13] libvirt: resuse SRIOV funtional tests for MDEV devices

Here we reuse 100/100 of the functional tests used for SR-IOV
devices. Again here, this part can be reused for Resource Provider.

* The Usage

There are no difference between SR-IOV and MDEV, from operators point
of view who knows how to expose SR-IOV devices in Nova, they already
know how to expose MDEV devices (vGPUs).

Operators will be able to expose MDEV devices in the same manner as
they expose SR-IOV:

 1/ Configure whitelist devices

 ['{"vendor_id":"10de"}']

 2/ Create aliases

 [{"vendor_id":"10de", "name":"vGPU"}]

 3/ Configure the flavor

 openstack flavor set --property "pci_passthrough:alias"="vGPU:1"

* Limitations

The mdev does not provide 'product_id' but 'mdev_type' which should be
considered to exactly identify which resource users can request e.g:
nvidia-10. To provide that support we have to add a new field
'mdev_type' so aliases could be something like:

 {"vendor_id":"10de", mdev_type="nvidia-10" "name":"alias-nvidia-10"}
 {"vendor_id":"10de", mdev_type="nvidia-11" "name":"alias-nvidia-11"}

I do have plan to add but first I need to have support from upstream
to continue that work.


[0] 
https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:pci-mdev-support
[1] 
http://lists.openstack.org/pipermail/openstack-dev/2017-September/122591.html

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] vGPUs support for Nova - Implementation

Re: [openstack-dev] vGPUs support for Nova - Implementation

Re: [openstack-dev] vGPUs support for Nova - Implementation

Re: [openstack-dev] vGPUs support for Nova - Implementation

Re: [openstack-dev] vGPUs support for Nova - Implementation

Re: [openstack-dev] vGPUs support for Nova - Implementation

Re: [openstack-dev] vGPUs support for Nova - Implementation

Re: [openstack-dev] vGPUs support for Nova - Implementation

Re: [openstack-dev] vGPUs support for Nova - Implementation

Re: [openstack-dev] vGPUs support for Nova - Implementation

Re: [openstack-dev] vGPUs support for Nova - Implementation

Re: [openstack-dev] vGPUs support for Nova - Implementation

Re: [openstack-dev] vGPUs support for Nova - Implementation

Re: [openstack-dev] vGPUs support for Nova - Implementation

[openstack-dev] vGPUs support for Nova - Implementation

15 matches

Site Navigation

Mail list logo

Footer information