Re: [openstack-dev] vGPUs support for Nova - Implementation
> -Original Message- > From: Dan Smith [mailto:d...@danplanet.com] > Sent: Monday, October 2, 2017 3:53 PM > To: OpenStack Development Mailing List (not for usage questions) > > Subject: Re: [openstack-dev] vGPUs support for Nova - Implementation > > >> I also think there is value in exposing vGPU in a generic way, > irrespective of the underlying implementation (whether it is DEMU, > mdev, SR-IOV or whatever approach Hyper-V/VMWare use). > > > > That is a big ask. To start with, all GPUs are not created equal, and > > various vGPU functionality as designed by the GPU vendors is not > > consistent, never mind the quirks added between different hypervisor > > implementations. So I feel like trying to expose this in a generic > > manner is, at least asking for problems, and more likely bound for > > failure. > > I feel the opposite. IMHO, Nova’s role in life is not to expose all the > quirks of the underlying platform, but rather to provide a useful > abstraction on top of those things. In spite of them. [Mooney, Sean K] I have to agree with dan here. vGPUs are a great example of where nova can add value by abstracting the hypervisor specifics and provide a abstract api to allow requesting vGPUS without having to encode the semantics of that api provide by the hypervisor or hardware vendor in what we expose to the tenant. > > > Nova already exposes plenty of hypervisor-specific functionality (or > > functionality only implemented for one hypervisor), and that's fine. > > And those bits of functionality are some of the most problematic we > have. Among other reasons, they make it difficult for us to expose > Thing 2.0, when we’ve encoded Thing 1.0 into our API so rigidly. This > happens even within one virt driver where Thing 2.0 is significantly > different than Thing 1.0. > > The vGPU stuff seems well-suited for the generic modeling work that > we’ve spent the last few years working on, and is a perfect example of > an area where we can avoid piling on more debt to a not-abstract-enough > “model” and move forward with the new one. That’s certainly my > preference, and I think it’s actually less work than the debt-ridden > way. > > -—Dan [Mooney, Sean K] I also agree that its likely less work to start fresh with the correct generic solution now, then try to adapt the pci passthough code we have today to support vGPUs with out breaking the current sriov and passthrough support. how vGPUs are virtualized is GPU vendor specific so even within a single host you may neen to support multiple methods (sriov/mdev...) in a single virt dirver. For example a cloud/host with both amd and nvidia Gpus which uses Libvirt would have to support generating the correct xml for both solutions. > > > > ___ > ___ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev- > requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] vGPUs support for Nova - Implementation
>> I also think there is value in exposing vGPU in a generic way, irrespective >> of the underlying implementation (whether it is DEMU, mdev, SR-IOV or >> whatever approach Hyper-V/VMWare use). > > That is a big ask. To start with, all GPUs are not created equal, and > various vGPU functionality as designed by the GPU vendors is not > consistent, never mind the quirks added between different hypervisor > implementations. So I feel like trying to expose this in a generic > manner is, at least asking for problems, and more likely bound for > failure. I feel the opposite. IMHO, Nova’s role in life is not to expose all the quirks of the underlying platform, but rather to provide a useful abstraction on top of those things. In spite of them. > Nova already exposes plenty of hypervisor-specific functionality (or > functionality only implemented for one hypervisor), and that's fine. And those bits of functionality are some of the most problematic we have. Among other reasons, they make it difficult for us to expose Thing 2.0, when we’ve encoded Thing 1.0 into our API so rigidly. This happens even within one virt driver where Thing 2.0 is significantly different than Thing 1.0. The vGPU stuff seems well-suited for the generic modeling work that we’ve spent the last few years working on, and is a perfect example of an area where we can avoid piling on more debt to a not-abstract-enough “model” and move forward with the new one. That’s certainly my preference, and I think it’s actually less work than the debt-ridden way. -—Dan __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] vGPUs support for Nova - Implementation
On Fri, Sep 29, 2017 at 04:51:10PM +, Bob Ball wrote: > Hi Sahid, > > > > a second device emulator along-side QEMU. There is no mdev > > > integration. I'm concerned about how much mdev-specific functionality > > > would have to be faked up in the XenServer-specific driver for vGPU to > > > be used in this way. > > > > What you are refering with your DEMU it's what QEMU/KVM have with its > > vfio-pci. XenServer is > > reading through MDEV since the vendors provide drivers on *Linux* using the > > MDEV framework. > > MDEV is a kernel layer, used to expose hardwares, it's not hypervisor > > specific. > > It is possible that the vendor's userspace libraries use mdev, > however DEMU has no concept of mdev at all. If the vendor's > userspace libraries do use mdev then this is entirely abstracted > from XenServer's integration. While I don't have access to the > vendors source for the userspace libraries or the kernel module my > understanding was that the kernel module in XenServer's integration > is for the userspace libraries to talk to the kernel module and for > IOCTLS. My reading of mdev implies that /sys/class/mdev_bus should > exist for it to be used? It does not exist in XenServer, which to > me implies that the vendor's driver for XenServer do not use mdev? I shared our discussion to Alex Williamson, it's response: > Hi Sahid, > > XenServer does not use mdev for vGPU support. The mdev/vfio > infrastructure was developed in response to DEMU used on XenServer, > which we felt was not an upstream acceptable solution. There has > been cursory interest in porting vfio to Xen, so it's possible that > they might use the same mechanism some day, but for now they are > different solutions, the vifo/mdev solution being the only one > accepted upstream so far. Thanks, > > Alex It's my mistake. It seems clear now that XenSever can't take the benefice of that mdev support I have added in /pci module. The support of vGPUs for Xen will have to wait for the generic device management I guess. > > Bob > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] vGPUs support for Nova - Implementation
On 29 September 2017 at 22:26, Bob Ball wrote: > The concepts of PCI and SR-IOV are, of course, generic, but I think out of > principal we should avoid a hypervisor-specific integration for vGPU (indeed > Citrix has been clear from the beginning that the vGPU integration we are > proposing is intentionally hypervisor agnostic) To be fair, what this proposal is doing is piggy-backing on Nova's existing PCI functionality to expose Linux/KVM VFIO mdev, it just so happens mdev was created for vGPU, but it was designed to extend to other devices/things too. > I also think there is value in exposing vGPU in a generic way, irrespective > of the underlying implementation (whether it is DEMU, mdev, SR-IOV or > whatever approach Hyper-V/VMWare use). That is a big ask. To start with, all GPUs are not created equal, and various vGPU functionality as designed by the GPU vendors is not consistent, never mind the quirks added between different hypervisor implementations. So I feel like trying to expose this in a generic manner is, at least asking for problems, and more likely bound for failure. Nova already exposes plenty of hypervisor-specific functionality (or functionality only implemented for one hypervisor), and that's fine. Maybe there should be a something in OpenStack that would generically manage vGPU-graphics and/or vGPU-compute etc, but I'm pretty sure it would never be allowed into Nova :-). Anyway, take all that with a grain of salt, because frankly I would love to see this in sooner rather than later - even if it did have a big "this might change in non-upgradeable ways" sticker on it. -- Cheers, ~Blairo __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] vGPUs support for Nova - Implementation
On Fri, Sep 29, 2017 at 11:16:43AM -0400, Jay Pipes wrote: > Hi Sahid, comments inline. :) > > On 09/29/2017 04:53 AM, Sahid Orentino Ferdjaoui wrote: > > On Thu, Sep 28, 2017 at 05:06:16PM -0400, Jay Pipes wrote: > > > On 09/28/2017 11:37 AM, Sahid Orentino Ferdjaoui wrote: > > > > Please consider the support of MDEV for the /pci framework which > > > > provides support for vGPUs [0]. > > > > > > > > Accordingly to the discussion [1] > > > > > > > > With this first implementation which could be used as a skeleton for > > > > implementing PCI Devices in Resource Tracker > > > > > > I'm not entirely sure what you're referring to above as "implementing PCI > > > devices in Resource Tracker". Could you elaborate? The resource tracker > > > already embeds a PciManager object that manages PCI devices, as you know. > > > Perhaps you meant "implement PCI devices as Resource Providers"? > > > > A PciManager? I know that we have a field PCI_DEVICE :) - I guess a > > virt driver can return inventory with total of PCI devices. Talking > > about manager, not sure. > > I'm referring to this: > > https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L33 > > [SNIP] > > It is that piece that Eric and myself have been talking about standardizing > into a "generic device management" interface that would have an > update_inventory() method that accepts a ProviderTree object [1] Jay, all of that looks to me perfectly sane even it's not clear what you want make so generic. That part of code is for the virt layers and you can't make it like just considering GPU or NET as a generic piece, they have characteristic which are requirements for virt layers. In that method 'update_inventory(provider_tree)' which you are going to introduce for /pci/PciManager, a first step would be to convert the objects to a understable dict for the whole logic, right, or do you have an other plan? In all cases from my POV I don't see any blocker, both work can co-exist without any pain. And adding features in the current /pci module is not going to add heavy work but is going to give to us a clear view of what is needed. > [1] > https://github.com/openstack/nova/blob/master/nova/compute/provider_tree.py > > and would add resource providers corresponding to devices that are made > available to guests for use. > > > You still have to define "traits", basically for physical network > > devices, the users want to select device according physical network, > > to select device according the placement on host (NUMA), to select the > > device according the bandwidth capability... For GPU it's same > > story. *And I do not have mentioned devices which support virtual > > functions.* > > Yes, the generic device manager would be responsible for associating traits > to the resource providers it adds to the ProviderTree provided to it in the > update_inventory() call. > > > So that is what you plan to do for this release :) - Reasonably I > > don't think we are close to have something ready for production. > > I don't disagree with you that this is a huge amount of refactoring to > undertake over the next couple releases. :) Yes and that is the point. We are going to block the work on /pci module during a period where we can see a large interest around such support. > > Jay, I have question, Why you don't start by exposing NUMA ? > > I believe you're asking here why we don't start by modeling NUMA nodes as > child resource providers of the compute node? Instead of starting by > modeling PCI devices as child providers of the compute node? If that's not > what you're asking, please do clarify... > > We're starting with modeling PCI devices as child providers of the compute > node because they are easier to deal with as a whole than NUMA nodes and we > have the potential of being able to remove the PciPassthroughFilter from the > scheduler in Queens. > > I don't see us being able to remove the NUMATopologyFilter from the > scheduler in Queens because of the complexity involved in how coupled the > NUMA topology resource handling is to CPU pinning, huge page support, and IO > emulation thread pinning. > > Hope that answers that question; again, lemme know if that's not the > question you were asking! :) Yes it was the question and you perfectly responded, thanks. I will try to be more clear in the future :) As you have noticed the support of NUMA will be quite difficult and it is not in the TODO right now, which let me think that we are going to block development on pci module and more of that at the end provide less support (no NUMA awareness). Is that reasonable ? > > > For the record, I have zero confidence in any existing "functional" tests > > > for NUMA, SR-IOV, CPU pinning, huge pages, and the like. Unfortunately, > > > due > > > to the fact that these features often require hardware that either the > > > upstream community CI lacks or that depends on libraries, drivers and > > > kernel > > > versions that really aren't
Re: [openstack-dev] vGPUs support for Nova - Implementation
Hi Sahid, > > a second device emulator along-side QEMU. There is no mdev > > integration. I'm concerned about how much mdev-specific functionality > > would have to be faked up in the XenServer-specific driver for vGPU to > > be used in this way. > > What you are refering with your DEMU it's what QEMU/KVM have with its > vfio-pci. XenServer is > reading through MDEV since the vendors provide drivers on *Linux* using the > MDEV framework. > MDEV is a kernel layer, used to expose hardwares, it's not hypervisor > specific. It is possible that the vendor's userspace libraries use mdev, however DEMU has no concept of mdev at all. If the vendor's userspace libraries do use mdev then this is entirely abstracted from XenServer's integration. While I don't have access to the vendors source for the userspace libraries or the kernel module my understanding was that the kernel module in XenServer's integration is for the userspace libraries to talk to the kernel module and for IOCTLS. My reading of mdev implies that /sys/class/mdev_bus should exist for it to be used? It does not exist in XenServer, which to me implies that the vendor's driver for XenServer do not use mdev? Bob __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] vGPUs support for Nova - Implementation
On Fri, Sep 29, 2017 at 12:26:07PM +, Bob Ball wrote: > Hi Sahid, > > > Please consider the support of MDEV for the /pci framework which provides > > support for vGPUs [0]. > > From my understanding, this MDEV implementation for vGPU would be > entirely specific to libvirt, is that correct? No, but Linux specific yes. Windows is supporting SR-IOV. > XenServer's implementation for vGPU is based on a pooled device > model (as described in > http://lists.openstack.org/pipermail/openstack-dev/2017-September/122702.html) This topic is referring something which I guess everyone understand now - It's basically why I do have added support of MDEV in /pci to make it working whatever how the virtual devices are exposed, SR-IOV or MDEV. > a second device emulator along-side QEMU. There is no mdev > integration. I'm concerned about how much mdev-specific > functionality would have to be faked up in the XenServer-specific > driver for vGPU to be used in this way. What you are refering with your DEMU it's what QEMU/KVM have with its vfio-pci. XenServer is reading through MDEV since the vendors provide drivers on *Linux* using the MDEV framework. MDEV is a kernel layer, used to expose hardwares, it's not hypervisor specific. > I'm not familiar with mdev, but it looks Linux specific, so would not be > usable by Hyper-V? > I've also not been able to find suggestions that VMWare can make use of mdev, > although I don't know the architecture of VMWare's integration. > > The concepts of PCI and SR-IOV are, of course, generic, but I think out of > principal we should avoid a hypervisor-specific integration for vGPU (indeed > Citrix has been clear from the beginning that the vGPU integration we are > proposing is intentionally hypervisor agnostic) > I also think there is value in exposing vGPU in a generic way, irrespective > of the underlying implementation (whether it is DEMU, mdev, SR-IOV or > whatever approach Hyper-V/VMWare use). > > It's quite difficult for me to see how this will work for other > hypervisors. Do you also have a draft alternate spec where more > details can be discussed? I would expect that XenServer provides the MDEV UUID, then it's easy to ask sysfs if you need to get the NUMA node of the physical device or the mdev_type. > Bob > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] vGPUs support for Nova - Implementation
Hi Sahid, comments inline. :) On 09/29/2017 04:53 AM, Sahid Orentino Ferdjaoui wrote: On Thu, Sep 28, 2017 at 05:06:16PM -0400, Jay Pipes wrote: On 09/28/2017 11:37 AM, Sahid Orentino Ferdjaoui wrote: Please consider the support of MDEV for the /pci framework which provides support for vGPUs [0]. Accordingly to the discussion [1] With this first implementation which could be used as a skeleton for implementing PCI Devices in Resource Tracker I'm not entirely sure what you're referring to above as "implementing PCI devices in Resource Tracker". Could you elaborate? The resource tracker already embeds a PciManager object that manages PCI devices, as you know. Perhaps you meant "implement PCI devices as Resource Providers"? A PciManager? I know that we have a field PCI_DEVICE :) - I guess a virt driver can return inventory with total of PCI devices. Talking about manager, not sure. I'm referring to this: https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L33 The PciDevTracker class is instantiated in the resource tracker when the first ComputeNode object managed by the resource tracker is init'd: https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L578 On initialization, the PciDevTracker inventories the compute node's collection of PCI devices by grabbing a list of records from the pci_devices table in the cell database: https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L69 and then comparing those DB records with information the hypervisor returns about PCI devices: https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L160 Each hypervisor returns something different for the list of pci devices, as you know. For libvirt, the call that returns PCI device information is here: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py#L842 The results of that are jammed into a "pci_passthrough_devices" key in the returned result of the virt driver's get_available_resource() call. For libvirt, that's here: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L5809 It is that piece that Eric and myself have been talking about standardizing into a "generic device management" interface that would have an update_inventory() method that accepts a ProviderTree object [1] [1] https://github.com/openstack/nova/blob/master/nova/compute/provider_tree.py and would add resource providers corresponding to devices that are made available to guests for use. You still have to define "traits", basically for physical network devices, the users want to select device according physical network, to select device according the placement on host (NUMA), to select the device according the bandwidth capability... For GPU it's same story. *And I do not have mentioned devices which support virtual functions.* Yes, the generic device manager would be responsible for associating traits to the resource providers it adds to the ProviderTree provided to it in the update_inventory() call. So that is what you plan to do for this release :) - Reasonably I don't think we are close to have something ready for production. I don't disagree with you that this is a huge amount of refactoring to undertake over the next couple releases. :) Jay, I have question, Why you don't start by exposing NUMA ? I believe you're asking here why we don't start by modeling NUMA nodes as child resource providers of the compute node? Instead of starting by modeling PCI devices as child providers of the compute node? If that's not what you're asking, please do clarify... We're starting with modeling PCI devices as child providers of the compute node because they are easier to deal with as a whole than NUMA nodes and we have the potential of being able to remove the PciPassthroughFilter from the scheduler in Queens. I don't see us being able to remove the NUMATopologyFilter from the scheduler in Queens because of the complexity involved in how coupled the NUMA topology resource handling is to CPU pinning, huge page support, and IO emulation thread pinning. Hope that answers that question; again, lemme know if that's not the question you were asking! :) For the record, I have zero confidence in any existing "functional" tests for NUMA, SR-IOV, CPU pinning, huge pages, and the like. Unfortunately, due to the fact that these features often require hardware that either the upstream community CI lacks or that depends on libraries, drivers and kernel versions that really aren't available to non-bleeding edge users (or users with very deep pockets). It's good point, if you are not confidence, don't you think it's premature to move forward on implementing new thing without to have well trusted functional tests? Completely agree with you. I would rather see functional integration tests that are proven to actually test these complex hardware devices *gating* Nova patches before adding any new
Re: [openstack-dev] vGPUs support for Nova - Implementation
The concepts of PCI and SR-IOV are, of course, generic They are, although the PowerVM guys have already pointed out that they don't even refer to virtual devices by PCI address and thus anything based on that subsystem isn't going to help them. but I think out of principal we should avoid a hypervisor-specific integration for vGPU (indeed Citrix has been clear from the beginning that the vGPU integration we are proposing is intentionally hypervisor agnostic) I also think there is value in exposing vGPU in a generic way, irrespective of the underlying implementation (whether it is DEMU, mdev, SR-IOV or whatever approach Hyper-V/VMWare use). I very much agree, of course. --Dan __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] vGPUs support for Nova - Implementation
Hi Sahid, > Please consider the support of MDEV for the /pci framework which provides > support for vGPUs [0]. From my understanding, this MDEV implementation for vGPU would be entirely specific to libvirt, is that correct? XenServer's implementation for vGPU is based on a pooled device model (as described in http://lists.openstack.org/pipermail/openstack-dev/2017-September/122702.html) and directly interfaces with the card using DEMU ("Discrete EMU") as a second device emulator along-side QEMU. There is no mdev integration. I'm concerned about how much mdev-specific functionality would have to be faked up in the XenServer-specific driver for vGPU to be used in this way. I'm not familiar with mdev, but it looks Linux specific, so would not be usable by Hyper-V? I've also not been able to find suggestions that VMWare can make use of mdev, although I don't know the architecture of VMWare's integration. The concepts of PCI and SR-IOV are, of course, generic, but I think out of principal we should avoid a hypervisor-specific integration for vGPU (indeed Citrix has been clear from the beginning that the vGPU integration we are proposing is intentionally hypervisor agnostic) I also think there is value in exposing vGPU in a generic way, irrespective of the underlying implementation (whether it is DEMU, mdev, SR-IOV or whatever approach Hyper-V/VMWare use). It's quite difficult for me to see how this will work for other hypervisors. Do you also have a draft alternate spec where more details can be discussed? Bob __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] vGPUs support for Nova - Implementation
On Fri, Sep 29, 2017 at 2:32 AM, Dan Smith wrote: > In this serie of patches we are generalizing the PCI framework to >>> handle MDEV devices. We arguing it's a lot of patches but most of them >>> are small and the logic behind is basically to make it understand two >>> new fields MDEV_PF and MDEV_VF. >>> >> >> That's not really "generalizing the PCI framework to handle MDEV devices" >> :) More like it's just changing the /pci module to understand a different >> device management API, but ok. >> > > Yeah, the series is adding more fields to our PCI structure to allow for > more variations in the kinds of things we lump into those tables. This is > my primary complaint with this approach, and has been since the topic first > came up. I really want to avoid building any more dependency on the > existing pci-passthrough mechanisms and focus any new effort on using > resource providers for this. The existing pci-passthrough code is almost > universally hated, poorly understood and tested, and something we should > not be further building upon. > > In this serie of patches we make libvirt driver support, as usually, >>> return resources and attach devices returned by the pci manager. This >>> part can be reused for Resource Provider. >>> >> >> Perhaps, but the idea behind the resource providers framework is to treat >> devices as generic things. Placement doesn't need to know about the >> particular device attachment status. >> > > I quickly went through the patches and left a few comments. The base work > of pulling some of this out of libvirt is there, but it's all focused on > the act of populating pci structures from the vgpu information we get from > libvirt. That code could be made to instead populate a resource inventory, > but that's about the most of the set that looks applicable to the > placement-based approach. > > I'll review them too. As mentioned in IRC and the previous ML discussion, my focus is on the >> nested resource providers work and reviews, along with the other two >> top-priority scheduler items (move operations and alternate hosts). >> >> I'll do my best to look at your patch series, but please note it's lower >> priority than a number of other items. >> > > FWIW, I'm not really planning to spend any time reviewing it until/unless > it is retooled to generate an inventory from the virt driver. > > With the two patches that report vgpus and then create guests with them > when asked converted to resource providers, I think that would be enough to > have basic vgpu support immediately. No DB migrations, model changes, etc > required. After that, helping to get the nested-rps and traits work landed > gets us the ability to expose attributes of different types of those vgpus > and opens up a lot of possibilities. IMHO, that's work I'm interested in > reviewing. > That's exactly the things I would like to provide for Queens, so operators would have a possibility to have flavors asking for vGPU resources in Queens, even if they couldn't yet ask for a specific VGPU type yet (or asking to be in the same NUMA cell than the CPU). The latter is definitely needing to have nested resource providers, but the former (just having vGPU resource classes provided by the virt driver) is possible for Queens. > One thing that would be very useful, Sahid, if you could get with Eric >> Fried (efried) on IRC and discuss with him the "generic device management" >> system that was discussed at the PTG. It's likely that the /pci module is >> going to be overhauled in Rocky and it would be good to have the mdev >> device management API requirements included in that discussion. >> > > Definitely this. > ++ > --Dan > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] vGPUs support for Nova - Implementation
On Thu, Sep 28, 2017 at 05:06:16PM -0400, Jay Pipes wrote: > On 09/28/2017 11:37 AM, Sahid Orentino Ferdjaoui wrote: > > Please consider the support of MDEV for the /pci framework which > > provides support for vGPUs [0]. > > > > Accordingly to the discussion [1] > > > > With this first implementation which could be used as a skeleton for > > implementing PCI Devices in Resource Tracker > > I'm not entirely sure what you're referring to above as "implementing PCI > devices in Resource Tracker". Could you elaborate? The resource tracker > already embeds a PciManager object that manages PCI devices, as you know. > Perhaps you meant "implement PCI devices as Resource Providers"? A PciManager? I know that we have a field PCI_DEVICE :) - I guess a virt driver can return inventory with total of PCI devices. Talking about manager, not sure. You still have to define "traits", basically for physical network devices, the users want to select device according physical network, to select device according the placement on host (NUMA), to select the device according the bandwidth capability... For GPU it's same story. *And I do not have mentioned devices which support virtual functions.* So that is what you plan to do for this release :) - Reasonably I don't think we are close to have something ready for production. Jay, I have question, Why you don't start by exposing NUMA ? > > we provide support for > > attaching vGPUs to guests. And also to provide affinity per NUMA > > nodes. An other important point is that that implementation can take > > advantage of the ongoing specs like PCI NUMA policies. > > > > * The Implementation [0] > > > > [PATCH 01/13] pci: update PciDevice object field 'address' to accept > > [PATCH 02/13] pci: add for PciDevice object new field mdev > > [PATCH 03/13] pci: generalize object unit-tests for different > > [PATCH 04/13] pci: add support for mdev device type request > > [PATCH 05/13] pci: generalize stats unit-tests for different > > [PATCH 06/13] pci: add support for mdev devices type devspec > > [PATCH 07/13] pci: add support for resource pool stats of mdev > > [PATCH 08/13] pci: make manager to accept handling mdev devices > > > > In this serie of patches we are generalizing the PCI framework to > > handle MDEV devices. We arguing it's a lot of patches but most of them > > are small and the logic behind is basically to make it understand two > > new fields MDEV_PF and MDEV_VF. > > That's not really "generalizing the PCI framework to handle MDEV devices" :) > More like it's just changing the /pci module to understand a different > device management API, but ok. If you prefer call it like that :) - The point is the /pci manages physical devices, It can passthrough the whole device or its virtual functions exposed through SRIOV or MDEV. > > [PATCH 09/13] libvirt: update PCI node device to report mdev devices > > [PATCH 10/13] libvirt: report mdev resources > > [PATCH 11/13] libvirt: add support to start vm with using mdev (vGPU) > > > > In this serie of patches we make libvirt driver support, as usually, > > return resources and attach devices returned by the pci manager. This > > part can be reused for Resource Provider. > > Perhaps, but the idea behind the resource providers framework is to treat > devices as generic things. Placement doesn't need to know about the > particular device attachment status. > > > [PATCH 12/13] functional: rework fakelibvirt host pci devices > > [PATCH 13/13] libvirt: resuse SRIOV funtional tests for MDEV devices > > > > Here we reuse 100/100 of the functional tests used for SR-IOV > > devices. Again here, this part can be reused for Resource Provider. > > Probably not, but I'll take a look :) > > For the record, I have zero confidence in any existing "functional" tests > for NUMA, SR-IOV, CPU pinning, huge pages, and the like. Unfortunately, due > to the fact that these features often require hardware that either the > upstream community CI lacks or that depends on libraries, drivers and kernel > versions that really aren't available to non-bleeding edge users (or users > with very deep pockets). It's good point, if you are not confidence, don't you think it's premature to move forward on implementing new thing without to have well trusted functional tests? > > * The Usage > > > > There are no difference between SR-IOV and MDEV, from operators point > > of view who knows how to expose SR-IOV devices in Nova, they already > > know how to expose MDEV devices (vGPUs). > > > > Operators will be able to expose MDEV devices in the same manner as > > they expose SR-IOV: > > > > 1/ Configure whitelist devices > > > > ['{"vendor_id":"10de"}'] > > > > 2/ Create aliases > > > > [{"vendor_id":"10de", "name":"vGPU"}] > > > > 3/ Configure the flavor > > > > openstack flavor set --property "pci_passthrough:alias"="vGPU:1" > > > > * Limitations > > > > The mdev does not provide 'product_id' but 'mdev_type' which should be > > cons
Re: [openstack-dev] vGPUs support for Nova - Implementation
In this serie of patches we are generalizing the PCI framework to handle MDEV devices. We arguing it's a lot of patches but most of them are small and the logic behind is basically to make it understand two new fields MDEV_PF and MDEV_VF. That's not really "generalizing the PCI framework to handle MDEV devices" :) More like it's just changing the /pci module to understand a different device management API, but ok. Yeah, the series is adding more fields to our PCI structure to allow for more variations in the kinds of things we lump into those tables. This is my primary complaint with this approach, and has been since the topic first came up. I really want to avoid building any more dependency on the existing pci-passthrough mechanisms and focus any new effort on using resource providers for this. The existing pci-passthrough code is almost universally hated, poorly understood and tested, and something we should not be further building upon. In this serie of patches we make libvirt driver support, as usually, return resources and attach devices returned by the pci manager. This part can be reused for Resource Provider. Perhaps, but the idea behind the resource providers framework is to treat devices as generic things. Placement doesn't need to know about the particular device attachment status. I quickly went through the patches and left a few comments. The base work of pulling some of this out of libvirt is there, but it's all focused on the act of populating pci structures from the vgpu information we get from libvirt. That code could be made to instead populate a resource inventory, but that's about the most of the set that looks applicable to the placement-based approach. As mentioned in IRC and the previous ML discussion, my focus is on the nested resource providers work and reviews, along with the other two top-priority scheduler items (move operations and alternate hosts). I'll do my best to look at your patch series, but please note it's lower priority than a number of other items. FWIW, I'm not really planning to spend any time reviewing it until/unless it is retooled to generate an inventory from the virt driver. With the two patches that report vgpus and then create guests with them when asked converted to resource providers, I think that would be enough to have basic vgpu support immediately. No DB migrations, model changes, etc required. After that, helping to get the nested-rps and traits work landed gets us the ability to expose attributes of different types of those vgpus and opens up a lot of possibilities. IMHO, that's work I'm interested in reviewing. One thing that would be very useful, Sahid, if you could get with Eric Fried (efried) on IRC and discuss with him the "generic device management" system that was discussed at the PTG. It's likely that the /pci module is going to be overhauled in Rocky and it would be good to have the mdev device management API requirements included in that discussion. Definitely this. --Dan __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] vGPUs support for Nova - Implementation
On 09/28/2017 11:37 AM, Sahid Orentino Ferdjaoui wrote: Please consider the support of MDEV for the /pci framework which provides support for vGPUs [0]. Accordingly to the discussion [1] With this first implementation which could be used as a skeleton for implementing PCI Devices in Resource Tracker I'm not entirely sure what you're referring to above as "implementing PCI devices in Resource Tracker". Could you elaborate? The resource tracker already embeds a PciManager object that manages PCI devices, as you know. Perhaps you meant "implement PCI devices as Resource Providers"? we provide support for attaching vGPUs to guests. And also to provide affinity per NUMA nodes. An other important point is that that implementation can take advantage of the ongoing specs like PCI NUMA policies. * The Implementation [0] [PATCH 01/13] pci: update PciDevice object field 'address' to accept [PATCH 02/13] pci: add for PciDevice object new field mdev [PATCH 03/13] pci: generalize object unit-tests for different [PATCH 04/13] pci: add support for mdev device type request [PATCH 05/13] pci: generalize stats unit-tests for different [PATCH 06/13] pci: add support for mdev devices type devspec [PATCH 07/13] pci: add support for resource pool stats of mdev [PATCH 08/13] pci: make manager to accept handling mdev devices In this serie of patches we are generalizing the PCI framework to handle MDEV devices. We arguing it's a lot of patches but most of them are small and the logic behind is basically to make it understand two new fields MDEV_PF and MDEV_VF. That's not really "generalizing the PCI framework to handle MDEV devices" :) More like it's just changing the /pci module to understand a different device management API, but ok. [PATCH 09/13] libvirt: update PCI node device to report mdev devices [PATCH 10/13] libvirt: report mdev resources [PATCH 11/13] libvirt: add support to start vm with using mdev (vGPU) In this serie of patches we make libvirt driver support, as usually, return resources and attach devices returned by the pci manager. This part can be reused for Resource Provider. Perhaps, but the idea behind the resource providers framework is to treat devices as generic things. Placement doesn't need to know about the particular device attachment status. [PATCH 12/13] functional: rework fakelibvirt host pci devices [PATCH 13/13] libvirt: resuse SRIOV funtional tests for MDEV devices Here we reuse 100/100 of the functional tests used for SR-IOV devices. Again here, this part can be reused for Resource Provider. Probably not, but I'll take a look :) For the record, I have zero confidence in any existing "functional" tests for NUMA, SR-IOV, CPU pinning, huge pages, and the like. Unfortunately, due to the fact that these features often require hardware that either the upstream community CI lacks or that depends on libraries, drivers and kernel versions that really aren't available to non-bleeding edge users (or users with very deep pockets). * The Usage There are no difference between SR-IOV and MDEV, from operators point of view who knows how to expose SR-IOV devices in Nova, they already know how to expose MDEV devices (vGPUs). Operators will be able to expose MDEV devices in the same manner as they expose SR-IOV: 1/ Configure whitelist devices ['{"vendor_id":"10de"}'] 2/ Create aliases [{"vendor_id":"10de", "name":"vGPU"}] 3/ Configure the flavor openstack flavor set --property "pci_passthrough:alias"="vGPU:1" * Limitations The mdev does not provide 'product_id' but 'mdev_type' which should be considered to exactly identify which resource users can request e.g: nvidia-10. To provide that support we have to add a new field 'mdev_type' so aliases could be something like: {"vendor_id":"10de", mdev_type="nvidia-10" "name":"alias-nvidia-10"} {"vendor_id":"10de", mdev_type="nvidia-11" "name":"alias-nvidia-11"} I do have plan to add but first I need to have support from upstream to continue that work. As mentioned in IRC and the previous ML discussion, my focus is on the nested resource providers work and reviews, along with the other two top-priority scheduler items (move operations and alternate hosts). I'll do my best to look at your patch series, but please note it's lower priority than a number of other items. One thing that would be very useful, Sahid, if you could get with Eric Fried (efried) on IRC and discuss with him the "generic device management" system that was discussed at the PTG. It's likely that the /pci module is going to be overhauled in Rocky and it would be good to have the mdev device management API requirements included in that discussion. Best, -jay Best, -jay [0] https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:pci-mdev-support [1] http://lists.openstack.org/pipermail/openstack-dev/2017-September/122591.html ___
[openstack-dev] vGPUs support for Nova - Implementation
Please consider the support of MDEV for the /pci framework which provides support for vGPUs [0]. Accordingly to the discussion [1] With this first implementation which could be used as a skeleton for implementing PCI Devices in Resource Tracker we provide support for attaching vGPUs to guests. And also to provide affinity per NUMA nodes. An other important point is that that implementation can take advantage of the ongoing specs like PCI NUMA policies. * The Implementation [0] [PATCH 01/13] pci: update PciDevice object field 'address' to accept [PATCH 02/13] pci: add for PciDevice object new field mdev [PATCH 03/13] pci: generalize object unit-tests for different [PATCH 04/13] pci: add support for mdev device type request [PATCH 05/13] pci: generalize stats unit-tests for different [PATCH 06/13] pci: add support for mdev devices type devspec [PATCH 07/13] pci: add support for resource pool stats of mdev [PATCH 08/13] pci: make manager to accept handling mdev devices In this serie of patches we are generalizing the PCI framework to handle MDEV devices. We arguing it's a lot of patches but most of them are small and the logic behind is basically to make it understand two new fields MDEV_PF and MDEV_VF. [PATCH 09/13] libvirt: update PCI node device to report mdev devices [PATCH 10/13] libvirt: report mdev resources [PATCH 11/13] libvirt: add support to start vm with using mdev (vGPU) In this serie of patches we make libvirt driver support, as usually, return resources and attach devices returned by the pci manager. This part can be reused for Resource Provider. [PATCH 12/13] functional: rework fakelibvirt host pci devices [PATCH 13/13] libvirt: resuse SRIOV funtional tests for MDEV devices Here we reuse 100/100 of the functional tests used for SR-IOV devices. Again here, this part can be reused for Resource Provider. * The Usage There are no difference between SR-IOV and MDEV, from operators point of view who knows how to expose SR-IOV devices in Nova, they already know how to expose MDEV devices (vGPUs). Operators will be able to expose MDEV devices in the same manner as they expose SR-IOV: 1/ Configure whitelist devices ['{"vendor_id":"10de"}'] 2/ Create aliases [{"vendor_id":"10de", "name":"vGPU"}] 3/ Configure the flavor openstack flavor set --property "pci_passthrough:alias"="vGPU:1" * Limitations The mdev does not provide 'product_id' but 'mdev_type' which should be considered to exactly identify which resource users can request e.g: nvidia-10. To provide that support we have to add a new field 'mdev_type' so aliases could be something like: {"vendor_id":"10de", mdev_type="nvidia-10" "name":"alias-nvidia-10"} {"vendor_id":"10de", mdev_type="nvidia-11" "name":"alias-nvidia-11"} I do have plan to add but first I need to have support from upstream to continue that work. [0] https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:pci-mdev-support [1] http://lists.openstack.org/pipermail/openstack-dev/2017-September/122591.html __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev