Re: [openstack-dev] FPGA as a dynamic nested resources
On Thu, 28 Jul 2016 10:50:08 -0400 Jay Pipes wrote: > Roman, great thread, thanks for posting! Comment inline :) Thanks! > > > It can identified 3 levels of FPGA resources, which can be nested one > > on the others: > > > > 1. Whole FPGA. If used discrete FPGA, than even today it might be pass > >through to the VM. > > > > 2. Region in FPGA. Some of the FPGA models can be divided into regions > >or slots. Also, for some model it is possible to (re)program such > >region individually - in this case there is a possibility to pass > >entire slot to the VM, so that it might be possible to reprogram > >such slot, and utilize the algorithm within the VM. > > > > 3. Accelerator in region/FPGA. If there is an accelerator programmed > >in the slot, it is possible, that such accelerator provides us with > >Virtual Functions (similar to the SR-IOV), than every available VF > >can be treated as a resource. > > > > 4. It might be also necessary to track every VF individually, although > >I didn't assumed it will be needed, nevertheless with nested > >resources it should be easy to handle it. > > > > Correlation between such resources are a bit different from NUMA - > > while in NUMA case there is a possibility to either schedule a VM with > > some memory specified, or request memory within NUMA cell, in FPGA if > > there is slot taken, or accelerator already programmed and used, there > > is no way to offer FPGA as a whole to the tenant, until all > > accelerators and slots are free. > > > > I've followed Jay idea about nested resources and having in mind > > blueprint[2] regarding dynamic resources I've prepared how it fit in. > > > > > > > To get id of resource of type acceleratorX to allocate 8 VF: > > > > > > SELECT rp.id > > FROM resource_providers rp > > LEFT JOIN allocations al ON al.resource_provider_id = rp.id > > LEFT JOIN inventories iv ON iv.resource_provider_id = rp.id > > WHERE al.resource_class_id = 1668 > > AND (iv.total - COALESCE(al.used, 0)) >= 8; > > Right idea, yes, but you would need to INNER JOIN inventories and LEFT > JOIN from the winnowed set of inventory records to a grouped projection > of allocations. :) > > The SQL would be this: > > SELECT rp.id > FROM resource_providers rp > INNER JOIN inventories iv > ON rp.id = iv.resource_provider_id > AND iv.resource_class_id = 1688 > LEFT JOIN ( >SELECT resource_provider_id, SUM(used) as used >FROM allocations >WHERE resource_class_id = 1688 >GROUP BY resource_provider_id > ) AS al > ON iv.resource_provider_id = al.id > WHERE (iv.total - COALESCE(al.used, 0)) >= 8; Hm. I'm getting same results using the both queries. Certainly, I can't see something obvious here, and for sure I'm no sql expert :) > The other SQL queries you listed had a couple errors, but the ideas were > mostly sound. I'll include the FPGA use cases when I write up the nested > resource providers spec proposal. Great, thank you! > The only thing I'd say is that I was envisioning the dynamic resource > classes for FPGAs to be the resource context to an already-flashed > algorithm, not to the FPGA root device (or a region even). But, who > knows, perhaps we can work something out. More discussion on the spec... For sure, we can start from defining basic case, and expand it if needed. -- Cheers, Roman Dobosz __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FPGA as a dynamic nested resources
On 07/20/2016 05:07 AM, Daniel P. Berrange wrote: For FPGA, I'd like to see an initial proposal that assumed the FPGA is pre-programmed & pre-divided into a fixed number of slots and simply deal with this. For the record, this is precisely what is described in the first version of the dynamic-resource-classes use cases section: https://review.openstack.org/#/c/312696/1/specs/newton/approved/resource-providers-dynamic-resource-classes.rst See starting at line 193. This level of details was removed in the second revision of the spec, which simply focuses on the CRUD operations to add to the placement REST API for these user-defined resource classes. All the best, -jay __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FPGA as a dynamic nested resources
> On Jul 28, 2016, at 7:57 AM, Jay Pipes wrote: > > On 07/19/2016 06:51 PM, Ed Leafe wrote: >> On Jul 19, 2016, at 2:58 PM, Chris Friesen >> wrote: Why would a VM program the slot? Wouldn’t it usually be at the host level? >>> >>> Are there no cases where a VM might want to download a proprietary >>> program into an FPGA? >> >> That doesn’t sound right to me, but maybe I’m just not that familiar >> with FPGA specifics. In general, VMs don’t control their hosts. > > Oh, but in NFV-land they most certainly do. :/ > > It's commonplace now to see NFV use cases where VMs are provided passthrough > access to an SR-IOV physical function on the host and the VMs application > code then controls and allocates at will virtual functions from that physical > function. Once that happens, yes, it's true that Nova no longer has any clue > about the resource usage of VFs on that host device -- it's essentially at > that point totally up to the VNF software to properly manage and maintain > access to those VFs and allocate/free resources as needed on the host device. > Agreed as a statement of today. Once the “VM” application has what looks like dedicated FPGA resources to it, it typically does both management and optionally the actual application workload. That typically includes loading the bitstream on the device as well and then executing API calls to the service it then provides. This can all be done now with PCIe/SR-IOV , which is great…. But the generic boards are getting bigger and we often want greater utilization of them and to virtualize and manage them separately from the VM based application code that may utilize them. In other words these “funky” devices are becoming hosts for dynamically loaded services. While a key first step to enable allocating the virtual region of the device to a VM when it is provisioned, we may want to enable separating management from data plane (aka workload) and support dynamic service consumption through more than network connections. VNFs are a use case for sure and a dominant one, but now that we have NICs on these large boards and also want to support service chaining, we have the opportunity to do that without consuming many CPU cycles. When I can push firewall, or ipsec or compression to the “NIC” and not use CPU cycles, why not ;-), and why not share it to other nearby VMs. Then take it past VNFs to other workloads that can exploit FPGA... > Same goes for FPGAs. VNF vendors want access to the physical host device and > want to be able to do with that host device whatever they please. > > As I wrote on Twitter recently, NFV is changing software-defined > infrastructure to instead be hardware-defined software. > > It's a funky new* world we live in, Ed :) > > -jay > > * new == old == new again. > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FPGA as a dynamic nested resources
On 07/19/2016 06:51 PM, Ed Leafe wrote: On Jul 19, 2016, at 2:58 PM, Chris Friesen wrote: Why would a VM program the slot? Wouldn’t it usually be at the host level? Are there no cases where a VM might want to download a proprietary program into an FPGA? That doesn’t sound right to me, but maybe I’m just not that familiar with FPGA specifics. In general, VMs don’t control their hosts. Oh, but in NFV-land they most certainly do. :/ It's commonplace now to see NFV use cases where VMs are provided passthrough access to an SR-IOV physical function on the host and the VMs application code then controls and allocates at will virtual functions from that physical function. Once that happens, yes, it's true that Nova no longer has any clue about the resource usage of VFs on that host device -- it's essentially at that point totally up to the VNF software to properly manage and maintain access to those VFs and allocate/free resources as needed on the host device. Same goes for FPGAs. VNF vendors want access to the physical host device and want to be able to do with that host device whatever they please. As I wrote on Twitter recently, NFV is changing software-defined infrastructure to instead be hardware-defined software. It's a funky new* world we live in, Ed :) -jay * new == old == new again. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FPGA as a dynamic nested resources
Roman, great thread, thanks for posting! Comment inline :) On 07/19/2016 02:03 PM, Roman Dobosz wrote: It can identified 3 levels of FPGA resources, which can be nested one on the others: 1. Whole FPGA. If used discrete FPGA, than even today it might be pass through to the VM. 2. Region in FPGA. Some of the FPGA models can be divided into regions or slots. Also, for some model it is possible to (re)program such region individually - in this case there is a possibility to pass entire slot to the VM, so that it might be possible to reprogram such slot, and utilize the algorithm within the VM. 3. Accelerator in region/FPGA. If there is an accelerator programmed in the slot, it is possible, that such accelerator provides us with Virtual Functions (similar to the SR-IOV), than every available VF can be treated as a resource. 4. It might be also necessary to track every VF individually, although I didn't assumed it will be needed, nevertheless with nested resources it should be easy to handle it. Correlation between such resources are a bit different from NUMA - while in NUMA case there is a possibility to either schedule a VM with some memory specified, or request memory within NUMA cell, in FPGA if there is slot taken, or accelerator already programmed and used, there is no way to offer FPGA as a whole to the tenant, until all accelerators and slots are free. I've followed Jay idea about nested resources and having in mind blueprint[2] regarding dynamic resources I've prepared how it fit in. To get id of resource of type acceleratorX to allocate 8 VF: SELECT rp.id FROM resource_providers rp LEFT JOIN allocations al ON al.resource_provider_id = rp.id LEFT JOIN inventories iv ON iv.resource_provider_id = rp.id WHERE al.resource_class_id = 1668 AND (iv.total - COALESCE(al.used, 0)) >= 8; Right idea, yes, but you would need to INNER JOIN inventories and LEFT JOIN from the winnowed set of inventory records to a grouped projection of allocations. :) The SQL would be this: SELECT rp.id FROM resource_providers rp INNER JOIN inventories iv ON rp.id = iv.resource_provider_id AND iv.resource_class_id = 1688 LEFT JOIN ( SELECT resource_provider_id, SUM(used) as used FROM allocations WHERE resource_class_id = 1688 GROUP BY resource_provider_id ) AS al ON iv.resource_provider_id = al.id WHERE (iv.total - COALESCE(al.used, 0)) >= 8; The other SQL queries you listed had a couple errors, but the ideas were mostly sound. I'll include the FPGA use cases when I write up the nested resource providers spec proposal. The only thing I'd say is that I was envisioning the dynamic resource classes for FPGAs to be the resource context to an already-flashed algorithm, not to the FPGA root device (or a region even). But, who knows, perhaps we can work something out. More discussion on the spec... Best, -jay __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FPGA as a dynamic nested resources
On Jul 21, 2016 5:12 AM, "Daniel P. Berrange" wrote: > > On Thu, Jul 21, 2016 at 07:54:48AM +0200, Roman Dobosz wrote: > > On Wed, 20 Jul 2016 10:07:12 +0100 > > "Daniel P. Berrange" wrote: > > > > Hey Daniel, thanks for the feedback. > > > > > > Thoughts? > > > > > > I'd suggest you'll increase your chances of success with nova design > > > approval if you focus on implementing a really simple usage scheme for > > > FPGA as the first step in Nova. > > > > This. Maybe I'm wrong, but for me the minimal use case for FPGA would > > be ability to schedule VM which need certain accelerator from multiple > > potential ones on available FPGA/fixed slot. How insane does it sound? > > > > Providing fixed, prepared earlier by DC administrator accelerator > > resource, doesn't bring much value, beyond what we already have in > > Nova, since PCI/SR-IOV passthrough might be used for accelerators, > > which expose their functionality via VF. > > IIUC, there's plenty of FPGAs which are not SRIOV based, so there's > still scope for Nova enhancement in this area. > > The fact that some FPGAs are SRIOV & some are not though, is is also > why I'm suggesting that any work related to FPGA should be based around > refactoring of the existing PCI device assignment model to form a more > generic "Hardware device assignment" model. If we end up having a > completely distinct data model for FPGAs that is a failure. We need to > have a generalized hardware assignment model that can be used for generic > PCI devices, NICs, FPGAs, TPMs, GPUs, etc regardless of whether they > are backed by SRIOV, or their own non-PCI virtual functions. Personally > I'll reject any spec proposal that ignores existing PCI framework and > introduces a separate model for FPGA. > > > > All the threads I've see go well off into the weeds about trying to > > > solve everybody's niche/edge cases perfectly and as a result get > > > very complicated. > > > > The topic is complicated :) > > Which is why i'm advising to not try to solve the perfect case and instead > focus on getting something simple & good enough for common case. > I think the simple use cases can be covered today for PCIe SR-IOV config easily and some number of VFs are applied to regions of a pre-initialized board. I know of successful deployments that do the initialization with ironic and use nova to allocate the PCIe SR-IOV access using existing extension points. Once allocated the actual function bitstream gets pushed in by the owning VM. The application owners manage concurrency. This level of support could be made mainstream rather than custom extension as a first step and then add support for alternatives to PCIe based connections. That said there are many use cases in play today outside of openstack unfortunately that manage the loading of the bitstream that implements a specific function. The desire is to load those bitstreams and manage a life cycle just like we manage a VM and image today. In effect the static region of the FPGA has the role of a very simple hypervisor. FPGA boards are getting denser and more common, and they are getting their own peripherals like on board NICs, serial ports, storage etc. I don't believe we need to expose complicated physical structure to management, but a device with the ability to be virtualized and dynamically programmed and has connection to the other infrastructure in the environment needs to be managed withe things it connects to. I suggest the following : First standardize how to describe and allocate a real or virtualized FPGA. Specify the meta data and related filter rules. Second, mirror the glance/nova process of image loading on hypervisor for bitstream loading of a reProgramable Region. Third keep the functions of the actual bit stream separate from the above management just like we do with VM or container functional capabilities. When the lifecycle of the PR is tied to a VM, just like ephemeral storage, driving allocation from nova seems to make the most sense. Am I way out of line? > Regards, > Daniel > -- > |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- http://virt-manager.org :| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| > |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FPGA as a dynamic nested resources
On Thu, Jul 21, 2016 at 07:54:48AM +0200, Roman Dobosz wrote: > On Wed, 20 Jul 2016 10:07:12 +0100 > "Daniel P. Berrange" wrote: > > Hey Daniel, thanks for the feedback. > > > > Thoughts? > > > > I'd suggest you'll increase your chances of success with nova design > > approval if you focus on implementing a really simple usage scheme for > > FPGA as the first step in Nova. > > This. Maybe I'm wrong, but for me the minimal use case for FPGA would > be ability to schedule VM which need certain accelerator from multiple > potential ones on available FPGA/fixed slot. How insane does it sound? > > Providing fixed, prepared earlier by DC administrator accelerator > resource, doesn't bring much value, beyond what we already have in > Nova, since PCI/SR-IOV passthrough might be used for accelerators, > which expose their functionality via VF. IIUC, there's plenty of FPGAs which are not SRIOV based, so there's still scope for Nova enhancement in this area. The fact that some FPGAs are SRIOV & some are not though, is is also why I'm suggesting that any work related to FPGA should be based around refactoring of the existing PCI device assignment model to form a more generic "Hardware device assignment" model. If we end up having a completely distinct data model for FPGAs that is a failure. We need to have a generalized hardware assignment model that can be used for generic PCI devices, NICs, FPGAs, TPMs, GPUs, etc regardless of whether they are backed by SRIOV, or their own non-PCI virtual functions. Personally I'll reject any spec proposal that ignores existing PCI framework and introduces a separate model for FPGA. > > All the threads I've see go well off into the weeds about trying to > > solve everybody's niche/edge cases perfectly and as a result get > > very complicated. > > The topic is complicated :) Which is why i'm advising to not try to solve the perfect case and instead focus on getting something simple & good enough for common case. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FPGA as a dynamic nested resources
On Thu, 21 Jul 2016 08:56:07 +0800 Fei K Chen wrote: > > Unless you have one FPGA with 8 slots, which can become FPGA with 4 > > slots. From scheduling perspective you have to know, which FPGA > > resources can be reconfigured, and which not, isn't it? Also, AFAIRC > > to provide VM with VF, there is a need for providing libvirt with > > address of such VF, right? That's why I've putted this last point. > > > > The whole idea of getting FPGA as resource is its ability to swap > > resources on demand. So it can be thought of as several available > > hardware (means - accelerators, consumable by VMs) which most of the > > time are not programmed in certain moment. > > > Let's have more thought about the resource swapping. The number of > run-time accelerators is not limited by the number of region/slot. > Inside FPGA, there can be some self-scheduling logic to schedule > accelerators on regions by using the fast partial reconfiguration. > It is not new, there are lots of such design in FPGA academic. Right, but not all devices have such functionality. And we are trying to make this solution common for most FPGA, right? -- Cheers, Roman Dobosz __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FPGA as a dynamic nested resources
On Thu, 21 Jul 2016 08:44:21 +0800 Fei K Chen wrote: > > 4. It might be also necessary to track every VF individually, although > >I didn't assumed it will be needed, nevertheless with nested > >resources it should be easy to handle it. > You need. For example you have 4 region and 8 VF. Some region is > configured with an accelerator so it can be shared to multi-VM (each > consume a VF). But some other region is configured with private > exclusive accelerator so it can only be bind to one VF. That's why > we need to track both region and VF. Well, it depends. If there is no difference between the VF (all provides the same functionality) and we don't really care about the placement (external entity would take care of this) than we don't need this level. All the information will be hold by resource inventory and allocation. OTOH if we need to store the information which VF is passed to which VM, than probably we need this level, or store VF addresses in inventory/allocation in some new filed. -- Cheers, Roman Dobosz __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FPGA as a dynamic nested resources
On Wed, 20 Jul 2016 10:07:12 +0100 "Daniel P. Berrange" wrote: Hey Daniel, thanks for the feedback. > > Thoughts? > > I'd suggest you'll increase your chances of success with nova design > approval if you focus on implementing a really simple usage scheme for > FPGA as the first step in Nova. This. Maybe I'm wrong, but for me the minimal use case for FPGA would be ability to schedule VM which need certain accelerator from multiple potential ones on available FPGA/fixed slot. How insane does it sound? Providing fixed, prepared earlier by DC administrator accelerator resource, doesn't bring much value, beyond what we already have in Nova, since PCI/SR-IOV passthrough might be used for accelerators, which expose their functionality via VF. > All the threads I've see go well off into the weeds about trying to > solve everybody's niche/edge cases perfectly and as a result get > very complicated. The topic is complicated :) > For both NUMA and PCI dev assignment we got initial success by cutting > back scope and focusing on the doing the minimum possible to satisfy > the 90% common use cases, and ignoring the less common 10% initially. > Yes this is not optimal, but it is good enough to keep most people > happy without introducing massive complexity into the designs & impl. > > For FPGA, I'd like to see an initial proposal that assumed the FPGA > is pre-programmed & pre-divided into a fixed number of slots and simply > deal with this. This is similar to how we dealt with PCI SR-IOV initially > where we assumed the dev is in VF-mode only. Only later did we start to > add cleverness around switching VF vs PF mode. For FPGA I think any kind > of dynamic re-allocation/re-configuration is better done as a stage 2 Okay. That sounds reasonable. -- Cheers, Roman Dobosz __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FPGA as a dynamic nested resources
Roman Dobosz wrote on 2016/07/20 15:25:28: > From: Roman Dobosz > To: "OpenStack Development Mailing List (not for usage questions)" > > Cc: Ed Leafe > Date: 2016/07/20 15:30 > Subject: Re: [openstack-dev] FPGA as a dynamic nested resources > > > > > 4. It might be also necessary to track every VF individually, although > > > I didn't assumed it will be needed, nevertheless with nested > > > resources it should be easy to handle it. > > > > I’m still not seeing the need for nesting. If you have a single FPGA > > with 8 slots, when you program the slots with accelerators, you now > > have 8 consumable resources. The fact that they came from a > > particular FPGA unit doesn’t seem relevant from a scheduling > > perspective. > > Unless you have one FPGA with 8 slots, which can become FPGA with 4 > slots. From scheduling perspective you have to know, which FPGA > resources can be reconfigured, and which not, isn't it? Also, AFAIRC > to provide VM with VF, there is a need for providing libvirt with > address of such VF, right? That's why I've putted this last point. > > The whole idea of getting FPGA as resource is its ability to swap > resources on demand. So it can be thought of as several available > hardware (means - accelerators, consumable by VMs) which most of the > time are not programmed in certain moment. > Let's have more thought about the resource swapping. The number of run-time accelerators is not limited by the number of region/slot. Inside FPGA, there can be some self-scheduling logic to schedule accelerators on regions by using the fast partial reconfiguration. It is not new, there are lots of such design in FPGA academic. -- Fei Chen __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FPGA as a dynamic nested resources
Roman Dobosz wrote on 2016/07/20 02:03:28: > From: Roman Dobosz > To: openstack-dev > Date: 2016/07/20 02:07 > Subject: [openstack-dev] FPGA as a dynamic nested resources > > Hi all, > > Some time ago Jay Pipes published etherpad[1] with ideas around > modelling nested resources, taking NUMA as an example. I was also > encouraged ;) to start this thread, on last Nova scheduler meeting. > > I was read mentioned etherpad and what hits me was that described > scenario with NUMA cells resembles the way how FPGA can be managed. In > some extent. > > NUMA cell can be treated as a vessel for memory cells, and it is > expressed as number of MB. So it is possible to extract the > information from existing data and add another level of aggregation > using only clever prepared SQL query. > > I think, that problem might be broader, than using existing, tweaked a > bit model. If we take a look into resources, which FPGA may expose, > than it can be couple of levels, and each of them can be treated as > resource. > > It can identified 3 levels of FPGA resources, which can be nested one > on the others: > > 1. Whole FPGA. If used discrete FPGA, than even today it might be pass >through to the VM. > > 2. Region in FPGA. Some of the FPGA models can be divided into regions >or slots. Also, for some model it is possible to (re)program such >region individually - in this case there is a possibility to pass >entire slot to the VM, so that it might be possible to reprogram >such slot, and utilize the algorithm within the VM. > > 3. Accelerator in region/FPGA. If there is an accelerator programmed >in the slot, it is possible, that such accelerator provides us with >Virtual Functions (similar to the SR-IOV), than every available VF >can be treated as a resource. > > 4. It might be also necessary to track every VF individually, although >I didn't assumed it will be needed, nevertheless with nested >resources it should be easy to handle it. You need. For example you have 4 region and 8 VF. Some region is configured with an accelerator so it can be shared to multi-VM (each consume a VF). But some other region is configured with private exclusive accelerator so it can only be bind to one VF. That's why we need to track both region and VF. > > Correlation between such resources are a bit different from NUMA - > while in NUMA case there is a possibility to either schedule a VM with > some memory specified, or request memory within NUMA cell, in FPGA if > there is slot taken, or accelerator already programmed and used, there > is no way to offer FPGA as a whole to the tenant, until all > accelerators and slots are free. > > I've followed Jay idea about nested resources and having in mind > blueprint[2] regarding dynamic resources I've prepared how it fit in. > > Tables are unchanged - it is a copy-paste from the etherpad[1]: > > > CREATE TABLE resource_providers ( > id INT NOT NULL AUTOINCREMENT PRIMARY KEY, > uuid CHAR(36) NOT NULL, > name VARCHAR(100) NULL, > root_provider_id INT NULL, > parent_provider_id INT NULL > ); > > CREATE TABLE inventories ( > id INT NOT NULL AUTOINCREMENT PRIMARY KEY, > resource_provider_id INT NOT NULL, > resource_class_id INT NOT NULL, > total INT NOT NULL, > reserved INT NOT NULL, > min_unit INT NOT NULL, > max_unit INT NOT NULL, > step_size INT NOT NULL, > allocation_ratio INT NOT NULL > ); > > CREATE TABLE allocations ( > id INT NOT NULL AUTOINCREMENT PRIMARY KEY, > resource_provider_id INT NOT NULL, > consumer_uuid CHAR(36) NOT NULL, > resource_class_id INT NOT NULL, > used INT NOT NULL > ); > > > Than lets fill the tables with data of following structure: > > -- FPGA-1 > -- +- FPGA-1 slot1 (taken), resource_provider_id: > -- +- FPGA-1 slot2 > -- +- FPGA-1 slot2 acceleratorX > -- +- FPGA-1 slot2 acceleratorX VF1 (taken) > -- +- FPGA-1 slot2 acceleratorX VF2 (taken) > -- +- FPGA-1 slot2 acceleratorX VF3 (taken) > -- +- FPGA-1 slot2 acceleratorX VF4 (taken) > -- +- FPGA-1 slot2 acceleratorX VF5 > -- +- .. > -- +- FPGA-1 slot2 acceleratorX VF32 > -- +- FPGA-1 slot3 > -- FPGA-2 > -- +- FPGA-2 slot1 > > where FPGA-1 and FPGA-2 are hosts with FPGA on board. There is also > assumed, that new dynamic resources are created: id 1666 means 'FPGA' > (although it might be simply standard class, which will be hardcoded > ENUM), 1667 means 'FPGA slot' and 1668 'FPGA accelerator'. > > > INSERT INTO resource_providers VALUES > (1, '', 'FPGA-1', 1, NULL), > (2, '', 'FPGA-1 slot 1', 1, 1), > (3, '', 'FPGA-1 slot 2', 1, 1), > (4, '', 'FPGA-1 slot 3', 1, 1), > (5, '', 'FPGA-1 slot 2 acceleratorX', 1, 3), > (6, '', 'FPGA-2', 6, NULL), > (7, '', 'FPGA-2 slot', 6, 6); > > > INSERT INTO inventories VALUES > (1, 1, 1666, 1, 0, 1, 1, 1, 1.0), > (2, 2, 1667, 1, 0, 1, 1, 1, 1.0), > (3, 3, 1667, 1, 0, 1, 1, 1, 1.0), > (4, 4, 16
Re: [openstack-dev] FPGA as a dynamic nested resources
On Jul 20, 2016, at 2:07 AM, Daniel P. Berrange wrote: > For FPGA, I'd like to see an initial proposal that assumed the FPGA > is pre-programmed & pre-divided into a fixed number of slots and simply > deal with this. This is similar to how we dealt with PCI SR-IOV initially > where we assumed the dev is in VF-mode only. Only later did we start to > add cleverness around switching VF vs PF mode. For FPGA I think any kind > of dynamic re-allocation/re-configuration is better done as a stage 2 +1 to this approach. I’m not convinced yet that Nova should be in the business of FPGA management, but once we get the basic functionality supporting FPGA working well, seeing what would be needed to add it would be much easier, and we could make a clearer determination as to whether this is feasible or not. -- Ed Leafe __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FPGA as a dynamic nested resources
On Tue, Jul 19, 2016 at 08:03:28PM +0200, Roman Dobosz wrote: > Hi all, > > Some time ago Jay Pipes published etherpad[1] with ideas around > modelling nested resources, taking NUMA as an example. I was also > encouraged ;) to start this thread, on last Nova scheduler meeting. > > I was read mentioned etherpad and what hits me was that described > scenario with NUMA cells resembles the way how FPGA can be managed. In > some extent. > > NUMA cell can be treated as a vessel for memory cells, and it is > expressed as number of MB. So it is possible to extract the > information from existing data and add another level of aggregation > using only clever prepared SQL query. > > I think, that problem might be broader, than using existing, tweaked a > bit model. If we take a look into resources, which FPGA may expose, > than it can be couple of levels, and each of them can be treated as > resource. > > It can identified 3 levels of FPGA resources, which can be nested one > on the others: > > 1. Whole FPGA. If used discrete FPGA, than even today it might be pass >through to the VM. > > 2. Region in FPGA. Some of the FPGA models can be divided into regions >or slots. Also, for some model it is possible to (re)program such >region individually - in this case there is a possibility to pass >entire slot to the VM, so that it might be possible to reprogram >such slot, and utilize the algorithm within the VM. > > 3. Accelerator in region/FPGA. If there is an accelerator programmed >in the slot, it is possible, that such accelerator provides us with >Virtual Functions (similar to the SR-IOV), than every available VF >can be treated as a resource. > > 4. It might be also necessary to track every VF individually, although >I didn't assumed it will be needed, nevertheless with nested >resources it should be easy to handle it. > > Correlation between such resources are a bit different from NUMA - > while in NUMA case there is a possibility to either schedule a VM with > some memory specified, or request memory within NUMA cell, in FPGA if > there is slot taken, or accelerator already programmed and used, there > is no way to offer FPGA as a whole to the tenant, until all > accelerators and slots are free. > > I've followed Jay idea about nested resources and having in mind > blueprint[2] regarding dynamic resources I've prepared how it fit in. [snip lots of complicated modelling] > Thoughts? I'd suggest you'll increase your chances of success with nova design approval if you focus on implementing a really simple usage scheme for FPGA as the first step in Nova. All the threads I've see go well off into the weeds about trying to solve everybody's niche/edge cases perfectly and as a result get very complicated. For both NUMA and PCI dev assignment we got initial success by cutting back scope and focusing on the doing the minimum possible to satisfy the 90% common use cases, and ignoring the less common 10% initially. Yes this is not optimal, but it is good enough to keep most people happy without introducing massive complexity into the designs & impl. For FPGA, I'd like to see an initial proposal that assumed the FPGA is pre-programmed & pre-divided into a fixed number of slots and simply deal with this. This is similar to how we dealt with PCI SR-IOV initially where we assumed the dev is in VF-mode only. Only later did we start to add cleverness around switching VF vs PF mode. For FPGA I think any kind of dynamic re-allocation/re-configuration is better done as a stage 2 Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FPGA as a dynamic nested resources
On Tue, 19 Jul 2016 15:51:26 -0700 Ed Leafe wrote: > >> Why would a VM program the slot? Wouldn’t it usually be at the > >> host level? > > > > Are there no cases where a VM might want to download a proprietary > > program into an FPGA? > > That doesn’t sound right to me, but maybe I’m just not that familiar > with FPGA specifics. In general, VMs don’t control their hosts. It > would also bring up some complications, such as what should happen > when you delete that VM: does the FPGA have to be reset to its > original state? Technically, it is not necessary to "erase" the FPGA. It might be untouched, and in resource tracker it would be figured as free, so than it can be programmed another accelerator, or passed to another VM if needed. It may be also zeroed (programmed with empty IP) by external entity which might be preferred option. > >> I’m still not seeing the need for nesting. If you have a single > >> FPGA with 8 slots, when you program the slots with accelerators, > >> you now have 8 consumable resources. The fact that they came from > >> a particular FPGA unit doesn’t seem relevant from a scheduling > >> perspective. > > > > If you want to be able to provide an FPGA as either a whole > > un-programmed FPGA or as pre-programmed resources, you'd > > presumably need to know which whole FPGAs are available and which > > have been fractionally allocated, no? > > An unprogrammed FPGA is a particular resource class. When you > program it, you are removing one of that class and creating one or > more of a new resource class (e.g., an encryption accelerator > program). There isn’t a need to nest anything. Although you have to track *where* you can schedule potential accelerator, isn't it? Certain type of IP will need proper slot, so it also have to be tracked. Nesting isn't necessary, but might be helpful to manage the state of your resources. > > I agree that if you are only going to have the host program the > > FPGA and then make the resources available then the scheduler > > doesn't need to know about whole FPGAs. > > That was where we left the discussion in Austin, so that was my > assumption. … as the first step, isn't it? No one is pushing to have this in Newton. Even Ocata time frame seems like unrealistic. -- Cheers, Roman Dobosz __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FPGA as a dynamic nested resources
On Tue, 19 Jul 2016 12:40:50 -0700 Ed Leafe wrote: > > It can identified 3 levels of FPGA resources, which can be nested one > > on the others: > > > > 1. Whole FPGA. If used discrete FPGA, than even today it might be pass > > through to the VM. > Can you explain why this would ever be useful? IOW, what can a VM do > with an entire FPGA? Private cloud, development purposes. It could be treated as FPGA-aaS. Same goes for the unoccupied slots. Of course, because reprogramming affects real, not virtualized hardware, there are security concerns for allowing users to do that in public clouds for example. The other reason, which is much more significant, there could be IPs so big, that will take most of FPGA - so reconfiguration is needed (although reconfiguration is out of scope for Nova) and we don't want to wipe out other accelerator which might be currently in use. That's why slots here also are treated as dynamic resources. > > 3. Accelerator in region/FPGA. If there is an accelerator programmed > > in the slot, it is possible, that such accelerator provides us with > > Virtual Functions (similar to the SR-IOV), than every available VF > > can be treated as a resource. > > This is my understanding of what would be consumable: the slot / VF, > which the VM could take advantage of. Yes. That's the obvious scenario. > > 4. It might be also necessary to track every VF individually, although > > I didn't assumed it will be needed, nevertheless with nested > > resources it should be easy to handle it. > > I’m still not seeing the need for nesting. If you have a single FPGA > with 8 slots, when you program the slots with accelerators, you now > have 8 consumable resources. The fact that they came from a > particular FPGA unit doesn’t seem relevant from a scheduling > perspective. Unless you have one FPGA with 8 slots, which can become FPGA with 4 slots. From scheduling perspective you have to know, which FPGA resources can be reconfigured, and which not, isn't it? Also, AFAIRC to provide VM with VF, there is a need for providing libvirt with address of such VF, right? That's why I've putted this last point. The whole idea of getting FPGA as resource is its ability to swap resources on demand. So it can be thought of as several available hardware (means - accelerators, consumable by VMs) which most of the time are not programmed in certain moment. So, let's assume, that we have two hosts: HostA and HostB with FPGA capable to provide 2 accelerators which exclusively use entire chip (lets call them AX1 and AX2), and one other, which can use one of the 2 possible slots (AY). So, the situation is we have 3 possible accelerators to use, and in worst case scenario only two available places where they can be places. Initially, there is no accelerator in use, cloud administrator define all the IPs he have available (somehow - this part isn't defined yet - but lets assume it is in place) Now, user requests VM with certain flavor/image with AX1 and scheduler knows, that it will fit into HostA and HostB, so HostA is chosen, FPGA magically™ is prepared to hold AX1 accelerator and VM was started. Now we have resource tree HostA FPGA->slot->AX1 and HostB FPGA. Next, user requests another VM with AY accelerator, scheduler now should know, that the only available option is HostB, so again magic is happening, and there is a resource tree: HostA FPGA +- slot1 +- AX1 HostB FPGA +- slot1 +- AY +- slot2 Now, what should happen if user remove VM with AY accelerator and request another VM with AX2? -- Cheers, Roman Dobosz __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FPGA as a dynamic nested resources
On Jul 19, 2016, at 2:58 PM, Chris Friesen wrote: >> Why would a VM program the slot? Wouldn’t it usually be at the host level? > > Are there no cases where a VM might want to download a proprietary program > into an FPGA? That doesn’t sound right to me, but maybe I’m just not that familiar with FPGA specifics. In general, VMs don’t control their hosts. It would also bring up some complications, such as what should happen when you delete that VM: does the FPGA have to be reset to its original state? >> I’m still not seeing the need for nesting. If you have a single FPGA with 8 >> slots, when you program the slots with accelerators, you now have 8 >> consumable resources. The fact that they came from a particular FPGA unit >> doesn’t seem relevant from a scheduling perspective. > > If you want to be able to provide an FPGA as either a whole un-programmed > FPGA or as pre-programmed resources, you'd presumably need to know which > whole FPGAs are available and which have been fractionally allocated, no? An unprogrammed FPGA is a particular resource class. When you program it, you are removing one of that class and creating one or more of a new resource class (e.g., an encryption accelerator program). There isn’t a need to nest anything. > I agree that if you are only going to have the host program the FPGA and then > make the resources available then the scheduler doesn't need to know about > whole FPGAs. That was where we left the discussion in Austin, so that was my assumption. -- Ed Leafe __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FPGA as a dynamic nested resources
On 07/19/2016 01:40 PM, Ed Leafe wrote: On Jul 19, 2016, at 11:03 AM, Roman Dobosz wrote: It can identified 3 levels of FPGA resources, which can be nested one on the others: 1. Whole FPGA. If used discrete FPGA, than even today it might be pass through to the VM. Can you explain why this would ever be useful? IOW, what can a VM do with an entire FPGA? 2. Region in FPGA. Some of the FPGA models can be divided into regions or slots. Also, for some model it is possible to (re)program such region individually - in this case there is a possibility to pass entire slot to the VM, so that it might be possible to reprogram such slot, and utilize the algorithm within the VM. Why would a VM program the slot? Wouldn’t it usually be at the host level? Are there no cases where a VM might want to download a proprietary program into an FPGA? 3. Accelerator in region/FPGA. If there is an accelerator programmed in the slot, it is possible, that such accelerator provides us with Virtual Functions (similar to the SR-IOV), than every available VF can be treated as a resource. This is my understanding of what would be consumable: the slot / VF, which the VM could take advantage of. 4. It might be also necessary to track every VF individually, although I didn't assumed it will be needed, nevertheless with nested resources it should be easy to handle it. I’m still not seeing the need for nesting. If you have a single FPGA with 8 slots, when you program the slots with accelerators, you now have 8 consumable resources. The fact that they came from a particular FPGA unit doesn’t seem relevant from a scheduling perspective. If you want to be able to provide an FPGA as either a whole un-programmed FPGA or as pre-programmed resources, you'd presumably need to know which whole FPGAs are available and which have been fractionally allocated, no? I agree that if you are only going to have the host program the FPGA and then make the resources available then the scheduler doesn't need to know about whole FPGAs. Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FPGA as a dynamic nested resources
On Jul 19, 2016, at 11:03 AM, Roman Dobosz wrote: > It can identified 3 levels of FPGA resources, which can be nested one > on the others: > > 1. Whole FPGA. If used discrete FPGA, than even today it might be pass > through to the VM. Can you explain why this would ever be useful? IOW, what can a VM do with an entire FPGA? > 2. Region in FPGA. Some of the FPGA models can be divided into regions > or slots. Also, for some model it is possible to (re)program such > region individually - in this case there is a possibility to pass > entire slot to the VM, so that it might be possible to reprogram > such slot, and utilize the algorithm within the VM. Why would a VM program the slot? Wouldn’t it usually be at the host level? > 3. Accelerator in region/FPGA. If there is an accelerator programmed > in the slot, it is possible, that such accelerator provides us with > Virtual Functions (similar to the SR-IOV), than every available VF > can be treated as a resource. This is my understanding of what would be consumable: the slot / VF, which the VM could take advantage of. > 4. It might be also necessary to track every VF individually, although > I didn't assumed it will be needed, nevertheless with nested > resources it should be easy to handle it. I’m still not seeing the need for nesting. If you have a single FPGA with 8 slots, when you program the slots with accelerators, you now have 8 consumable resources. The fact that they came from a particular FPGA unit doesn’t seem relevant from a scheduling perspective. -- Ed Leafe __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev