Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-29 Thread Roman Dobosz
On Thu, 28 Jul 2016 10:50:08 -0400
Jay Pipes  wrote:

> Roman, great thread, thanks for posting! Comment inline :)

Thanks!

> 
> > It can identified 3 levels of FPGA resources, which can be nested one
> > on the others:
> >
> > 1. Whole FPGA. If used discrete FPGA, than even today it might be pass
> >through to the VM.
> >
> > 2. Region in FPGA. Some of the FPGA models can be divided into regions
> >or slots. Also, for some model it is possible to (re)program such
> >region individually - in this case there is a possibility to pass
> >entire slot to the VM, so that it might be possible to reprogram
> >such slot, and utilize the algorithm within the VM.
> >
> > 3. Accelerator in region/FPGA. If there is an accelerator programmed
> >in the slot, it is possible, that such accelerator provides us with
> >Virtual Functions (similar to the SR-IOV), than every available VF
> >can be treated as a resource.
> >
> > 4. It might be also necessary to track every VF individually, although
> >I didn't assumed it will be needed, nevertheless with nested
> >resources it should be easy to handle it.
> >
> > Correlation between such resources are a bit different from NUMA -
> > while in NUMA case there is a possibility to either schedule a VM with
> > some memory specified, or request memory within NUMA cell, in FPGA if
> > there is slot taken, or accelerator already programmed and used, there
> > is no way to offer FPGA as a whole to the tenant, until all
> > accelerators and slots are free.
> >
> > I've followed Jay idea about nested resources and having in mind
> > blueprint[2] regarding dynamic resources I've prepared how it fit in.
> >
> 
> >
> > To get id of resource of type acceleratorX to allocate 8 VF:
> >
> >
> > SELECT rp.id
> > FROM resource_providers rp
> > LEFT JOIN allocations al ON al.resource_provider_id = rp.id
> > LEFT JOIN inventories iv ON iv.resource_provider_id = rp.id
> > WHERE al.resource_class_id = 1668
> > AND (iv.total - COALESCE(al.used, 0)) >= 8;
> 
> Right idea, yes, but you would need to INNER JOIN inventories and LEFT 
> JOIN from the winnowed set of inventory records to a grouped projection 
> of allocations. :)
> 
> The SQL would be this:
> 
> SELECT rp.id
> FROM resource_providers rp
> INNER JOIN inventories iv
> ON rp.id = iv.resource_provider_id
> AND iv.resource_class_id = 1688
> LEFT JOIN (
>SELECT resource_provider_id, SUM(used) as used
>FROM allocations
>WHERE resource_class_id = 1688
>GROUP BY resource_provider_id
> ) AS al
> ON iv.resource_provider_id = al.id
> WHERE (iv.total - COALESCE(al.used, 0)) >= 8;

Hm. I'm getting same results using the both queries. Certainly, I can't
see something obvious here, and for sure I'm no sql expert :)

> The other SQL queries you listed had a couple errors, but the ideas were 
> mostly sound. I'll include the FPGA use cases when I write up the nested 
> resource providers spec proposal.

Great, thank you!

> The only thing I'd say is that I was envisioning the dynamic resource 
> classes for FPGAs to be the resource context to an already-flashed 
> algorithm, not to the FPGA root device (or a region even). But, who 
> knows, perhaps we can work something out. More discussion on the spec...

For sure, we can start from defining basic case, and expand it if
needed.

-- 
Cheers,
Roman Dobosz

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-28 Thread Jay Pipes

On 07/20/2016 05:07 AM, Daniel P. Berrange wrote:

For FPGA, I'd like to see an initial proposal that assumed the FPGA
is pre-programmed & pre-divided into a fixed number of slots and simply
deal with this.


For the record, this is precisely what is described in the first version 
of the dynamic-resource-classes use cases section:


https://review.openstack.org/#/c/312696/1/specs/newton/approved/resource-providers-dynamic-resource-classes.rst

See starting at line 193.

This level of details was removed in the second revision of the spec, 
which simply focuses on the CRUD operations to add to the placement REST 
API for these user-defined resource classes.


All the best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-28 Thread Harm Sluiman

> On Jul 28, 2016, at 7:57 AM, Jay Pipes  wrote:
> 
> On 07/19/2016 06:51 PM, Ed Leafe wrote:
>> On Jul 19, 2016, at 2:58 PM, Chris Friesen
>>  wrote:
 Why would a VM program the slot? Wouldn’t it usually be at the
 host level?
>>> 
>>> Are there no cases where a VM might want to download a proprietary
>>> program into an FPGA?
>> 
>> That doesn’t sound right to me, but maybe I’m just not that familiar
>> with FPGA specifics. In general, VMs don’t control their hosts.
> 
> Oh, but in NFV-land they most certainly do. :/
> 
> It's commonplace now to see NFV use cases where VMs are provided passthrough 
> access to an SR-IOV physical function on the host and the VMs application 
> code then controls and allocates at will virtual functions from that physical 
> function. Once that happens, yes, it's true that Nova no longer has any clue 
> about the resource usage of VFs on that host device -- it's essentially at 
> that point totally up to the VNF software to properly manage and maintain 
> access to those VFs and allocate/free resources as needed on the host device.
> 
Agreed as a statement of today. 
Once the “VM” application has what looks like dedicated FPGA resources to it, 
it typically does both management and optionally the actual application 
workload. That typically includes loading the bitstream on the device as well 
and then executing API calls to the service it then provides. This can all be 
done now with PCIe/SR-IOV , which is great….

But the generic boards are getting bigger and we often want greater utilization 
of them and to virtualize and manage them separately from the VM based 
application code that may utilize them. In other words these “funky” devices 
are becoming hosts for dynamically loaded services. While a key first step to 
enable allocating the virtual region of the device to a VM when it is 
provisioned, we may want to enable separating management from data plane (aka 
workload) and support dynamic service consumption through more than network 
connections.

VNFs are a use case for sure and a dominant one, but now that we have NICs on 
these large boards and also want to support service chaining, we have the 
opportunity to do that without consuming many CPU cycles. When I can push 
firewall, or ipsec or compression to the “NIC” and not use CPU cycles, why not 
;-), and why not share it to other nearby VMs.

Then take it past VNFs to other workloads that can exploit FPGA...


> Same goes for FPGAs. VNF vendors want access to the physical host device and 
> want to be able to do with that host device whatever they please.
> 
> As I wrote on Twitter recently, NFV is changing software-defined 
> infrastructure to instead be hardware-defined software.
> 
> It's a funky new* world we live in, Ed :)
> 
> -jay
> 
> * new == old == new again.
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-28 Thread Jay Pipes

On 07/19/2016 06:51 PM, Ed Leafe wrote:

On Jul 19, 2016, at 2:58 PM, Chris Friesen
 wrote:

Why would a VM program the slot? Wouldn’t it usually be at the
host level?


Are there no cases where a VM might want to download a proprietary
program into an FPGA?


That doesn’t sound right to me, but maybe I’m just not that familiar
with FPGA specifics. In general, VMs don’t control their hosts.


Oh, but in NFV-land they most certainly do. :/

It's commonplace now to see NFV use cases where VMs are provided 
passthrough access to an SR-IOV physical function on the host and the 
VMs application code then controls and allocates at will virtual 
functions from that physical function. Once that happens, yes, it's true 
that Nova no longer has any clue about the resource usage of VFs on that 
host device -- it's essentially at that point totally up to the VNF 
software to properly manage and maintain access to those VFs and 
allocate/free resources as needed on the host device.


Same goes for FPGAs. VNF vendors want access to the physical host device 
and want to be able to do with that host device whatever they please.


As I wrote on Twitter recently, NFV is changing software-defined 
infrastructure to instead be hardware-defined software.


It's a funky new* world we live in, Ed :)

-jay

* new == old == new again.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-28 Thread Jay Pipes

Roman, great thread, thanks for posting! Comment inline :)

On 07/19/2016 02:03 PM, Roman Dobosz wrote:


It can identified 3 levels of FPGA resources, which can be nested one
on the others:

1. Whole FPGA. If used discrete FPGA, than even today it might be pass
   through to the VM.

2. Region in FPGA. Some of the FPGA models can be divided into regions
   or slots. Also, for some model it is possible to (re)program such
   region individually - in this case there is a possibility to pass
   entire slot to the VM, so that it might be possible to reprogram
   such slot, and utilize the algorithm within the VM.

3. Accelerator in region/FPGA. If there is an accelerator programmed
   in the slot, it is possible, that such accelerator provides us with
   Virtual Functions (similar to the SR-IOV), than every available VF
   can be treated as a resource.

4. It might be also necessary to track every VF individually, although
   I didn't assumed it will be needed, nevertheless with nested
   resources it should be easy to handle it.

Correlation between such resources are a bit different from NUMA -
while in NUMA case there is a possibility to either schedule a VM with
some memory specified, or request memory within NUMA cell, in FPGA if
there is slot taken, or accelerator already programmed and used, there
is no way to offer FPGA as a whole to the tenant, until all
accelerators and slots are free.

I've followed Jay idea about nested resources and having in mind
blueprint[2] regarding dynamic resources I've prepared how it fit in.





To get id of resource of type acceleratorX to allocate 8 VF:


SELECT rp.id
FROM resource_providers rp
LEFT JOIN allocations al ON al.resource_provider_id = rp.id
LEFT JOIN inventories iv ON iv.resource_provider_id = rp.id
WHERE al.resource_class_id = 1668
AND (iv.total - COALESCE(al.used, 0)) >= 8;


Right idea, yes, but you would need to INNER JOIN inventories and LEFT 
JOIN from the winnowed set of inventory records to a grouped projection 
of allocations. :)


The SQL would be this:

SELECT rp.id
FROM resource_providers rp
INNER JOIN inventories iv
ON rp.id = iv.resource_provider_id
AND iv.resource_class_id = 1688
LEFT JOIN (
  SELECT resource_provider_id, SUM(used) as used
  FROM allocations
  WHERE resource_class_id = 1688
  GROUP BY resource_provider_id
) AS al
ON iv.resource_provider_id = al.id
WHERE (iv.total - COALESCE(al.used, 0)) >= 8;

The other SQL queries you listed had a couple errors, but the ideas were 
mostly sound. I'll include the FPGA use cases when I write up the nested 
resource providers spec proposal.


The only thing I'd say is that I was envisioning the dynamic resource 
classes for FPGAs to be the resource context to an already-flashed 
algorithm, not to the FPGA root device (or a region even). But, who 
knows, perhaps we can work something out. More discussion on the spec...


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-21 Thread Harm Sluiman
On Jul 21, 2016 5:12 AM, "Daniel P. Berrange"  wrote:
>
> On Thu, Jul 21, 2016 at 07:54:48AM +0200, Roman Dobosz wrote:
> > On Wed, 20 Jul 2016 10:07:12 +0100
> > "Daniel P. Berrange"  wrote:
> >
> > Hey Daniel, thanks for the feedback.
> >
> > > > Thoughts?
> > >
> > > I'd suggest you'll increase your chances of success with nova design
> > > approval if you focus on implementing a really simple usage scheme for
> > > FPGA as the first step in Nova.
> >
> > This. Maybe I'm wrong, but for me the minimal use case for FPGA would
> > be ability to schedule VM which need certain accelerator from multiple
> > potential ones on available FPGA/fixed slot. How insane does it sound?
> >
> > Providing fixed, prepared earlier by DC administrator accelerator
> > resource, doesn't bring much value, beyond what we already have in
> > Nova, since PCI/SR-IOV passthrough might be used for accelerators,
> > which expose their functionality via VF.
>
> IIUC, there's plenty of FPGAs which are not SRIOV based, so there's
> still scope for Nova enhancement in this area.
>
> The fact that some FPGAs are SRIOV & some are not though, is is also
> why I'm suggesting that any work related to FPGA should be based around
> refactoring of the existing PCI device assignment model to form a more
> generic "Hardware device assignment" model.  If we end up having a
> completely distinct data model for FPGAs that is a failure. We need to
> have a generalized hardware assignment model that can be used for generic
> PCI devices, NICs, FPGAs, TPMs, GPUs, etc regardless of whether they
> are backed by SRIOV, or their own non-PCI virtual functions. Personally
> I'll reject any spec proposal that ignores existing PCI framework and
> introduces a separate model for FPGA.
>
> > > All the threads I've see go well off into the weeds about trying to
> > > solve everybody's niche/edge cases  perfectly and as a result get
> > > very complicated.
> >
> > The topic is complicated :)
>
> Which is why i'm advising to not try to solve the perfect case and instead
> focus on getting something simple & good enough for common case.
>
I think the simple use cases can be covered today for PCIe SR-IOV config
easily and some number of VFs are applied to regions of a pre-initialized
board.  I know of successful deployments that do the initialization with
ironic and use nova to allocate the PCIe SR-IOV access using existing
extension points. Once allocated the actual function bitstream gets pushed
in by the owning VM. The application owners manage concurrency. This level
of support could be made mainstream rather than custom extension as a first
step and then add support for alternatives to PCIe based connections.

That said there are many use cases in play today outside of openstack
unfortunately that manage the loading of the bitstream that implements a
specific function. The desire is to load those bitstreams and manage a life
cycle just like we manage a VM and image today. In effect the static region
of the FPGA has the role of a very simple hypervisor.

FPGA boards are getting denser and more common, and they are getting their
own peripherals like on board NICs, serial ports, storage etc.

I don't believe we need to expose complicated physical structure to
management, but a device with the ability to be virtualized and dynamically
programmed  and has connection to the other infrastructure in the
environment needs to be managed withe things it connects to.

I suggest the following :
First standardize how to describe and allocate a real or virtualized FPGA.
Specify the meta data and related filter rules.
Second, mirror the glance/nova process of image loading on hypervisor for
bitstream loading of a reProgramable Region.
Third keep the functions of the actual bit stream separate from the above
management just like we do with VM or container functional capabilities.

When the lifecycle of the PR is tied to a VM, just like ephemeral storage,
driving allocation from nova seems to make the most sense.

Am I way out of line?
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
:|
> |: http://libvirt.org  -o- http://virt-manager.org
:|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/
:|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc
:|
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-21 Thread Daniel P. Berrange
On Thu, Jul 21, 2016 at 07:54:48AM +0200, Roman Dobosz wrote:
> On Wed, 20 Jul 2016 10:07:12 +0100
> "Daniel P. Berrange"  wrote:
> 
> Hey Daniel, thanks for the feedback.
> 
> > > Thoughts?
> > 
> > I'd suggest you'll increase your chances of success with nova design
> > approval if you focus on implementing a really simple usage scheme for
> > FPGA as the first step in Nova.
> 
> This. Maybe I'm wrong, but for me the minimal use case for FPGA would
> be ability to schedule VM which need certain accelerator from multiple
> potential ones on available FPGA/fixed slot. How insane does it sound?
> 
> Providing fixed, prepared earlier by DC administrator accelerator
> resource, doesn't bring much value, beyond what we already have in
> Nova, since PCI/SR-IOV passthrough might be used for accelerators,
> which expose their functionality via VF.

IIUC, there's plenty of FPGAs which are not SRIOV based, so there's
still scope for Nova enhancement in this area.

The fact that some FPGAs are SRIOV & some are not though, is is also
why I'm suggesting that any work related to FPGA should be based around
refactoring of the existing PCI device assignment model to form a more
generic "Hardware device assignment" model.  If we end up having a
completely distinct data model for FPGAs that is a failure. We need to
have a generalized hardware assignment model that can be used for generic
PCI devices, NICs, FPGAs, TPMs, GPUs, etc regardless of whether they
are backed by SRIOV, or their own non-PCI virtual functions. Personally
I'll reject any spec proposal that ignores existing PCI framework and
introduces a separate model for FPGA.

> > All the threads I've see go well off into the weeds about trying to 
> > solve everybody's niche/edge cases  perfectly and as a result get 
> > very complicated.
> 
> The topic is complicated :)

Which is why i'm advising to not try to solve the perfect case and instead
focus on getting something simple & good enough for common case.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-21 Thread Roman Dobosz
On Thu, 21 Jul 2016 08:56:07 +0800
Fei K Chen  wrote:

> > Unless you have one FPGA with 8 slots, which can become FPGA with 4
> > slots. From scheduling perspective you have to know, which FPGA
> > resources can be reconfigured, and which not, isn't it? Also, AFAIRC
> > to provide VM with VF, there is a need for providing libvirt with
> > address of such VF, right? That's why I've putted this last point.
> >
> > The whole idea of getting FPGA as resource is its ability to swap
> > resources on demand. So it can be thought of as several available
> > hardware (means - accelerators, consumable by VMs) which most of the
> > time are not programmed in certain moment.
> >
> Let's have more thought about the resource swapping. The number of 
> run-time accelerators is not limited by the number of region/slot. 
> Inside FPGA, there can be some self-scheduling logic to schedule 
> accelerators on regions by using the fast partial reconfiguration. 
> It is not new, there are lots of such design in FPGA academic.

Right, but not all devices have such functionality. And we are trying 
to make this solution common for most FPGA, right?

-- 
Cheers,
Roman Dobosz

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-21 Thread Roman Dobosz
On Thu, 21 Jul 2016 08:44:21 +0800
Fei K Chen  wrote:

> > 4. It might be also necessary to track every VF individually, although
> >I didn't assumed it will be needed, nevertheless with nested
> >resources it should be easy to handle it.
> You need. For example you have 4 region and 8 VF. Some region is 
> configured with an accelerator so it can be shared to multi-VM (each 
> consume a VF). But some other region is configured with private 
> exclusive accelerator so it can only be bind to one VF. That's why 
> we need to track both region and VF.

Well, it depends. If there is no difference between the VF (all 
provides the same functionality) and we don't really care about the 
placement (external entity would take care of this) than we don't need 
this level. All the information will be hold by resource inventory and 
allocation.

OTOH if we need to store the information which VF is passed to which 
VM, than probably we need this level, or store VF addresses in 
inventory/allocation in some new filed.

-- 
Cheers,
Roman Dobosz

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-20 Thread Roman Dobosz
On Wed, 20 Jul 2016 10:07:12 +0100
"Daniel P. Berrange"  wrote:

Hey Daniel, thanks for the feedback.

> > Thoughts?
> 
> I'd suggest you'll increase your chances of success with nova design
> approval if you focus on implementing a really simple usage scheme for
> FPGA as the first step in Nova.

This. Maybe I'm wrong, but for me the minimal use case for FPGA would
be ability to schedule VM which need certain accelerator from multiple
potential ones on available FPGA/fixed slot. How insane does it sound?

Providing fixed, prepared earlier by DC administrator accelerator
resource, doesn't bring much value, beyond what we already have in
Nova, since PCI/SR-IOV passthrough might be used for accelerators,
which expose their functionality via VF.

> All the threads I've see go well off into the weeds about trying to 
> solve everybody's niche/edge cases  perfectly and as a result get 
> very complicated.

The topic is complicated :)

> For both NUMA and PCI dev assignment we got initial success by cutting
> back scope and focusing on the doing the minimum possible to satisfy
> the 90% common use cases, and ignoring the less common 10% initially.
> Yes this is not optimal, but it is good enough to keep most people
> happy without introducing massive complexity into the designs & impl.
> 
> For FPGA, I'd like to see an initial proposal that assumed the FPGA
> is pre-programmed & pre-divided into a fixed number of slots and simply
> deal with this. This is similar to how we dealt with PCI SR-IOV initially
> where we assumed the dev is in VF-mode only. Only later did we start to
> add cleverness around switching VF vs PF mode. For FPGA I think any kind
> of dynamic re-allocation/re-configuration is better done as a stage 2

Okay. That sounds reasonable.

-- 
Cheers,
Roman Dobosz

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-20 Thread Fei K Chen


Roman Dobosz <roman.dob...@intel.com> wrote on 2016/07/20 15:25:28:

> From: Roman Dobosz <roman.dob...@intel.com>
> To: "OpenStack Development Mailing List (not for usage questions)"
> <openstack-dev@lists.openstack.org>
> Cc: Ed Leafe <e...@leafe.com>
> Date: 2016/07/20 15:30
> Subject: Re: [openstack-dev] FPGA as a dynamic nested resources
>
>
> > > 4. It might be also necessary to track every VF individually,
although
> > >   I didn't assumed it will be needed, nevertheless with nested
> > >   resources it should be easy to handle it.
> >
> > I’m still not seeing the need for nesting. If you have a single FPGA
> > with 8 slots, when you program the slots with accelerators, you now
> > have 8 consumable resources. The fact that they came from a
> > particular FPGA unit doesn’t seem relevant from a scheduling
> > perspective.
>
> Unless you have one FPGA with 8 slots, which can become FPGA with 4
> slots. From scheduling perspective you have to know, which FPGA
> resources can be reconfigured, and which not, isn't it? Also, AFAIRC
> to provide VM with VF, there is a need for providing libvirt with
> address of such VF, right? That's why I've putted this last point.
>
> The whole idea of getting FPGA as resource is its ability to swap
> resources on demand. So it can be thought of as several available
> hardware (means - accelerators, consumable by VMs) which most of the
> time are not programmed in certain moment.
>
Let's have more thought about the resource swapping. The number of run-time
accelerators is not limited by the number of region/slot. Inside FPGA,
there
can be some self-scheduling logic to schedule accelerators on regions by
using
the fast partial reconfiguration. It is not new, there are lots of such
design in FPGA academic.


-- Fei Chen
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-20 Thread Fei K Chen


Roman Dobosz <roman.dob...@intel.com> wrote on 2016/07/20 02:03:28:

> From: Roman Dobosz <roman.dob...@intel.com>
> To: openstack-dev <openstack-dev@lists.openstack.org>
> Date: 2016/07/20 02:07
> Subject: [openstack-dev] FPGA as a dynamic nested resources
>
> Hi all,
>
> Some time ago Jay Pipes published etherpad[1] with ideas around
> modelling nested resources, taking NUMA as an example. I was also
> encouraged ;) to start this thread, on last Nova scheduler meeting.
>
> I was read mentioned etherpad and what hits me was that described
> scenario with NUMA cells resembles the way how FPGA can be managed. In
> some extent.
>
> NUMA cell can be treated as a vessel for memory cells, and it is
> expressed as number of MB. So it is possible to extract the
> information from existing data and add another level of aggregation
> using only clever prepared SQL query.
>
> I think, that problem might be broader, than using existing, tweaked a
> bit model. If we take a look into resources, which FPGA may expose,
> than it can be couple of levels, and each of them can be treated as
> resource.
>
> It can identified 3 levels of FPGA resources, which can be nested one
> on the others:
>
> 1. Whole FPGA. If used discrete FPGA, than even today it might be pass
>through to the VM.
>
> 2. Region in FPGA. Some of the FPGA models can be divided into regions
>or slots. Also, for some model it is possible to (re)program such
>region individually - in this case there is a possibility to pass
>entire slot to the VM, so that it might be possible to reprogram
>such slot, and utilize the algorithm within the VM.
>
> 3. Accelerator in region/FPGA. If there is an accelerator programmed
>in the slot, it is possible, that such accelerator provides us with
>Virtual Functions (similar to the SR-IOV), than every available VF
>can be treated as a resource.
>
> 4. It might be also necessary to track every VF individually, although
>I didn't assumed it will be needed, nevertheless with nested
>resources it should be easy to handle it.
You need. For example you have 4 region and 8 VF. Some region is configured
with an accelerator so it can be shared to multi-VM (each consume a VF).
But
some other region is configured with private exclusive accelerator so it
can
only be bind to one VF. That's why we need to track both region and VF.

>
> Correlation between such resources are a bit different from NUMA -
> while in NUMA case there is a possibility to either schedule a VM with
> some memory specified, or request memory within NUMA cell, in FPGA if
> there is slot taken, or accelerator already programmed and used, there
> is no way to offer FPGA as a whole to the tenant, until all
> accelerators and slots are free.
>
> I've followed Jay idea about nested resources and having in mind
> blueprint[2] regarding dynamic resources I've prepared how it fit in.
>
> Tables are unchanged - it is a copy-paste from the etherpad[1]:
>
>
> CREATE TABLE resource_providers (
> id INT NOT NULL AUTOINCREMENT PRIMARY KEY,
> uuid CHAR(36) NOT NULL,
> name VARCHAR(100) NULL,
> root_provider_id INT NULL,
> parent_provider_id INT NULL
> );
>
> CREATE TABLE inventories (
> id INT NOT NULL AUTOINCREMENT PRIMARY KEY,
> resource_provider_id INT NOT NULL,
> resource_class_id INT NOT NULL,
> total INT NOT NULL,
> reserved INT NOT NULL,
> min_unit INT NOT NULL,
> max_unit INT NOT NULL,
> step_size INT NOT NULL,
> allocation_ratio INT NOT NULL
> );
>
> CREATE TABLE allocations (
> id INT NOT NULL AUTOINCREMENT PRIMARY KEY,
> resource_provider_id INT NOT NULL,
> consumer_uuid CHAR(36) NOT NULL,
> resource_class_id INT NOT NULL,
> used INT NOT NULL
> );
>
>
> Than lets fill the tables with data of following structure:
>
> -- FPGA-1
> --   +- FPGA-1 slot1 (taken), resource_provider_id:
> --   +- FPGA-1 slot2
> --   +- FPGA-1 slot2 acceleratorX
> --   +- FPGA-1 slot2 acceleratorX VF1 (taken)
> --   +- FPGA-1 slot2 acceleratorX VF2 (taken)
> --   +- FPGA-1 slot2 acceleratorX VF3 (taken)
> --   +- FPGA-1 slot2 acceleratorX VF4 (taken)
> --   +- FPGA-1 slot2 acceleratorX VF5
> --   +- ..
> --   +- FPGA-1 slot2 acceleratorX VF32
> --   +- FPGA-1 slot3
> -- FPGA-2
> --   +- FPGA-2 slot1
>
> where FPGA-1 and FPGA-2 are hosts with FPGA on board. There is also
> assumed, that new dynamic resources are created: id 1666 means 'FPGA'
> (although it might be simply standard class, which will be hardcoded
> ENUM), 1667 means 'FPGA slot' and 1668 'FPGA accelerator'.
>
>

Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-20 Thread Ed Leafe
On Jul 20, 2016, at 2:07 AM, Daniel P. Berrange  wrote:

> For FPGA, I'd like to see an initial proposal that assumed the FPGA
> is pre-programmed & pre-divided into a fixed number of slots and simply
> deal with this. This is similar to how we dealt with PCI SR-IOV initially
> where we assumed the dev is in VF-mode only. Only later did we start to
> add cleverness around switching VF vs PF mode. For FPGA I think any kind
> of dynamic re-allocation/re-configuration is better done as a stage 2

+1 to this approach. I’m not convinced yet that Nova should be in the business 
of FPGA management, but once we get the basic functionality supporting FPGA 
working well, seeing what would be needed to add it would be much easier, and 
we could make a clearer determination as to whether this is feasible or not.


-- Ed Leafe






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-20 Thread Daniel P. Berrange
On Tue, Jul 19, 2016 at 08:03:28PM +0200, Roman Dobosz wrote:
> Hi all,
> 
> Some time ago Jay Pipes published etherpad[1] with ideas around
> modelling nested resources, taking NUMA as an example. I was also
> encouraged ;) to start this thread, on last Nova scheduler meeting.
> 
> I was read mentioned etherpad and what hits me was that described
> scenario with NUMA cells resembles the way how FPGA can be managed. In
> some extent.
> 
> NUMA cell can be treated as a vessel for memory cells, and it is
> expressed as number of MB. So it is possible to extract the
> information from existing data and add another level of aggregation
> using only clever prepared SQL query.
> 
> I think, that problem might be broader, than using existing, tweaked a
> bit model. If we take a look into resources, which FPGA may expose,
> than it can be couple of levels, and each of them can be treated as
> resource.
> 
> It can identified 3 levels of FPGA resources, which can be nested one
> on the others:
> 
> 1. Whole FPGA. If used discrete FPGA, than even today it might be pass
>through to the VM.
> 
> 2. Region in FPGA. Some of the FPGA models can be divided into regions
>or slots. Also, for some model it is possible to (re)program such
>region individually - in this case there is a possibility to pass
>entire slot to the VM, so that it might be possible to reprogram
>such slot, and utilize the algorithm within the VM.
> 
> 3. Accelerator in region/FPGA. If there is an accelerator programmed
>in the slot, it is possible, that such accelerator provides us with
>Virtual Functions (similar to the SR-IOV), than every available VF
>can be treated as a resource.
> 
> 4. It might be also necessary to track every VF individually, although
>I didn't assumed it will be needed, nevertheless with nested
>resources it should be easy to handle it.
> 
> Correlation between such resources are a bit different from NUMA -
> while in NUMA case there is a possibility to either schedule a VM with
> some memory specified, or request memory within NUMA cell, in FPGA if
> there is slot taken, or accelerator already programmed and used, there
> is no way to offer FPGA as a whole to the tenant, until all
> accelerators and slots are free.
> 
> I've followed Jay idea about nested resources and having in mind
> blueprint[2] regarding dynamic resources I've prepared how it fit in.

[snip lots of complicated modelling]

> Thoughts?

I'd suggest you'll increase your chances of success with nova design
approval if you focus on implementing a really simple usage scheme for
FPGA as the first step in Nova. All the threads I've see go well off
into the weeds about trying to solve everybody's niche/edge cases
perfectly and as a result get very complicated.

For both NUMA and PCI dev assignment we got initial success by cutting
back scope and focusing on the doing the minimum possible to satisfy
the 90% common use cases, and ignoring the less common 10% initially.
Yes this is not optimal, but it is good enough to keep most people
happy without introducing massive complexity into the designs & impl.

For FPGA, I'd like to see an initial proposal that assumed the FPGA
is pre-programmed & pre-divided into a fixed number of slots and simply
deal with this. This is similar to how we dealt with PCI SR-IOV initially
where we assumed the dev is in VF-mode only. Only later did we start to
add cleverness around switching VF vs PF mode. For FPGA I think any kind
of dynamic re-allocation/re-configuration is better done as a stage 2

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-20 Thread Roman Dobosz
On Tue, 19 Jul 2016 15:51:26 -0700
Ed Leafe  wrote:

> >> Why would a VM program the slot? Wouldn’t it usually be at the
> >> host level?
> >
> > Are there no cases where a VM might want to download a proprietary
> > program into an FPGA?
>
> That doesn’t sound right to me, but maybe I’m just not that familiar
> with FPGA specifics. In general, VMs don’t control their hosts. It
> would also bring up some complications, such as what should happen
> when you delete that VM: does the FPGA have to be reset to its
> original state?

Technically, it is not necessary to "erase" the FPGA. It might be
untouched, and in resource tracker it would be figured as free, so
than it can be programmed another accelerator, or passed to another VM
if needed. It may be also zeroed (programmed with empty IP) by
external entity which might be preferred option.

> >> I’m still not seeing the need for nesting. If you have a single
> >> FPGA with 8 slots, when you program the slots with accelerators,
> >> you now have 8 consumable resources. The fact that they came from
> >> a particular FPGA unit doesn’t seem relevant from a scheduling
> >> perspective.
> >
> > If you want to be able to provide an FPGA as either a whole
> > un-programmed FPGA or as pre-programmed resources, you'd
> > presumably need to know which whole FPGAs are available and which
> > have been fractionally allocated, no?
>
> An unprogrammed FPGA is a particular resource class. When you
> program it, you are removing one of that class and creating one or
> more of a new resource class (e.g., an encryption accelerator
> program). There isn’t a need to nest anything.

Although you have to track *where* you can schedule potential
accelerator, isn't it? Certain type of IP will need proper slot, so it
also have to be tracked. Nesting isn't necessary, but might be helpful
to manage the state of your resources.

> > I agree that if you are only going to have the host program the
> > FPGA and then make the resources available then the scheduler
> > doesn't need to know about whole FPGAs.
>
> That was where we left the discussion in Austin, so that was my
> assumption.

… as the first step, isn't it? No one is pushing to have this in
Newton. Even Ocata time frame seems like unrealistic.

-- 
Cheers,
Roman Dobosz

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-20 Thread Roman Dobosz
On Tue, 19 Jul 2016 12:40:50 -0700
Ed Leafe  wrote:

> > It can identified 3 levels of FPGA resources, which can be nested one
> > on the others:
> >
> > 1. Whole FPGA. If used discrete FPGA, than even today it might be pass
> >   through to the VM.
> Can you explain why this would ever be useful? IOW, what can a VM do
> with an entire FPGA?

Private cloud, development purposes. It could be treated as FPGA-aaS.
Same goes for the unoccupied slots. Of course, because reprogramming
affects real, not virtualized hardware, there are security concerns
for allowing users to do that in public clouds for example.

The other reason, which is much more significant, there could be IPs
so big, that will take most of FPGA - so reconfiguration is needed
(although reconfiguration is out of scope for Nova) and we don't want
to wipe out other accelerator which might be currently in use. That's
why slots here also are treated as dynamic resources.

> > 3. Accelerator in region/FPGA. If there is an accelerator programmed
> >   in the slot, it is possible, that such accelerator provides us with
> >   Virtual Functions (similar to the SR-IOV), than every available VF
> >   can be treated as a resource.
>
> This is my understanding of what would be consumable: the slot / VF,
> which the VM could take advantage of.

Yes. That's the obvious scenario.

> > 4. It might be also necessary to track every VF individually, although
> >   I didn't assumed it will be needed, nevertheless with nested
> >   resources it should be easy to handle it.
>
> I’m still not seeing the need for nesting. If you have a single FPGA
> with 8 slots, when you program the slots with accelerators, you now
> have 8 consumable resources. The fact that they came from a
> particular FPGA unit doesn’t seem relevant from a scheduling
> perspective.

Unless you have one FPGA with 8 slots, which can become FPGA with 4
slots. From scheduling perspective you have to know, which FPGA
resources can be reconfigured, and which not, isn't it? Also, AFAIRC
to provide VM with VF, there is a need for providing libvirt with
address of such VF, right? That's why I've putted this last point.

The whole idea of getting FPGA as resource is its ability to swap
resources on demand. So it can be thought of as several available
hardware (means - accelerators, consumable by VMs) which most of the
time are not programmed in certain moment.

So, let's assume, that we have two hosts: HostA and HostB with FPGA
capable to provide 2 accelerators which exclusively use entire chip
(lets call them AX1 and AX2), and one other, which can use one of the
2 possible slots (AY). So, the situation is we have 3 possible
accelerators to use, and in worst case scenario only two available
places where they can be places.

Initially, there is no accelerator in use, cloud administrator define
all the IPs he have available (somehow - this part isn't defined yet -
but lets assume it is in place)

Now, user requests VM with certain flavor/image with AX1 and scheduler
knows, that it will fit into HostA and HostB, so HostA is chosen, FPGA
magically™ is prepared to hold AX1 accelerator and VM was started. Now
we have resource tree HostA FPGA->slot->AX1 and HostB FPGA.

Next, user requests another VM with AY accelerator, scheduler now
should know, that the only available option is HostB, so again magic
is happening, and there is a resource tree:

HostA FPGA
 +- slot1
 +- AX1

HostB FPGA
 +- slot1
 +- AY
 +- slot2


Now, what should happen if user remove VM with AY accelerator and
request another VM with AX2?

-- 
Cheers,
Roman Dobosz

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-19 Thread Ed Leafe
On Jul 19, 2016, at 2:58 PM, Chris Friesen  wrote:

>> Why would a VM program the slot? Wouldn’t it usually be at the host level?
> 
> Are there no cases where a VM might want to download a proprietary program 
> into an FPGA?

That doesn’t sound right to me, but maybe I’m just not that familiar with FPGA 
specifics. In general, VMs don’t control their hosts. It would also bring up 
some complications, such as what should happen when you delete that VM: does 
the FPGA have to be reset to its original state?

>> I’m still not seeing the need for nesting. If you have a single FPGA with 8 
>> slots, when you program the slots with accelerators, you now have 8 
>> consumable resources. The fact that they came from a particular FPGA unit 
>> doesn’t seem relevant from a scheduling perspective.
> 
> If you want to be able to provide an FPGA as either a whole un-programmed 
> FPGA or as pre-programmed resources, you'd presumably need to know which 
> whole FPGAs are available and which have been fractionally allocated, no?

An unprogrammed FPGA is a particular resource class. When you program it, you 
are removing one of that class and creating one or more of a new resource class 
(e.g., an encryption accelerator program). There isn’t a need to nest anything.

> I agree that if you are only going to have the host program the FPGA and then 
> make the resources available then the scheduler doesn't need to know about 
> whole FPGAs.

That was where we left the discussion in Austin, so that was my assumption.


-- Ed Leafe






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-19 Thread Chris Friesen

On 07/19/2016 01:40 PM, Ed Leafe wrote:

On Jul 19, 2016, at 11:03 AM, Roman Dobosz  wrote:


It can identified 3 levels of FPGA resources, which can be nested one
on the others:

1. Whole FPGA. If used discrete FPGA, than even today it might be pass
   through to the VM.


Can you explain why this would ever be useful? IOW, what can a VM do with an 
entire FPGA?


2. Region in FPGA. Some of the FPGA models can be divided into regions
   or slots. Also, for some model it is possible to (re)program such
   region individually - in this case there is a possibility to pass
   entire slot to the VM, so that it might be possible to reprogram
   such slot, and utilize the algorithm within the VM.


Why would a VM program the slot? Wouldn’t it usually be at the host level?


Are there no cases where a VM might want to download a proprietary program into 
an FPGA?



3. Accelerator in region/FPGA. If there is an accelerator programmed
   in the slot, it is possible, that such accelerator provides us with
   Virtual Functions (similar to the SR-IOV), than every available VF
   can be treated as a resource.


This is my understanding of what would be consumable: the slot / VF, which the 
VM could take advantage of.


4. It might be also necessary to track every VF individually, although
   I didn't assumed it will be needed, nevertheless with nested
   resources it should be easy to handle it.


I’m still not seeing the need for nesting. If you have a single FPGA with 8 
slots, when you program the slots with accelerators, you now have 8 consumable 
resources. The fact that they came from a particular FPGA unit doesn’t seem 
relevant from a scheduling perspective.


If you want to be able to provide an FPGA as either a whole un-programmed FPGA 
or as pre-programmed resources, you'd presumably need to know which whole FPGAs 
are available and which have been fractionally allocated, no?


I agree that if you are only going to have the host program the FPGA and then 
make the resources available then the scheduler doesn't need to know about whole 
FPGAs.


Chris


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FPGA as a dynamic nested resources

2016-07-19 Thread Ed Leafe
On Jul 19, 2016, at 11:03 AM, Roman Dobosz  wrote:

> It can identified 3 levels of FPGA resources, which can be nested one
> on the others:
> 
> 1. Whole FPGA. If used discrete FPGA, than even today it might be pass
>   through to the VM.

Can you explain why this would ever be useful? IOW, what can a VM do with an 
entire FPGA?

> 2. Region in FPGA. Some of the FPGA models can be divided into regions
>   or slots. Also, for some model it is possible to (re)program such
>   region individually - in this case there is a possibility to pass
>   entire slot to the VM, so that it might be possible to reprogram
>   such slot, and utilize the algorithm within the VM.

Why would a VM program the slot? Wouldn’t it usually be at the host level? 

> 3. Accelerator in region/FPGA. If there is an accelerator programmed
>   in the slot, it is possible, that such accelerator provides us with
>   Virtual Functions (similar to the SR-IOV), than every available VF
>   can be treated as a resource.

This is my understanding of what would be consumable: the slot / VF, which the 
VM could take advantage of.

> 4. It might be also necessary to track every VF individually, although
>   I didn't assumed it will be needed, nevertheless with nested
>   resources it should be easy to handle it.

I’m still not seeing the need for nesting. If you have a single FPGA with 8 
slots, when you program the slots with accelerators, you now have 8 consumable 
resources. The fact that they came from a particular FPGA unit doesn’t seem 
relevant from a scheduling perspective.

-- Ed Leafe






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] FPGA as a dynamic nested resources

2016-07-19 Thread Roman Dobosz
Hi all,

Some time ago Jay Pipes published etherpad[1] with ideas around
modelling nested resources, taking NUMA as an example. I was also
encouraged ;) to start this thread, on last Nova scheduler meeting.

I was read mentioned etherpad and what hits me was that described
scenario with NUMA cells resembles the way how FPGA can be managed. In
some extent.

NUMA cell can be treated as a vessel for memory cells, and it is
expressed as number of MB. So it is possible to extract the
information from existing data and add another level of aggregation
using only clever prepared SQL query.

I think, that problem might be broader, than using existing, tweaked a
bit model. If we take a look into resources, which FPGA may expose,
than it can be couple of levels, and each of them can be treated as
resource.

It can identified 3 levels of FPGA resources, which can be nested one
on the others:

1. Whole FPGA. If used discrete FPGA, than even today it might be pass
   through to the VM.

2. Region in FPGA. Some of the FPGA models can be divided into regions
   or slots. Also, for some model it is possible to (re)program such
   region individually - in this case there is a possibility to pass
   entire slot to the VM, so that it might be possible to reprogram
   such slot, and utilize the algorithm within the VM.

3. Accelerator in region/FPGA. If there is an accelerator programmed
   in the slot, it is possible, that such accelerator provides us with
   Virtual Functions (similar to the SR-IOV), than every available VF
   can be treated as a resource.

4. It might be also necessary to track every VF individually, although
   I didn't assumed it will be needed, nevertheless with nested
   resources it should be easy to handle it.

Correlation between such resources are a bit different from NUMA -
while in NUMA case there is a possibility to either schedule a VM with
some memory specified, or request memory within NUMA cell, in FPGA if
there is slot taken, or accelerator already programmed and used, there
is no way to offer FPGA as a whole to the tenant, until all
accelerators and slots are free.

I've followed Jay idea about nested resources and having in mind
blueprint[2] regarding dynamic resources I've prepared how it fit in.

Tables are unchanged - it is a copy-paste from the etherpad[1]:


CREATE TABLE resource_providers (
id INT NOT NULL AUTOINCREMENT PRIMARY KEY,
uuid CHAR(36) NOT NULL,
name VARCHAR(100) NULL,
root_provider_id INT NULL,
parent_provider_id INT NULL
);

CREATE TABLE inventories (
id INT NOT NULL AUTOINCREMENT PRIMARY KEY,
resource_provider_id INT NOT NULL,
resource_class_id INT NOT NULL,
total INT NOT NULL,
reserved INT NOT NULL,
min_unit INT NOT NULL,
max_unit INT NOT NULL,
step_size INT NOT NULL,
allocation_ratio INT NOT NULL
);

CREATE TABLE allocations (
id INT NOT NULL AUTOINCREMENT PRIMARY KEY,
resource_provider_id INT NOT NULL,
consumer_uuid CHAR(36) NOT NULL,
resource_class_id INT NOT NULL,
used INT NOT NULL
);


Than lets fill the tables with data of following structure:

-- FPGA-1
--   +- FPGA-1 slot1 (taken), resource_provider_id:
--   +- FPGA-1 slot2
--   +- FPGA-1 slot2 acceleratorX
--   +- FPGA-1 slot2 acceleratorX VF1 (taken)
--   +- FPGA-1 slot2 acceleratorX VF2 (taken)
--   +- FPGA-1 slot2 acceleratorX VF3 (taken)
--   +- FPGA-1 slot2 acceleratorX VF4 (taken)
--   +- FPGA-1 slot2 acceleratorX VF5
--   +- ..
--   +- FPGA-1 slot2 acceleratorX VF32
--   +- FPGA-1 slot3
-- FPGA-2
--   +- FPGA-2 slot1

where FPGA-1 and FPGA-2 are hosts with FPGA on board. There is also
assumed, that new dynamic resources are created: id 1666 means 'FPGA'
(although it might be simply standard class, which will be hardcoded
ENUM), 1667 means 'FPGA slot' and 1668 'FPGA accelerator'.


INSERT INTO resource_providers VALUES
(1, '', 'FPGA-1', 1, NULL),
(2, '', 'FPGA-1 slot 1', 1, 1),
(3, '', 'FPGA-1 slot 2', 1, 1),
(4, '', 'FPGA-1 slot 3', 1, 1),
(5, '', 'FPGA-1 slot 2 acceleratorX', 1, 3),
(6, '', 'FPGA-2', 6, NULL),
(7, '', 'FPGA-2 slot', 6, 6);


INSERT INTO inventories VALUES
(1, 1, 1666, 1, 0, 1, 1, 1, 1.0),
(2, 2, 1667, 1, 0, 1, 1, 1, 1.0),
(3, 3, 1667, 1, 0, 1, 1, 1, 1.0),
(4, 4, 1667, 1, 0, 1, 1, 1, 1.0),
(5, 5, 1668, 32, 0, 1, 32, 1, 1.0),
(6, 6, 1666, 1, 0, 1, 1, 1, 1.0),
(7, 7, 1667, 1, 0, 1, 1, 1, 1.0);

INSERT INTO allocations VALUES
(1, 5, '', 1668, 4),
(2, 2, '', 1667, 1);


To get id of resource of type acceleratorX to allocate 8 VF:


SELECT rp.id
FROM resource_providers rp
LEFT JOIN allocations al ON al.resource_provider_id = rp.id
LEFT JOIN inventories iv ON iv.resource_provider_id = rp.id
WHERE al.resource_class_id = 1668
AND (iv.total - COALESCE(al.used, 0)) >= 8;


Note, that I don't have to calculate number of total available VFs in
this case, although it might happen, that user might schedule VM which
requests number of VFs that exceed available VFs