Roman Dobosz <roman.dob...@intel.com> wrote on 2016/07/20 02:03:28:
> From: Roman Dobosz <roman.dob...@intel.com> > To: openstack-dev <openstack-dev@lists.openstack.org> > Date: 2016/07/20 02:07 > Subject: [openstack-dev] FPGA as a dynamic nested resources > > Hi all, > > Some time ago Jay Pipes published etherpad[1] with ideas around > modelling nested resources, taking NUMA as an example. I was also > encouraged ;) to start this thread, on last Nova scheduler meeting. > > I was read mentioned etherpad and what hits me was that described > scenario with NUMA cells resembles the way how FPGA can be managed. In > some extent. > > NUMA cell can be treated as a vessel for memory cells, and it is > expressed as number of MB. So it is possible to extract the > information from existing data and add another level of aggregation > using only clever prepared SQL query. > > I think, that problem might be broader, than using existing, tweaked a > bit model. If we take a look into resources, which FPGA may expose, > than it can be couple of levels, and each of them can be treated as > resource. > > It can identified 3 levels of FPGA resources, which can be nested one > on the others: > > 1. Whole FPGA. If used discrete FPGA, than even today it might be pass > through to the VM. > > 2. Region in FPGA. Some of the FPGA models can be divided into regions > or slots. Also, for some model it is possible to (re)program such > region individually - in this case there is a possibility to pass > entire slot to the VM, so that it might be possible to reprogram > such slot, and utilize the algorithm within the VM. > > 3. Accelerator in region/FPGA. If there is an accelerator programmed > in the slot, it is possible, that such accelerator provides us with > Virtual Functions (similar to the SR-IOV), than every available VF > can be treated as a resource. > > 4. It might be also necessary to track every VF individually, although > I didn't assumed it will be needed, nevertheless with nested > resources it should be easy to handle it. You need. For example you have 4 region and 8 VF. Some region is configured with an accelerator so it can be shared to multi-VM (each consume a VF). But some other region is configured with private exclusive accelerator so it can only be bind to one VF. That's why we need to track both region and VF. > > Correlation between such resources are a bit different from NUMA - > while in NUMA case there is a possibility to either schedule a VM with > some memory specified, or request memory within NUMA cell, in FPGA if > there is slot taken, or accelerator already programmed and used, there > is no way to offer FPGA as a whole to the tenant, until all > accelerators and slots are free. > > I've followed Jay idea about nested resources and having in mind > blueprint[2] regarding dynamic resources I've prepared how it fit in. > > Tables are unchanged - it is a copy-paste from the etherpad[1]: > > > CREATE TABLE resource_providers ( > id INT NOT NULL AUTOINCREMENT PRIMARY KEY, > uuid CHAR(36) NOT NULL, > name VARCHAR(100) NULL, > root_provider_id INT NULL, > parent_provider_id INT NULL > ); > > CREATE TABLE inventories ( > id INT NOT NULL AUTOINCREMENT PRIMARY KEY, > resource_provider_id INT NOT NULL, > resource_class_id INT NOT NULL, > total INT NOT NULL, > reserved INT NOT NULL, > min_unit INT NOT NULL, > max_unit INT NOT NULL, > step_size INT NOT NULL, > allocation_ratio INT NOT NULL > ); > > CREATE TABLE allocations ( > id INT NOT NULL AUTOINCREMENT PRIMARY KEY, > resource_provider_id INT NOT NULL, > consumer_uuid CHAR(36) NOT NULL, > resource_class_id INT NOT NULL, > used INT NOT NULL > ); > > > Than lets fill the tables with data of following structure: > > -- FPGA-1 > -- +- FPGA-1 slot1 (taken), resource_provider_id: > -- +- FPGA-1 slot2 > -- +- FPGA-1 slot2 acceleratorX > -- +- FPGA-1 slot2 acceleratorX VF1 (taken) > -- +- FPGA-1 slot2 acceleratorX VF2 (taken) > -- +- FPGA-1 slot2 acceleratorX VF3 (taken) > -- +- FPGA-1 slot2 acceleratorX VF4 (taken) > -- +- FPGA-1 slot2 acceleratorX VF5 > -- +- .. > -- +- FPGA-1 slot2 acceleratorX VF32 > -- +- FPGA-1 slot3 > -- FPGA-2 > -- +- FPGA-2 slot1 > > where FPGA-1 and FPGA-2 are hosts with FPGA on board. There is also > assumed, that new dynamic resources are created: id 1666 means 'FPGA' > (although it might be simply standard class, which will be hardcoded > ENUM), 1667 means 'FPGA slot' and 1668 'FPGA accelerator'. > > > INSERT INTO resource_providers VALUES > (1, '<UUID>', 'FPGA-1', 1, NULL), > (2, '<UUID>', 'FPGA-1 slot 1', 1, 1), > (3, '<UUID>', 'FPGA-1 slot 2', 1, 1), > (4, '<UUID>', 'FPGA-1 slot 3', 1, 1), > (5, '<UUID>', 'FPGA-1 slot 2 acceleratorX', 1, 3), > (6, '<UUID>', 'FPGA-2', 6, NULL), > (7, '<UUID>', 'FPGA-2 slot', 6, 6); > > > INSERT INTO inventories VALUES > (1, 1, 1666, 1, 0, 1, 1, 1, 1.0), > (2, 2, 1667, 1, 0, 1, 1, 1, 1.0), > (3, 3, 1667, 1, 0, 1, 1, 1, 1.0), > (4, 4, 1667, 1, 0, 1, 1, 1, 1.0), > (5, 5, 1668, 32, 0, 1, 32, 1, 1.0), > (6, 6, 1666, 1, 0, 1, 1, 1, 1.0), > (7, 7, 1667, 1, 0, 1, 1, 1, 1.0); > > INSERT INTO allocations VALUES > (1, 5, '<UUID>', 1668, 4), > (2, 2, '<UUID>', 1667, 1); > > > To get id of resource of type acceleratorX to allocate 8 VF: > > > SELECT rp.id > FROM resource_providers rp > LEFT JOIN allocations al ON al.resource_provider_id = rp.id > LEFT JOIN inventories iv ON iv.resource_provider_id = rp.id > WHERE al.resource_class_id = 1668 > AND (iv.total - COALESCE(al.used, 0)) >= 8; > > > Note, that I don't have to calculate number of total available VFs in > this case, although it might happen, that user might schedule VM which > requests number of VFs that exceed available VFs in single accelerator, > than such calculation will be needed. > > Getting more VFs than available will not return any records: > > > SELECT rp.id > FROM resource_providers rp > LEFT JOIN allocations al ON al.resource_provider_id = rp.id > LEFT JOIN inventories iv ON iv.resource_provider_id = rp.id > WHERE al.resource_class_id = 1668 > AND (iv.total - COALESCE(al.used, 0)) >= 29; > > > Nothing fancy here. More interesting cases would be for getting all > unallocated slots: > > > SELECT rp.id > FROM resource_providers rp > LEFT JOIN inventories iv on iv.resource_provider_id = rp.id > WHERE iv.resource_class_id = 1667 > AND rp.id not in ( > SELECT rp.parent_provider_id as id > FROM allocations al > LEFT JOIN inventories iv on al.resource_provider_id = > iv.resource_provider_id > LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id > WHERE al.resource_class_id = 1668 > UNION > SELECT iv.resource_provider_id as id > FROM allocations al > LEFT JOIN inventories iv on al.resource_provider_id = > iv.resource_provider_id > LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id > WHERE al.resource_class_id = 1667 > ); > > > Or get all unallocated whole FPGA: > > > SELECT rp.id > FROM resource_providers rp > LEFT JOIN inventories iv on rp.id = iv.resource_provider_id > WHERE iv.resource_class_id = 1666 > AND rp.id NOT in ( > SELECT rp.parent_provider_id > FROM resource_providers rp > LEFT JOIN inventories iv on iv.resource_provider_id = rp.id > WHERE iv.resource_class_id = 1667 > AND rp.id in ( > SELECT rp.parent_provider_id as id > FROM allocations al > LEFT JOIN inventories iv on al.resource_provider_id = > iv.resource_provider_id > LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id > WHERE al.resource_class_id = 1668 > UNION > SELECT iv.resource_provider_id as id > FROM allocations al > LEFT JOIN inventories iv on al.resource_provider_id = > iv.resource_provider_id > LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id > WHERE al.resource_class_id = 1667 > ) > ); > > > Those two queries are similar, in a fact, that if user request > slot/whole FPGA, we have to check if there is no accelerator (in use) > which might occupy slot in case of slot query, and the same check and > additional for slot usage for querying free FPGA. > > There is another topic, which I didn't thought out yet - means > potentially available resources - that means accelerator/IP which > might be requested during VM boot, but doesn't exist yet. In a case of > FPGA, it might be simply brought up by external entity (assumed > library or service) which will take care about burden for preparing > such IP accelerator/IP on free slot, and takes care about updating > information of allocations and dynamic resources. > > Thoughts? > > -- > Cheers, > Roman Dobosz > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev