Hi all, Some time ago Jay Pipes published etherpad[1] with ideas around modelling nested resources, taking NUMA as an example. I was also encouraged ;) to start this thread, on last Nova scheduler meeting.
I was read mentioned etherpad and what hits me was that described scenario with NUMA cells resembles the way how FPGA can be managed. In some extent. NUMA cell can be treated as a vessel for memory cells, and it is expressed as number of MB. So it is possible to extract the information from existing data and add another level of aggregation using only clever prepared SQL query. I think, that problem might be broader, than using existing, tweaked a bit model. If we take a look into resources, which FPGA may expose, than it can be couple of levels, and each of them can be treated as resource. It can identified 3 levels of FPGA resources, which can be nested one on the others: 1. Whole FPGA. If used discrete FPGA, than even today it might be pass through to the VM. 2. Region in FPGA. Some of the FPGA models can be divided into regions or slots. Also, for some model it is possible to (re)program such region individually - in this case there is a possibility to pass entire slot to the VM, so that it might be possible to reprogram such slot, and utilize the algorithm within the VM. 3. Accelerator in region/FPGA. If there is an accelerator programmed in the slot, it is possible, that such accelerator provides us with Virtual Functions (similar to the SR-IOV), than every available VF can be treated as a resource. 4. It might be also necessary to track every VF individually, although I didn't assumed it will be needed, nevertheless with nested resources it should be easy to handle it. Correlation between such resources are a bit different from NUMA - while in NUMA case there is a possibility to either schedule a VM with some memory specified, or request memory within NUMA cell, in FPGA if there is slot taken, or accelerator already programmed and used, there is no way to offer FPGA as a whole to the tenant, until all accelerators and slots are free. I've followed Jay idea about nested resources and having in mind blueprint[2] regarding dynamic resources I've prepared how it fit in. Tables are unchanged - it is a copy-paste from the etherpad[1]: CREATE TABLE resource_providers ( id INT NOT NULL AUTOINCREMENT PRIMARY KEY, uuid CHAR(36) NOT NULL, name VARCHAR(100) NULL, root_provider_id INT NULL, parent_provider_id INT NULL ); CREATE TABLE inventories ( id INT NOT NULL AUTOINCREMENT PRIMARY KEY, resource_provider_id INT NOT NULL, resource_class_id INT NOT NULL, total INT NOT NULL, reserved INT NOT NULL, min_unit INT NOT NULL, max_unit INT NOT NULL, step_size INT NOT NULL, allocation_ratio INT NOT NULL ); CREATE TABLE allocations ( id INT NOT NULL AUTOINCREMENT PRIMARY KEY, resource_provider_id INT NOT NULL, consumer_uuid CHAR(36) NOT NULL, resource_class_id INT NOT NULL, used INT NOT NULL ); Than lets fill the tables with data of following structure: -- FPGA-1 -- +- FPGA-1 slot1 (taken), resource_provider_id: -- +- FPGA-1 slot2 -- +- FPGA-1 slot2 acceleratorX -- +- FPGA-1 slot2 acceleratorX VF1 (taken) -- +- FPGA-1 slot2 acceleratorX VF2 (taken) -- +- FPGA-1 slot2 acceleratorX VF3 (taken) -- +- FPGA-1 slot2 acceleratorX VF4 (taken) -- +- FPGA-1 slot2 acceleratorX VF5 -- +- .. -- +- FPGA-1 slot2 acceleratorX VF32 -- +- FPGA-1 slot3 -- FPGA-2 -- +- FPGA-2 slot1 where FPGA-1 and FPGA-2 are hosts with FPGA on board. There is also assumed, that new dynamic resources are created: id 1666 means 'FPGA' (although it might be simply standard class, which will be hardcoded ENUM), 1667 means 'FPGA slot' and 1668 'FPGA accelerator'. INSERT INTO resource_providers VALUES (1, '<UUID>', 'FPGA-1', 1, NULL), (2, '<UUID>', 'FPGA-1 slot 1', 1, 1), (3, '<UUID>', 'FPGA-1 slot 2', 1, 1), (4, '<UUID>', 'FPGA-1 slot 3', 1, 1), (5, '<UUID>', 'FPGA-1 slot 2 acceleratorX', 1, 3), (6, '<UUID>', 'FPGA-2', 6, NULL), (7, '<UUID>', 'FPGA-2 slot', 6, 6); INSERT INTO inventories VALUES (1, 1, 1666, 1, 0, 1, 1, 1, 1.0), (2, 2, 1667, 1, 0, 1, 1, 1, 1.0), (3, 3, 1667, 1, 0, 1, 1, 1, 1.0), (4, 4, 1667, 1, 0, 1, 1, 1, 1.0), (5, 5, 1668, 32, 0, 1, 32, 1, 1.0), (6, 6, 1666, 1, 0, 1, 1, 1, 1.0), (7, 7, 1667, 1, 0, 1, 1, 1, 1.0); INSERT INTO allocations VALUES (1, 5, '<UUID>', 1668, 4), (2, 2, '<UUID>', 1667, 1); To get id of resource of type acceleratorX to allocate 8 VF: SELECT rp.id FROM resource_providers rp LEFT JOIN allocations al ON al.resource_provider_id = rp.id LEFT JOIN inventories iv ON iv.resource_provider_id = rp.id WHERE al.resource_class_id = 1668 AND (iv.total - COALESCE(al.used, 0)) >= 8; Note, that I don't have to calculate number of total available VFs in this case, although it might happen, that user might schedule VM which requests number of VFs that exceed available VFs in single accelerator, than such calculation will be needed. Getting more VFs than available will not return any records: SELECT rp.id FROM resource_providers rp LEFT JOIN allocations al ON al.resource_provider_id = rp.id LEFT JOIN inventories iv ON iv.resource_provider_id = rp.id WHERE al.resource_class_id = 1668 AND (iv.total - COALESCE(al.used, 0)) >= 29; Nothing fancy here. More interesting cases would be for getting all unallocated slots: SELECT rp.id FROM resource_providers rp LEFT JOIN inventories iv on iv.resource_provider_id = rp.id WHERE iv.resource_class_id = 1667 AND rp.id not in ( SELECT rp.parent_provider_id as id FROM allocations al LEFT JOIN inventories iv on al.resource_provider_id = iv.resource_provider_id LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id WHERE al.resource_class_id = 1668 UNION SELECT iv.resource_provider_id as id FROM allocations al LEFT JOIN inventories iv on al.resource_provider_id = iv.resource_provider_id LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id WHERE al.resource_class_id = 1667 ); Or get all unallocated whole FPGA: SELECT rp.id FROM resource_providers rp LEFT JOIN inventories iv on rp.id = iv.resource_provider_id WHERE iv.resource_class_id = 1666 AND rp.id NOT in ( SELECT rp.parent_provider_id FROM resource_providers rp LEFT JOIN inventories iv on iv.resource_provider_id = rp.id WHERE iv.resource_class_id = 1667 AND rp.id in ( SELECT rp.parent_provider_id as id FROM allocations al LEFT JOIN inventories iv on al.resource_provider_id = iv.resource_provider_id LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id WHERE al.resource_class_id = 1668 UNION SELECT iv.resource_provider_id as id FROM allocations al LEFT JOIN inventories iv on al.resource_provider_id = iv.resource_provider_id LEFT JOIN resource_providers rp on rp.id = iv.resource_provider_id WHERE al.resource_class_id = 1667 ) ); Those two queries are similar, in a fact, that if user request slot/whole FPGA, we have to check if there is no accelerator (in use) which might occupy slot in case of slot query, and the same check and additional for slot usage for querying free FPGA. There is another topic, which I didn't thought out yet - means potentially available resources - that means accelerator/IP which might be requested during VM boot, but doesn't exist yet. In a case of FPGA, it might be simply brought up by external entity (assumed library or service) which will take care about burden for preparing such IP accelerator/IP on free slot, and takes care about updating information of allocations and dynamic resources. Thoughts? -- Cheers, Roman Dobosz __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev