Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

Balázs Gibizer Wed, 30 May 2018 04:08:07 -0700

On Tue, May 29, 2018 at 3:12 PM, Sylvain Bauza <sba...@redhat.com>wrote:

On Tue, May 29, 2018 at 2:21 PM, Balázs Gibizer<balazs.gibi...@ericsson.com> wrote:
On Tue, May 29, 2018 at 1:47 PM, Sylvain Bauza <sba...@redhat.com>wrote:
Le mar. 29 mai 2018 à 11:02, Balázs Gibizer<balazs.gibi...@ericsson.com> a écrit :
On Tue, May 29, 2018 at 9:38 AM, Sylvain Bauza <sba...@redhat.com>
wrote:
>
>
> On Tue, May 29, 2018 at 3:08 AM, TETSURO NAKAMURA
> <nakamura.tets...@lab.ntt.co.jp> wrote
>
>> > In that situation, say for example with VGPU inventories, that
>> would mean
>> > that the compute node would stop reporting inventories for its
>> root RP, but
>> > would rather report inventories for at least one single childRP.>> > In that model, do we reconcile the allocations that werealready
>> made
>> > against the "root RP" inventory ?
>>
>> It would be nice to see Eric and Jay comment on this,
>> but if I'm not mistaken, when the virt driver stops reporting
>> inventories for its root RP, placement would try to delete that
>> inventory inside and raise InventoryInUse exception if any
>> allocations still exist on that resource.
>>
>> ```
>> update_from_provider_tree() (nova/compute/resource_tracker.py)
>> + _set_inventory_for_provider()(nova/scheduler/client/report.py)>> + put() - PUT /resource_providers/<rp_uuid>/inventorieswith
>> new inventories (scheduler/client/report.py)
>>           + set_inventories() (placement/handler/inventory.py)
>>               + _set_inventory()
>> (placement/objects/resource_proveider.py)
>>                   + _delete_inventory_from_provider()
>> (placement/objects/resource_proveider.py)
>>                       -> raise exception.InventoryInUse
>> ```
>>
>> So we need some trick something like deleting VGPU allocations
>> before upgrading and set the allocation again for the creatednew
>> child after upgrading?
>>
>
> I wonder if we should keep the existing inventory in the rootRP, and> somehow just reserve the left resources (so Placement wouldn'tpass
> that root RP for queries, but would still have allocations). But
> then, where and how to do this ? By the resource tracker ?
>
AFAIK it is the virt driver that decides to model the VGU resourceat adifferent place in the RP tree so I think it is the responsibilityofthe same virt driver to move any existing allocation from the oldplace
to the new place during this change.

Cheers,
gibi
Why not instead not move the allocation but rather have the virtdriver updating the root RP by modifying the reserved value to thetotal size?
That way, the virt driver wouldn't need to ask for an allocationbut rather continue to provide inventories...
Thoughts?
Keeping the old allocaton at the old RP and adding a similar sizedreservation in the new RP feels hackis as those are not reallyreserved GPUs but used GPUs just from the old RP. If somebody sumsup the total reported GPUs in this setup via the placement API thenshe will get more GPUs in total that what is physically visible forthe hypervisor as the GPUs part of the old allocation reported twicein two different total value. Could we just report less GPUinventories to the new RP until the old RP has GPU allocations?
We could keep the old inventory in the root RP for the previous vGPUtype already supported in Queens and just add other inventories forother vGPU types now supported. That looks possibly the simpliestoption as the virt driver knows that.

That works for me. Can we somehow deprecate the previous, alreadysupported vGPU types to eventually get rid of the splitted inventory?

Some alternatives from my jetlagged brain:
a) Implement a move inventory/allocation API in placement. Given aresource class and a source RP uuid and a destination RP uuidplacement moves the inventory and allocations of that resource classfrom the source RP to the destination RP. Then the virt drive cancall this API to move the allocation. This has an impact on the fastforward upgrade as it needs running virt driver to do the allocationmove.
Instead of having the virt driver doing that (TBH, I don't like thatgiven both Xen and libvirt drivers have the same problem), we couldwrite a nova-manage upgrade call for that that would call thePlacement API, sure.

The nova-manage is another possible way similar to my idea #c) butthere I imagined the logic in placement-manage instead of nova-manage.

b) For this I assume that live migrating an instance having a GPUallocation on the old RP will allocate GPU for that instance fromthe new RP. In the virt driver do not report GPUs to the new RPwhile there is allocation for such GPUs in the old RP. Let thedeployer live migrate away the instances. When the virt driverdetects that there is no more GPU allocations on the old RP it candelete the inventory from the old RP and report it to the new RP.
For the moment, vGPUs don't support live migration, even within QEMU.I haven't checked that, but IIUC when you live-migrate an instancethat have vGPUs, it will just migrate it without recreating the vGPUs.

If there is no live migration support for vGPUs then this option can beignored.

Now, the problem is with the VGPU allocation, we should delete itthen. Maybe a new bug report ?


Sounds like a bug report to me :)

c) For this I assume that there is no support for live migration ofan instance having a GPU. If there is GPU allocation in the old RPthen virt driver does not report GPU inventory to the new RP justcreates the new nested RPs. Provide a placement-manage command to dothe inventory + allocation copy from the old RP to the new RP.
what's the difference with the first alternative ?

I think after you mentioned nova-manage for the first alternative thedifference became only doing it from nova-manage or fromplacement-manage. The placement-manage solution has the benefit ofbeing a pure DB operation, moving inventory and allocation between twoRPs while nova-manage would need to call a new placement API.

Anyway, looks like it's pretty simple to just keep the inventory forthe already existing vGPU type in the root RP, and just add nestedRPs for other vGPU types.Oh, and btw. we could possibly have the same problem when weimplement the NUMA spec that I need to reworkhttps://review.openstack.org/#/c/552924/

If we want to move the VCPU resources from the root to the nested NUMARP then yes, that feels like the same problem.


gibi


-Sylvain

Cheers,
gibi


> -Sylvain
>


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)

Unsubscribe:openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)

Unsubscribe:openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

Reply via email to