Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft

Sylvain Bauza Fri, 11 Oct 2013 05:24:03 -0700

Long-story short, sounds like we do have the same concerns here in Climate.

I'll be present at the Summit, any chance to do an unconference meetingin between all parties ?


Thanks,
-Sylvain

Le 11/10/2013 08:25, Mike Spreitzer a écrit :

Regarding Alex's question of which component does holisticinfrastructure scheduling, I hesitate to simply answer "heat". Heatis about orchestration, and infrastructure scheduling is anothermatter. I have attempted to draw pictures to sort this out, seehttps://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9Uandhttps://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g.In those you will see that I identify holistic infrastructurescheduling as separate functionality from infrastructure orchestration(the main job of today's heat engine) and also separate from softwareorchestration concerns. However, I also see a close relationshipbetween holistic infrastructure scheduling and heat, as should beevident in those pictures too.
Alex made a remark about the needed inputs, and I agree but would liketo expand a little on the topic. One thing any scheduler needs isknowledge of the amount, structure, and capacity of the hostingthingies (I wish I could say "resources", but that would be confusing)onto which the workload is to be scheduled. Scheduling decisions aremade against available capacity. I think the most practical way todetermine available capacity is to separately track raw capacity andcurrent (plus already planned!) allocations from that capacity,finally subtracting the latter from the former.
In Nova, for example, sensing raw capacity is handled by the variousnova-compute agents reporting that information. I think a holisticinfrastructure scheduler should get that information from the variousindividual services (Nova, Cinder, etc) that it is concerned with(presumably they have it anyway).
A holistic infrastructure scheduler can keep track of the allocationsit has planned (regardless of whether they have been executed yet).However, there may also be allocations that did not originate in theholistic infrastructure scheduler. The individual underlying servicesshould be able to report (to the holistic infrastructure scheduler,even if lowly users are not so authorized) all the allocationscurrently in effect. An accurate union of the current and plannedallocations is what we want to subtract from raw capacity to getavailable capacity.
If there is a long delay between planning and executing an allocation,there can be nasty surprises from competitors --- if there are anycompetitors. Actually, there can be nasty surprises anyway. Anyscheduler should be prepared for nasty surprises, and react by somesensible retrying. If nasty surprises are rare, we are pretty muchdone. If nasty surprises due to the presence of competing managersare common, we may be able to combat the problem by changing the longdelay to a short one --- by moving the allocation execution earlierinto a stage that is only about locking in allocations, leaving allthe other work involved in creating virtual resources to later(perhaps Climate will be good for this). If the delay betweenplanning and executing an allocation is short and there are many nastysurprises due to competing managers, then you have too muchcompetition between managers --- don't do that.
Debo wants a simpler nova-centric story. OK, how about the following.This is for the first step in the roadmap, where scheduling decisionsare still made independently for each VM instance. For theclient/service interface, I think we can do this with a simple cleantwo-phase interface when traditional software orchestration is inplay, a one-phase interface when slick new software orchestration isused. Let me outline the two-phase flow. We extend the Nova API withCRUD operations on VRTs (top-level groups). For example, the CREATEoperation takes a definition of a top-level group and all its nestedgroups, definitions (excepting stuff like userdata) of all theresources (only VM instances, for now) contained in those groups, allthe relationships among those groups/resources, and all theapplications of policy to those groups, resources, and relationships.This is a rest-style interface; the CREATE operation takes adefinition of the thing (a top-level group and all that it contains)being created; the UPDATE operation takes a revised definition of thewhole thing. Nova records the presented information; the familiarstuff is stored essentially as it is today (but marked as being insome new sort of tentative state), and the grouping, relationship, andpolicy stuff is stored according to a model like the one Debo&Yathiwrote. The CREATE operation returns a UUID for the newly createdtop-level group. The invocation of the top-level group CRUD is asingle operation and it is the first of the two phases. In the secondphase of a CREATE flow, the client creates individual resources withthe same calls as are used today, except that each VM instance createcall is augmented with a pointer into the policy information. Thatpointer consists of (1) the UUID of the relevant top-level group and(2) the name used within that group to identify the resource now beingcreated. (Obviously we would need resources to be named uniquelyamong all the things ultimately contained anywhere in the sametop-level group. That could be done, e.g., with path names and arequirement only that siblings have distinct names. Or we couldsimply require that names be unique without mandating any particularstructure. We could call them IDs rather than names.) The way Novahandles a VM-create call can now be enhanced to reference and use thepolicy information that is associated with the newly passed policypointer.
The UPDATE flow is similar: first UPDATE the top-level group, thenupdate individual resources.
For the definition of a top-level group and all that it contains weneed some language. I think the obvious answer is an extended versionof the HOT language. Which is why I have proposed such an extension.It is not because I am confused about what the heat engine should do,it is because I want something else (the policy-informed scheduler) tohave an input language with sufficient content. This is the roleplayed by "HOT+" in the first of my two pictures cited above. Thesame sort of language is needed in the first step of the roadmap,where it is only Nova that is policy-informed and scheduling is notyet joint --- but at this early step of the roadmap theresources+policy language is input to Nova rather than to a separateholistic infrastructure scheduler.
Regards,
Mike


_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft

Reply via email to