Long-story short, sounds like we do have the same concerns here in Climate.
I'll be present at the Summit, any chance to do an unconference meeting
in between all parties ?
Thanks,
-Sylvain
Le 11/10/2013 08:25, Mike Spreitzer a écrit :
Regarding Alex's question of which component does holistic
infrastructure scheduling, I hesitate to simply answer "heat". Heat
is about orchestration, and infrastructure scheduling is another
matter. I have attempted to draw pictures to sort this out, see
https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9Uand
https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g.
In those you will see that I identify holistic infrastructure
scheduling as separate functionality from infrastructure orchestration
(the main job of today's heat engine) and also separate from software
orchestration concerns. However, I also see a close relationship
between holistic infrastructure scheduling and heat, as should be
evident in those pictures too.
Alex made a remark about the needed inputs, and I agree but would like
to expand a little on the topic. One thing any scheduler needs is
knowledge of the amount, structure, and capacity of the hosting
thingies (I wish I could say "resources", but that would be confusing)
onto which the workload is to be scheduled. Scheduling decisions are
made against available capacity. I think the most practical way to
determine available capacity is to separately track raw capacity and
current (plus already planned!) allocations from that capacity,
finally subtracting the latter from the former.
In Nova, for example, sensing raw capacity is handled by the various
nova-compute agents reporting that information. I think a holistic
infrastructure scheduler should get that information from the various
individual services (Nova, Cinder, etc) that it is concerned with
(presumably they have it anyway).
A holistic infrastructure scheduler can keep track of the allocations
it has planned (regardless of whether they have been executed yet).
However, there may also be allocations that did not originate in the
holistic infrastructure scheduler. The individual underlying services
should be able to report (to the holistic infrastructure scheduler,
even if lowly users are not so authorized) all the allocations
currently in effect. An accurate union of the current and planned
allocations is what we want to subtract from raw capacity to get
available capacity.
If there is a long delay between planning and executing an allocation,
there can be nasty surprises from competitors --- if there are any
competitors. Actually, there can be nasty surprises anyway. Any
scheduler should be prepared for nasty surprises, and react by some
sensible retrying. If nasty surprises are rare, we are pretty much
done. If nasty surprises due to the presence of competing managers
are common, we may be able to combat the problem by changing the long
delay to a short one --- by moving the allocation execution earlier
into a stage that is only about locking in allocations, leaving all
the other work involved in creating virtual resources to later
(perhaps Climate will be good for this). If the delay between
planning and executing an allocation is short and there are many nasty
surprises due to competing managers, then you have too much
competition between managers --- don't do that.
Debo wants a simpler nova-centric story. OK, how about the following.
This is for the first step in the roadmap, where scheduling decisions
are still made independently for each VM instance. For the
client/service interface, I think we can do this with a simple clean
two-phase interface when traditional software orchestration is in
play, a one-phase interface when slick new software orchestration is
used. Let me outline the two-phase flow. We extend the Nova API with
CRUD operations on VRTs (top-level groups). For example, the CREATE
operation takes a definition of a top-level group and all its nested
groups, definitions (excepting stuff like userdata) of all the
resources (only VM instances, for now) contained in those groups, all
the relationships among those groups/resources, and all the
applications of policy to those groups, resources, and relationships.
This is a rest-style interface; the CREATE operation takes a
definition of the thing (a top-level group and all that it contains)
being created; the UPDATE operation takes a revised definition of the
whole thing. Nova records the presented information; the familiar
stuff is stored essentially as it is today (but marked as being in
some new sort of tentative state), and the grouping, relationship, and
policy stuff is stored according to a model like the one Debo&Yathi
wrote. The CREATE operation returns a UUID for the newly created
top-level group. The invocation of the top-level group CRUD is a
single operation and it is the first of the two phases. In the second
phase of a CREATE flow, the client creates individual resources with
the same calls as are used today, except that each VM instance create
call is augmented with a pointer into the policy information. That
pointer consists of (1) the UUID of the relevant top-level group and
(2) the name used within that group to identify the resource now being
created. (Obviously we would need resources to be named uniquely
among all the things ultimately contained anywhere in the same
top-level group. That could be done, e.g., with path names and a
requirement only that siblings have distinct names. Or we could
simply require that names be unique without mandating any particular
structure. We could call them IDs rather than names.) The way Nova
handles a VM-create call can now be enhanced to reference and use the
policy information that is associated with the newly passed policy
pointer.
The UPDATE flow is similar: first UPDATE the top-level group, then
update individual resources.
For the definition of a top-level group and all that it contains we
need some language. I think the obvious answer is an extended version
of the HOT language. Which is why I have proposed such an extension.
It is not because I am confused about what the heat engine should do,
it is because I want something else (the policy-informed scheduler) to
have an input language with sufficient content. This is the role
played by "HOT+" in the first of my two pictures cited above. The
same sort of language is needed in the first step of the roadmap,
where it is only Nova that is policy-informed and scheduling is not
yet joint --- but at this early step of the roadmap the
resources+policy language is input to Nova rather than to a separate
holistic infrastructure scheduler.
Regards,
Mike
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev