OK, let's take the holistic infrastructure scheduling out of Heat.  It 
really belongs at a lower level anyway.  Think of it as something you slap 
on top of Nova, Cinder, Neutron, etc. and everything that is going to use 
them goes first through the holistic scheduler, to give it a chance to 
make some joint decisions.  Zane has been worried about conflicting 
decisions being made, but if everything goes through the holistic 
infrastructure scheduling service then there does not need to be an issue 
with other parallel decision-making services (more on this below).  For a 
public cloud, think of this holistic infrastructure scheduling as part of 
the service that the cloud offers to the public; the public says what it 
wants, and the various levels of schedulers work on delivering it; the 
internals are not exposed to the public.  For example, a cloud user may 
say "spread my cluster across at least two racks, not too unevenly"; you 
do not want that public cloud customer to be in the business of knowing 
how many racks are in the cloud, knowing how much each one is currently 
being used, and picking which rack will contain which members of his 
cluster.  For a private cloud, the holistic infrastructure scheduler 
should have the same humility as the lower schedulers: offer enough 
visibility and control to the clients that they can make decisions if they 
want to (thus, nobody needs to "go around" the holistic infrastructure 
scheduler if they already know what they want).

You do not want to ask the holistic infrastructure scheduler to schedule 
resources one by one; you want to ask it to allocate a whole 
pattern/template/topology.  There is thus no need for infrastructure 
orchestration prior to holistic infrastructure scheduling.

Once the holistic infrastructure scheduler has done its job, there is a 
need for infrastructure orchestration.  What should we use for that?

OK, more on the business of conflicting decisions.  For the sake of 
scalability and modularity, the holistic infrastructure scheduler should 
delegate as much decision-making as it can to more specific services.  The 
job of the holistic infrastructure scheduler is to make joint decisions 
when there are strong interactions between services.  You can fudge this 
either way (have the holistic infrastructure scheduler make more or less 
decisions than ideal), but if you want the best then I think the principle 
I stated is what would guide.  So what if a delegated decision conflicts 
with a holistic decision?  Don't do that.  Divide the decision-making 
responsibilities into distinct domains, for example with the holistic 
scheduler making relatively big-picture decisions and individual resource 
services filling in the details.

That said, there can still be nasty surprises from lower layers.  Even if 
the design has carefully partitioned decision-making responsibilities, 
irregular things can still happen (e.g., authorized people can do 
something unexpected).  Even if nothing intentionally does anything 
irregular, there remains the possibility of bugs.  The holistic 
infrastructure scheduler should be prepared for nasty surprises, and 
getting information that is as authoritative as possible to begin with 
(promptness doesn't hurt either).

Then there is the question of the scalability of the holistic 
infrastructure scheduler.  One hard kernel of that is solving the 
optimization problem.  Nobody should expect the scheduler to find the 
truly optimal solution; this is an NP-hard problem.  However, there exist 
optimization algorithms that produce pretty good approximations in modest 
amounts of time.  Additionally: if the patterns are small relative to the 
size of the whole zone being scheduled then it should be possible to do 
concurrent decision-making with optimistic concurrency control (as Clint 
has mentioned).

You would not want one holistic infrastructure scheduler for a whole 
geographically distributed cloud.  You could use a hierarchical 
arrangement, with one top-level decision-maker dividing a pattern between 
availability zones (by which I mean the sort of large independent domains 
that are typically known by that term) and then a subsidiary scheduler for 
each availability zone.

Regards,
Mike
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to