On 11/19/2013 01:51 PM, Clint Byrum wrote:
Excerpts from Chris Friesen's message of 2013-11-19 11:37:02 -0800:
On 11/19/2013 12:35 PM, Clint Byrum wrote:

Each scheduler process can own a different set of resources. If they
each grab instance requests in a round-robin fashion, then they will
fill their resources up in a relatively well balanced way until one
scheduler's resources are exhausted. At that time it should bow out of
taking new instances. If it can't fit a request in, it should kick the
request out for retry on another scheduler.

In this way, they only need to be in sync in that they need a way to
agree on who owns which resources. A distributed hash table that gets
refreshed whenever schedulers come and go would be fine for that.

That has some potential, but at high occupancy you could end up refusing
to schedule something because no one scheduler has sufficient resources
even if the cluster as a whole does.


I'm not sure what you mean here. What resource spans multiple compute
hosts?

Imagine the cluster is running close to full occupancy, each scheduler has room for 40 more instances. Now I come along and issue a single request to boot 50 instances. The cluster has room for that, but none of the schedulers do.

This gets worse once you start factoring in things like heat and
instance groups that will want to schedule whole sets of resources
(instances, IP addresses, network links, cinder volumes, etc.) at once
with constraints on where they can be placed relative to each other.

Actually that is rather simple. Such requests have to be serialized
into a work-flow. So if you say "give me 2 instances in 2 different
locations" then you allocate 1 instance, and then another one with
'not_in_location(1)' as a condition.

Actually, you don't want to serialize it, you want to hand the whole set of resource requests and constraints to the scheduler all at once.

If you do them one at a time, then early decisions made with less-than-complete knowledge can result in later scheduling requests failing due to being unable to meet constraints, even if there are actually sufficient resources in the cluster.

The "VM ensembles" document at
https://docs.google.com/document/d/1bAMtkaIFn4ZSMqqsXjs_riXofuRvApa--qo4UTwsmhw/edit?pli=1 has a good example of how one-at-a-time scheduling can cause spurious failures.

And if you're handing the whole set of requests to a scheduler all at once, then you want the scheduler to have access to as many resources as possible so that it has the highest likelihood of being able to satisfy the request given the constraints.

Chris

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to