Excerpts from Chris Friesen's message of 2013-11-19 12:18:16 -0800:
> On 11/19/2013 01:51 PM, Clint Byrum wrote:
> > Excerpts from Chris Friesen's message of 2013-11-19 11:37:02 -0800:
> >> On 11/19/2013 12:35 PM, Clint Byrum wrote:
> >>
> >>> Each scheduler process can own a different set of resources. If they
> >>> each grab instance requests in a round-robin fashion, then they will
> >>> fill their resources up in a relatively well balanced way until one
> >>> scheduler's resources are exhausted. At that time it should bow out of
> >>> taking new instances. If it can't fit a request in, it should kick the
> >>> request out for retry on another scheduler.
> >>>
> >>> In this way, they only need to be in sync in that they need a way to
> >>> agree on who owns which resources. A distributed hash table that gets
> >>> refreshed whenever schedulers come and go would be fine for that.
> >>
> >> That has some potential, but at high occupancy you could end up refusing
> >> to schedule something because no one scheduler has sufficient resources
> >> even if the cluster as a whole does.
> >>
> >
> > I'm not sure what you mean here. What resource spans multiple compute
> > hosts?
> 
> Imagine the cluster is running close to full occupancy, each scheduler 
> has room for 40 more instances.  Now I come along and issue a single 
> request to boot 50 instances.  The cluster has room for that, but none 
> of the schedulers do.
> 

You're assuming that all 50 come in at once. That is only one use case
and not at all the most common.

> >> This gets worse once you start factoring in things like heat and
> >> instance groups that will want to schedule whole sets of resources
> >> (instances, IP addresses, network links, cinder volumes, etc.) at once
> >> with constraints on where they can be placed relative to each other.
> 
> > Actually that is rather simple. Such requests have to be serialized
> > into a work-flow. So if you say "give me 2 instances in 2 different
> > locations" then you allocate 1 instance, and then another one with
> > 'not_in_location(1)' as a condition.
> 
> Actually, you don't want to serialize it, you want to hand the whole set 
> of resource requests and constraints to the scheduler all at once.
> 
> If you do them one at a time, then early decisions made with 
> less-than-complete knowledge can result in later scheduling requests 
> failing due to being unable to meet constraints, even if there are 
> actually sufficient resources in the cluster.
> 
> The "VM ensembles" document at
> https://docs.google.com/document/d/1bAMtkaIFn4ZSMqqsXjs_riXofuRvApa--qo4UTwsmhw/edit?pli=1
>  
> has a good example of how one-at-a-time scheduling can cause spurious 
> failures.
> 
> And if you're handing the whole set of requests to a scheduler all at 
> once, then you want the scheduler to have access to as many resources as 
> possible so that it has the highest likelihood of being able to satisfy 
> the request given the constraints.

This use case is real and valid, which is why I think there is room for
multiple approaches. For instance the situation you describe can also be
dealt with by just having the cloud stay under-utilized and accepting
that when you get over a certain percentage utilized spurious failures
will happen. We have a similar solution in the ext3 filesystem on Linux.
Don't fill it up, or suffer a huge performance penalty.

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to