Ian Wells wrote:
On 10 October 2015 at 23:47, Clint Byrum <cl...@fewbar.com
<mailto:cl...@fewbar.com>> wrote:
> Per before, my suggestion was that every scheduler tries to
maintain a copy
> of the cloud's state in memory (in much the same way, per the previous
> example, as every router on the internet tries to make a route
table out of
> what it learns from BGP). They don't have to be perfect. They
don't have
> to be in sync. As long as there's some variability in the
decision making,
> they don't have to update when another scheduler schedules
something (and
> you can make the compute node send an immediate update when a new
VM is
> run, anyway). They all stand a good chance of scheduling VMs well
> simultaneously.
>
I'm quite in favor of eventual consistency and retries. Even if we had
a system of perfect updating of all state records everywhere, it would
break sometimes and I'd still want to not trust any record of state as
being correct for the entire distributed system. However, there is an
efficiency win gained by staying _close_ to correct. It is actually a
function of the expected entropy. The more concurrent schedulers, the
more entropy there will be to deal with.
... and the fewer the servers in total, the larger the entropy as a
proportion of the whole system (if that's a thing, it's a long time
since I did physical chemistry). But consider the use cases:
1. I have a small cloud, I run two schedulers for redundancy. There's a
good possibility that, when the cloud is loaded, the schedulers make
poor decisions occasionally. We'd have to consider how likely that was,
certainly.
2. I have a large cloud, and I run 20 schedulers for redundancy.
There's a good chance that a scheduler is out of date on its
information. But there could be several hundred hosts willing to
satisfy a scheduling request, and even of the ones with incorrect
information a low chance that any of those are close to the threshold
where they won't run the VM in question, so good odds it will pick a
host that's happy to satsify the request.
> But to be fair, we're throwing made up numbers around at this
point. Maybe
> it's time to work out how to test this for scale in a harness -
which is
> the bit of work we all really need to do this properly, or there's
no proof
> we've actually helped - and leave people to code their ideas up?
I'm working on adding meters for rates and amounts of messages and
queries that the system does right now for performance purposes. Rally
though, is the place where I'd go to ask "how fast can we schedule
things
right now?".
My only concern is that we're testing a real cloud at scale and I
haven't got any more firstborn to sell for hardware, so I wonder if we
can fake up a compute node in our test harness.
Does the openstack foundation have access to a scaling area that can be
used by the community for this kind of experimental work? It seems like
infra or others should be able make that possible? Maybe we could
sacrifice a summit and instead of spending the money on that we (as a
community) could spend the money on a really nice scale lab for the
community ;)
--
Ian.
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev