Ian Wells wrote:
On 10 October 2015 at 23:47, Clint Byrum <cl...@fewbar.com
<mailto:cl...@fewbar.com>> wrote:

    >  Per before, my suggestion was that every scheduler tries to
    maintain a copy
    >  of the cloud's state in memory (in much the same way, per the previous
    >  example, as every router on the internet tries to make a route
    table out of
    >  what it learns from BGP).  They don't have to be perfect.  They
    don't have
    >  to be in sync.  As long as there's some variability in the
    decision making,
    >  they don't have to update when another scheduler schedules
    something (and
    >  you can make the compute node send an immediate update when a new
    VM is
    >  run, anyway).  They all stand a good chance of scheduling VMs well
    >  simultaneously.
    >

    I'm quite in favor of eventual consistency and retries. Even if we had
    a system of perfect updating of all state records everywhere, it would
    break sometimes and I'd still want to not trust any record of state as
    being correct for the entire distributed system. However, there is an
    efficiency win gained by staying _close_ to correct. It is actually a
    function of the expected entropy. The more concurrent schedulers, the
    more entropy there will be to deal with.


... and the fewer the servers in total, the larger the entropy as a
proportion of the whole system (if that's a thing, it's a long time
since I did physical chemistry).  But consider the use cases:

1. I have a small cloud, I run two schedulers for redundancy.  There's a
good possibility that, when the cloud is loaded, the schedulers make
poor decisions occasionally.  We'd have to consider how likely that was,
certainly.

2. I have a large cloud, and I run 20 schedulers for redundancy.
There's a good chance that a scheduler is out of date on its
information.  But there could be several hundred hosts willing to
satisfy a scheduling request, and even of the ones with incorrect
information a low chance that any of those are close to the threshold
where they won't run the VM in question, so good odds it will pick a
host that's happy to satsify the request.


    >  But to be fair, we're throwing made up numbers around at this
    point.  Maybe
    >  it's time to work out how to test this for scale in a harness -
    which is
    >  the bit of work we all really need to do this properly, or there's
    no proof
    >  we've actually helped - and leave people to code their ideas up?

    I'm working on adding meters for rates and amounts of messages and
    queries that the system does right now for performance purposes. Rally
    though, is the place where I'd go to ask "how fast can we schedule
    things
    right now?".


My only concern is that we're testing a real cloud at scale and I
haven't got any more firstborn to sell for hardware, so I wonder if we
can fake up a compute node in our test harness.

Does the openstack foundation have access to a scaling area that can be used by the community for this kind of experimental work? It seems like infra or others should be able make that possible? Maybe we could sacrifice a summit and instead of spending the money on that we (as a community) could spend the money on a really nice scale lab for the community ;)

--
Ian.

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to