Re: [openstack-dev] Scheduler proposal

Joshua Harlow Thu, 08 Oct 2015 10:41:22 -0700

Clint Byrum wrote:

Excerpts from Joshua Harlow's message of 2015-10-08 08:38:57 -0700:

Joshua Harlow wrote:

On Thu, 8 Oct 2015 10:43:01 -0400
Monty Taylor<mord...@inaugust.com>   wrote:

On 10/08/2015 09:01 AM, Thierry Carrez wrote:

Maish Saidel-Keesing wrote:

Operational overhead has a cost - maintaining 3 different database
tools, backing them up, providing HA, etc. has operational cost.

This is not to say that this cannot be overseen, but it should be
taken into consideration.

And *if* they can be consolidated into an agreed solution across
the whole of OpenStack - that would be highly beneficial (IMHO).

Agreed, and that ties into the similar discussion we recently had
about picking a common DLM. Ideally we'd only add *one* general
dependency and use it for locks / leader election / syncing status
around.

++

All of the proposed DLM tools can fill this space successfully. There
is definitely not a need for multiple.

On this point, and just thinking out loud. If we consider saving
compute_node information into say a node in said DLM backend (for
example a znode in zookeeper[1]); this information would be updated
periodically by that compute_node *itself* (it would say contain
information about what VMs are running on it, what there utilization is
and so-on).

For example the following layout could be used:

/nova/compute_nodes/<hypervisor-hostname>

<hypervisor-hostname>   data could be:

{
      vms: [],
      memory_free: XYZ,
      cpu_usage: ABC,
      memory_used: MNO,
      ...
}

Now if we imagine each/all schedulers having watches
on /nova/compute_nodes/ ([2] consul and etc.d have equivalent concepts
afaik) then when a compute_node updates that information a push
notification (the watch being triggered) will be sent to the
scheduler(s) and the scheduler(s) could then update a local in-memory
cache of the data about all the hypervisors that can be selected from
for scheduling. This avoids any reading of a large set of data in the
first place (besides an initial read-once on startup to read the
initial list + setup the watches); in a way its similar to push
notifications. Then when scheduling a VM ->   hypervisor there isn't any
need to query anything but the local in-memory representation that the
scheduler is maintaining (and updating as watches are triggered)...

So this is why I was wondering about what capabilities of cassandra are
being used here; because the above I think are unique capababilties of
DLM like systems (zookeeper, consul, etcd) that could be advantageous
here...

[1]
https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#sc_zkDataModel_znodes

[2]
https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkWatches

And here's a final super-awesomeness,

Use the same existence of that znode + information (perhaps using
ephemeral znodes or equivalent) to determine if a hypervisor is 'alive'
or 'dead', thus removing the need to do queries and periodic writes to
the nova database to determine if a hypervisors nova-compute service is
alive or dead (with reads via
https://github.com/openstack/nova/blob/master/nova/servicegroup/drivers/db.py#L33
and other similar code scattered in nova)...


^^ THIS is the kind of architectural thinking I'd like to see us do more
of.

This isn't "hey I have a better database" it is "I have a way to reduce
the most common operations to O(1) complexity".

Ed, for all of the promise of your experiment, I'd actually rather see
time spent on Josh's idea above. In fact, I might spend time on Josh's
idea above. :)


Go for it!

We (at yahoo) are also brainstorming this idea (or something like it),and as we hit more performance issues pushing the 1000+ hypervisors in asingle cluster (no cell/s) (one of our many cluster/s) we will startadjusting (and hopefully more blogging, upstreaming and all that) whatneeds to be fixed/tweaked/altered to continue to push these boundaries.


Collab. and all that is welcome to of course :)

P.S.

The DLM spec @ https://review.openstack.org/#/c/209661/ (rendered nicelyathttp://docs-draft.openstack.org/61/209661/29/check/gate-openstack-specs-docs/2ff62fa//doc/build/html/specs/chronicles-of-a-dlm.html)mentions 'Such a consensus being built will also influence the futurefunctionality and capabilities of OpenStack at large so we need to beespecially careful, thoughtful, and explicit here.'

This statement was really targeted at cases like this, when we (as acommunity) choose a DLM solution we affect the larger capabilities ofopenstack, not just for locking but for scheduling (and likely for otherfunctionality I can't even think of/predict...)


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Scheduler proposal

Reply via email to