subject:"Re\: \[openstack\-dev\] Scheduler proposal"

Re: [openstack-dev] Scheduler proposal

2015-10-16 Thread Julien Danjou

On Fri, Oct 16 2015, Joshua Harlow wrote:

> Another idea is to use numpy and start representing filters as linear
> equations, then use something like
> https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.solve.html#numpy.linalg.solve
> to solve linear equations given some data.
>
> Another idea, turn each filter into a constraint equation (which it sorta is
> anyway) and use a known fast constraint solver on that data...
>
> Lots of ideas here that can be possible, likely endless :)

Already pasted on Twitter, but just in case, Optaplanner:

  
http://community.redhat.com/blog/2014/11/smart-vm-scheduling-in-ovirt-clusters/

-- 
Julien Danjou
-- Free Software hacker
-- https://julien.danjou.info


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Scheduler proposal

2015-10-16 Thread Clint Byrum

Excerpts from Ed Leafe's message of 2015-10-15 11:56:24 -0700:
> Wow, I seem to have unleashed a bunch of pent-up frustration in the 
> community! It's great to see everyone coming forward with their ideas and 
> insights for improving the way Nova (and, by extension, all of OpenStack) can 
> potentially scale.
> 
> I do have a few comments on the discussion:
> 
> 1) This isn't a proposal to simply add some sort of DLM to Nova as a magic 
> cure-all. The concerns about Nova's ability to scale have to do a lot more 
> with the overall internal communication design.
> 

In this, we agree.

> 2) I really liked the comment about "made-up numbers". It's so true: we are 
> all impressed by such examples of speed that we sometimes forget whether 
> speeding up X will improve the overall process to any significant degree. The 
> purpose of my original email back in July, and the question I asked at the 
> Nova midcycle, is if we could get some numbers that would be a target to 
> shoot for with any of these experiments. Sure, I could come up with a test 
> that shows a zillion transactions per second, but if that doesn't result in a 
> cloud being able to schedule more efficiently, what's the point?
>

Speed is only 1 dimension. Efficiency and simplicity are two others that
I think are harder to quantify, but are also equally important in any
component of OpenStack.

> 3) I like the idea of something like ZooKeeper, but my concern is how to 
> efficiently query the data. If, for example, we had records for 100K compute 
> nodes, would it be possible to do the equivalent of "SELECT * FROM resources 
> WHERE resource_type = 'compute' AND free_ram_mb >= 2048 AND …" - well, you 
> get the idea. Are complex data queries possible in ZK? I haven't been able to 
> find that information anywhere.
>

You don't do complex queries, because you have all of the data in RAM,
in an efficient in-RAM format. Even if each record is 50KB, we can do
100,000 of them in 5GB. That's a drop in the bucket.

> 4) It is true that even in a very large deployment, it is possible to keep 
> all the relevant data needed for scheduling in memory. My concern is how to 
> efficiently search that data, much like in the ZK scenario.
> 

There are a bunch of ways to do this. My favorite is to have filter
plugins in the scheduler define what they need to index, and then
build a B-tree for each filter as each record arrives in the main data
structure. When scheduling requests come in, they simply walk through
each B-tree and turn that into a set. Then read each piece of the set
out of the main structure and sort based on whichever you want (less
full for load balancing, most full for efficient stacking).

> 5) Concerns about Cassandra running with OpenJDK instead of the Oracle JVM 
> are troubling. I sent an email about this to one of the people I know at 
> DataStax, but so far have not received a response. And while it would be 
> great to have people contribute to OpenJDK to make it compatible, keep in 
> mind that that would be an ongoing commitment, not just a one-time effort.
> 

There are a few avenues to success with Cassandra but I don't think any
of them pass very close to OpenStack's current neighborhood.

> 6) I remember discussions back in the Austin-Bexar time frame about what 
> Thierry referred to as 'flavor-based schedulers', and they were immediately 
> discounted as not sophisticated enough to handle the sort of complex 
> scheduling requests that were expected. I'd be interested in finding out from 
> the big cloud providers what percentage of their requests would fall into 
> this simple structure, and what percent are more complicated than that. 
> Having hosts listening to queues that they know they can satisfy removes the 
> raciness from the process, although it would require some additional handling 
> for the situation where no host accepts the request. Still, it has the 
> advantage of being dead simple. Unfortunately, this would probably require a 
> bigger architectural change than integrating Cassandra into the Scheduler 
> would.
> 

No host accepting the request means your cloud is, more or less, full. If
you have flavors that aren't proper factors of smaller flavors, this
will indeed happen even when it isn't 100% utilized. If you have other
constraints that you allow your users to specify, then you are letting
them dictate how your hardware is utilized, which I think is a foolhardy
business decision. This is no different than any other manufacturing batch
size problem: sometimes parts of your process are under utilized, and
you have to make choices about rejecting certain workloads if they will
end up costing you more than you're willing to pay for the happy customer.

Note that the "efficient stacking" model I talked about can't really
work in the queue-based approach. If you want to fill up the most full
hosts before filling more, you need some awareness of what host is most
full and the compute nodes can't really

Re: [openstack-dev] Scheduler proposal

2015-10-16 Thread Alec Hothan (ahothan)

On 10/15/15, 11:11 PM, "Clint Byrum"  wrote:

>Excerpts from Ed Leafe's message of 2015-10-15 11:56:24 -0700:
>> Wow, I seem to have unleashed a bunch of pent-up frustration in the 
>> community! It's great to see everyone coming forward with their ideas and 
>> insights for improving the way Nova (and, by extension, all of OpenStack) 
>> can potentially scale.
>> 
>> I do have a few comments on the discussion:
>> 
>> 1) This isn't a proposal to simply add some sort of DLM to Nova as a magic 
>> cure-all. The concerns about Nova's ability to scale have to do a lot more 
>> with the overall internal communication design.
>> 
>
>In this, we agree.
>
>> 2) I really liked the comment about "made-up numbers". It's so true: we are 
>> all impressed by such examples of speed that we sometimes forget whether 
>> speeding up X will improve the overall process to any significant degree. 
>> The purpose of my original email back in July, and the question I asked at 
>> the Nova midcycle, is if we could get some numbers that would be a target to 
>> shoot for with any of these experiments. Sure, I could come up with a test 
>> that shows a zillion transactions per second, but if that doesn't result in 
>> a cloud being able to schedule more efficiently, what's the point?
>>
>
>Speed is only 1 dimension. Efficiency and simplicity are two others that
>I think are harder to quantify, but are also equally important in any
>component of OpenStack.

Monty did suggest a goal with 100K nodes - which I think is moon expedition 
kind of goal given how far we are from it, but goal nevertheless ;-)
Openstack does not provide any number today outside of "massive scale" and this 
could be a problem for designers and implementors.
I think openstack is now mature enough to have to worry seriously about scale, 
and we have a very long way to go at that level.

I agree for the importance of simplicity and efficiency. But lets also add 
operational requirements such as ease of deployment and ease of 
troubleshooting. It is more difficult for Ops to deal with too many different 
technologies under the cover.
My concern is we may not have sufficient oversight (from the TC) for this kind 
of project to keep it within reasonable complexity for the given requirements, 
and this is hard to achieve when the requirements are very vague.
It looks like the main area where we might need faster nova scheduling would be 
those big deployments that use nova networking (thousands of nodes) - just 
because neutron deployments just may not have enough nodes to require such 
rate. And nobody seems to know what is the targeted rate (schedules per second 
here) and what is the exact problem to solve (by exact I mean what numbers do 
we have to say that the current nova scheduling is too slow or does not scale).

>
>> 3) I like the idea of something like ZooKeeper, but my concern is how to 
>> efficiently query the data. If, for example, we had records for 100K compute 
>> nodes, would it be possible to do the equivalent of "SELECT * FROM resources 
>> WHERE resource_type = 'compute' AND free_ram_mb >= 2048 AND …" - well, you 
>> get the idea. Are complex data queries possible in ZK? I haven't been able 
>> to find that information anywhere.
>>
>
>You don't do complex queries, because you have all of the data in RAM,
>in an efficient in-RAM format. Even if each record is 50KB, we can do
>100,000 of them in 5GB. That's a drop in the bucket.

Yes

>
>> 4) It is true that even in a very large deployment, it is possible to keep 
>> all the relevant data needed for scheduling in memory. My concern is how to 
>> efficiently search that data, much like in the ZK scenario.
>> 
>
>There are a bunch of ways to do this. My favorite is to have filter
>plugins in the scheduler define what they need to index, and then
>build a B-tree for each filter as each record arrives in the main data
>structure. When scheduling requests come in, they simply walk through
>each B-tree and turn that into a set. Then read each piece of the set
>out of the main structure and sort based on whichever you want (less
>full for load balancing, most full for efficient stacking).

There are clearly things you should be doing to scale properly. Python is not 
very speedy but can be made good enough at scale using the proper algorithm 
(such as the one you propose).
Furthermore, it can be made to run much faster - close to native speed - with 
proper design and the use of the proper libraries. So there is a lot that can 
be done to speed up things without having necessarily to increase complexity 
and scale out everything.

>
>> 5) Concerns about Cassandra running with OpenJDK instead of the Oracle JVM 
>> are troubling. I sent an email about this to one of the people I know at 
>> DataStax, but so far have not received a response. And while it would be 
>> great to have people contribute to OpenJDK to make it compatible, keep in 
>> mind that that would be an ongoing

Re: [openstack-dev] Scheduler proposal

2015-10-16 Thread Joshua Harlow

Clint Byrum wrote:

Excerpts from Ed Leafe's message of 2015-10-15 11:56:24 -0700:

Wow, I seem to have unleashed a bunch of pent-up frustration in the
community! It's great to see everyone coming forward with their
ideas and insights for improving the way Nova (and, by extension,
all of OpenStack) can potentially scale.

I do have a few comments on the discussion:

1) This isn't a proposal to simply add some sort of DLM to Nova as
a magic cure-all. The concerns about Nova's ability to scale have
to do a lot more with the overall internal communication design.

In this, we agree.

2) I really liked the comment about "made-up numbers". It's so
true: we are all impressed by such examples of speed that we
sometimes forget whether speeding up X will improve the overall
process to any significant degree. The purpose of my original email
back in July, and the question I asked at the Nova midcycle, is if
we could get some numbers that would be a target to shoot for with
any of these experiments. Sure, I could come up with a test that
shows a zillion transactions per second, but if that doesn't result
in a cloud being able to schedule more efficiently, what's the
point?

Speed is only 1 dimension. Efficiency and simplicity are two others
that I think are harder to quantify, but are also equally important
in any component of OpenStack.

3) I like the idea of something like ZooKeeper, but my concern is
how to efficiently query the data. If, for example, we had records
for 100K compute nodes, would it be possible to do the equivalent
of "SELECT * FROM resources WHERE resource_type = 'compute' AND
free_ram_mb>= 2048 AND …" - well, you get the idea. Are complex
data queries possible in ZK? I haven't been able to find that
information anywhere.

You don't do complex queries, because you have all of the data in
RAM, in an efficient in-RAM format. Even if each record is 50KB, we
can do 100,000 of them in 5GB. That's a drop in the bucket.

4) It is true that even in a very large deployment, it is possible
to keep all the relevant data needed for scheduling in memory. My
concern is how to efficiently search that data, much like in the ZK
scenario.

There are a bunch of ways to do this. My favorite is to have filter
plugins in the scheduler define what they need to index, and then
build a B-tree for each filter as each record arrives in the main
data structure. When scheduling requests come in, they simply walk
through each B-tree and turn that into a set. Then read each piece of
the set out of the main structure and sort based on whichever you
want (less full for load balancing, most full for efficient
stacking).

Another idea is to use numpy and start representing filters as linear
equations, then use something like
https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.solve.html#numpy.linalg.solve
to solve linear equations given some data.

Another idea, turn each filter into a constraint equation (which it
sorta is anyway) and use a known fast constraint solver on that data...

Lots of ideas here that can be possible, likely endless :)

5) Concerns about Cassandra running with OpenJDK instead of the
Oracle JVM are troubling. I sent an email about this to one of the
people I know at DataStax, but so far have not received a response.
And while it would be great to have people contribute to OpenJDK to
make it compatible, keep in mind that that would be an ongoing
commitment, not just a one-time effort.

There are a few avenues to success with Cassandra but I don't think
any of them pass very close to OpenStack's current neighborhood.

6) I remember discussions back in the Austin-Bexar time frame about
what Thierry referred to as 'flavor-based schedulers', and they
were immediately discounted as not sophisticated enough to handle
the sort of complex scheduling requests that were expected. I'd be
interested in finding out from the big cloud providers what
percentage of their requests would fall into this simple structure,
and what percent are more complicated than that. Having hosts
listening to queues that they know they can satisfy removes the
raciness from the process, although it would require some
additional handling for the situation where no host accepts the
request. Still, it has the advantage of being dead simple.
Unfortunately, this would probably require a bigger architectural
change than integrating Cassandra into the Scheduler would.

No host accepting the request means your cloud is, more or less,
full. If you have flavors that aren't proper factors of smaller
flavors, this will indeed happen even when it isn't 100% utilized. If
you have other constraints that you allow your users to specify, then
you are letting them dictate how your hardware is utilized, which I
think is a foolhardy business decision. This is no different than any
other manufacturing batch size problem: sometimes parts of your
process are under utilized, and you have to make choices about
rejecting

1 2 >

1 - 100 of 101 matches

Mail list logo