[openstack-dev] [nova][scheduler] A simple solution for better scheduler performance

2016-07-15 Thread Cheng, Yingxin
It is easy to understand that scheduling in nova-scheduler service consists of 
2 major phases:
A. Cache refresh, in code [1].
B. Filtering and weighing, in code [2].

Couple of previous experiments [3] [4] shows that “cache-refresh” is the major 
bottleneck of nova scheduler. For example, the 15th page of presentation [3] 
says the time cost of “cache-refresh” takes 98.5% of time of the entire 
`_schedule` function [6], when there are 200-1000 nodes and 50+ concurrent 
requests. The latest experiments [5] in China Mobile’s 1000-node environment 
also prove the same conclusion, and it’s even 99.7% when there’re 40+ 
concurrent requests.

Here’re some existing solutions for the “cache-refresh” bottleneck:
I. Caching scheduler.
II. Scheduler filters in DB [7].
III. Eventually consistent scheduler host state [8].

I can discuss their merits and drawbacks in a separate thread, but here I want 
to show a simplest solution based on my findings during the experiments [5]. I 
wrapped the expensive function [1] and tried to see the behavior of 
cache-refresh under pressure. It is very interesting to see a single 
cache-refresh only costs about 0.3 seconds. And when there’re concurrent 
cache-refresh operations, this cost can be suddenly increased to 8 seconds. 
I’ve seen it even reached 60 seconds for one cache-refresh under higher 
pressure. See the below section for details.

It raises a question in the current implementation: Do we really need a 
cache-refresh operation [1] for *every* requests? If those concurrent 
operations are replaced by one database query, the scheduler is still happy 
with the latest resource view from database. Scheduler is even happier because 
those expensive cache-refresh operations are minimized and much faster (0.3 
seconds). I believe it is the simplest optimization to scheduler performance, 
which doesn’t make any changes in filter scheduler. Minor improvements inside 
host manager is enough.

[1] 
https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L104
 
[2] 
https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L112-L123
[3] 
https://www.openstack.org/assets/presentation-media/7129-Dive-into-nova-scheduler-performance-summit.pdf
 
[4] http://lists.openstack.org/pipermail/openstack-dev/2016-June/098202.html 
[5] Please refer to Barcelona summit session ID 15334 later: “A tool to test 
and tune your OpenStack Cloud? Sharing our 1000 node China Mobile experience.”
[6] 
https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L53
[7] https://review.openstack.org/#/c/300178/
[8] https://review.openstack.org/#/c/306844/


** Here is the discovery from latest experiments [5] **
https://docs.google.com/document/d/1N_ZENg-jmFabyE0kLMBgIjBGXfL517QftX3DW7RVCzU/edit?usp=sharing
 

The figure 1 illustrates the concurrent cache-refresh operations in a nova 
scheduler service. There’re at most 23 requests waiting for the cache-refresh 
operations at time 43s.

The figure 2 illustrates the time cost of every requests in the same 
experiment. It shows that the cost is increased with the growth of concurrency. 
It proves the vicious circle that a request will wait longer for the database 
when there’re more waiting requests.

The figure 3/4 illustrate a worse case when the cache-refresh operation costs 
reach 60 seconds because of excessive cache-refresh operations.


-- 
Regards
Yingxin

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][scheduler] A simple solution for better scheduler performance

2016-07-15 Thread John Garbutt
On 15 July 2016 at 09:26, Cheng, Yingxin  wrote:
> It is easy to understand that scheduling in nova-scheduler service consists 
> of 2 major phases:
> A. Cache refresh, in code [1].
> B. Filtering and weighing, in code [2].
>
> Couple of previous experiments [3] [4] shows that “cache-refresh” is the 
> major bottleneck of nova scheduler. For example, the 15th page of 
> presentation [3] says the time cost of “cache-refresh” takes 98.5% of time of 
> the entire `_schedule` function [6], when there are 200-1000 nodes and 50+ 
> concurrent requests. The latest experiments [5] in China Mobile’s 1000-node 
> environment also prove the same conclusion, and it’s even 99.7% when there’re 
> 40+ concurrent requests.
>
> Here’re some existing solutions for the “cache-refresh” bottleneck:
> I. Caching scheduler.
> II. Scheduler filters in DB [7].
> III. Eventually consistent scheduler host state [8].
>
> I can discuss their merits and drawbacks in a separate thread, but here I 
> want to show a simplest solution based on my findings during the experiments 
> [5]. I wrapped the expensive function [1] and tried to see the behavior of 
> cache-refresh under pressure. It is very interesting to see a single 
> cache-refresh only costs about 0.3 seconds. And when there’re concurrent 
> cache-refresh operations, this cost can be suddenly increased to 8 seconds. 
> I’ve seen it even reached 60 seconds for one cache-refresh under higher 
> pressure. See the below section for details.

I am curious about what DB driver you are using?
Using PyMySQL should remove at lot of those issues.
This is the driver we use in the gate now, but it didn't used to be the default.

If you use the C based MySQL driver, you will find it locks the whole
process when making a DB call, then eventlet schedules the next DB
call, etc, etc, and then it loops back and allows the python code to
process the first db call, etc. In extreme cases you will find the
code processing the DB query considers some of the hosts to be down
since its so long since the DB call was returned.

Switching the driver should dramatically increase the performance of (II)

> It raises a question in the current implementation: Do we really need a 
> cache-refresh operation [1] for *every* requests? If those concurrent 
> operations are replaced by one database query, the scheduler is still happy 
> with the latest resource view from database. Scheduler is even happier 
> because those expensive cache-refresh operations are minimized and much 
> faster (0.3 seconds). I believe it is the simplest optimization to scheduler 
> performance, which doesn’t make any changes in filter scheduler. Minor 
> improvements inside host manager is enough.

So it depends on the usage patterns in your cloud.

The caching scheduler is one way to avoid the cache-refresh operation
on every request. It has an upper limit on throughput as you are
forced into having a single active nova-scheduler process.

But the caching means you can only have a single nova-scheduler
process, where as (II) allows you to have multiple nova-scheduler
workers to increase the concurrency.

> [1] 
> https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L104
> [2] 
> https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L112-L123
> [3] 
> https://www.openstack.org/assets/presentation-media/7129-Dive-into-nova-scheduler-performance-summit.pdf
> [4] http://lists.openstack.org/pipermail/openstack-dev/2016-June/098202.html
> [5] Please refer to Barcelona summit session ID 15334 later: “A tool to test 
> and tune your OpenStack Cloud? Sharing our 1000 node China Mobile experience.”
> [6] 
> https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L53
> [7] https://review.openstack.org/#/c/300178/
> [8] https://review.openstack.org/#/c/306844/
>
>
> ** Here is the discovery from latest experiments [5] **
> https://docs.google.com/document/d/1N_ZENg-jmFabyE0kLMBgIjBGXfL517QftX3DW7RVCzU/edit?usp=sharing
>
> The figure 1 illustrates the concurrent cache-refresh operations in a nova 
> scheduler service. There’re at most 23 requests waiting for the cache-refresh 
> operations at time 43s.
>
> The figure 2 illustrates the time cost of every requests in the same 
> experiment. It shows that the cost is increased with the growth of 
> concurrency. It proves the vicious circle that a request will wait longer for 
> the database when there’re more waiting requests.
>
> The figure 3/4 illustrate a worse case when the cache-refresh operation costs 
> reach 60 seconds because of excessive cache-refresh operations.

Sorry, its not clear to be if this was using I, II, or III? It seems
like its just using the default system?

This looks like the problems I have seen when you don't use PyMySQL
for your DB driver.

Thanks,
John

__
OpenStack Development Mailing List (not for usage questions

Re: [openstack-dev] [nova][scheduler] A simple solution for better scheduler performance

2016-07-15 Thread Cheng, Yingxin
Hi John,

Thanks for the reply.

There’re two rounds of experiments:
Experiment A [3] is deployed by devstack. There’re 1000 compute services with 
fake virt driver. The DB driver is the devstack default PyMySQL. And the 
scheduler driver is the default filter scheduler.
Experiment B [4] is the real production environment from China Mobile with 
about 600 active compute nodes. The DB driver is the default driver of 
SQLAlchemy, i.e. the C based python-mysql. The scheduler is also filter 
scheduler.

And in analysis 
https://docs.google.com/document/d/1N_ZENg-jmFabyE0kLMBgIjBGXfL517QftX3DW7RVCzU/edit?usp=sharing
 Figure 1/2 are from experiment B, figure 3/4 are from experiment A. So the 2 
kinds of DB APIs are all covered.

My point is simple: When the host manager is querying host states for request 
A, and another request B comes, the host manager won’t launch a second 
cache-refresh; Instead, it simply reuses the first one and returns the same 
result to both A and B. In this way, we can reduce the expensive cache-refresh 
queries to minimum while keeping scheduler host states fresh. It will become 
more effective when there’re more compute nodes and heavier request pressure.

I also have runnable code that can better explain my idea: 
https://github.com/cyx1231st/making-food 

-- 
Regards
Yingxin

On 7/15/16, 17:19, "John Garbutt"  wrote:

On 15 July 2016 at 09:26, Cheng, Yingxin  wrote:
> It is easy to understand that scheduling in nova-scheduler service 
consists of 2 major phases:
> A. Cache refresh, in code [1].
> B. Filtering and weighing, in code [2].
>
> Couple of previous experiments [3] [4] shows that “cache-refresh” is the 
major bottleneck of nova scheduler. For example, the 15th page of presentation 
[3] says the time cost of “cache-refresh” takes 98.5% of time of the entire 
`_schedule` function [6], when there are 200-1000 nodes and 50+ concurrent 
requests. The latest experiments [5] in China Mobile’s 1000-node environment 
also prove the same conclusion, and it’s even 99.7% when there’re 40+ 
concurrent requests.
>
> Here’re some existing solutions for the “cache-refresh” bottleneck:
> I. Caching scheduler.
> II. Scheduler filters in DB [7].
> III. Eventually consistent scheduler host state [8].
>
> I can discuss their merits and drawbacks in a separate thread, but here I 
want to show a simplest solution based on my findings during the experiments 
[5]. I wrapped the expensive function [1] and tried to see the behavior of 
cache-refresh under pressure. It is very interesting to see a single 
cache-refresh only costs about 0.3 seconds. And when there’re concurrent 
cache-refresh operations, this cost can be suddenly increased to 8 seconds. 
I’ve seen it even reached 60 seconds for one cache-refresh under higher 
pressure. See the below section for details.

I am curious about what DB driver you are using?
Using PyMySQL should remove at lot of those issues.
This is the driver we use in the gate now, but it didn't used to be the 
default.

If you use the C based MySQL driver, you will find it locks the whole
process when making a DB call, then eventlet schedules the next DB
call, etc, etc, and then it loops back and allows the python code to
process the first db call, etc. In extreme cases you will find the
code processing the DB query considers some of the hosts to be down
since its so long since the DB call was returned.

Switching the driver should dramatically increase the performance of (II)

> It raises a question in the current implementation: Do we really need a 
cache-refresh operation [1] for *every* requests? If those concurrent 
operations are replaced by one database query, the scheduler is still happy 
with the latest resource view from database. Scheduler is even happier because 
those expensive cache-refresh operations are minimized and much faster (0.3 
seconds). I believe it is the simplest optimization to scheduler performance, 
which doesn’t make any changes in filter scheduler. Minor improvements inside 
host manager is enough.

So it depends on the usage patterns in your cloud.

The caching scheduler is one way to avoid the cache-refresh operation
on every request. It has an upper limit on throughput as you are
forced into having a single active nova-scheduler process.

But the caching means you can only have a single nova-scheduler
process, where as (II) allows you to have multiple nova-scheduler
workers to increase the concurrency.

> [1] 
https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L104
> [2] 
https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L112-L123
> [3] 
https://www.openstack.org/assets/presentation-media/7129-Dive-into-nova-scheduler-performance-summit.pdf
> [4] 
http://lists.openstack.org/pipermail/openstack-dev/2016-June/