Re: [tip:sched/core] sched/fair: Correct unit of load_above_capacity

2016-05-30 Thread Yuyang Du
On Tue, May 24, 2016 at 04:24:01AM +0800, Yuyang Du wrote:
> On Thu, May 19, 2016 at 04:36:38PM +0100, Morten Rasmussen wrote:
> > 
> > And this is exactly you get with this patch :-) load_above_capacity
> > (through max_pull) is multiplied by the group capacity to compute that
> > actual amount of [load] to remove:
> > 
> > env->imbalance  = load_above_capacity * busiest->group_capacity /
> > SCHED_CAPACITY_SCALE
> > 
> > = 1*NICE_0_LOAD * 3*SCHED_CAPACITY_SCALE /
> > SCHED_CAPACITY_SCALE
> > 
> > = 3*NICE_0_LOAD
> > 
> > I don't think we disagree on how it should work :-) Without the capacity
> > scaling in this patch you get: 
> > 
> > env->imbalance = (6*SCHED_CAPACITY_SCALE - 3*SCHED_CAPACITY_SCALE) *
> > 3*SCHED_CAPACITY_SCALE / SCHED_CAPACITY_SCALE
> > 
> > = 9*SCHED_CAPACITY_SCALE
> > 
> > Coming back to Yuyang's question. I think it should be NICE_0_LOAD to
> > ensure that the resulting imbalance has the proper unit [load].
> 
> Sorry, I'm still confused. After this patch, the unit is indeed [load], the
> load same as the weight visible to the user. However, you literally compared
> it with sg_lb_stats's avg_load and load_per_task, which has the unit of
> load_avg, which is scaled_load_down(NICE_0_LOAD). Am I missing something?

Hello?


Re: [tip:sched/core] sched/fair: Correct unit of load_above_capacity

2016-05-23 Thread Yuyang Du
On Thu, May 19, 2016 at 04:36:38PM +0100, Morten Rasmussen wrote:
> 
> And this is exactly you get with this patch :-) load_above_capacity
> (through max_pull) is multiplied by the group capacity to compute that
> actual amount of [load] to remove:
> 
> env->imbalance= load_above_capacity * busiest->group_capacity /
>   SCHED_CAPACITY_SCALE
> 
>   = 1*NICE_0_LOAD * 3*SCHED_CAPACITY_SCALE /
>   SCHED_CAPACITY_SCALE
> 
>   = 3*NICE_0_LOAD
> 
> I don't think we disagree on how it should work :-) Without the capacity
> scaling in this patch you get: 
> 
> env->imbalance = (6*SCHED_CAPACITY_SCALE - 3*SCHED_CAPACITY_SCALE) *
>   3*SCHED_CAPACITY_SCALE / SCHED_CAPACITY_SCALE
> 
>   = 9*SCHED_CAPACITY_SCALE
> 
> Coming back to Yuyang's question. I think it should be NICE_0_LOAD to
> ensure that the resulting imbalance has the proper unit [load].

Sorry, I'm still confused. After this patch, the unit is indeed [load], the
load same as the weight visible to the user. However, you literally compared
it with sg_lb_stats's avg_load and load_per_task, which has the unit of
load_avg, which is scaled_load_down(NICE_0_LOAD). Am I missing something?


Re: [tip:sched/core] sched/fair: Correct unit of load_above_capacity

2016-05-20 Thread Vincent Guittot
Hi Morten,

On 19 May 2016 at 17:36, Morten Rasmussen  wrote:
> On Fri, May 13, 2016 at 10:22:19AM +0200, Vincent Guittot wrote:
>> On 12 May 2016 at 23:48, Yuyang Du  wrote:
>> > On Thu, May 12, 2016 at 03:31:51AM -0700, tip-bot for Morten Rasmussen 
>> > wrote:
>> >> Commit-ID:  cfa10334318d8212d007da8c771187643c9cef35
>> >> Gitweb: 
>> >> http://git.kernel.org/tip/cfa10334318d8212d007da8c771187643c9cef35
>> >> Author: Morten Rasmussen 
>> >> AuthorDate: Fri, 29 Apr 2016 20:32:40 +0100
>> >> Committer:  Ingo Molnar 
>> >> CommitDate: Thu, 12 May 2016 09:55:33 +0200
>> >>
>> >> sched/fair: Correct unit of load_above_capacity
>> >>
>> >> In calculate_imbalance() load_above_capacity currently has the unit
>> >> [capacity] while it is used as being [load/capacity]. Not only is it
>> >> wrong it also makes it unlikely that load_above_capacity is ever used
>> >> as the subsequent code picks the smaller of load_above_capacity and
>> >> the avg_load
>> >>
>> >> This patch ensures that load_above_capacity has the right unit
>> >> [load/capacity].
>>
>> load_above_capacity has the unit [load] and is then compared with other load
>
> I'm not sure if talk about the same code, I'm referring to:
>
> max_pull = min(busiest->avg_load - sds->avg_load, load_above_capacity);
>
> a bit further down. Here the first option has unit [load/capacity], and
> the subsequent code multiplies the result with [capacity] to get back to
> [load]:

My understand is that it multiplies and divides by capacity
so we select the minimum between
 max_pull * busiest->group_capacity/SCHED_CAPACITY_SCALE
and
(sds->avg_load - local->avg_load) * local->group_capacity  /
SCHED_CAPACITY_SCALE

so the unit of imbalance is the same as max_pull because we multiply
and divide by [capacity]
the imbalance's unit is [load] so max_pull also has a unit [load]
and max_pull has the same unit of load_above_capacity

>
> /* How much load to actually move to equalise the imbalance */
> env->imbalance = min(
> max_pull * busiest->group_capacity,
> (sds->avg_load - local->avg_load) * local->group_capacity
> ) / SCHED_CAPACITY_SCALE;
>
> That lead me to the conclusion that load_above_capacity would have to be
> 'per capacity', i.e. [load/capacity], as well for the code to make
> sense.
>
>>
>> >>
>> >> Signed-off-by: Morten Rasmussen 
>> >> [ Changed changelog to note it was in capacity unit; +rebase. ]
>> >> Signed-off-by: Peter Zijlstra (Intel) 
>> >> Cc: Dietmar Eggemann 
>> >> Cc: Linus Torvalds 
>> >> Cc: Mike Galbraith 
>> >> Cc: Peter Zijlstra 
>> >> Cc: Thomas Gleixner 
>> >> Cc: linux-kernel@vger.kernel.org
>> >> Link: 
>> >> http://lkml.kernel.org/r/1461958364-675-4-git-send-email-dietmar.eggem...@arm.com
>> >> Signed-off-by: Ingo Molnar 
>> >> ---
>> >>  kernel/sched/fair.c | 6 --
>> >>  1 file changed, 4 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> >> index 2338105..218f8e8 100644
>> >> --- a/kernel/sched/fair.c
>> >> +++ b/kernel/sched/fair.c
>> >> @@ -7067,9 +7067,11 @@ static inline void calculate_imbalance(struct 
>> >> lb_env *env, struct sd_lb_stats *s
>> >>   if (busiest->group_type == group_overloaded &&
>> >>   local->group_type   == group_overloaded) {
>> >>   load_above_capacity = busiest->sum_nr_running * 
>> >> SCHED_CAPACITY_SCALE;
>> >> - if (load_above_capacity > busiest->group_capacity)
>> >> + if (load_above_capacity > busiest->group_capacity) {
>> >>   load_above_capacity -= busiest->group_capacity;
>> >> - else
>> >> + load_above_capacity *= NICE_0_LOAD;
>> >> + load_above_capacity /= busiest->group_capacity;
>>
>> If I follow correctly the sequence,
>> load_above_capacity = (busiest->sum_nr_running * SCHED_CAPACITY_SCALE
>> - busiest->group_capacity) * NICE_0_LOAD / busiest->group_capacity
>> so
>> load_above_capacity = busiest->sum_nr_running * SCHED_CAPACITY_SCALE /
>> busiest->group_capacity * NICE_0_LOAD - NICE_0_LOAD
>>
>> so the unit is [load]
>
> I'm with you for the equation, but not for the unit and I find it hard
> convince myself what the unit really is :-(

the sole NICE_0_LOAD in the equation above tends to confirms that it's
only [load]

>
> I think it is down to the fact that we are comparing [load] directly
> with [capacity] here, basically saying [load] == [capacity]. Most other
> places we scale [load] by [capacity], we don't compare the two or
> otherwise imply that they are the same unit. Do we have any other
> examples of this?

IIUC the goal of this part of calculate_imbalance, we make the
assumption that one task with NICE_0_LOAD should fit in one core with
SCHED_CAPACITY_SCALE capacity and we want to ensure that we will not
make a CPU idle

>
> The name load_above_capacity suggests that it should have unit [load],
> but we actually compute it by subtracting to values, where the latter
> clearly has unit [capacit

Re: [tip:sched/core] sched/fair: Correct unit of load_above_capacity

2016-05-19 Thread Morten Rasmussen
On Fri, May 13, 2016 at 10:22:19AM +0200, Vincent Guittot wrote:
> On 12 May 2016 at 23:48, Yuyang Du  wrote:
> > On Thu, May 12, 2016 at 03:31:51AM -0700, tip-bot for Morten Rasmussen 
> > wrote:
> >> Commit-ID:  cfa10334318d8212d007da8c771187643c9cef35
> >> Gitweb: 
> >> http://git.kernel.org/tip/cfa10334318d8212d007da8c771187643c9cef35
> >> Author: Morten Rasmussen 
> >> AuthorDate: Fri, 29 Apr 2016 20:32:40 +0100
> >> Committer:  Ingo Molnar 
> >> CommitDate: Thu, 12 May 2016 09:55:33 +0200
> >>
> >> sched/fair: Correct unit of load_above_capacity
> >>
> >> In calculate_imbalance() load_above_capacity currently has the unit
> >> [capacity] while it is used as being [load/capacity]. Not only is it
> >> wrong it also makes it unlikely that load_above_capacity is ever used
> >> as the subsequent code picks the smaller of load_above_capacity and
> >> the avg_load
> >>
> >> This patch ensures that load_above_capacity has the right unit
> >> [load/capacity].
> 
> load_above_capacity has the unit [load] and is then compared with other load

I'm not sure if talk about the same code, I'm referring to:

max_pull = min(busiest->avg_load - sds->avg_load, load_above_capacity);

a bit further down. Here the first option has unit [load/capacity], and
the subsequent code multiplies the result with [capacity] to get back to
[load]:

/* How much load to actually move to equalise the imbalance */
env->imbalance = min(
max_pull * busiest->group_capacity,
(sds->avg_load - local->avg_load) * local->group_capacity
) / SCHED_CAPACITY_SCALE;

That lead me to the conclusion that load_above_capacity would have to be
'per capacity', i.e. [load/capacity], as well for the code to make
sense.

> 
> >>
> >> Signed-off-by: Morten Rasmussen 
> >> [ Changed changelog to note it was in capacity unit; +rebase. ]
> >> Signed-off-by: Peter Zijlstra (Intel) 
> >> Cc: Dietmar Eggemann 
> >> Cc: Linus Torvalds 
> >> Cc: Mike Galbraith 
> >> Cc: Peter Zijlstra 
> >> Cc: Thomas Gleixner 
> >> Cc: linux-kernel@vger.kernel.org
> >> Link: 
> >> http://lkml.kernel.org/r/1461958364-675-4-git-send-email-dietmar.eggem...@arm.com
> >> Signed-off-by: Ingo Molnar 
> >> ---
> >>  kernel/sched/fair.c | 6 --
> >>  1 file changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 2338105..218f8e8 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -7067,9 +7067,11 @@ static inline void calculate_imbalance(struct 
> >> lb_env *env, struct sd_lb_stats *s
> >>   if (busiest->group_type == group_overloaded &&
> >>   local->group_type   == group_overloaded) {
> >>   load_above_capacity = busiest->sum_nr_running * 
> >> SCHED_CAPACITY_SCALE;
> >> - if (load_above_capacity > busiest->group_capacity)
> >> + if (load_above_capacity > busiest->group_capacity) {
> >>   load_above_capacity -= busiest->group_capacity;
> >> - else
> >> + load_above_capacity *= NICE_0_LOAD;
> >> + load_above_capacity /= busiest->group_capacity;
> 
> If I follow correctly the sequence,
> load_above_capacity = (busiest->sum_nr_running * SCHED_CAPACITY_SCALE
> - busiest->group_capacity) * NICE_0_LOAD / busiest->group_capacity
> so
> load_above_capacity = busiest->sum_nr_running * SCHED_CAPACITY_SCALE /
> busiest->group_capacity * NICE_0_LOAD - NICE_0_LOAD
> 
> so the unit is [load]

I'm with you for the equation, but not for the unit and I find it hard
convince myself what the unit really is :-(

I think it is down to the fact that we are comparing [load] directly
with [capacity] here, basically saying [load] == [capacity]. Most other
places we scale [load] by [capacity], we don't compare the two or
otherwise imply that they are the same unit. Do we have any other
examples of this?

The name load_above_capacity suggests that it should have unit [load],
but we actually compute it by subtracting to values, where the latter
clearly has unit [capacity]. PeterZ's recent patch (1be0eb2a97
sched/fair: Clean up scale confusion) changes the former to be
[capacity].

load_above_capacity is later multiplied by [capacity] to determine the
imbalance which must have unit [load]. So working our way backwards,
load_above_capacity must have unit [load/capacity]. However, if [load] =
[capacity] then what we really have is just group-normalized [load].

As said in my other reply, the code only really makes sense for under
special circumstances where [load] == [capacity]. The easy, and possibly
best, way out of this mess would be to replace the code with something
like PeterZ's suggestion that I tried to implement in the patch included
in my other reply.

> Lets take the example of a group made of 2 cores with 2 threads per
> core and the capacity of each core is 1,5* SCHED_CAPACITY_SCALE so the
> capacity of the group is 3*SCHED_CAPACITY_SCALE.
> If we have 6 tasks on this group :
> lo

Re: [tip:sched/core] sched/fair: Correct unit of load_above_capacity

2016-05-13 Thread Vincent Guittot
On 12 May 2016 at 23:48, Yuyang Du  wrote:
> On Thu, May 12, 2016 at 03:31:51AM -0700, tip-bot for Morten Rasmussen wrote:
>> Commit-ID:  cfa10334318d8212d007da8c771187643c9cef35
>> Gitweb: 
>> http://git.kernel.org/tip/cfa10334318d8212d007da8c771187643c9cef35
>> Author: Morten Rasmussen 
>> AuthorDate: Fri, 29 Apr 2016 20:32:40 +0100
>> Committer:  Ingo Molnar 
>> CommitDate: Thu, 12 May 2016 09:55:33 +0200
>>
>> sched/fair: Correct unit of load_above_capacity
>>
>> In calculate_imbalance() load_above_capacity currently has the unit
>> [capacity] while it is used as being [load/capacity]. Not only is it
>> wrong it also makes it unlikely that load_above_capacity is ever used
>> as the subsequent code picks the smaller of load_above_capacity and
>> the avg_load
>>
>> This patch ensures that load_above_capacity has the right unit
>> [load/capacity].

load_above_capacity has the unit [load] and is then compared with other load

>>
>> Signed-off-by: Morten Rasmussen 
>> [ Changed changelog to note it was in capacity unit; +rebase. ]
>> Signed-off-by: Peter Zijlstra (Intel) 
>> Cc: Dietmar Eggemann 
>> Cc: Linus Torvalds 
>> Cc: Mike Galbraith 
>> Cc: Peter Zijlstra 
>> Cc: Thomas Gleixner 
>> Cc: linux-kernel@vger.kernel.org
>> Link: 
>> http://lkml.kernel.org/r/1461958364-675-4-git-send-email-dietmar.eggem...@arm.com
>> Signed-off-by: Ingo Molnar 
>> ---
>>  kernel/sched/fair.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 2338105..218f8e8 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -7067,9 +7067,11 @@ static inline void calculate_imbalance(struct lb_env 
>> *env, struct sd_lb_stats *s
>>   if (busiest->group_type == group_overloaded &&
>>   local->group_type   == group_overloaded) {
>>   load_above_capacity = busiest->sum_nr_running * 
>> SCHED_CAPACITY_SCALE;
>> - if (load_above_capacity > busiest->group_capacity)
>> + if (load_above_capacity > busiest->group_capacity) {
>>   load_above_capacity -= busiest->group_capacity;
>> - else
>> + load_above_capacity *= NICE_0_LOAD;
>> + load_above_capacity /= busiest->group_capacity;

If I follow correctly the sequence,
load_above_capacity = (busiest->sum_nr_running * SCHED_CAPACITY_SCALE
- busiest->group_capacity) * NICE_0_LOAD / busiest->group_capacity
so
load_above_capacity = busiest->sum_nr_running * SCHED_CAPACITY_SCALE /
busiest->group_capacity * NICE_0_LOAD - NICE_0_LOAD

so the unit is [load]

Lets take the example of a group made of 2 cores with 2 threads per
core and the capacity of each core is 1,5* SCHED_CAPACITY_SCALE so the
capacity of the group is 3*SCHED_CAPACITY_SCALE.
If we have 6 tasks on this group :
load_above capacity = 1 *NICE_0_LOAD which means we want to remove no
more than 1 tasks from the group and let 5 tasks in the group whereas
we don't expect to have more than 4 tasks as we have 4 CPUs and
probably less because the compute capacity of each CPU is less than
the default one

So i would have expected
load_above_capacity = busiest->sum_nr_running * NICE_0_LOAD -
busiest->group_capacity * NICE_0_LOAD / SCHED_CAPACITY_SCALE
load_above capacity = 3*NICE_0_LOAD which means we want to remove no
more than 3 tasks and let 3 tasks in the group

Regards,
Vincent


>> + } else
>>   load_above_capacity = ~0UL;
>>   }
>
> Hi Morten,
>
> I got the feeling this might be wrong, the NICE_0_LOAD should be scaled down.
> But I hope I am wrong.
>
> Vincent, could you take a look?


Re: [tip:sched/core] sched/fair: Correct unit of load_above_capacity

2016-05-12 Thread Yuyang Du
On Thu, May 12, 2016 at 03:31:51AM -0700, tip-bot for Morten Rasmussen wrote:
> Commit-ID:  cfa10334318d8212d007da8c771187643c9cef35
> Gitweb: http://git.kernel.org/tip/cfa10334318d8212d007da8c771187643c9cef35
> Author: Morten Rasmussen 
> AuthorDate: Fri, 29 Apr 2016 20:32:40 +0100
> Committer:  Ingo Molnar 
> CommitDate: Thu, 12 May 2016 09:55:33 +0200
> 
> sched/fair: Correct unit of load_above_capacity
> 
> In calculate_imbalance() load_above_capacity currently has the unit
> [capacity] while it is used as being [load/capacity]. Not only is it
> wrong it also makes it unlikely that load_above_capacity is ever used
> as the subsequent code picks the smaller of load_above_capacity and
> the avg_load
> 
> This patch ensures that load_above_capacity has the right unit
> [load/capacity].
> 
> Signed-off-by: Morten Rasmussen 
> [ Changed changelog to note it was in capacity unit; +rebase. ]
> Signed-off-by: Peter Zijlstra (Intel) 
> Cc: Dietmar Eggemann 
> Cc: Linus Torvalds 
> Cc: Mike Galbraith 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Cc: linux-kernel@vger.kernel.org
> Link: 
> http://lkml.kernel.org/r/1461958364-675-4-git-send-email-dietmar.eggem...@arm.com
> Signed-off-by: Ingo Molnar 
> ---
>  kernel/sched/fair.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2338105..218f8e8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7067,9 +7067,11 @@ static inline void calculate_imbalance(struct lb_env 
> *env, struct sd_lb_stats *s
>   if (busiest->group_type == group_overloaded &&
>   local->group_type   == group_overloaded) {
>   load_above_capacity = busiest->sum_nr_running * 
> SCHED_CAPACITY_SCALE;
> - if (load_above_capacity > busiest->group_capacity)
> + if (load_above_capacity > busiest->group_capacity) {
>   load_above_capacity -= busiest->group_capacity;
> - else
> + load_above_capacity *= NICE_0_LOAD;
> + load_above_capacity /= busiest->group_capacity;
> + } else
>   load_above_capacity = ~0UL;
>   }
  
Hi Morten,

I got the feeling this might be wrong, the NICE_0_LOAD should be scaled down.
But I hope I am wrong.

Vincent, could you take a look?


[tip:sched/core] sched/fair: Correct unit of load_above_capacity

2016-05-12 Thread tip-bot for Morten Rasmussen
Commit-ID:  cfa10334318d8212d007da8c771187643c9cef35
Gitweb: http://git.kernel.org/tip/cfa10334318d8212d007da8c771187643c9cef35
Author: Morten Rasmussen 
AuthorDate: Fri, 29 Apr 2016 20:32:40 +0100
Committer:  Ingo Molnar 
CommitDate: Thu, 12 May 2016 09:55:33 +0200

sched/fair: Correct unit of load_above_capacity

In calculate_imbalance() load_above_capacity currently has the unit
[capacity] while it is used as being [load/capacity]. Not only is it
wrong it also makes it unlikely that load_above_capacity is ever used
as the subsequent code picks the smaller of load_above_capacity and
the avg_load

This patch ensures that load_above_capacity has the right unit
[load/capacity].

Signed-off-by: Morten Rasmussen 
[ Changed changelog to note it was in capacity unit; +rebase. ]
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Dietmar Eggemann 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1461958364-675-4-git-send-email-dietmar.eggem...@arm.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/fair.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2338105..218f8e8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7067,9 +7067,11 @@ static inline void calculate_imbalance(struct lb_env 
*env, struct sd_lb_stats *s
if (busiest->group_type == group_overloaded &&
local->group_type   == group_overloaded) {
load_above_capacity = busiest->sum_nr_running * 
SCHED_CAPACITY_SCALE;
-   if (load_above_capacity > busiest->group_capacity)
+   if (load_above_capacity > busiest->group_capacity) {
load_above_capacity -= busiest->group_capacity;
-   else
+   load_above_capacity *= NICE_0_LOAD;
+   load_above_capacity /= busiest->group_capacity;
+   } else
load_above_capacity = ~0UL;
}