Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-07 Thread Dietmar Eggemann
On 04/09/15 00:51, Steve Muckle wrote:
> Hi Morten, Dietmar,
> 
> On 08/14/2015 09:23 AM, Morten Rasmussen wrote:
> ...
>> + * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus the
>> + * recent utilization of currently non-runnable tasks on a CPU. It 
>> represents
>> + * the amount of utilization of a CPU in the range [0..capacity_orig] where
> 
> I see util_sum is scaled by SCHED_LOAD_SHIFT at the end of
> __update_load_avg(). If there is now an assumption that util_avg may be
> used directly as a capacity value, should it be changed to
> SCHED_CAPACITY_SHIFT? These are equal right now, not sure if they will
> always be or if they can be combined.

You're referring to the code line

2647   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;

in __update_load_avg()?

Here we actually scale by 'SCHED_LOAD_SCALE/LOAD_AVG_MAX' so both values are
load related.

LOAD (UTIL) and CAPACITY have the same SCALE and SHIFT values because
SCHED_LOAD_RESOLUTION is always defined to 0. scale_load() and
scale_load_down() are also NOPs so this area is probably 
worth a separate clean-up.
Beyond that, I'm not sure if the current functionality is
broken if we use different SCALE and SHIFT values for LOAD and CAPACITY?

> 
>> + * capacity_orig is the cpu_capacity available at * the highest frequency
> 
> spurious *
> 
> thanks,
> Steve
> 

Fixed.

Thanks, 

-- Dietmar

-- >8 --

From: Dietmar Eggemann 
Date: Fri, 14 Aug 2015 17:23:13 +0100
Subject: [PATCH] sched/fair: Get rid of scaling utilization by capacity_orig

Utilization is currently scaled by capacity_orig, but since we now have
frequency and cpu invariant cfs_rq.avg.util_avg, frequency and cpu scaling
now happens as part of the utilization tracking itself.
So cfs_rq.avg.util_avg should no longer be scaled in cpu_util().

Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Signed-off-by: Dietmar Eggemann 
Signed-off-by: Morten Rasmussen 
---
 kernel/sched/fair.c | 38 ++
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2074d45a67c2..a73ece2372f5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4824,33 +4824,39 @@ static int select_idle_sibling(struct task_struct *p, 
int target)
 done:
return target;
 }
+
 /*
  * cpu_util returns the amount of capacity of a CPU that is used by CFS
  * tasks. The unit of the return value must be the one of capacity so we can
  * compare the utilization with the capacity of the CPU that is available for
  * CFS task (ie cpu_capacity).
- * cfs.avg.util_avg is the sum of running time of runnable tasks on a
- * CPU. It represents the amount of utilization of a CPU in the range
- * [0..SCHED_LOAD_SCALE]. The utilization of a CPU can't be higher than the
- * full capacity of the CPU because it's about the running time on this CPU.
- * Nevertheless, cfs.avg.util_avg can be higher than SCHED_LOAD_SCALE
- * because of unfortunate rounding in util_avg or just
- * after migrating tasks until the average stabilizes with the new running
- * time. So we need to check that the utilization stays into the range
- * [0..cpu_capacity_orig] and cap if necessary.
- * Without capping the utilization, a group could be seen as overloaded (CPU0
- * utilization at 121% + CPU1 utilization at 80%) whereas CPU1 has 20% of
- * available capacity.
+ *
+ * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus the
+ * recent utilization of currently non-runnable tasks on a CPU. It represents
+ * the amount of utilization of a CPU in the range [0..capacity_orig] where
+ * capacity_orig is the cpu_capacity available at the highest frequency
+ * (arch_scale_freq_capacity()).
+ * The utilization of a CPU converges towards a sum equal to or less than the
+ * current capacity (capacity_curr <= capacity_orig) of the CPU because it is
+ * the running time on this CPU scaled by capacity_curr.
+ *
+ * Nevertheless, cfs_rq.avg.util_avg can be higher than capacity_curr or even
+ * higher than capacity_orig because of unfortunate rounding in
+ * cfs.avg.util_avg or just after migrating tasks and new task wakeups until
+ * the average stabilizes with the new running time. We need to check that the
+ * utilization stays within the range of [0..capacity_orig] and cap it if
+ * necessary. Without utilization capping, a group could be seen as overloaded
+ * (CPU0 utilization at 121% + CPU1 utilization at 80%) whereas CPU1 has 20% of
+ * available capacity. We allow utilization to overshoot capacity_curr (but not
+ * capacity_orig) as it useful for predicting the capacity required after task
+ * migrations (scheduler-driven DVFS).
  */
 static int cpu_util(int cpu)
 {
unsigned long util = cpu_rq(cpu)->cfs.avg.util_avg;
unsigned long capacity = capacity_orig_of(cpu);
 
-   if (util >= SCHED_LOAD_SCALE)
-   return capacity;
-
-   return (util * capacity) >> SCHED_LOAD_SHIFT;
+   return (util >=

Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-07 Thread Vincent Guittot
On 7 September 2015 at 17:37, Dietmar Eggemann  wrote:
> On 04/09/15 00:51, Steve Muckle wrote:
>> Hi Morten, Dietmar,
>>
>> On 08/14/2015 09:23 AM, Morten Rasmussen wrote:
>> ...
>>> + * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus 
>>> the
>>> + * recent utilization of currently non-runnable tasks on a CPU. It 
>>> represents
>>> + * the amount of utilization of a CPU in the range [0..capacity_orig] where
>>
>> I see util_sum is scaled by SCHED_LOAD_SHIFT at the end of
>> __update_load_avg(). If there is now an assumption that util_avg may be
>> used directly as a capacity value, should it be changed to
>> SCHED_CAPACITY_SHIFT? These are equal right now, not sure if they will
>> always be or if they can be combined.
>
> You're referring to the code line
>
> 2647   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
>
> in __update_load_avg()?
>
> Here we actually scale by 'SCHED_LOAD_SCALE/LOAD_AVG_MAX' so both values are
> load related.

I agree with Steve that there is an issue from a unit point of view

sa->util_sum and LOAD_AVG_MAX have the same unit so sa->util_avg is a
load because of << SCHED_LOAD_SHIFT)

Before this patch , the translation from load to capacity unit was
done in get_cpu_usage with "* capacity) >> SCHED_LOAD_SHIFT"

So you still have to change the unit from load to capacity with a "/
SCHED_LOAD_SCALE * SCHED_CAPACITY_SCALE" somewhere.

sa->util_avg = ((sa->util_sum << SCHED_LOAD_SHIFT) /SCHED_LOAD_SCALE *
SCHED_CAPACITY_SCALE / LOAD_AVG_MAX = (sa->util_sum <<
SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;


Regards,
Vincent


>
> LOAD (UTIL) and CAPACITY have the same SCALE and SHIFT values because
> SCHED_LOAD_RESOLUTION is always defined to 0. scale_load() and
> scale_load_down() are also NOPs so this area is probably
> worth a separate clean-up.
> Beyond that, I'm not sure if the current functionality is
> broken if we use different SCALE and SHIFT values for LOAD and CAPACITY?
>
>>
>>> + * capacity_orig is the cpu_capacity available at * the highest frequency
>>
>> spurious *
>>
>> thanks,
>> Steve
>>
>
> Fixed.
>
> Thanks,
>
> -- Dietmar
>
> -- >8 --
>
> From: Dietmar Eggemann 
> Date: Fri, 14 Aug 2015 17:23:13 +0100
> Subject: [PATCH] sched/fair: Get rid of scaling utilization by capacity_orig
>
> Utilization is currently scaled by capacity_orig, but since we now have
> frequency and cpu invariant cfs_rq.avg.util_avg, frequency and cpu scaling
> now happens as part of the utilization tracking itself.
> So cfs_rq.avg.util_avg should no longer be scaled in cpu_util().
>
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Signed-off-by: Dietmar Eggemann 
> Signed-off-by: Morten Rasmussen 
> ---
>  kernel/sched/fair.c | 38 ++
>  1 file changed, 22 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2074d45a67c2..a73ece2372f5 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4824,33 +4824,39 @@ static int select_idle_sibling(struct task_struct *p, 
> int target)
>  done:
> return target;
>  }
> +
>  /*
>   * cpu_util returns the amount of capacity of a CPU that is used by CFS
>   * tasks. The unit of the return value must be the one of capacity so we can
>   * compare the utilization with the capacity of the CPU that is available for
>   * CFS task (ie cpu_capacity).
> - * cfs.avg.util_avg is the sum of running time of runnable tasks on a
> - * CPU. It represents the amount of utilization of a CPU in the range
> - * [0..SCHED_LOAD_SCALE]. The utilization of a CPU can't be higher than the
> - * full capacity of the CPU because it's about the running time on this CPU.
> - * Nevertheless, cfs.avg.util_avg can be higher than SCHED_LOAD_SCALE
> - * because of unfortunate rounding in util_avg or just
> - * after migrating tasks until the average stabilizes with the new running
> - * time. So we need to check that the utilization stays into the range
> - * [0..cpu_capacity_orig] and cap if necessary.
> - * Without capping the utilization, a group could be seen as overloaded (CPU0
> - * utilization at 121% + CPU1 utilization at 80%) whereas CPU1 has 20% of
> - * available capacity.
> + *
> + * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus the
> + * recent utilization of currently non-runnable tasks on a CPU. It represents
> + * the amount of utilization of a CPU in the range [0..capacity_orig] where
> + * capacity_orig is the cpu_capacity available at the highest frequency
> + * (arch_scale_freq_capacity()).
> + * The utilization of a CPU converges towards a sum equal to or less than the
> + * current capacity (capacity_curr <= capacity_orig) of the CPU because it is
> + * the running time on this CPU scaled by capacity_curr.
> + *
> + * Nevertheless, cfs_rq.avg.util_avg can be higher than capacity_curr or even
> + * higher than capacity_orig because of unfortunate rounding in
> + * cfs.avg.util_avg or just after migrating tasks and new 

Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-07 Thread Dietmar Eggemann
On 07/09/15 17:21, Vincent Guittot wrote:
> On 7 September 2015 at 17:37, Dietmar Eggemann  
> wrote:
>> On 04/09/15 00:51, Steve Muckle wrote:
>>> Hi Morten, Dietmar,
>>>
>>> On 08/14/2015 09:23 AM, Morten Rasmussen wrote:
>>> ...
 + * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus 
 the
 + * recent utilization of currently non-runnable tasks on a CPU. It 
 represents
 + * the amount of utilization of a CPU in the range [0..capacity_orig] 
 where
>>>
>>> I see util_sum is scaled by SCHED_LOAD_SHIFT at the end of
>>> __update_load_avg(). If there is now an assumption that util_avg may be
>>> used directly as a capacity value, should it be changed to
>>> SCHED_CAPACITY_SHIFT? These are equal right now, not sure if they will
>>> always be or if they can be combined.
>>
>> You're referring to the code line
>>
>> 2647   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
>>
>> in __update_load_avg()?
>>
>> Here we actually scale by 'SCHED_LOAD_SCALE/LOAD_AVG_MAX' so both values are
>> load related.
> 
> I agree with Steve that there is an issue from a unit point of view
> 
> sa->util_sum and LOAD_AVG_MAX have the same unit so sa->util_avg is a
> load because of << SCHED_LOAD_SHIFT)
> 
> Before this patch , the translation from load to capacity unit was
> done in get_cpu_usage with "* capacity) >> SCHED_LOAD_SHIFT"
> 
> So you still have to change the unit from load to capacity with a "/
> SCHED_LOAD_SCALE * SCHED_CAPACITY_SCALE" somewhere.
> 
> sa->util_avg = ((sa->util_sum << SCHED_LOAD_SHIFT) /SCHED_LOAD_SCALE *
> SCHED_CAPACITY_SCALE / LOAD_AVG_MAX = (sa->util_sum <<
> SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;

I see the point but IMHO this will only be necessary if the 
SCHED_LOAD_RESOLUTION
stuff gets re-enabled again.

It's not really about utilization or capacity units but rather about using the 
same
SCALE/SHIFT values for both sides, right?

I always thought that scale_load_down() takes care of that.

So shouldn't:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3445d2fb38f4..b80f799aface 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2644,7 +2644,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
cfs_rq->runnable_load_avg =
div_u64(cfs_rq->runnable_load_sum, 
LOAD_AVG_MAX);
}
-   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / 
LOAD_AVG_MAX;
+   sa->util_avg = (sa->util_sum * 
scale_load_down(SCHED_LOAD_SCALE)) / LOAD_AVG_MAX;
}
 
return decayed;

fix that issue in case SCHED_LOAD_RESOLUTION != 0 ?

I would vote for removing this SCHED_LOAD_RESOLUTION thing completely so that 
we can
assume that load/util and capacity are always using 1024/10.

Cheers,

-- Dietmar

> 
> 
> Regards,
> Vincent
> 
> 
>>
>> LOAD (UTIL) and CAPACITY have the same SCALE and SHIFT values because
>> SCHED_LOAD_RESOLUTION is always defined to 0. scale_load() and
>> scale_load_down() are also NOPs so this area is probably
>> worth a separate clean-up.
>> Beyond that, I'm not sure if the current functionality is
>> broken if we use different SCALE and SHIFT values for LOAD and CAPACITY?
>>

[...]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-07 Thread Peter Zijlstra
On Mon, Sep 07, 2015 at 07:54:18PM +0100, Dietmar Eggemann wrote:
> I would vote for removing this SCHED_LOAD_RESOLUTION thing completely so that 
> we can
> assume that load/util and capacity are always using 1024/10.

Ha!, I just requested Google look into moving it to 20 again ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-08 Thread Vincent Guittot
On 7 September 2015 at 20:54, Dietmar Eggemann  wrote:
> On 07/09/15 17:21, Vincent Guittot wrote:
>> On 7 September 2015 at 17:37, Dietmar Eggemann  
>> wrote:
>>> On 04/09/15 00:51, Steve Muckle wrote:
 Hi Morten, Dietmar,

 On 08/14/2015 09:23 AM, Morten Rasmussen wrote:
 ...
> + * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus 
> the
> + * recent utilization of currently non-runnable tasks on a CPU. It 
> represents
> + * the amount of utilization of a CPU in the range [0..capacity_orig] 
> where

 I see util_sum is scaled by SCHED_LOAD_SHIFT at the end of
 __update_load_avg(). If there is now an assumption that util_avg may be
 used directly as a capacity value, should it be changed to
 SCHED_CAPACITY_SHIFT? These are equal right now, not sure if they will
 always be or if they can be combined.
>>>
>>> You're referring to the code line
>>>
>>> 2647   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
>>>
>>> in __update_load_avg()?
>>>
>>> Here we actually scale by 'SCHED_LOAD_SCALE/LOAD_AVG_MAX' so both values are
>>> load related.
>>
>> I agree with Steve that there is an issue from a unit point of view
>>
>> sa->util_sum and LOAD_AVG_MAX have the same unit so sa->util_avg is a
>> load because of << SCHED_LOAD_SHIFT)
>>
>> Before this patch , the translation from load to capacity unit was
>> done in get_cpu_usage with "* capacity) >> SCHED_LOAD_SHIFT"
>>
>> So you still have to change the unit from load to capacity with a "/
>> SCHED_LOAD_SCALE * SCHED_CAPACITY_SCALE" somewhere.
>>
>> sa->util_avg = ((sa->util_sum << SCHED_LOAD_SHIFT) /SCHED_LOAD_SCALE *
>> SCHED_CAPACITY_SCALE / LOAD_AVG_MAX = (sa->util_sum <<
>> SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
>
> I see the point but IMHO this will only be necessary if the 
> SCHED_LOAD_RESOLUTION
> stuff gets re-enabled again.
>
> It's not really about utilization or capacity units but rather about using 
> the same
> SCALE/SHIFT values for both sides, right?

It's both a unit and a SCALE/SHIFT problem, SCHED_LOAD_SHIFT and
SCHED_CAPACITY_SHIFT are defined separately so we must be sure to
scale the value in the right range. In the case of cpu_usage which
returns sa->util_avg , it's the capacity range not the load range.

>
> I always thought that scale_load_down() takes care of that.

AFAIU, scale_load_down is a way to increase the resolution  of the
load not to move from load to capacity

>
> So shouldn't:
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3445d2fb38f4..b80f799aface 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2644,7 +2644,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg 
> *sa,
> cfs_rq->runnable_load_avg =
> div_u64(cfs_rq->runnable_load_sum, 
> LOAD_AVG_MAX);
> }
> -   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / 
> LOAD_AVG_MAX;
> +   sa->util_avg = (sa->util_sum * 
> scale_load_down(SCHED_LOAD_SCALE)) / LOAD_AVG_MAX;
> }
>
> return decayed;
>
> fix that issue in case SCHED_LOAD_RESOLUTION != 0 ?


No, but
sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
will fix the unit issue.
I agree that i don't change the result because both SCHED_LOAD_SHIFT
and SCHED_CAPACITY_SHIFT are set to 10 but as mentioned above, they
are set separately so it can make the difference if someone change one
SHIFT value.

Regards,
Vincent

>
> I would vote for removing this SCHED_LOAD_RESOLUTION thing completely so that 
> we can
> assume that load/util and capacity are always using 1024/10.
>
> Cheers,
>
> -- Dietmar
>
>>
>>
>> Regards,
>> Vincent
>>
>>
>>>
>>> LOAD (UTIL) and CAPACITY have the same SCALE and SHIFT values because
>>> SCHED_LOAD_RESOLUTION is always defined to 0. scale_load() and
>>> scale_load_down() are also NOPs so this area is probably
>>> worth a separate clean-up.
>>> Beyond that, I'm not sure if the current functionality is
>>> broken if we use different SCALE and SHIFT values for LOAD and CAPACITY?
>>>
>
> [...]
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-08 Thread Peter Zijlstra
On Mon, Sep 07, 2015 at 07:54:18PM +0100, Dietmar Eggemann wrote:
> I see the point but IMHO this will only be necessary if the 
> SCHED_LOAD_RESOLUTION
> stuff gets re-enabled again.

Paul, Ben, gentle reminder to look at re-enabling this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-08 Thread Peter Zijlstra
On Tue, Sep 08, 2015 at 09:22:05AM +0200, Vincent Guittot wrote:
> No, but
> sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
> will fix the unit issue.

Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.

And where load_sum already gets a factor 1024 from the weight
multiplication, util_sum does not get such a factor, and all the scaling
we do on it loose bits.

So at the moment we go compute the util_avg value, we need to inflate
util_sum with an extra factor 1024 in order to make it work.

And seeing that we do the shift up on sa->util_sum without consideration
of overflow, would it not make sense to add that factor before the
scaling and into the addition?

Now, given all that, units are a complete mess here, and I'd not mind
something like:

#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
#error "something usefull"
#endif

somewhere near here.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-08 Thread Dietmar Eggemann
On 07/09/15 20:47, Peter Zijlstra wrote:
> On Mon, Sep 07, 2015 at 07:54:18PM +0100, Dietmar Eggemann wrote:
>> I would vote for removing this SCHED_LOAD_RESOLUTION thing completely so 
>> that we can
>> assume that load/util and capacity are always using 1024/10.
> 
> Ha!, I just requested Google look into moving it to 20 again ;-)
> 

In this case Steve and Vincent have a point here.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-08 Thread Dietmar Eggemann
On 08/09/15 08:22, Vincent Guittot wrote:
> On 7 September 2015 at 20:54, Dietmar Eggemann  
> wrote:
>> On 07/09/15 17:21, Vincent Guittot wrote:
>>> On 7 September 2015 at 17:37, Dietmar Eggemann  
>>> wrote:
 On 04/09/15 00:51, Steve Muckle wrote:
> Hi Morten, Dietmar,
>
> On 08/14/2015 09:23 AM, Morten Rasmussen wrote:
> ...
>> + * cfs_rq.avg.util_avg is the sum of running time of runnable tasks 
>> plus the
>> + * recent utilization of currently non-runnable tasks on a CPU. It 
>> represents
>> + * the amount of utilization of a CPU in the range [0..capacity_orig] 
>> where
>
> I see util_sum is scaled by SCHED_LOAD_SHIFT at the end of
> __update_load_avg(). If there is now an assumption that util_avg may be
> used directly as a capacity value, should it be changed to
> SCHED_CAPACITY_SHIFT? These are equal right now, not sure if they will
> always be or if they can be combined.

 You're referring to the code line

 2647   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;

 in __update_load_avg()?

 Here we actually scale by 'SCHED_LOAD_SCALE/LOAD_AVG_MAX' so both values 
 are
 load related.
>>>
>>> I agree with Steve that there is an issue from a unit point of view
>>>
>>> sa->util_sum and LOAD_AVG_MAX have the same unit so sa->util_avg is a
>>> load because of << SCHED_LOAD_SHIFT)
>>>
>>> Before this patch , the translation from load to capacity unit was
>>> done in get_cpu_usage with "* capacity) >> SCHED_LOAD_SHIFT"
>>>
>>> So you still have to change the unit from load to capacity with a "/
>>> SCHED_LOAD_SCALE * SCHED_CAPACITY_SCALE" somewhere.
>>>
>>> sa->util_avg = ((sa->util_sum << SCHED_LOAD_SHIFT) /SCHED_LOAD_SCALE *
>>> SCHED_CAPACITY_SCALE / LOAD_AVG_MAX = (sa->util_sum <<
>>> SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
>>
>> I see the point but IMHO this will only be necessary if the 
>> SCHED_LOAD_RESOLUTION
>> stuff gets re-enabled again.
>>
>> It's not really about utilization or capacity units but rather about using 
>> the same
>> SCALE/SHIFT values for both sides, right?
> 
> It's both a unit and a SCALE/SHIFT problem, SCHED_LOAD_SHIFT and
> SCHED_CAPACITY_SHIFT are defined separately so we must be sure to
> scale the value in the right range. In the case of cpu_usage which
> returns sa->util_avg , it's the capacity range not the load range.

Still don't understand why it's a unit problem. IMHO LOAD/UTIL and
CAPACITY have no unit.

I agree that with the current patch-set we have a SHIFT/SCALE problem
once SCHED_LOAD_RESOLUTION is set to != 0.

> 
>>
>> I always thought that scale_load_down() takes care of that.
> 
> AFAIU, scale_load_down is a way to increase the resolution  of the
> load not to move from load to capacity

IMHO, increasing the resolution of the load is done by re-enabling this
define SCHED_LOAD_RESOLUTION  10 thing (or by setting
SCHED_LOAD_RESOLUTION to something else than 0).

I tried to figure out why we have this issue when comparing UTIL w/
CAPACITY and not LOAD w/ CAPACITY:

Both are initialized like that:

 sa->load_avg = scale_load_down(se->load.weight);
 sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
 sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
 sa->util_sum = LOAD_AVG_MAX;

and we use 'se->on_rq * scale_load_down(se->load.weight)' as 'unsigned
long weight' argument to call __update_load_avg() making sure the
scaling differences between LOAD and CAPACITY are respected while
updating sa->load_sum (and sa->load_avg).

OTAH, we don't apply a scale_load_down for sa->util_[sum/avg] only a '<<
SCHED_LOAD_SHIFT) / LOAD_AVG_MAX' on sa->util_avg.
So changing '<< SCHED_LOAD_SHIFT' to '*
scale_load_down(SCHED_LOAD_SCALE)' would be the logical thing to do.

I agree that '<< SCHED_CAPACITY_SHIFT' would have the same effect but
why using a CAPACITY related thing on the LOAD/UTIL side? The only
reason would be the unit problem which I don't understand.

> 
>>
>> So shouldn't:
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 3445d2fb38f4..b80f799aface 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2644,7 +2644,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg 
>> *sa,
>> cfs_rq->runnable_load_avg =
>> div_u64(cfs_rq->runnable_load_sum, 
>> LOAD_AVG_MAX);
>> }
>> -   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / 
>> LOAD_AVG_MAX;
>> +   sa->util_avg = (sa->util_sum * 
>> scale_load_down(SCHED_LOAD_SCALE)) / LOAD_AVG_MAX;
>> }
>>
>> return decayed;
>>
>> fix that issue in case SCHED_LOAD_RESOLUTION != 0 ?
> 
> 
> No, but
> sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
> will fix the unit issue.
> I agree that i don't change the result because both SCHED_LOAD_SHIFT
> and SCHED_CAPACITY_SHIFT are set to 10 but as mentioned above, they
> are set separately so it can

Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-08 Thread Peter Zijlstra
On Tue, Sep 08, 2015 at 02:26:06PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 08, 2015 at 09:22:05AM +0200, Vincent Guittot wrote:
> > No, but
> > sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
> > will fix the unit issue.
> 
> Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
> 
> And where load_sum already gets a factor 1024 from the weight
> multiplication, util_sum does not get such a factor, and all the scaling
> we do on it loose bits.
> 
> So at the moment we go compute the util_avg value, we need to inflate
> util_sum with an extra factor 1024 in order to make it work.
> 
> And seeing that we do the shift up on sa->util_sum without consideration
> of overflow, would it not make sense to add that factor before the
> scaling and into the addition?
> 
> Now, given all that, units are a complete mess here, and I'd not mind
> something like:
> 
> #if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
> #error "something usefull"
> #endif
> 
> somewhere near here.

Something like teh below..

Another thing to ponder; the downside of scaled_delta_w is that its
fairly likely delta is small and you loose all bits, whereas the weight
is likely to be large can could loose a fwe bits without issue.

That is, in fixed point scaling like this, you want to start with the
biggest numbers, not the smallest, otherwise you loose too much.

The flip side is of course that now you can share a multiplcation.

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -682,7 +682,7 @@ void init_entity_runnable_average(struct
sa->load_avg = scale_load_down(se->load.weight);
sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
-   sa->util_sum = LOAD_AVG_MAX;
+   sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
/* when this task enqueue'ed, it will contribute to its cfs_rq's 
load_avg */
 }
 
@@ -2515,6 +2515,10 @@ static u32 __compute_runnable_contrib(u6
return contrib + runnable_avg_yN_sum[n];
 }
 
+#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || SCHED_CAPACITY_SHIFT 
!= 10
+#error "load tracking assumes 2^10 as unit"
+#endif
+
 #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
 
 /*
@@ -2599,7 +2603,7 @@ __update_load_avg(u64 now, int cpu, stru
}
}
if (running)
-   sa->util_sum += cap_scale(scaled_delta_w, scale_cpu);
+   sa->util_sum += scaled_delta_w * scale_cpu;
 
delta -= delta_w;
 
@@ -2623,7 +2627,7 @@ __update_load_avg(u64 now, int cpu, stru
cfs_rq->runnable_load_sum += weight * contrib;
}
if (running)
-   sa->util_sum += cap_scale(contrib, scale_cpu);
+   sa->util_sum += contrib * scale_cpu;
}
 
/* Remainder of delta accrued against u_0` */
@@ -2634,7 +2638,7 @@ __update_load_avg(u64 now, int cpu, stru
cfs_rq->runnable_load_sum += weight * scaled_delta;
}
if (running)
-   sa->util_sum += cap_scale(scaled_delta, scale_cpu);
+   sa->util_sum += scaled_delta * scale_cpu;
 
sa->period_contrib += delta;
 
@@ -2644,7 +2648,7 @@ __update_load_avg(u64 now, int cpu, stru
cfs_rq->runnable_load_avg =
div_u64(cfs_rq->runnable_load_sum, 
LOAD_AVG_MAX);
}
-   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / 
LOAD_AVG_MAX;
+   sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
}
 
return decayed;
@@ -2686,8 +2690,7 @@ static inline int update_cfs_rq_load_avg
if (atomic_long_read(&cfs_rq->removed_util_avg)) {
long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
sa->util_avg = max_t(long, sa->util_avg - r, 0);
-   sa->util_sum = max_t(s32, sa->util_sum -
-   ((r * LOAD_AVG_MAX) >> SCHED_LOAD_SHIFT), 0);
+   sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);
}
 
decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-08 Thread Vincent Guittot
On 8 September 2015 at 14:26, Peter Zijlstra  wrote:
> On Tue, Sep 08, 2015 at 09:22:05AM +0200, Vincent Guittot wrote:
>> No, but
>> sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
>> will fix the unit issue.
>
> Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
>
> And where load_sum already gets a factor 1024 from the weight
> multiplication, util_sum does not get such a factor, and all the scaling
> we do on it loose bits.

fair point

>
> So at the moment we go compute the util_avg value, we need to inflate
> util_sum with an extra factor 1024 in order to make it work.
>
> And seeing that we do the shift up on sa->util_sum without consideration
> of overflow, would it not make sense to add that factor before the
> scaling and into the addition?

Yes this should save 1 left shift and 1 right shift

 >>

>
> Now, given all that, units are a complete mess here, and I'd not mind
> something like:
>
> #if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
> #error "something usefull"
> #endif

In this case why not simply doing
#define SCHED_CAPACITY_SHIFT SCHED_LOAD_SHIFT
or the opposite ?

>
> somewhere near here.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-08 Thread Vincent Guittot
On 8 September 2015 at 14:50, Dietmar Eggemann  wrote:
> On 08/09/15 08:22, Vincent Guittot wrote:
>> On 7 September 2015 at 20:54, Dietmar Eggemann  
>> wrote:
>>> On 07/09/15 17:21, Vincent Guittot wrote:
 On 7 September 2015 at 17:37, Dietmar Eggemann  
 wrote:
> On 04/09/15 00:51, Steve Muckle wrote:
>> Hi Morten, Dietmar,
>>
>> On 08/14/2015 09:23 AM, Morten Rasmussen wrote:
>> ...
>>> + * cfs_rq.avg.util_avg is the sum of running time of runnable tasks 
>>> plus the
>>> + * recent utilization of currently non-runnable tasks on a CPU. It 
>>> represents
>>> + * the amount of utilization of a CPU in the range [0..capacity_orig] 
>>> where
>>
>> I see util_sum is scaled by SCHED_LOAD_SHIFT at the end of
>> __update_load_avg(). If there is now an assumption that util_avg may be
>> used directly as a capacity value, should it be changed to
>> SCHED_CAPACITY_SHIFT? These are equal right now, not sure if they will
>> always be or if they can be combined.
>
> You're referring to the code line
>
> 2647   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
>
> in __update_load_avg()?
>
> Here we actually scale by 'SCHED_LOAD_SCALE/LOAD_AVG_MAX' so both values 
> are
> load related.

 I agree with Steve that there is an issue from a unit point of view

 sa->util_sum and LOAD_AVG_MAX have the same unit so sa->util_avg is a
 load because of << SCHED_LOAD_SHIFT)

 Before this patch , the translation from load to capacity unit was
 done in get_cpu_usage with "* capacity) >> SCHED_LOAD_SHIFT"

 So you still have to change the unit from load to capacity with a "/
 SCHED_LOAD_SCALE * SCHED_CAPACITY_SCALE" somewhere.

 sa->util_avg = ((sa->util_sum << SCHED_LOAD_SHIFT) /SCHED_LOAD_SCALE *
 SCHED_CAPACITY_SCALE / LOAD_AVG_MAX = (sa->util_sum <<
 SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
>>>
>>> I see the point but IMHO this will only be necessary if the 
>>> SCHED_LOAD_RESOLUTION
>>> stuff gets re-enabled again.
>>>
>>> It's not really about utilization or capacity units but rather about using 
>>> the same
>>> SCALE/SHIFT values for both sides, right?
>>
>> It's both a unit and a SCALE/SHIFT problem, SCHED_LOAD_SHIFT and
>> SCHED_CAPACITY_SHIFT are defined separately so we must be sure to
>> scale the value in the right range. In the case of cpu_usage which
>> returns sa->util_avg , it's the capacity range not the load range.
>
> Still don't understand why it's a unit problem. IMHO LOAD/UTIL and
> CAPACITY have no unit.

If you set 2 different values to SCHED_LOAD_SHIFT and
SCHED_CAPACITY_SHIFT for test purpose, you will see that util_avg will
not use the right range of value

If we don't take into account freq and cpu invariance in a 1st step

sa->util_sum is a load in the range [0..LOAD_AVG_MAX]. I say load
because of the max value

the current implementation of util_avg is
sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX

so sa->util_avg is a load in the range [0..SCHED_LOAD_SCALE]

the current implementation of get_cpu_usage is
return (sa->util_avg * capacity_orig_of(cpu)) >> SCHED_LOAD_SHIFT

so the usage has the same unit and range as capacity of the cpu and
can be compared with another capacity value

Your patchset returns directly sa->util_avg which is a load to compare
it with capacity value

So you have to convert sa->util_avg from load to capacity so if you have
sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX

sa->util_avg is now a capacity with the same range as you cpu thanks
to the cpu invariance factor that the patch 3 has added.

the << SCHED_CAPACITY_SHIFT above can be optimized with the >>
SCHED_CAPACITY_SHIFT included in
sa->util_sum += scale(contrib, scale_cpu);
as mentioned by Peter

At now, SCHED_CAPACITY_SHIFT is set to 10 as well as SCHED_LOAD_SHIFT
so using one instead of the other doesn't change the result but if
it's no more the case, we need to take care of the range/unit that we
use

Regards,
Vincent


>
> I agree that with the current patch-set we have a SHIFT/SCALE problem
> once SCHED_LOAD_RESOLUTION is set to != 0.
>
>>
>>>
>>> I always thought that scale_load_down() takes care of that.
>>
>> AFAIU, scale_load_down is a way to increase the resolution  of the
>> load not to move from load to capacity
>
> IMHO, increasing the resolution of the load is done by re-enabling this
> define SCHED_LOAD_RESOLUTION  10 thing (or by setting
> SCHED_LOAD_RESOLUTION to something else than 0).
>
> I tried to figure out why we have this issue when comparing UTIL w/
> CAPACITY and not LOAD w/ CAPACITY:
>
> Both are initialized like that:
>
>  sa->load_avg = scale_load_down(se->load.weight);
>  sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
>  sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
>  sa->util_sum = LOAD_AVG_MAX;
>
> and we use 'se->on_rq * scale_load_down(se->load.weight)' as 

Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-08 Thread Vincent Guittot
On 8 September 2015 at 14:52, Peter Zijlstra  wrote:
> On Tue, Sep 08, 2015 at 02:26:06PM +0200, Peter Zijlstra wrote:
>> On Tue, Sep 08, 2015 at 09:22:05AM +0200, Vincent Guittot wrote:
>> > No, but
>> > sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
>> > will fix the unit issue.
>>
>> Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
>>
>> And where load_sum already gets a factor 1024 from the weight
>> multiplication, util_sum does not get such a factor, and all the scaling
>> we do on it loose bits.
>>
>> So at the moment we go compute the util_avg value, we need to inflate
>> util_sum with an extra factor 1024 in order to make it work.
>>
>> And seeing that we do the shift up on sa->util_sum without consideration
>> of overflow, would it not make sense to add that factor before the
>> scaling and into the addition?
>>
>> Now, given all that, units are a complete mess here, and I'd not mind
>> something like:
>>
>> #if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
>> #error "something usefull"
>> #endif
>>
>> somewhere near here.
>
> Something like teh below..
>
> Another thing to ponder; the downside of scaled_delta_w is that its
> fairly likely delta is small and you loose all bits, whereas the weight
> is likely to be large can could loose a fwe bits without issue.
>
> That is, in fixed point scaling like this, you want to start with the
> biggest numbers, not the smallest, otherwise you loose too much.
>
> The flip side is of course that now you can share a multiplcation.
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -682,7 +682,7 @@ void init_entity_runnable_average(struct
> sa->load_avg = scale_load_down(se->load.weight);
> sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
> sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
> -   sa->util_sum = LOAD_AVG_MAX;
> +   sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
> /* when this task enqueue'ed, it will contribute to its cfs_rq's 
> load_avg */
>  }
>
> @@ -2515,6 +2515,10 @@ static u32 __compute_runnable_contrib(u6
> return contrib + runnable_avg_yN_sum[n];
>  }
>
> +#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || SCHED_CAPACITY_SHIFT 
> != 10
> +#error "load tracking assumes 2^10 as unit"
> +#endif

so why don't we set SCHED_CAPACITY_SHIFT to SCHED_LOAD_SHIFT ?

> +
>  #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
>
>  /*
> @@ -2599,7 +2603,7 @@ __update_load_avg(u64 now, int cpu, stru
> }
> }
> if (running)
> -   sa->util_sum += cap_scale(scaled_delta_w, scale_cpu);
> +   sa->util_sum += scaled_delta_w * scale_cpu;
>
> delta -= delta_w;
>
> @@ -2623,7 +2627,7 @@ __update_load_avg(u64 now, int cpu, stru
> cfs_rq->runnable_load_sum += weight * contrib;
> }
> if (running)
> -   sa->util_sum += cap_scale(contrib, scale_cpu);
> +   sa->util_sum += contrib * scale_cpu;
> }
>
> /* Remainder of delta accrued against u_0` */
> @@ -2634,7 +2638,7 @@ __update_load_avg(u64 now, int cpu, stru
> cfs_rq->runnable_load_sum += weight * scaled_delta;
> }
> if (running)
> -   sa->util_sum += cap_scale(scaled_delta, scale_cpu);
> +   sa->util_sum += scaled_delta * scale_cpu;
>
> sa->period_contrib += delta;
>
> @@ -2644,7 +2648,7 @@ __update_load_avg(u64 now, int cpu, stru
> cfs_rq->runnable_load_avg =
> div_u64(cfs_rq->runnable_load_sum, 
> LOAD_AVG_MAX);
> }
> -   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / 
> LOAD_AVG_MAX;
> +   sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
> }
>
> return decayed;
> @@ -2686,8 +2690,7 @@ static inline int update_cfs_rq_load_avg
> if (atomic_long_read(&cfs_rq->removed_util_avg)) {
> long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
> sa->util_avg = max_t(long, sa->util_avg - r, 0);
> -   sa->util_sum = max_t(s32, sa->util_sum -
> -   ((r * LOAD_AVG_MAX) >> SCHED_LOAD_SHIFT), 0);
> +   sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);

looks good to me

> }
>
> decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-08 Thread Peter Zijlstra
On Tue, Sep 08, 2015 at 03:39:37PM +0200, Vincent Guittot wrote:
> > Now, given all that, units are a complete mess here, and I'd not mind
> > something like:
> >
> > #if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
> > #error "something usefull"
> > #endif
> 
> In this case why not simply doing
> #define SCHED_CAPACITY_SHIFT SCHED_LOAD_SHIFT
> or the opposite ?

Sadly not enough; aside from the fact that we really should do !0
LOAD_RESOLUTION on 64bit, the whole magic tables (runnable_avg_yN_*[])
and LOAD_AVG_MAX* values rely on the unit being 1<<10.

So regardless of defining one in terms of the other, we should check
both are in fact 10 and error out otherwise.

Changing them must involve recomputing these numbers or otherwise
mucking about with shifts to ensure its back to 10 when we do this load
muck.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-08 Thread Dietmar Eggemann


On 08/09/15 15:01, Vincent Guittot wrote:
> On 8 September 2015 at 14:50, Dietmar Eggemann  
> wrote:
>> On 08/09/15 08:22, Vincent Guittot wrote:
>>> On 7 September 2015 at 20:54, Dietmar Eggemann  
>>> wrote:
 On 07/09/15 17:21, Vincent Guittot wrote:
> On 7 September 2015 at 17:37, Dietmar Eggemann  
> wrote:
>> On 04/09/15 00:51, Steve Muckle wrote:
>>> Hi Morten, Dietmar,
>>>
>>> On 08/14/2015 09:23 AM, Morten Rasmussen wrote:
>>> ...
 + * cfs_rq.avg.util_avg is the sum of running time of runnable tasks 
 plus the
 + * recent utilization of currently non-runnable tasks on a CPU. It 
 represents
 + * the amount of utilization of a CPU in the range [0..capacity_orig] 
 where
>>>
>>> I see util_sum is scaled by SCHED_LOAD_SHIFT at the end of
>>> __update_load_avg(). If there is now an assumption that util_avg may be
>>> used directly as a capacity value, should it be changed to
>>> SCHED_CAPACITY_SHIFT? These are equal right now, not sure if they will
>>> always be or if they can be combined.
>>
>> You're referring to the code line
>>
>> 2647   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
>>
>> in __update_load_avg()?
>>
>> Here we actually scale by 'SCHED_LOAD_SCALE/LOAD_AVG_MAX' so both values 
>> are
>> load related.
>
> I agree with Steve that there is an issue from a unit point of view
>
> sa->util_sum and LOAD_AVG_MAX have the same unit so sa->util_avg is a
> load because of << SCHED_LOAD_SHIFT)
>
> Before this patch , the translation from load to capacity unit was
> done in get_cpu_usage with "* capacity) >> SCHED_LOAD_SHIFT"
>
> So you still have to change the unit from load to capacity with a "/
> SCHED_LOAD_SCALE * SCHED_CAPACITY_SCALE" somewhere.
>
> sa->util_avg = ((sa->util_sum << SCHED_LOAD_SHIFT) /SCHED_LOAD_SCALE *
> SCHED_CAPACITY_SCALE / LOAD_AVG_MAX = (sa->util_sum <<
> SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;

 I see the point but IMHO this will only be necessary if the 
 SCHED_LOAD_RESOLUTION
 stuff gets re-enabled again.

 It's not really about utilization or capacity units but rather about using 
 the same
 SCALE/SHIFT values for both sides, right?
>>>
>>> It's both a unit and a SCALE/SHIFT problem, SCHED_LOAD_SHIFT and
>>> SCHED_CAPACITY_SHIFT are defined separately so we must be sure to
>>> scale the value in the right range. In the case of cpu_usage which
>>> returns sa->util_avg , it's the capacity range not the load range.
>>
>> Still don't understand why it's a unit problem. IMHO LOAD/UTIL and
>> CAPACITY have no unit.
> 
> If you set 2 different values to SCHED_LOAD_SHIFT and
> SCHED_CAPACITY_SHIFT for test purpose, you will see that util_avg will
> not use the right range of value
> 
> If we don't take into account freq and cpu invariance in a 1st step
> 
> sa->util_sum is a load in the range [0..LOAD_AVG_MAX]. I say load
> because of the max value
> 
> the current implementation of util_avg is
> sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX
> 
> so sa->util_avg is a load in the range [0..SCHED_LOAD_SCALE]
> 
> the current implementation of get_cpu_usage is
> return (sa->util_avg * capacity_orig_of(cpu)) >> SCHED_LOAD_SHIFT
> 
> so the usage has the same unit and range as capacity of the cpu and
> can be compared with another capacity value
> 
> Your patchset returns directly sa->util_avg which is a load to compare
> it with capacity value
> 
> So you have to convert sa->util_avg from load to capacity so if you have
> sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX
> 
> sa->util_avg is now a capacity with the same range as you cpu thanks
> to the cpu invariance factor that the patch 3 has added.
> 
> the << SCHED_CAPACITY_SHIFT above can be optimized with the >>
> SCHED_CAPACITY_SHIFT included in
> sa->util_sum += scale(contrib, scale_cpu);
> as mentioned by Peter
> 
> At now, SCHED_CAPACITY_SHIFT is set to 10 as well as SCHED_LOAD_SHIFT
> so using one instead of the other doesn't change the result but if
> it's no more the case, we need to take care of the range/unit that we
> use

No arguing here, I just called this a SHIFT/SCALE problem.

[...]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-08 Thread Morten Rasmussen
On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 08, 2015 at 02:26:06PM +0200, Peter Zijlstra wrote:
> > On Tue, Sep 08, 2015 at 09:22:05AM +0200, Vincent Guittot wrote:
> > > No, but
> > > sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
> > > will fix the unit issue.
> > 
> > Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.

I don't get why LOAD_AVG_MAX relies on the util_avg shifting being
1<<10, it is just the sum of the geometric series and the upper bound of
util_sum?

> > And where load_sum already gets a factor 1024 from the weight
> > multiplication, util_sum does not get such a factor, and all the scaling
> > we do on it loose bits.
> > 
> > So at the moment we go compute the util_avg value, we need to inflate
> > util_sum with an extra factor 1024 in order to make it work.

Agreed. Inflating the util_sum instead of util_avg like you do below
makes more sense. The load_sum/util_sum assymmetry is somewhat confusing.

> > And seeing that we do the shift up on sa->util_sum without consideration
> > of overflow, would it not make sense to add that factor before the
> > scaling and into the addition?

I don't think util_sum can overflow as it is bounded by LOAD_AVG_MAX
unless you shift it a lot, like << 20. The << SCHED_LOAD_SHIFT in the
existing code is wrong I think. Looking at the initialization of
util_avg = scale_load_down(SCHED_LOAD_SCALE) it is not using using high
resolution load.

> > Now, given all that, units are a complete mess here, and I'd not mind
> > something like:
> > 
> > #if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
> > #error "something usefull"
> > #endif
> > 
> > somewhere near here.

Yes. As I see it, it all falls completely if that isn't true. 

> 
> Something like teh below..
> 
> Another thing to ponder; the downside of scaled_delta_w is that its
> fairly likely delta is small and you loose all bits, whereas the weight
> is likely to be large can could loose a fwe bits without issue.

That issue applies both to load and util.

> 
> That is, in fixed point scaling like this, you want to start with the
> biggest numbers, not the smallest, otherwise you loose too much.
> 
> The flip side is of course that now you can share a multiplcation.

But if we apply the scaling to the weight instead of time, we would only
have to apply it once and not three times like it is now? So maybe we
can end up with almost the same number of multiplications.

We might be loosing bits for low priority task running on cpus at a low
frequency though.

> 
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -682,7 +682,7 @@ void init_entity_runnable_average(struct
>   sa->load_avg = scale_load_down(se->load.weight);
>   sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
>   sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
> - sa->util_sum = LOAD_AVG_MAX;
> + sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
>   /* when this task enqueue'ed, it will contribute to its cfs_rq's 
> load_avg */
>  }
>  
> @@ -2515,6 +2515,10 @@ static u32 __compute_runnable_contrib(u6
>   return contrib + runnable_avg_yN_sum[n];
>  }
>  
> +#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || SCHED_CAPACITY_SHIFT 
> != 10
> +#error "load tracking assumes 2^10 as unit"
> +#endif

As mentioned above. Does it have to be 10?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-08 Thread Morten Rasmussen
On Tue, Sep 08, 2015 at 04:06:36PM +0200, Vincent Guittot wrote:
> On 8 September 2015 at 14:52, Peter Zijlstra  wrote:
> > On Tue, Sep 08, 2015 at 02:26:06PM +0200, Peter Zijlstra wrote:
> >> On Tue, Sep 08, 2015 at 09:22:05AM +0200, Vincent Guittot wrote:
> >> > No, but
> >> > sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
> >> > will fix the unit issue.
> >>
> >> Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
> >>
> >> And where load_sum already gets a factor 1024 from the weight
> >> multiplication, util_sum does not get such a factor, and all the scaling
> >> we do on it loose bits.
> >>
> >> So at the moment we go compute the util_avg value, we need to inflate
> >> util_sum with an extra factor 1024 in order to make it work.
> >>
> >> And seeing that we do the shift up on sa->util_sum without consideration
> >> of overflow, would it not make sense to add that factor before the
> >> scaling and into the addition?
> >>
> >> Now, given all that, units are a complete mess here, and I'd not mind
> >> something like:
> >>
> >> #if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
> >> #error "something usefull"
> >> #endif
> >>
> >> somewhere near here.
> >
> > Something like teh below..
> >
> > Another thing to ponder; the downside of scaled_delta_w is that its
> > fairly likely delta is small and you loose all bits, whereas the weight
> > is likely to be large can could loose a fwe bits without issue.
> >
> > That is, in fixed point scaling like this, you want to start with the
> > biggest numbers, not the smallest, otherwise you loose too much.
> >
> > The flip side is of course that now you can share a multiplcation.
> >
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -682,7 +682,7 @@ void init_entity_runnable_average(struct
> > sa->load_avg = scale_load_down(se->load.weight);
> > sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
> > sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
> > -   sa->util_sum = LOAD_AVG_MAX;
> > +   sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
> > /* when this task enqueue'ed, it will contribute to its cfs_rq's 
> > load_avg */
> >  }
> >
> > @@ -2515,6 +2515,10 @@ static u32 __compute_runnable_contrib(u6
> > return contrib + runnable_avg_yN_sum[n];
> >  }
> >
> > +#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || 
> > SCHED_CAPACITY_SHIFT != 10
> > +#error "load tracking assumes 2^10 as unit"
> > +#endif
> 
> so why don't we set SCHED_CAPACITY_SHIFT to SCHED_LOAD_SHIFT ?

Don't you mean:

#define SCHED_LOAD_SHIFT (SCHED_CAPACITY_SHIFT + SCHED_LOAD_RESOLUTION)

?

Or do you want to increase the capacity resolution as well if you
increase the load resolution?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-08 Thread Vincent Guittot
On 8 September 2015 at 16:35, Morten Rasmussen  wrote:
> On Tue, Sep 08, 2015 at 04:06:36PM +0200, Vincent Guittot wrote:
>> On 8 September 2015 at 14:52, Peter Zijlstra  wrote:
>> > On Tue, Sep 08, 2015 at 02:26:06PM +0200, Peter Zijlstra wrote:
>> >> On Tue, Sep 08, 2015 at 09:22:05AM +0200, Vincent Guittot wrote:
>> >> > No, but
>> >> > sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
>> >> > will fix the unit issue.
>> >>
>> >> Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
>> >>
>> >> And where load_sum already gets a factor 1024 from the weight
>> >> multiplication, util_sum does not get such a factor, and all the scaling
>> >> we do on it loose bits.
>> >>
>> >> So at the moment we go compute the util_avg value, we need to inflate
>> >> util_sum with an extra factor 1024 in order to make it work.
>> >>
>> >> And seeing that we do the shift up on sa->util_sum without consideration
>> >> of overflow, would it not make sense to add that factor before the
>> >> scaling and into the addition?
>> >>
>> >> Now, given all that, units are a complete mess here, and I'd not mind
>> >> something like:
>> >>
>> >> #if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
>> >> #error "something usefull"
>> >> #endif
>> >>
>> >> somewhere near here.
>> >
>> > Something like teh below..
>> >
>> > Another thing to ponder; the downside of scaled_delta_w is that its
>> > fairly likely delta is small and you loose all bits, whereas the weight
>> > is likely to be large can could loose a fwe bits without issue.
>> >
>> > That is, in fixed point scaling like this, you want to start with the
>> > biggest numbers, not the smallest, otherwise you loose too much.
>> >
>> > The flip side is of course that now you can share a multiplcation.
>> >
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -682,7 +682,7 @@ void init_entity_runnable_average(struct
>> > sa->load_avg = scale_load_down(se->load.weight);
>> > sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
>> > sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
>> > -   sa->util_sum = LOAD_AVG_MAX;
>> > +   sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
>> > /* when this task enqueue'ed, it will contribute to its cfs_rq's 
>> > load_avg */
>> >  }
>> >
>> > @@ -2515,6 +2515,10 @@ static u32 __compute_runnable_contrib(u6
>> > return contrib + runnable_avg_yN_sum[n];
>> >  }
>> >
>> > +#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || 
>> > SCHED_CAPACITY_SHIFT != 10
>> > +#error "load tracking assumes 2^10 as unit"
>> > +#endif
>>
>> so why don't we set SCHED_CAPACITY_SHIFT to SCHED_LOAD_SHIFT ?
>
> Don't you mean:
>
> #define SCHED_LOAD_SHIFT (SCHED_CAPACITY_SHIFT + SCHED_LOAD_RESOLUTION)

yes you're right

>
> ?
>
> Or do you want to increase the capacity resolution as well if you
> increase the load resolution?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-08 Thread Vincent Guittot
On 8 September 2015 at 16:10, Peter Zijlstra  wrote:
> On Tue, Sep 08, 2015 at 03:39:37PM +0200, Vincent Guittot wrote:
>> > Now, given all that, units are a complete mess here, and I'd not mind
>> > something like:
>> >
>> > #if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
>> > #error "something usefull"
>> > #endif
>>
>> In this case why not simply doing
>> #define SCHED_CAPACITY_SHIFT SCHED_LOAD_SHIFT
>> or the opposite ?
>
> Sadly not enough; aside from the fact that we really should do !0
> LOAD_RESOLUTION on 64bit, the whole magic tables (runnable_avg_yN_*[])
> and LOAD_AVG_MAX* values rely on the unit being 1<<10.

ah yes, i forgot to take into account the LOAD_RESOLUTION.

So after some more thinking,  i finally don't see where in the code,
we will have a issue if SCHED_CAPACITY_SHIFT is not equal to
(SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) or not equal to 10 with the
respect of using a value that doesn't overflow the variables

Regards,
Vincent

>
> So regardless of defining one in terms of the other, we should check
> both are in fact 10 and error out otherwise.
>
> Changing them must involve recomputing these numbers or otherwise
> mucking about with shifts to ensure its back to 10 when we do this load
> muck.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-08 Thread Peter Zijlstra
On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
> On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
> > > Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
> 
> I don't get why LOAD_AVG_MAX relies on the util_avg shifting being
> 1<<10, it is just the sum of the geometric series and the upper bound of
> util_sum?

It needs a 1024, it might just have been the 1024 ns we use a period
instead of the scale unit though.

The LOAD_AVG_MAX is the number where adding a next element to the series
doesn't change the result anymore, so scaling it up will allow more
significant elements to the series before we bottom out, which is the _N
thing.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-08 Thread Morten Rasmussen
On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
> On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
> > 
> > Something like teh below..
> > 
> > Another thing to ponder; the downside of scaled_delta_w is that its
> > fairly likely delta is small and you loose all bits, whereas the weight
> > is likely to be large can could loose a fwe bits without issue.
> 
> That issue applies both to load and util.
> 
> > 
> > That is, in fixed point scaling like this, you want to start with the
> > biggest numbers, not the smallest, otherwise you loose too much.
> > 
> > The flip side is of course that now you can share a multiplcation.
> 
> But if we apply the scaling to the weight instead of time, we would only
> have to apply it once and not three times like it is now? So maybe we
> can end up with almost the same number of multiplications.
> 
> We might be loosing bits for low priority task running on cpus at a low
> frequency though.

Something like the below. We should be saving one multiplication.

--- 8< ---

From: Morten Rasmussen 
Date: Tue, 8 Sep 2015 17:15:40 +0100
Subject: [PATCH] sched/fair: Scale load/util contribution rather than time

When updating load/util tracking the time delta might be very small (1)
in many cases, scaling it futher down with frequency and cpu invariance
scaling might cause us to loose precision. Instead of scaling time we
can scale the weight of the task for load and the capacity for
utilization. Both weight (>=15) and capacity should be significantly
bigger in most cases. Low priority tasks might still suffer a bit but
worst should be improved, as weight is at least 15 before invariance
scaling.

Signed-off-by: Morten Rasmussen 
---
 kernel/sched/fair.c | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9301291..d5ee72a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2519,8 +2519,6 @@ static u32 __compute_runnable_contrib(u64 n)
 #error "load tracking assumes 2^10 as unit"
 #endif
 
-#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
-
 /*
  * We can represent the historical contribution to runnable average as the
  * coefficients of a geometric series.  To do this we sub-divide our runnable
@@ -2553,10 +2551,10 @@ static __always_inline int
 __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
  unsigned long weight, int running, struct cfs_rq *cfs_rq)
 {
-   u64 delta, scaled_delta, periods;
+   u64 delta, periods;
u32 contrib;
-   unsigned int delta_w, scaled_delta_w, decayed = 0;
-   unsigned long scale_freq, scale_cpu;
+   unsigned int delta_w, decayed = 0;
+   unsigned long scaled_weight = 0, scale_freq, scale_freq_cpu = 0;
 
delta = now - sa->last_update_time;
/*
@@ -2577,8 +2575,13 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
return 0;
sa->last_update_time = now;
 
-   scale_freq = arch_scale_freq_capacity(NULL, cpu);
-   scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
+   if (weight || running)
+   scale_freq = arch_scale_freq_capacity(NULL, cpu);
+   if (weight)
+   scaled_weight = weight * scale_freq >> SCHED_CAPACITY_SHIFT;
+   if (running)
+   scale_freq_cpu = scale_freq * arch_scale_cpu_capacity(NULL, cpu)
+   >> SCHED_CAPACITY_SHIFT;
 
/* delta_w is the amount already accumulated against our next period */
delta_w = sa->period_contrib;
@@ -2594,16 +2597,15 @@ __update_load_avg(u64 now, int cpu, struct sched_avg 
*sa,
 * period and accrue it.
 */
delta_w = 1024 - delta_w;
-   scaled_delta_w = cap_scale(delta_w, scale_freq);
if (weight) {
-   sa->load_sum += weight * scaled_delta_w;
+   sa->load_sum += scaled_weight * delta_w;
if (cfs_rq) {
cfs_rq->runnable_load_sum +=
-   weight * scaled_delta_w;
+   scaled_weight * delta_w;
}
}
if (running)
-   sa->util_sum += scaled_delta_w * scale_cpu;
+   sa->util_sum += delta_w * scale_freq_cpu;
 
delta -= delta_w;
 
@@ -2620,25 +2622,23 @@ __update_load_avg(u64 now, int cpu, struct sched_avg 
*sa,
 
/* Efficiently calculate \sum (1..n_period) 1024*y^i */
contrib = __compute_runnable_contrib(periods);
-   contrib = cap_scale(contrib, scale_freq);
if (weight) {
-   sa->load_sum += weight * contrib;
+   sa->load_sum += scaled_weight * contrib;
if (cfs_rq)
- 

Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-09 Thread Peter Zijlstra
On Tue, Sep 08, 2015 at 05:53:31PM +0100, Morten Rasmussen wrote:
> On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:

> > On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
> > But if we apply the scaling to the weight instead of time, we would only
> > have to apply it once and not three times like it is now? So maybe we
> > can end up with almost the same number of multiplications.
> > 
> > We might be loosing bits for low priority task running on cpus at a low
> > frequency though.
> 
> Something like the below. We should be saving one multiplication.

> @@ -2577,8 +2575,13 @@ __update_load_avg(u64 now, int cpu, struct sched_avg 
> *sa,
>   return 0;
>   sa->last_update_time = now;
>  
> - scale_freq = arch_scale_freq_capacity(NULL, cpu);
> - scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> + if (weight || running)
> + scale_freq = arch_scale_freq_capacity(NULL, cpu);
> + if (weight)
> + scaled_weight = weight * scale_freq >> SCHED_CAPACITY_SHIFT;
> + if (running)
> + scale_freq_cpu = scale_freq * arch_scale_cpu_capacity(NULL, cpu)
> + >> SCHED_CAPACITY_SHIFT;
>  
>   /* delta_w is the amount already accumulated against our next period */
>   delta_w = sa->period_contrib;
> @@ -2594,16 +2597,15 @@ __update_load_avg(u64 now, int cpu, struct sched_avg 
> *sa,
>* period and accrue it.
>*/
>   delta_w = 1024 - delta_w;
> - scaled_delta_w = cap_scale(delta_w, scale_freq);
>   if (weight) {
> - sa->load_sum += weight * scaled_delta_w;
> + sa->load_sum += scaled_weight * delta_w;
>   if (cfs_rq) {
>   cfs_rq->runnable_load_sum +=
> - weight * scaled_delta_w;
> + scaled_weight * delta_w;
>   }
>   }
>   if (running)
> - sa->util_sum += scaled_delta_w * scale_cpu;
> + sa->util_sum += delta_w * scale_freq_cpu;
>  
>   delta -= delta_w;
>  

Sadly that makes the code worse; I get 14 mul instructions where
previously I had 11.

What happens is that GCC gets confused and cannot constant propagate the
new variables, so what used to be shifts now end up being actual
multiplications.

With this, I get back to 11. Can you see what happens on ARM where you
have both functions defined to non constants?

---
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2551,10 +2551,10 @@ static __always_inline int
 __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
  unsigned long weight, int running, struct cfs_rq *cfs_rq)
 {
+   unsigned long scaled_weight, scale_freq, scale_freq_cpu;
+   unsigned int delta_w, decayed = 0;
u64 delta, periods;
u32 contrib;
-   unsigned int delta_w, decayed = 0;
-   unsigned long scaled_weight = 0, scale_freq, scale_freq_cpu = 0;
 
delta = now - sa->last_update_time;
/*
@@ -2575,13 +2575,10 @@ __update_load_avg(u64 now, int cpu, stru
return 0;
sa->last_update_time = now;
 
-   if (weight || running)
-   scale_freq = arch_scale_freq_capacity(NULL, cpu);
-   if (weight)
-   scaled_weight = weight * scale_freq >> SCHED_CAPACITY_SHIFT;
-   if (running)
-   scale_freq_cpu = scale_freq * arch_scale_cpu_capacity(NULL, cpu)
-   >> SCHED_CAPACITY_SHIFT;
+   scale_freq = arch_scale_freq_capacity(NULL, cpu);
+
+   scaled_weight = weight * scale_freq >> SCHED_CAPACITY_SHIFT;
+   scale_freq_cpu = scale_freq * arch_scale_cpu_capacity(NULL, cpu) >> 
SCHED_CAPACITY_SHIFT;
 
/* delta_w is the amount already accumulated against our next period */
delta_w = sa->period_contrib;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-09 Thread Peter Zijlstra
On Wed, Sep 09, 2015 at 11:43:05AM +0200, Peter Zijlstra wrote:
> Sadly that makes the code worse; I get 14 mul instructions where
> previously I had 11.

FWIW I count like:

objdump -d defconfig-build/kernel/sched/fair.o |
  awk '/<[^>]*>:/ { p=0 }
   /:/ { p=1 }
   { if (p) print $0 }' |
  cut -d\: -f2- | grep mul | wc -l
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-09 Thread Morten Rasmussen
On Wed, Sep 09, 2015 at 11:43:05AM +0200, Peter Zijlstra wrote:
> On Tue, Sep 08, 2015 at 05:53:31PM +0100, Morten Rasmussen wrote:
> > On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
> 
> > > On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
> > > But if we apply the scaling to the weight instead of time, we would only
> > > have to apply it once and not three times like it is now? So maybe we
> > > can end up with almost the same number of multiplications.
> > > 
> > > We might be loosing bits for low priority task running on cpus at a low
> > > frequency though.
> > 
> > Something like the below. We should be saving one multiplication.
> 
> > @@ -2577,8 +2575,13 @@ __update_load_avg(u64 now, int cpu, struct sched_avg 
> > *sa,
> > return 0;
> > sa->last_update_time = now;
> >  
> > -   scale_freq = arch_scale_freq_capacity(NULL, cpu);
> > -   scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> > +   if (weight || running)
> > +   scale_freq = arch_scale_freq_capacity(NULL, cpu);
> > +   if (weight)
> > +   scaled_weight = weight * scale_freq >> SCHED_CAPACITY_SHIFT;
> > +   if (running)
> > +   scale_freq_cpu = scale_freq * arch_scale_cpu_capacity(NULL, cpu)
> > +   >> SCHED_CAPACITY_SHIFT;
> >  
> > /* delta_w is the amount already accumulated against our next period */
> > delta_w = sa->period_contrib;
> > @@ -2594,16 +2597,15 @@ __update_load_avg(u64 now, int cpu, struct 
> > sched_avg *sa,
> >  * period and accrue it.
> >  */
> > delta_w = 1024 - delta_w;
> > -   scaled_delta_w = cap_scale(delta_w, scale_freq);
> > if (weight) {
> > -   sa->load_sum += weight * scaled_delta_w;
> > +   sa->load_sum += scaled_weight * delta_w;
> > if (cfs_rq) {
> > cfs_rq->runnable_load_sum +=
> > -   weight * scaled_delta_w;
> > +   scaled_weight * delta_w;
> > }
> > }
> > if (running)
> > -   sa->util_sum += scaled_delta_w * scale_cpu;
> > +   sa->util_sum += delta_w * scale_freq_cpu;
> >  
> > delta -= delta_w;
> >  
> 
> Sadly that makes the code worse; I get 14 mul instructions where
> previously I had 11.
> 
> What happens is that GCC gets confused and cannot constant propagate the
> new variables, so what used to be shifts now end up being actual
> multiplications.
> 
> With this, I get back to 11. Can you see what happens on ARM where you
> have both functions defined to non constants?

We repeated the experiment on arm and arm64 but still with functions
defined to constant to compare with your results. The mul instruction
count seems to be somewhat compiler version dependent, but consistently
show no effect of the patch:

arm before  after
gcc4.9  12  12
gcc4.8  10  10

arm64   before  after
gcc4.9  11  11

I will get numbers with the arch-functions implemented as well and do
hackbench runs to see what happens in terms of performance.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-09 Thread bsegall
Peter Zijlstra  writes:

> On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
>> On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
>> > > Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
>> 
>> I don't get why LOAD_AVG_MAX relies on the util_avg shifting being
>> 1<<10, it is just the sum of the geometric series and the upper bound of
>> util_sum?
>
> It needs a 1024, it might just have been the 1024 ns we use a period
> instead of the scale unit though.
>
> The LOAD_AVG_MAX is the number where adding a next element to the series
> doesn't change the result anymore, so scaling it up will allow more
> significant elements to the series before we bottom out, which is the _N
> thing.
>

Yes, as the comments say, the 1024ns unit is arbitrary (and is an
average of not-quite-microseconds instead of just nanoseconds to allow
more bits to load.weight when we multiply load.weight by this number).
In fact there are two arbitrary 1024 units here, which are technically
unrelated and are both unrelated to SCHED_LOAD_RESOLUTION/etc - we
operate on units of almost-microseconds and we also do decays every
almost-millisecond.

There appears to be a bunch of confusion in the current code around
util_sum/util_avg which appears to using SCHED_LOAD_SCALE
for a fixed-point percentage or something, which is at least reasonable,
but is initializing it as scale_load_down(SCHED_LOAD_SCALE), which
results in either initializing as 100% or .1% depending on RESOLUTION.
This'll get clobbered on first update, but if it needs to be
initialized, it should either get initialized to something sane or at
least consistent.

load_sum/load_avg appear to be scale_load_down()ed properly, and appear
to be used as such at a quick glance.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-09 Thread Yuyang Du
On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
>  
> +#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || SCHED_CAPACITY_SHIFT 
> != 10
> +#error "load tracking assumes 2^10 as unit"
> +#endif
> +

Sorry for late response. I might already missed somthing.

But I got a bit lost here, with:

#define SCHED_LOAD_SHIFT(10 + SCHED_LOAD_RESOLUTION)
#define SCHED_CAPACITY_SHIFT10

the #if is certainly false.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-09 Thread Yuyang Du
On Tue, Sep 08, 2015 at 01:50:38PM +0100, Dietmar Eggemann wrote:
> > It's both a unit and a SCALE/SHIFT problem, SCHED_LOAD_SHIFT and
> > SCHED_CAPACITY_SHIFT are defined separately so we must be sure to
> > scale the value in the right range. In the case of cpu_usage which
> > returns sa->util_avg , it's the capacity range not the load range.
> 
> Still don't understand why it's a unit problem. IMHO LOAD/UTIL and
> CAPACITY have no unit.

To be more accurate, probably, LOAD can be thought of as having unit,
but UTIL has no unit.
 
Anyway, those are my definitions:

1) unit, only for LOAD, and SCHED_LOAD_X is the unit (but
   SCHED_LOAD_RESOLUTION make it also some 2, see below)
2) range, aka, resolution or fix-point percentage (as Ben said)
3) timing ratio, LOAD_AVG_MAX etc, unralated with SCHED_LOAD_X

> >> I always thought that scale_load_down() takes care of that.
> > 
> > AFAIU, scale_load_down is a way to increase the resolution  of the
> > load not to move from load to capacity
> 
> I tried to figure out why we have this issue when comparing UTIL w/
> CAPACITY and not LOAD w/ CAPACITY:
> 
> Both are initialized like that:
> 
>  sa->load_avg = scale_load_down(se->load.weight);
>  sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
>  sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
>  sa->util_sum = LOAD_AVG_MAX;
> 
> and we use 'se->on_rq * scale_load_down(se->load.weight)' as 'unsigned
> long weight' argument to call __update_load_avg() making sure the
> scaling differences between LOAD and CAPACITY are respected while
> updating sa->load_sum (and sa->load_avg).

Yes, because we used SCHED_LOAD_X as both unit and range for LOAD.
 
> OTAH, we don't apply a scale_load_down for sa->util_[sum/avg] only a '<<
> SCHED_LOAD_SHIFT) / LOAD_AVG_MAX' on sa->util_avg.
> So changing '<< SCHED_LOAD_SHIFT' to '*
> scale_load_down(SCHED_LOAD_SCALE)' would be the logical thing to do.

Actually, for UTIL, we only need range, so don't conflate with LOAD,
what about we get all these clarified by redefining SCHED_LOAD_RESOLUTION
as the resolution/range generic macro for LOAD, UTIL, and CAPACITY:

#define SCHED_RESOLUTION_SHIFT  10
#define SCHED_RESOLUTION_SCALE  (1L << SCHED_RESOLUTION_SHIFT)

#if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under 
light load  */
# define scale_load(w)  ((w) << SCHED_RESOLUTION_SHIFT)
# define scale_load_down(w) ((w) >> SCHED_RESOLUTION_SHIFT)
# define SCHED_LOAD_SHIFT   (10 + SCHED_RESOLUTION_SHIFT)
#else
# define scale_load(w)  (w)
# define scale_load_down(w) (w)
# define SCHED_LOAD_SHIFT   (10)
#endif

#define SCHED_LOAD_SCALE(1L << SCHED_LOAD_SHIFT)

For UTIL, e.g., it will be initiated as:
sa->util_avg = SCHED_RESOLUTION_SCALE;

And for capacity, we just use SCHED_RESOLUTION_SHIFT
(so SCHED_CAPACITY_SHIFT is not needed).

Thanks,
Yuyang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-10 Thread Peter Zijlstra
On Thu, Sep 10, 2015 at 03:07:48AM +0800, Yuyang Du wrote:
> On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
> >  
> > +#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || 
> > SCHED_CAPACITY_SHIFT != 10
> > +#error "load tracking assumes 2^10 as unit"
> > +#endif
> > +
> 
> Sorry for late response. I might already missed somthing.
> 
> But I got a bit lost here, with:
> 
> #define SCHED_LOAD_SHIFT(10 + SCHED_LOAD_RESOLUTION)
> #define SCHED_CAPACITY_SHIFT10
> 
> the #if is certainly false.

That is intended, triggering #error would be 'bad'.

The reason for this bit is to raise a stink if someone 'accidentally'
changes one of these values and expects things to just work.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-10 Thread Peter Zijlstra
On Thu, Sep 10, 2015 at 04:15:20AM +0800, Yuyang Du wrote:
> On Tue, Sep 08, 2015 at 01:50:38PM +0100, Dietmar Eggemann wrote:
> > > It's both a unit and a SCALE/SHIFT problem, SCHED_LOAD_SHIFT and
> > > SCHED_CAPACITY_SHIFT are defined separately so we must be sure to
> > > scale the value in the right range. In the case of cpu_usage which
> > > returns sa->util_avg , it's the capacity range not the load range.
> > 
> > Still don't understand why it's a unit problem. IMHO LOAD/UTIL and
> > CAPACITY have no unit.
> 
> To be more accurate, probably, LOAD can be thought of as having unit,
> but UTIL has no unit.

But I'm thinking that is wrong; it should have one, esp. if we go scale
the thing. Giving it the same fixed point unit as load simplifies the
code.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-03 Thread Steve Muckle
Hi Morten, Dietmar,

On 08/14/2015 09:23 AM, Morten Rasmussen wrote:
...
> + * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus the
> + * recent utilization of currently non-runnable tasks on a CPU. It represents
> + * the amount of utilization of a CPU in the range [0..capacity_orig] where

I see util_sum is scaled by SCHED_LOAD_SHIFT at the end of
__update_load_avg(). If there is now an assumption that util_avg may be
used directly as a capacity value, should it be changed to
SCHED_CAPACITY_SHIFT? These are equal right now, not sure if they will
always be or if they can be combined.

> + * capacity_orig is the cpu_capacity available at * the highest frequency

spurious *

thanks,
Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-21 Thread bsegall
Yuyang Du  writes:

> On Thu, Sep 17, 2015 at 12:38:25PM +0200, Peter Zijlstra wrote:
>> On Fri, Sep 11, 2015 at 06:22:47PM +0100, Morten Rasmussen wrote:
>> 
>> > While at it, should I include Yuyang's patch redefining the SCALE/SHIFT
>> > mess?
>> 
>> I suspect his patch will fail to compile on ARM which uses
>> SCHED_CAPACITY_* outside of kernel/sched/*.
>> 
>> But if you all (Ben, Yuyang, you) can agree on a patch simplifying these
>> things I'm not opposed to it.
>
> Yes, indeed. So SCHED_RESOLUTION_SHIFT has to be defined in 
> include/linux/sched.h.
>
> With this, I think the codes still need some cleanup, and importantly
> documentation.
>
> But first, I think as load_sum and load_avg can afford NICE_0_LOAD with 
> either high
> or low resolution. So we have no reason to have low resolution (10bits) 
> load_avg
> when NICE_0_LOAD has high resolution (20bits), because load_avg = runnable% * 
> load,
> as opposed to now we have load_avg = runnable% * scale_load_down(load).
>
> We get rid of all scale_load_down() for runnable load average?

Hmm, LOAD_AVG_MAX * prio_to_weight[0] is 4237627662, ie barely within a
32-bit unsigned long, but in fact LOAD_AVG_MAX * MAX_SHARES is already
going to give errors on 32-bit (even with the old code in fact). This
should probably be fixed... somehow (dividing by 4 for load_sum on
32-bit would work, though be ugly. Reducing MAX_SHARES by 2 bits on
32-bit might have made sense but would be a weird difference between 32
and 64, and could break userspace anyway, so it's presumably too late
for that).

64-bit has ~30 bits free, so this would be fine so long as SLR is 0 on
32-bit.

>
> --
>
> Subject: [PATCH] sched/fair: Generalize the load/util averages resolution
>  definition
>
> The metric needs certain resolution to determine how much detail we
> can look into (or not losing detail by integer rounding), which also
> determines the range of the metrics.
>
> For instance, to increase the resolution of [0, 1] (two levels), one
> can multiply 1024 and get [0, 1024] (1025 levels).
>
> In sched/fair, a few metrics depend on the resolution: load/load_avg,
> util_avg, and capacity (frequency adjustment). In order to reduce the
> risks to make mistakes relating to resolution/range, we therefore
> generalize the resolution by defining a basic resolution constant
> number, and then formalize all metrics by depending on the basic
> resolution. The basic resolution is 1024 or (1 << 10). Further, one
> can recursively apply the basic resolution to increase the final
> resolution.
>
> Pointed out by Ben Segall, NICE_0's weight (visible to user) and load
> have independent resolution, but they must be well calibrated.
>
> Signed-off-by: Yuyang Du 
> ---
>  include/linux/sched.h |  9 ++---
>  kernel/sched/fair.c   |  4 
>  kernel/sched/sched.h  | 15 ++-
>  3 files changed, 16 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index bd38b3e..9b86f79 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -909,10 +909,13 @@ enum cpu_idle_type {
>   CPU_MAX_IDLE_TYPES
>  };
>  
> +# define SCHED_RESOLUTION_SHIFT  10
> +# define SCHED_RESOLUTION_SCALE  (1L << SCHED_RESOLUTION_SHIFT)
> +
>  /*
>   * Increase resolution of cpu_capacity calculations
>   */
> -#define SCHED_CAPACITY_SHIFT 10
> +#define SCHED_CAPACITY_SHIFT SCHED_RESOLUTION_SHIFT
>  #define SCHED_CAPACITY_SCALE (1L << SCHED_CAPACITY_SHIFT)
>  
>  /*
> @@ -1180,8 +1183,8 @@ struct load_weight {
>   * 1) load_avg factors frequency scaling into the amount of time that a
>   * sched_entity is runnable on a rq into its weight. For cfs_rq, it is the
>   * aggregated such weights of all runnable and blocked sched_entities.
> - * 2) util_avg factors frequency and cpu scaling into the amount of time
> - * that a sched_entity is running on a CPU, in the range 
> [0..SCHED_LOAD_SCALE].
> + * 2) util_avg factors frequency and cpu capacity scaling into the amount of 
> time
> + * that a sched_entity is running on a CPU, in the range 
> [0..SCHED_CAPACITY_SCALE].
>   * For cfs_rq, it is the aggregated such times of all runnable and
>   * blocked sched_entities.
>   * The 64 bit load_sum can:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4df37a4..c61fd8e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2522,10 +2522,6 @@ static u32 __compute_runnable_contrib(u64 n)
>   return contrib + runnable_avg_yN_sum[n];
>  }
>  
> -#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || SCHED_CAPACITY_SHIFT 
> != 10
> -#error "load tracking assumes 2^10 as unit"
> -#endif
> -
>  #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
>  
>  /*
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3845a71..31b4022 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -53,18 +53,23 @@ static inline void update_cpu_load_active(struct rq 
> *this_rq) { }
>   * increased costs.
>   */
>  #if 

Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-22 Thread Yuyang Du
On Mon, Sep 21, 2015 at 10:30:04AM -0700, bseg...@google.com wrote:
> > But first, I think as load_sum and load_avg can afford NICE_0_LOAD with 
> > either high
> > or low resolution. So we have no reason to have low resolution (10bits) 
> > load_avg
> > when NICE_0_LOAD has high resolution (20bits), because load_avg = runnable% 
> > * load,
> > as opposed to now we have load_avg = runnable% * scale_load_down(load).
> >
> > We get rid of all scale_load_down() for runnable load average?
> 
> Hmm, LOAD_AVG_MAX * prio_to_weight[0] is 4237627662, ie barely within a
> 32-bit unsigned long, but in fact LOAD_AVG_MAX * MAX_SHARES is already
> going to give errors on 32-bit (even with the old code in fact). This
> should probably be fixed... somehow (dividing by 4 for load_sum on
> 32-bit would work, though be ugly. Reducing MAX_SHARES by 2 bits on
> 32-bit might have made sense but would be a weird difference between 32
> and 64, and could break userspace anyway, so it's presumably too late
> for that).
> 
> 64-bit has ~30 bits free, so this would be fine so long as SLR is 0 on
> 32-bit.
> 

load_avg has no LOAD_AVG_MAX term in it, it is runnable% * load, IOW, load_avg 
<= load.
So, on 32bit, cfs_rq's load_avg can host 2^32/prio_to_weight[0]/1024 = 47, with 
20bits
load resolution. This is ok, because struct load_weight's load is also unsigned
long. If overflown, cfs_rq->load.weight will be overflown in the first place.

However, after a second thought, this is not quite right. Because load_avg is 
not
necessarily no greater than load, since load_avg has blocked load in it. 
Although,
load_avg is still at the same level as load (converging to be <= load), we may 
not
want the risk to overflow on 32bit.

> > +/*
> > + * NICE_0's weight (visible to user) and its load (invisible to user) have
> > + * independent resolution, but they should be well calibrated. We use 
> > scale_load()
> > + * and scale_load_down(w) to convert between them, the following must be 
> > true:
> > + * scale_load(prio_to_weight[20]) == NICE_0_LOAD
> > + */
> >  #define NICE_0_LOADSCHED_LOAD_SCALE
> >  #define NICE_0_SHIFT   SCHED_LOAD_SHIFT
> 
> I still think tying the scale_load shift to be the same as the
> SCHED_CAPACITY/etc shift is silly, and tying the NICE_0_LOAD/SHIFT in is
> worse. Honestly if I was going to change anything it would be to define
> NICE_0_LOAD/SHIFT entirely separately from SCHED_LOAD_SCALE/SHIFT.

If NICE_0_LOAD is nice-0's load, and if SCHED_LOAD_SHIFT is to say how to get 
nice-0's load, I don't understand why you want to separate them.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-22 Thread bsegall
Yuyang Du  writes:

> On Mon, Sep 21, 2015 at 10:30:04AM -0700, bseg...@google.com wrote:
>> > But first, I think as load_sum and load_avg can afford NICE_0_LOAD with 
>> > either high
>> > or low resolution. So we have no reason to have low resolution (10bits) 
>> > load_avg
>> > when NICE_0_LOAD has high resolution (20bits), because load_avg = 
>> > runnable% * load,
>> > as opposed to now we have load_avg = runnable% * scale_load_down(load).
>> >
>> > We get rid of all scale_load_down() for runnable load average?
>> 
>> Hmm, LOAD_AVG_MAX * prio_to_weight[0] is 4237627662, ie barely within a
>> 32-bit unsigned long, but in fact LOAD_AVG_MAX * MAX_SHARES is already
>> going to give errors on 32-bit (even with the old code in fact). This
>> should probably be fixed... somehow (dividing by 4 for load_sum on
>> 32-bit would work, though be ugly. Reducing MAX_SHARES by 2 bits on
>> 32-bit might have made sense but would be a weird difference between 32
>> and 64, and could break userspace anyway, so it's presumably too late
>> for that).
>> 
>> 64-bit has ~30 bits free, so this would be fine so long as SLR is 0 on
>> 32-bit.
>> 
>
> load_avg has no LOAD_AVG_MAX term in it, it is runnable% * load, IOW, 
> load_avg <= load.
> So, on 32bit, cfs_rq's load_avg can host 2^32/prio_to_weight[0]/1024 = 47, 
> with 20bits
> load resolution. This is ok, because struct load_weight's load is also 
> unsigned
> long. If overflown, cfs_rq->load.weight will be overflown in the first place.
>
> However, after a second thought, this is not quite right. Because load_avg is 
> not
> necessarily no greater than load, since load_avg has blocked load in it. 
> Although,
> load_avg is still at the same level as load (converging to be <= load), we 
> may not
> want the risk to overflow on 32bit.

Yeah, I missed that load_sum was u64 and only load_avg was long. This
means we're fine on 32-bit with no SLR (or more precisely, cfs_rq
runnable_load_avg can overflow, but only when cfs_rq load.weight does,
so whatever). On 64-bit you can currently have 2^36 cgroups or 2^37
tasks before load.weight overflows, and ~2^31 tasks before
runnable_load_avg does, which is obviously fine (and in fact you hit
PID_MAX_LIMIT and even if you had the cpu/ram/etc to not fall over).

Now, applying SLR to runnable_load_avg would cut this down to ~2^21
tasks running at once or 2^20 with cgroups, which is technically
allowed, though it seems utterly implausible (especially since this
would have to all be on one cpu). If SLR was increased as peterz asked
about, this could be an issue though.

All that said, using SLR on load_sum/load_avg as opposed to cfs_rq
runnable_load_avg would be fine, as they're limited to only one
task/cgroup's weight. Having it SLRed and cfs_rq not would be a
little odd, but not impossible.

>
>> > +/*
>> > + * NICE_0's weight (visible to user) and its load (invisible to user) have
>> > + * independent resolution, but they should be well calibrated. We use 
>> > scale_load()
>> > + * and scale_load_down(w) to convert between them, the following must be 
>> > true:
>> > + * scale_load(prio_to_weight[20]) == NICE_0_LOAD
>> > + */
>> >  #define NICE_0_LOAD   SCHED_LOAD_SCALE
>> >  #define NICE_0_SHIFT  SCHED_LOAD_SHIFT
>> 
>> I still think tying the scale_load shift to be the same as the
>> SCHED_CAPACITY/etc shift is silly, and tying the NICE_0_LOAD/SHIFT in is
>> worse. Honestly if I was going to change anything it would be to define
>> NICE_0_LOAD/SHIFT entirely separately from SCHED_LOAD_SCALE/SHIFT.
>
> If NICE_0_LOAD is nice-0's load, and if SCHED_LOAD_SHIFT is to say how to get 
> nice-0's load, I don't understand why you want to separate them.

SCHED_LOAD_SHIFT is not how to get nice-0's load, it just happens to
have the same value as NICE_0_SHIFT. (I think anyway, SCHED_LOAD_* is
used in precisely one place other than the newish util_avg, and as I
mentioned it's not remotely clear what compute_imbalance is doing theer)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-23 Thread Yuyang Du
On Tue, Sep 22, 2015 at 10:18:30AM -0700, bseg...@google.com wrote:
> Yuyang Du  writes:
> 
> > On Mon, Sep 21, 2015 at 10:30:04AM -0700, bseg...@google.com wrote:
> >> > But first, I think as load_sum and load_avg can afford NICE_0_LOAD with 
> >> > either high
> >> > or low resolution. So we have no reason to have low resolution (10bits) 
> >> > load_avg
> >> > when NICE_0_LOAD has high resolution (20bits), because load_avg = 
> >> > runnable% * load,
> >> > as opposed to now we have load_avg = runnable% * scale_load_down(load).
> >> >
> >> > We get rid of all scale_load_down() for runnable load average?
> >> 
> >> Hmm, LOAD_AVG_MAX * prio_to_weight[0] is 4237627662, ie barely within a
> >> 32-bit unsigned long, but in fact LOAD_AVG_MAX * MAX_SHARES is already
> >> going to give errors on 32-bit (even with the old code in fact). This
> >> should probably be fixed... somehow (dividing by 4 for load_sum on
> >> 32-bit would work, though be ugly. Reducing MAX_SHARES by 2 bits on
> >> 32-bit might have made sense but would be a weird difference between 32
> >> and 64, and could break userspace anyway, so it's presumably too late
> >> for that).
> >> 
> >> 64-bit has ~30 bits free, so this would be fine so long as SLR is 0 on
> >> 32-bit.
> >> 
> >
> > load_avg has no LOAD_AVG_MAX term in it, it is runnable% * load, IOW, 
> > load_avg <= load.
> > So, on 32bit, cfs_rq's load_avg can host 2^32/prio_to_weight[0]/1024 = 47, 
> > with 20bits
> > load resolution. This is ok, because struct load_weight's load is also 
> > unsigned
> > long. If overflown, cfs_rq->load.weight will be overflown in the first 
> > place.
> >
> > However, after a second thought, this is not quite right. Because load_avg 
> > is not
> > necessarily no greater than load, since load_avg has blocked load in it. 
> > Although,
> > load_avg is still at the same level as load (converging to be <= load), we 
> > may not
> > want the risk to overflow on 32bit.
 
This second thought made a mistake (what was wrong with me). load_avg is for 
sure
no greater than load with or without blocked load.

With that said, it really does not matter what the following numbers are, 32bit 
or
64bit machine. What matters is that cfs_rq->load.weight is one that needs to 
worry
whether overflow or not, not the load_avg. It is as simple as that.

With that, I think we can and should get rid of the scale_load_down() for 
load_avg.

> Yeah, I missed that load_sum was u64 and only load_avg was long. This
> means we're fine on 32-bit with no SLR (or more precisely, cfs_rq
> runnable_load_avg can overflow, but only when cfs_rq load.weight does,
> so whatever). On 64-bit you can currently have 2^36 cgroups or 2^37
> tasks before load.weight overflows, and ~2^31 tasks before
> runnable_load_avg does, which is obviously fine (and in fact you hit
> PID_MAX_LIMIT and even if you had the cpu/ram/etc to not fall over).
> 
> Now, applying SLR to runnable_load_avg would cut this down to ~2^21
> tasks running at once or 2^20 with cgroups, which is technically
> allowed, though it seems utterly implausible (especially since this
> would have to all be on one cpu). If SLR was increased as peterz asked
> about, this could be an issue though.
> 
> All that said, using SLR on load_sum/load_avg as opposed to cfs_rq
> runnable_load_avg would be fine, as they're limited to only one
> task/cgroup's weight. Having it SLRed and cfs_rq not would be a
> little odd, but not impossible.
 

> > If NICE_0_LOAD is nice-0's load, and if SCHED_LOAD_SHIFT is to say how to 
> > get 
> > nice-0's load, I don't understand why you want to separate them.
> 
> SCHED_LOAD_SHIFT is not how to get nice-0's load, it just happens to
> have the same value as NICE_0_SHIFT. (I think anyway, SCHED_LOAD_* is
> used in precisely one place other than the newish util_avg, and as I
> mentioned it's not remotely clear what compute_imbalance is doing theer)

Yes, it is not clear to me either.

With the above proposal to get rid of scale_load_down() for load_avg, so I think
now we can remove SCHED_LOAD_*, and rename scale_load() to 
user_to_kernel_load(),
and raname scale_load_down() to kernel_to_user_load().

Hmm?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-30 Thread Peter Zijlstra
On Tue, Sep 22, 2015 at 10:18:30AM -0700, bseg...@google.com wrote:
> If SLR was increased as peterz asked
> about

Right, so I was under the impression that you (Google) run with it
increased and in mainline its currently dead code.

So if its valuable to you guys we should fix in mainline.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-10 Thread Morten Rasmussen
On Wed, Sep 09, 2015 at 03:23:43PM -0700, bseg...@google.com wrote:
> Peter Zijlstra  writes:
> 
> > On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
> >> On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
> >> > > Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
> >> 
> >> I don't get why LOAD_AVG_MAX relies on the util_avg shifting being
> >> 1<<10, it is just the sum of the geometric series and the upper bound of
> >> util_sum?
> >
> > It needs a 1024, it might just have been the 1024 ns we use a period
> > instead of the scale unit though.
> >
> > The LOAD_AVG_MAX is the number where adding a next element to the series
> > doesn't change the result anymore, so scaling it up will allow more
> > significant elements to the series before we bottom out, which is the _N
> > thing.
> >
> 
> Yes, as the comments say, the 1024ns unit is arbitrary (and is an
> average of not-quite-microseconds instead of just nanoseconds to allow
> more bits to load.weight when we multiply load.weight by this number).
> In fact there are two arbitrary 1024 units here, which are technically
> unrelated and are both unrelated to SCHED_LOAD_RESOLUTION/etc - we
> operate on units of almost-microseconds and we also do decays every
> almost-millisecond.
> 
> There appears to be a bunch of confusion in the current code around
> util_sum/util_avg which appears to using SCHED_LOAD_SCALE
> for a fixed-point percentage or something, which is at least reasonable,
> but is initializing it as scale_load_down(SCHED_LOAD_SCALE), which
> results in either initializing as 100% or .1% depending on RESOLUTION.
> This'll get clobbered on first update, but if it needs to be
> initialized, it should either get initialized to something sane or at
> least consistent.

This is what I thought too. The whole geometric series math is completely
independent of the scale used for priority in load_avg and the fixed
point shifting used for util_avg.

> load_sum/load_avg appear to be scale_load_down()ed properly, and appear
> to be used as such at a quick glance.

I don't think shifting by SCHED_LOAD_SHIFT in __update_load_avg() is
right:

sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;

util_avg is initialized to low resolution (>> SCHED_LOAD_RESOLUTION):

sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);

so it appear to be intended to be using low resolution like load_avg
(weight is scaled down before it is passed into __update_load_avg()),
but util_avg is shifted up to high resolution. It should be:

sa->util_avg = (sa->util_sum << (SCHED_LOAD_SHIFT -
SCHED_LOAD_SHIFT)) / LOAD_AVG_MAX;

to be consistent.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-10 Thread Vincent Guittot
On 10 September 2015 at 13:06, Morten Rasmussen
 wrote:
> On Wed, Sep 09, 2015 at 03:23:43PM -0700, bseg...@google.com wrote:
>> Peter Zijlstra  writes:
>>
>> > On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
>> >> On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
>> >> > > Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
>> >>
>> >> I don't get why LOAD_AVG_MAX relies on the util_avg shifting being
>> >> 1<<10, it is just the sum of the geometric series and the upper bound of
>> >> util_sum?
>> >
>> > It needs a 1024, it might just have been the 1024 ns we use a period
>> > instead of the scale unit though.
>> >
>> > The LOAD_AVG_MAX is the number where adding a next element to the series
>> > doesn't change the result anymore, so scaling it up will allow more
>> > significant elements to the series before we bottom out, which is the _N
>> > thing.
>> >
>>
>> Yes, as the comments say, the 1024ns unit is arbitrary (and is an
>> average of not-quite-microseconds instead of just nanoseconds to allow
>> more bits to load.weight when we multiply load.weight by this number).
>> In fact there are two arbitrary 1024 units here, which are technically
>> unrelated and are both unrelated to SCHED_LOAD_RESOLUTION/etc - we
>> operate on units of almost-microseconds and we also do decays every
>> almost-millisecond.
>>
>> There appears to be a bunch of confusion in the current code around
>> util_sum/util_avg which appears to using SCHED_LOAD_SCALE
>> for a fixed-point percentage or something, which is at least reasonable,
>> but is initializing it as scale_load_down(SCHED_LOAD_SCALE), which
>> results in either initializing as 100% or .1% depending on RESOLUTION.
>> This'll get clobbered on first update, but if it needs to be
>> initialized, it should either get initialized to something sane or at
>> least consistent.
>
> This is what I thought too. The whole geometric series math is completely
> independent of the scale used for priority in load_avg and the fixed
> point shifting used for util_avg.
>
>> load_sum/load_avg appear to be scale_load_down()ed properly, and appear
>> to be used as such at a quick glance.
>
> I don't think shifting by SCHED_LOAD_SHIFT in __update_load_avg() is
> right:
>
> sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
>
> util_avg is initialized to low resolution (>> SCHED_LOAD_RESOLUTION):
>
> sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
>
> so it appear to be intended to be using low resolution like load_avg
> (weight is scaled down before it is passed into __update_load_avg()),
> but util_avg is shifted up to high resolution. It should be:
>
> sa->util_avg = (sa->util_sum << (SCHED_LOAD_SHIFT -
> SCHED_LOAD_SHIFT)) / LOAD_AVG_MAX;

you probably mean (SCHED_LOAD_SHIFT -  SCHED_LOAD_RESOLUTION)

The goal of this patchset is to be able to scale util_avg in the range
of cpu capacity so why don't we directly initialize it with
sa->util_avg = SCHED_CAPACITY_SCALE;

and then use

 sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;

so we don't have to take care of high and low load resolution
Regards,

>
> to be consistent.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-10 Thread Morten Rasmussen
On Thu, Sep 10, 2015 at 01:11:01PM +0200, Vincent Guittot wrote:
> On 10 September 2015 at 13:06, Morten Rasmussen
>  wrote:
> > On Wed, Sep 09, 2015 at 03:23:43PM -0700, bseg...@google.com wrote:
> >> Peter Zijlstra  writes:
> >>
> >> > On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
> >> >> On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
> >> >> > > Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
> >> >>
> >> >> I don't get why LOAD_AVG_MAX relies on the util_avg shifting being
> >> >> 1<<10, it is just the sum of the geometric series and the upper bound of
> >> >> util_sum?
> >> >
> >> > It needs a 1024, it might just have been the 1024 ns we use a period
> >> > instead of the scale unit though.
> >> >
> >> > The LOAD_AVG_MAX is the number where adding a next element to the series
> >> > doesn't change the result anymore, so scaling it up will allow more
> >> > significant elements to the series before we bottom out, which is the _N
> >> > thing.
> >> >
> >>
> >> Yes, as the comments say, the 1024ns unit is arbitrary (and is an
> >> average of not-quite-microseconds instead of just nanoseconds to allow
> >> more bits to load.weight when we multiply load.weight by this number).
> >> In fact there are two arbitrary 1024 units here, which are technically
> >> unrelated and are both unrelated to SCHED_LOAD_RESOLUTION/etc - we
> >> operate on units of almost-microseconds and we also do decays every
> >> almost-millisecond.
> >>
> >> There appears to be a bunch of confusion in the current code around
> >> util_sum/util_avg which appears to using SCHED_LOAD_SCALE
> >> for a fixed-point percentage or something, which is at least reasonable,
> >> but is initializing it as scale_load_down(SCHED_LOAD_SCALE), which
> >> results in either initializing as 100% or .1% depending on RESOLUTION.
> >> This'll get clobbered on first update, but if it needs to be
> >> initialized, it should either get initialized to something sane or at
> >> least consistent.
> >
> > This is what I thought too. The whole geometric series math is completely
> > independent of the scale used for priority in load_avg and the fixed
> > point shifting used for util_avg.
> >
> >> load_sum/load_avg appear to be scale_load_down()ed properly, and appear
> >> to be used as such at a quick glance.
> >
> > I don't think shifting by SCHED_LOAD_SHIFT in __update_load_avg() is
> > right:
> >
> > sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
> >
> > util_avg is initialized to low resolution (>> SCHED_LOAD_RESOLUTION):
> >
> > sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
> >
> > so it appear to be intended to be using low resolution like load_avg
> > (weight is scaled down before it is passed into __update_load_avg()),
> > but util_avg is shifted up to high resolution. It should be:
> >
> > sa->util_avg = (sa->util_sum << (SCHED_LOAD_SHIFT -
> > SCHED_LOAD_SHIFT)) / LOAD_AVG_MAX;
> 
> you probably mean (SCHED_LOAD_SHIFT -  SCHED_LOAD_RESOLUTION)

Yes. Thanks for providing the right expression. There seems to be enough
confusion in this thread already :)

> The goal of this patchset is to be able to scale util_avg in the range
> of cpu capacity so why don't we directly initialize it with
> sa->util_avg = SCHED_CAPACITY_SCALE;
> 
> and then use
> 
>  sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
> 
> so we don't have to take care of high and low load resolution

That works for me, except that the left-shift has gone be PeterZ's
optimization patch posted earlier in this thread. It is changing
util_sum to scaled by capacity instead of being the pure geometric
series which requires the left shift at the end when we divide by
LOAD_AVG_MAX. So it should be equivalent to what you are proposing if we
change the initialization to your proposal too.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-10 Thread bsegall
Morten Rasmussen  writes:

> On Wed, Sep 09, 2015 at 03:23:43PM -0700, bseg...@google.com wrote:
>> Peter Zijlstra  writes:
>> 
>> > On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
>> >> On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
>> >> > > Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
>> >> 
>> >> I don't get why LOAD_AVG_MAX relies on the util_avg shifting being
>> >> 1<<10, it is just the sum of the geometric series and the upper bound of
>> >> util_sum?
>> >
>> > It needs a 1024, it might just have been the 1024 ns we use a period
>> > instead of the scale unit though.
>> >
>> > The LOAD_AVG_MAX is the number where adding a next element to the series
>> > doesn't change the result anymore, so scaling it up will allow more
>> > significant elements to the series before we bottom out, which is the _N
>> > thing.
>> >
>> 
>> Yes, as the comments say, the 1024ns unit is arbitrary (and is an
>> average of not-quite-microseconds instead of just nanoseconds to allow
>> more bits to load.weight when we multiply load.weight by this number).
>> In fact there are two arbitrary 1024 units here, which are technically
>> unrelated and are both unrelated to SCHED_LOAD_RESOLUTION/etc - we
>> operate on units of almost-microseconds and we also do decays every
>> almost-millisecond.
>> 
>> There appears to be a bunch of confusion in the current code around
>> util_sum/util_avg which appears to using SCHED_LOAD_SCALE
>> for a fixed-point percentage or something, which is at least reasonable,
>> but is initializing it as scale_load_down(SCHED_LOAD_SCALE), which
>> results in either initializing as 100% or .1% depending on RESOLUTION.
>> This'll get clobbered on first update, but if it needs to be
>> initialized, it should either get initialized to something sane or at
>> least consistent.
>
> This is what I thought too. The whole geometric series math is completely
> independent of the scale used for priority in load_avg and the fixed
> point shifting used for util_avg.
>
>> load_sum/load_avg appear to be scale_load_down()ed properly, and appear
>> to be used as such at a quick glance.
>
> I don't think shifting by SCHED_LOAD_SHIFT in __update_load_avg() is
> right:
>
>   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
>
> util_avg is initialized to low resolution (>> SCHED_LOAD_RESOLUTION):
>
> sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
>
> so it appear to be intended to be using low resolution like load_avg
> (weight is scaled down before it is passed into __update_load_avg()),
> but util_avg is shifted up to high resolution. It should be:
>
>   sa->util_avg = (sa->util_sum << (SCHED_LOAD_SHIFT -
>   SCHED_LOAD_SHIFT)) / LOAD_AVG_MAX;
>
> to be consistent.

Yeah, util_avg was/is screwed up in terms of either the initialization
or which shift to use there. The load ones however appear to be fine.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-11 Thread Leo Yan
Hi Morten,

On Tue, Sep 08, 2015 at 05:53:31PM +0100, Morten Rasmussen wrote:
> On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
> > On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
> > > 
> > > Something like teh below..
> > > 
> > > Another thing to ponder; the downside of scaled_delta_w is that its
> > > fairly likely delta is small and you loose all bits, whereas the weight
> > > is likely to be large can could loose a fwe bits without issue.
> > 
> > That issue applies both to load and util.
> > 
> > > 
> > > That is, in fixed point scaling like this, you want to start with the
> > > biggest numbers, not the smallest, otherwise you loose too much.
> > > 
> > > The flip side is of course that now you can share a multiplcation.
> > 
> > But if we apply the scaling to the weight instead of time, we would only
> > have to apply it once and not three times like it is now? So maybe we
> > can end up with almost the same number of multiplications.
> > 
> > We might be loosing bits for low priority task running on cpus at a low
> > frequency though.
> 
> Something like the below. We should be saving one multiplication.
> 
> --- 8< ---
> 
> From: Morten Rasmussen 
> Date: Tue, 8 Sep 2015 17:15:40 +0100
> Subject: [PATCH] sched/fair: Scale load/util contribution rather than time
> 
> When updating load/util tracking the time delta might be very small (1)
> in many cases, scaling it futher down with frequency and cpu invariance
> scaling might cause us to loose precision. Instead of scaling time we
> can scale the weight of the task for load and the capacity for
> utilization. Both weight (>=15) and capacity should be significantly
> bigger in most cases. Low priority tasks might still suffer a bit but
> worst should be improved, as weight is at least 15 before invariance
> scaling.
> 
> Signed-off-by: Morten Rasmussen 
> ---
>  kernel/sched/fair.c | 38 +++---
>  1 file changed, 19 insertions(+), 19 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9301291..d5ee72a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2519,8 +2519,6 @@ static u32 __compute_runnable_contrib(u64 n)
>  #error "load tracking assumes 2^10 as unit"
>  #endif
>  
> -#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
> -
>  /*
>   * We can represent the historical contribution to runnable average as the
>   * coefficients of a geometric series.  To do this we sub-divide our runnable
> @@ -2553,10 +2551,10 @@ static __always_inline int
>  __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> unsigned long weight, int running, struct cfs_rq *cfs_rq)
>  {
> - u64 delta, scaled_delta, periods;
> + u64 delta, periods;
>   u32 contrib;
> - unsigned int delta_w, scaled_delta_w, decayed = 0;
> - unsigned long scale_freq, scale_cpu;
> + unsigned int delta_w, decayed = 0;
> + unsigned long scaled_weight = 0, scale_freq, scale_freq_cpu = 0;
>  
>   delta = now - sa->last_update_time;
>   /*
> @@ -2577,8 +2575,13 @@ __update_load_avg(u64 now, int cpu, struct sched_avg 
> *sa,
>   return 0;
>   sa->last_update_time = now;
>  
> - scale_freq = arch_scale_freq_capacity(NULL, cpu);
> - scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> + if (weight || running)
> + scale_freq = arch_scale_freq_capacity(NULL, cpu);
> + if (weight)
> + scaled_weight = weight * scale_freq >> SCHED_CAPACITY_SHIFT;
> + if (running)
> + scale_freq_cpu = scale_freq * arch_scale_cpu_capacity(NULL, cpu)
> + >> SCHED_CAPACITY_SHIFT;

maybe below question is stupid :)

Why not calculate the scaled_weight depend on cpu's capacity as well?
So like: scaled_weight = weight * scale_freq_cpu.

>   /* delta_w is the amount already accumulated against our next period */
>   delta_w = sa->period_contrib;
> @@ -2594,16 +2597,15 @@ __update_load_avg(u64 now, int cpu, struct sched_avg 
> *sa,
>* period and accrue it.
>*/
>   delta_w = 1024 - delta_w;
> - scaled_delta_w = cap_scale(delta_w, scale_freq);
>   if (weight) {
> - sa->load_sum += weight * scaled_delta_w;
> + sa->load_sum += scaled_weight * delta_w;
>   if (cfs_rq) {
>   cfs_rq->runnable_load_sum +=
> - weight * scaled_delta_w;
> + scaled_weight * delta_w;
>   }
>   }
>   if (running)
> - sa->util_sum += scaled_delta_w * scale_cpu;
> + sa->util_sum += delta_w * scale_freq_cpu;
>  
>   delta -= delta_w;
>  
> @@ -2620,25 +2622,23 @@ __update_load_avg(u64 now, int cpu, struct sched_avg 
> *sa,
>  
>   

Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-11 Thread Yuyang Du
On Thu, Sep 10, 2015 at 12:07:27PM +0200, Peter Zijlstra wrote:
> > > Still don't understand why it's a unit problem. IMHO LOAD/UTIL and
> > > CAPACITY have no unit.
> > 
> > To be more accurate, probably, LOAD can be thought of as having unit,
> > but UTIL has no unit.
> 
> But I'm thinking that is wrong; it should have one, esp. if we go scale
> the thing. Giving it the same fixed point unit as load simplifies the
> code.

I think we probably are saying the same thing with different terms. Anyway,
let me reiterate what I said and make it a little more formalized.

UTIL has no unit because it is pure ratio, the cpu_running%, which is in the
range of [0, 100%], and we increase the resolution, because we don't want
to lose many (due to integer rounding) by multiplying a number (say 1024), then
the range becomes [0, 1024].

CAPACITY is also a ratio of ACTUAL_PERF/MAX_PERF, from (0, 1]. Even LOAD
is the same, a ratio of NICE_X/NICE_0, from [15/1024=0.015, 88761/1024=86.68],
as it only has relativity meaning (i.e., when comparing to each other).
I said it has unit, it is in the sense that it looks like currency (for 
instance,
Yuan), you can use to buy CPU fair share. But it is just how you look at it and
there are certainly many other ways.

So, I still propose to generalize all these with the following patch, in the
belief that this makes it simple and clear, and error-reducing. 

--

Subject: [PATCH] sched/fair: Generalize the load/util averages resolution
 definition

A integer metric needs certain resolution to allow how much detail we
can look into (not losing detail by integer rounding), which also
determines the range of the metrics.

For instance, to increase the resolution of [0, 1] (two levels), one
can multiply 1024 and get [0, 1024] (1025 levels).

In sched/fair, a few metrics depend on the resolution: load/load_avg,
util_avg, and capacity (frequency adjustment). In order to reduce the
risks of making mistakes relating to resolution/range, we therefore
generalize the resolution by defining a basic resolution constant
number, and then formalize all metrics to depend on the basic
resolution. The basic resolution is 1024 or (1 << 10). Further, one
can recursively apply another basic resolution to increase the final
resolution (e.g., 1048676=1<<20).

Signed-off-by: Yuyang Du 
---
 include/linux/sched.h |  2 +-
 kernel/sched/sched.h  | 12 +++-
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 119823d..55a7b93 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -912,7 +912,7 @@ enum cpu_idle_type {
 /*
  * Increase resolution of cpu_capacity calculations
  */
-#define SCHED_CAPACITY_SHIFT   10
+#define SCHED_CAPACITY_SHIFT   SCHED_RESOLUTION_SHIFT
 #define SCHED_CAPACITY_SCALE   (1L << SCHED_CAPACITY_SHIFT)
 
 /*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 68cda11..d27cdd8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -40,6 +40,9 @@ static inline void update_cpu_load_active(struct rq *this_rq) 
{ }
  */
 #define NS_TO_JIFFIES(TIME)((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))
 
+# define SCHED_RESOLUTION_SHIFT10
+# define SCHED_RESOLUTION_SCALE(1L << SCHED_RESOLUTION_SHIFT)
+
 /*
  * Increase resolution of nice-level calculations for 64-bit architectures.
  * The extra resolution improves shares distribution and load balancing of
@@ -53,16 +56,15 @@ static inline void update_cpu_load_active(struct rq 
*this_rq) { }
  * increased costs.
  */
 #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage 
under light load  */
-# define SCHED_LOAD_RESOLUTION 10
-# define scale_load(w) ((w) << SCHED_LOAD_RESOLUTION)
-# define scale_load_down(w)((w) >> SCHED_LOAD_RESOLUTION)
+# define SCHED_LOAD_SHIFT  (SCHED_RESOLUTION_SHIFT + 
SCHED_RESOLUTION_SHIFT)
+# define scale_load(w) ((w) << SCHED_RESOLUTION_SHIFT)
+# define scale_load_down(w)((w) >> SCHED_RESOLUTION_SHIFT)
 #else
-# define SCHED_LOAD_RESOLUTION 0
+# define SCHED_LOAD_SHIFT  (SCHED_RESOLUTION_SHIFT)
 # define scale_load(w) (w)
 # define scale_load_down(w)(w)
 #endif
 
-#define SCHED_LOAD_SHIFT   (10 + SCHED_LOAD_RESOLUTION)
 #define SCHED_LOAD_SCALE   (1L << SCHED_LOAD_SHIFT)
 
 #define NICE_0_LOADSCHED_LOAD_SCALE
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-11 Thread Yuyang Du
On Thu, Sep 10, 2015 at 01:10:19PM +0100, Morten Rasmussen wrote:
> > > so it appear to be intended to be using low resolution like load_avg
> > > (weight is scaled down before it is passed into __update_load_avg()),
> > > but util_avg is shifted up to high resolution. It should be:
> > >
> > > sa->util_avg = (sa->util_sum << (SCHED_LOAD_SHIFT -
> > > SCHED_LOAD_SHIFT)) / LOAD_AVG_MAX;
> > 
> > you probably mean (SCHED_LOAD_SHIFT -  SCHED_LOAD_RESOLUTION)
> 
> Yes. Thanks for providing the right expression. There seems to be enough
> confusion in this thread already :)
 
And yes, it is my bad in the first place, sorry, I did not think it though :)

> > The goal of this patchset is to be able to scale util_avg in the range
> > of cpu capacity so why don't we directly initialize it with
> > sa->util_avg = SCHED_CAPACITY_SCALE;

Yes, we should, and specifically, it is bacause we can combine the
resolution thing for util% * capacity%, so we only need to use the
resolution once.

> > and then use
> > 
> >  sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
> > 
> > so we don't have to take care of high and low load resolution
> 
> That works for me, except that the left-shift has gone be PeterZ's
> optimization patch posted earlier in this thread. It is changing
> util_sum to scaled by capacity instead of being the pure geometric
> series which requires the left shift at the end when we divide by
> LOAD_AVG_MAX. So it should be equivalent to what you are proposing if we
> change the initialization to your proposal too.

I previously initialized the util_sum as:

sa->util_sum = LOAD_AVG_MAX;

it is because wihout capacity adjustment, this can save some multiplications
in __update_load_avg(), but actually if we do capacity adjustment, we must
multiply anyway, so it is better we initialize it as:

sa->util_sum = sa->util_avg * LOAD_AVG_MAX;

Anyway, with the patch I posted in the other email in this thread, we
can fix all this very clearly, I hope so. I did not post a fix patch,
it is because the solutions are already there, it is just how we make it 
look better, and you can provide it in your new version.

Thanks,
Yuyang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-11 Thread Morten Rasmussen
On Fri, Sep 11, 2015 at 03:46:51PM +0800, Leo Yan wrote:
> Hi Morten,
> 
> On Tue, Sep 08, 2015 at 05:53:31PM +0100, Morten Rasmussen wrote:
> > On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
> > > On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
> > > > 
> > > > Something like teh below..
> > > > 
> > > > Another thing to ponder; the downside of scaled_delta_w is that its
> > > > fairly likely delta is small and you loose all bits, whereas the weight
> > > > is likely to be large can could loose a fwe bits without issue.
> > > 
> > > That issue applies both to load and util.
> > > 
> > > > 
> > > > That is, in fixed point scaling like this, you want to start with the
> > > > biggest numbers, not the smallest, otherwise you loose too much.
> > > > 
> > > > The flip side is of course that now you can share a multiplcation.
> > > 
> > > But if we apply the scaling to the weight instead of time, we would only
> > > have to apply it once and not three times like it is now? So maybe we
> > > can end up with almost the same number of multiplications.
> > > 
> > > We might be loosing bits for low priority task running on cpus at a low
> > > frequency though.
> > 
> > Something like the below. We should be saving one multiplication.
> > 
> > --- 8< ---
> > 
> > From: Morten Rasmussen 
> > Date: Tue, 8 Sep 2015 17:15:40 +0100
> > Subject: [PATCH] sched/fair: Scale load/util contribution rather than time
> > 
> > When updating load/util tracking the time delta might be very small (1)
> > in many cases, scaling it futher down with frequency and cpu invariance
> > scaling might cause us to loose precision. Instead of scaling time we
> > can scale the weight of the task for load and the capacity for
> > utilization. Both weight (>=15) and capacity should be significantly
> > bigger in most cases. Low priority tasks might still suffer a bit but
> > worst should be improved, as weight is at least 15 before invariance
> > scaling.
> > 
> > Signed-off-by: Morten Rasmussen 
> > ---
> >  kernel/sched/fair.c | 38 +++---
> >  1 file changed, 19 insertions(+), 19 deletions(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 9301291..d5ee72a 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -2519,8 +2519,6 @@ static u32 __compute_runnable_contrib(u64 n)
> >  #error "load tracking assumes 2^10 as unit"
> >  #endif
> >  
> > -#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
> > -
> >  /*
> >   * We can represent the historical contribution to runnable average as the
> >   * coefficients of a geometric series.  To do this we sub-divide our 
> > runnable
> > @@ -2553,10 +2551,10 @@ static __always_inline int
> >  __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> >   unsigned long weight, int running, struct cfs_rq *cfs_rq)
> >  {
> > -   u64 delta, scaled_delta, periods;
> > +   u64 delta, periods;
> > u32 contrib;
> > -   unsigned int delta_w, scaled_delta_w, decayed = 0;
> > -   unsigned long scale_freq, scale_cpu;
> > +   unsigned int delta_w, decayed = 0;
> > +   unsigned long scaled_weight = 0, scale_freq, scale_freq_cpu = 0;
> >  
> > delta = now - sa->last_update_time;
> > /*
> > @@ -2577,8 +2575,13 @@ __update_load_avg(u64 now, int cpu, struct sched_avg 
> > *sa,
> > return 0;
> > sa->last_update_time = now;
> >  
> > -   scale_freq = arch_scale_freq_capacity(NULL, cpu);
> > -   scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> > +   if (weight || running)
> > +   scale_freq = arch_scale_freq_capacity(NULL, cpu);
> > +   if (weight)
> > +   scaled_weight = weight * scale_freq >> SCHED_CAPACITY_SHIFT;
> > +   if (running)
> > +   scale_freq_cpu = scale_freq * arch_scale_cpu_capacity(NULL, cpu)
> > +   >> SCHED_CAPACITY_SHIFT;
> 
> maybe below question is stupid :)
> 
> Why not calculate the scaled_weight depend on cpu's capacity as well?
> So like: scaled_weight = weight * scale_freq_cpu.

IMHO, we should not scale load by cpu capacity since load isn't really
comparable to capacity. It is runnable time based (not running time like
utilization) and the idea is to used it for balancing when when the
system is fully utilized. When the system is fully utilized we can't say
anything about the true compute demands of a task, it may get exactly
the cpu time it needs or it may need much more. Hence it doesn't really
make sense to scale the demand by the capacity of the cpu. Two busy
loops on cpus with different cpu capacities should have the load as they
have the same compute demands.

I mentioned this briefly in the commit message of patch 3 in this
series.

Makes sense?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http

Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-11 Thread Morten Rasmussen
On Fri, Sep 11, 2015 at 08:28:25AM +0800, Yuyang Du wrote:
> On Thu, Sep 10, 2015 at 12:07:27PM +0200, Peter Zijlstra wrote:
> > > > Still don't understand why it's a unit problem. IMHO LOAD/UTIL and
> > > > CAPACITY have no unit.
> > > 
> > > To be more accurate, probably, LOAD can be thought of as having unit,
> > > but UTIL has no unit.
> > 
> > But I'm thinking that is wrong; it should have one, esp. if we go scale
> > the thing. Giving it the same fixed point unit as load simplifies the
> > code.
> 
> I think we probably are saying the same thing with different terms. Anyway,
> let me reiterate what I said and make it a little more formalized.
> 
> UTIL has no unit because it is pure ratio, the cpu_running%, which is in the
> range of [0, 100%], and we increase the resolution, because we don't want
> to lose many (due to integer rounding) by multiplying a number (say 1024), 
> then
> the range becomes [0, 1024].

Fully agree, and with frequency invariance we basically scale running
time to take into account that the cpu might be running slower that it
is capable of at the highest frequency. With cpu invariance also scale
by any difference their might be in max frequency and/or cpu
micro-archiecture so utilization becomes comparable between cpus. One
can also see it as we slow down or speed up time depending the current
compute capacity of the cpu relative to the max capacity.

> CAPACITY is also a ratio of ACTUAL_PERF/MAX_PERF, from (0, 1]. Even LOAD
> is the same, a ratio of NICE_X/NICE_0, from [15/1024=0.015, 88761/1024=86.68],
> as it only has relativity meaning (i.e., when comparing to each other).

Fully agree. Though 'LOAD' is a somewhat overloaded term in the
scheduler. Just to be clear, you refer to load.weight, load_avg is the
multiplication of load.weight and the task runnable time ratio.

> I said it has unit, it is in the sense that it looks like currency (for 
> instance,
> Yuan), you can use to buy CPU fair share. But it is just how you look at it 
> and
> there are certainly many other ways.
> 
> So, I still propose to generalize all these with the following patch, in the
> belief that this makes it simple and clear, and error-reducing. 
> 
> --
> 
> Subject: [PATCH] sched/fair: Generalize the load/util averages resolution
>  definition
> 
> A integer metric needs certain resolution to allow how much detail we
> can look into (not losing detail by integer rounding), which also
> determines the range of the metrics.
> 
> For instance, to increase the resolution of [0, 1] (two levels), one
> can multiply 1024 and get [0, 1024] (1025 levels).
> 
> In sched/fair, a few metrics depend on the resolution: load/load_avg,
> util_avg, and capacity (frequency adjustment). In order to reduce the
> risks of making mistakes relating to resolution/range, we therefore
> generalize the resolution by defining a basic resolution constant
> number, and then formalize all metrics to depend on the basic
> resolution. The basic resolution is 1024 or (1 << 10). Further, one
> can recursively apply another basic resolution to increase the final
> resolution (e.g., 1048676=1<<20).
> 
> Signed-off-by: Yuyang Du 
> ---
>  include/linux/sched.h |  2 +-
>  kernel/sched/sched.h  | 12 +++-
>  2 files changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 119823d..55a7b93 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -912,7 +912,7 @@ enum cpu_idle_type {
>  /*
>   * Increase resolution of cpu_capacity calculations
>   */
> -#define SCHED_CAPACITY_SHIFT 10
> +#define SCHED_CAPACITY_SHIFT SCHED_RESOLUTION_SHIFT
>  #define SCHED_CAPACITY_SCALE (1L << SCHED_CAPACITY_SHIFT)
>  
>  /*
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 68cda11..d27cdd8 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -40,6 +40,9 @@ static inline void update_cpu_load_active(struct rq 
> *this_rq) { }
>   */
>  #define NS_TO_JIFFIES(TIME)  ((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))
>  
> +# define SCHED_RESOLUTION_SHIFT  10
> +# define SCHED_RESOLUTION_SCALE  (1L << SCHED_RESOLUTION_SHIFT)
> +
>  /*
>   * Increase resolution of nice-level calculations for 64-bit architectures.
>   * The extra resolution improves shares distribution and load balancing of
> @@ -53,16 +56,15 @@ static inline void update_cpu_load_active(struct rq 
> *this_rq) { }
>   * increased costs.
>   */
>  #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage 
> under light load  */
> -# define SCHED_LOAD_RESOLUTION   10
> -# define scale_load(w)   ((w) << SCHED_LOAD_RESOLUTION)
> -# define scale_load_down(w)  ((w) >> SCHED_LOAD_RESOLUTION)
> +# define SCHED_LOAD_SHIFT(SCHED_RESOLUTION_SHIFT + 
> SCHED_RESOLUTION_SHIFT)
> +# define scale_load(w)   ((w) << SCHED_RESOLUTION_SHIFT)
> +# define scale_load_down(w)  ((w) >> SCHED_RESOLUTION_SHIFT)
>  #else
> -# define SCHED_LOAD_RESOLUTION

Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-11 Thread Leo Yan
On Fri, Sep 11, 2015 at 11:02:33AM +0100, Morten Rasmussen wrote:
> On Fri, Sep 11, 2015 at 03:46:51PM +0800, Leo Yan wrote:
> > Hi Morten,
> > 
> > On Tue, Sep 08, 2015 at 05:53:31PM +0100, Morten Rasmussen wrote:
> > > On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
> > > > On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
> > > > > 
> > > > > Something like teh below..
> > > > > 
> > > > > Another thing to ponder; the downside of scaled_delta_w is that its
> > > > > fairly likely delta is small and you loose all bits, whereas the 
> > > > > weight
> > > > > is likely to be large can could loose a fwe bits without issue.
> > > > 
> > > > That issue applies both to load and util.
> > > > 
> > > > > 
> > > > > That is, in fixed point scaling like this, you want to start with the
> > > > > biggest numbers, not the smallest, otherwise you loose too much.
> > > > > 
> > > > > The flip side is of course that now you can share a multiplcation.
> > > > 
> > > > But if we apply the scaling to the weight instead of time, we would only
> > > > have to apply it once and not three times like it is now? So maybe we
> > > > can end up with almost the same number of multiplications.
> > > > 
> > > > We might be loosing bits for low priority task running on cpus at a low
> > > > frequency though.
> > > 
> > > Something like the below. We should be saving one multiplication.
> > > 
> > > --- 8< ---
> > > 
> > > From: Morten Rasmussen 
> > > Date: Tue, 8 Sep 2015 17:15:40 +0100
> > > Subject: [PATCH] sched/fair: Scale load/util contribution rather than time
> > > 
> > > When updating load/util tracking the time delta might be very small (1)
> > > in many cases, scaling it futher down with frequency and cpu invariance
> > > scaling might cause us to loose precision. Instead of scaling time we
> > > can scale the weight of the task for load and the capacity for
> > > utilization. Both weight (>=15) and capacity should be significantly
> > > bigger in most cases. Low priority tasks might still suffer a bit but
> > > worst should be improved, as weight is at least 15 before invariance
> > > scaling.
> > > 
> > > Signed-off-by: Morten Rasmussen 
> > > ---
> > >  kernel/sched/fair.c | 38 +++---
> > >  1 file changed, 19 insertions(+), 19 deletions(-)
> > > 
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 9301291..d5ee72a 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -2519,8 +2519,6 @@ static u32 __compute_runnable_contrib(u64 n)
> > >  #error "load tracking assumes 2^10 as unit"
> > >  #endif
> > >  
> > > -#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
> > > -
> > >  /*
> > >   * We can represent the historical contribution to runnable average as 
> > > the
> > >   * coefficients of a geometric series.  To do this we sub-divide our 
> > > runnable
> > > @@ -2553,10 +2551,10 @@ static __always_inline int
> > >  __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> > > unsigned long weight, int running, struct cfs_rq *cfs_rq)
> > >  {
> > > - u64 delta, scaled_delta, periods;
> > > + u64 delta, periods;
> > >   u32 contrib;
> > > - unsigned int delta_w, scaled_delta_w, decayed = 0;
> > > - unsigned long scale_freq, scale_cpu;
> > > + unsigned int delta_w, decayed = 0;
> > > + unsigned long scaled_weight = 0, scale_freq, scale_freq_cpu = 0;
> > >  
> > >   delta = now - sa->last_update_time;
> > >   /*
> > > @@ -2577,8 +2575,13 @@ __update_load_avg(u64 now, int cpu, struct 
> > > sched_avg *sa,
> > >   return 0;
> > >   sa->last_update_time = now;
> > >  
> > > - scale_freq = arch_scale_freq_capacity(NULL, cpu);
> > > - scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> > > + if (weight || running)
> > > + scale_freq = arch_scale_freq_capacity(NULL, cpu);
> > > + if (weight)
> > > + scaled_weight = weight * scale_freq >> SCHED_CAPACITY_SHIFT;
> > > + if (running)
> > > + scale_freq_cpu = scale_freq * arch_scale_cpu_capacity(NULL, cpu)
> > > + >> SCHED_CAPACITY_SHIFT;
> > 
> > maybe below question is stupid :)
> > 
> > Why not calculate the scaled_weight depend on cpu's capacity as well?
> > So like: scaled_weight = weight * scale_freq_cpu.
> 
> IMHO, we should not scale load by cpu capacity since load isn't really
> comparable to capacity. It is runnable time based (not running time like
> utilization) and the idea is to used it for balancing when when the
> system is fully utilized. When the system is fully utilized we can't say
> anything about the true compute demands of a task, it may get exactly
> the cpu time it needs or it may need much more. Hence it doesn't really
> make sense to scale the demand by the capacity of the cpu. Two busy
> loops on cpus with different cpu capacities should have the load as they
> have the same compute demands.
> 
> I mentioned this briefly in the commit message of pa

Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-11 Thread bsegall
Morten Rasmussen  writes:

> On Fri, Sep 11, 2015 at 08:28:25AM +0800, Yuyang Du wrote:
>> On Thu, Sep 10, 2015 at 12:07:27PM +0200, Peter Zijlstra wrote:
>> > > > Still don't understand why it's a unit problem. IMHO LOAD/UTIL and
>> > > > CAPACITY have no unit.
>> > > 
>> > > To be more accurate, probably, LOAD can be thought of as having unit,
>> > > but UTIL has no unit.
>> > 
>> > But I'm thinking that is wrong; it should have one, esp. if we go scale
>> > the thing. Giving it the same fixed point unit as load simplifies the
>> > code.
>> 
>> I think we probably are saying the same thing with different terms. Anyway,
>> let me reiterate what I said and make it a little more formalized.
>> 
>> UTIL has no unit because it is pure ratio, the cpu_running%, which is in the
>> range of [0, 100%], and we increase the resolution, because we don't want
>> to lose many (due to integer rounding) by multiplying a number (say 1024), 
>> then
>> the range becomes [0, 1024].
>
> Fully agree, and with frequency invariance we basically scale running
> time to take into account that the cpu might be running slower that it
> is capable of at the highest frequency. With cpu invariance also scale
> by any difference their might be in max frequency and/or cpu
> micro-archiecture so utilization becomes comparable between cpus. One
> can also see it as we slow down or speed up time depending the current
> compute capacity of the cpu relative to the max capacity.
>
>> CAPACITY is also a ratio of ACTUAL_PERF/MAX_PERF, from (0, 1]. Even LOAD
>> is the same, a ratio of NICE_X/NICE_0, from [15/1024=0.015, 
>> 88761/1024=86.68],
>> as it only has relativity meaning (i.e., when comparing to each other).
>
> Fully agree. Though 'LOAD' is a somewhat overloaded term in the
> scheduler. Just to be clear, you refer to load.weight, load_avg is the
> multiplication of load.weight and the task runnable time ratio.
>
>> I said it has unit, it is in the sense that it looks like currency (for 
>> instance,
>> Yuan), you can use to buy CPU fair share. But it is just how you look at it 
>> and
>> there are certainly many other ways.
>> 
>> So, I still propose to generalize all these with the following patch, in the
>> belief that this makes it simple and clear, and error-reducing. 
>> 
>> --
>> 
>> Subject: [PATCH] sched/fair: Generalize the load/util averages resolution
>>  definition
>> 
>> A integer metric needs certain resolution to allow how much detail we
>> can look into (not losing detail by integer rounding), which also
>> determines the range of the metrics.
>> 
>> For instance, to increase the resolution of [0, 1] (two levels), one
>> can multiply 1024 and get [0, 1024] (1025 levels).
>> 
>> In sched/fair, a few metrics depend on the resolution: load/load_avg,
>> util_avg, and capacity (frequency adjustment). In order to reduce the
>> risks of making mistakes relating to resolution/range, we therefore
>> generalize the resolution by defining a basic resolution constant
>> number, and then formalize all metrics to depend on the basic
>> resolution. The basic resolution is 1024 or (1 << 10). Further, one
>> can recursively apply another basic resolution to increase the final
>> resolution (e.g., 1048676=1<<20).
>> 
>> Signed-off-by: Yuyang Du 
>> ---
>>  include/linux/sched.h |  2 +-
>>  kernel/sched/sched.h  | 12 +++-
>>  2 files changed, 8 insertions(+), 6 deletions(-)
>> 
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 119823d..55a7b93 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -912,7 +912,7 @@ enum cpu_idle_type {
>>  /*
>>   * Increase resolution of cpu_capacity calculations
>>   */
>> -#define SCHED_CAPACITY_SHIFT10
>> +#define SCHED_CAPACITY_SHIFTSCHED_RESOLUTION_SHIFT
>>  #define SCHED_CAPACITY_SCALE(1L << SCHED_CAPACITY_SHIFT)
>>  
>>  /*
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index 68cda11..d27cdd8 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -40,6 +40,9 @@ static inline void update_cpu_load_active(struct rq 
>> *this_rq) { }
>>   */
>>  #define NS_TO_JIFFIES(TIME) ((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))
>>  
>> +# define SCHED_RESOLUTION_SHIFT 10
>> +# define SCHED_RESOLUTION_SCALE (1L << SCHED_RESOLUTION_SHIFT)
>> +
>>  /*
>>   * Increase resolution of nice-level calculations for 64-bit architectures.
>>   * The extra resolution improves shares distribution and load balancing of
>> @@ -53,16 +56,15 @@ static inline void update_cpu_load_active(struct rq 
>> *this_rq) { }
>>   * increased costs.
>>   */
>>  #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage 
>> under light load  */
>> -# define SCHED_LOAD_RESOLUTION  10
>> -# define scale_load(w)  ((w) << SCHED_LOAD_RESOLUTION)
>> -# define scale_load_down(w) ((w) >> SCHED_LOAD_RESOLUTION)
>> +# define SCHED_LOAD_SHIFT   (SCHED_RESOLUTION_SHIFT + 
>> SCHED_RESOLUTION_SHIFT)
>> +# define s

Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-11 Thread Morten Rasmussen
On Wed, Sep 09, 2015 at 12:13:10PM +0100, Morten Rasmussen wrote:
> On Wed, Sep 09, 2015 at 11:43:05AM +0200, Peter Zijlstra wrote:
> > Sadly that makes the code worse; I get 14 mul instructions where
> > previously I had 11.
> > 
> > What happens is that GCC gets confused and cannot constant propagate the
> > new variables, so what used to be shifts now end up being actual
> > multiplications.
> > 
> > With this, I get back to 11. Can you see what happens on ARM where you
> > have both functions defined to non constants?
> 
> We repeated the experiment on arm and arm64 but still with functions
> defined to constant to compare with your results. The mul instruction
> count seems to be somewhat compiler version dependent, but consistently
> show no effect of the patch:
> 
> arm   before  after
> gcc4.912  12
> gcc4.810  10
> 
> arm64 before  after
> gcc4.911  11
> 
> I will get numbers with the arch-functions implemented as well and do
> hackbench runs to see what happens in terms of performance.

I have done some runs with the proposed fixes added:

1. PeterZ's util_sum shift fix (change util_sum).
2. Morten's scaling of weight instead of time (reduce bit loss).
3. PeterZ's unconditional calls to arch*() functions (compiler opt).

To be clear: 2 includes 1, and 3 includes 1 and 2.

Runs where done with the default (#define) implementation of the
arch-functions and with arch specific implementation for ARM.

I realized that just looking for 'mul' instructions in
update_blocked_averages() is probably not a fair comparison on ARM as it
turned out that it has quite a few multiply-accumulate instructions. So
I have included the total count including those too.


Test platforms:

ARM TC2 (A7x3 only)
perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 200
#mul: grep -e mul (in update_blocked_averages())
#mul_all: grep -e mul -e mla -e mls -e mia (in update_blocked_averages())
gcc: 4.9.3

Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz
perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 15000
#mul: grep -e mul (in update_blocked_averages())
gcc: 4.9.2


Results:

perf numbers are average of three (x10) runs. Raw data is available
further down.

ARM TC2 #mul#mul_allperf bench
arch*() default arm default arm default arm

1 shift_fix 10  16  22  36  13.401  13.288
2 scaled_weight 12  14  30  32  13.282  13.238
3 unconditional 12  14  26  32  13.296  13.427

Intel E5-2690   #mul#mul_allperf bench
arch*() default default default

1 shift_fix 13  14.786
2 scaled_weight 18  15.078
3 unconditional 14  15.195


Overall it appears that fewer 'mul' instructions doesn't necessarily
mean better perf bench score. For ARM, 2 seems the best choice overall.
While 1 is better for Intel. If we want to try avoid the bit loss by
scaling weight instead of time, 2 is best for both. However, all that
said, looking at the raw numbers there is a significant difference
between runs of perf --repeat, so we can't really draw any strong
conclusions. It all appears to be in the noise.

I suggest that I spin a v2 of this series and go with scaled_weight to
reduce bit loss. Any objections?

While at it, should I include Yuyang's patch redefining the SCALE/SHIFT
mess?


Raw numbers:

ARM TC2

shift_fix   default_arch
gcc4.9.3
#mul 10
#mul+mla+mls+mia 22
13.384416727 seconds time elapsed ( +-  0.17% )
13.431014702 seconds time elapsed ( +-  0.18% )
13.387434890 seconds time elapsed ( +-  0.15% )

shift_fix   arm_arch
gcc4.9.3
#mul 16
#mul+mla+mls+mia 36
13.271044081 seconds time elapsed ( +-  0.11% )
13.310189123 seconds time elapsed ( +-  0.19% )
13.283594740 seconds time elapsed ( +-  0.12% )

scaled_weight   default_arch
gcc4.9.3
#mul 12
#mul+mla+mls+mia 30
13.295649553 seconds time elapsed ( +-  0.20% )
13.271634654 seconds time elapsed ( +-  0.19% )
13.280081329 seconds time elapsed ( +-  0.14% )

scaled_weight   arm_arch
gcc4.9.3
#mul 14
#mul+mla+mls+mia 32
13.230659223 seconds time elapsed ( +-  0.15% )
13.76527 seconds time elapsed ( +-  0.15% )
13.260275081 seconds time elapsed ( +-  0.21% )

unconditional   default_arch
gcc4.9.3
#mul 12
#mul+mla+mls+mia 26
13.274904460 seconds time elapsed ( +-  0.13% )
13.307853511 seconds time elapsed ( +-  0.15% )
13.304084844 seconds time elapsed ( +-  0.22% )

unconditional   arm_arch
gcc4.9.3
#mul 14
#mul+mla+mls+mia 32
13.432878577 seconds time elapsed ( +-  0.13% )
13.417950552 seconds time elapsed ( +-  0.12% )
13.431682719 seconds time elapsed ( +-  0.18% )


Intel

shift_fix   default_arch
gcc4.9.2
#mul 13
14.905815416 seconds time elapsed ( +-  0.61% )
14.83694 seconds time elapsed ( +-  0.84% )
14.639739309 seconds time elapsed ( +-  0.76% )

scaled_weight   default_arch
gcc4.9.2
#mul 18

Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-11 Thread Yuyang Du
On Fri, Sep 11, 2015 at 10:05:53AM -0700, bseg...@google.com wrote:
> 
> SCHED_LOAD_RESOLUTION and the non-SLR part of SCHED_LOAD_SHIFT are not
> required to be the same value and should not be conflated.
 
> In particular, since cgroups are on the same timeline as tasks and their
> shares are not scaled by SCHED_LOAD_SHIFT in any way (but are scaled so
> that SCHED_LOAD_RESOLUTION is invisible), changing that part of
> SCHED_LOAD_SHIFT would cause issues, since things can assume that nice-0
> = 1024. However changing SCHED_LOAD_RESOLUTION would be fine, as that is
> an internal value to the kernel.
>
> In addition, changing the non-SLR part of SCHED_LOAD_SHIFT would require
> recomputing all of prio_to_weight/wmult for the new NICE_0_LOAD.

Not fully looked into the concerns, but the new SCHED_RESOLUTION_SHIFT
is intended to formalize all the integer metrics that need better resolution.
It is not special to any metric, so actually it is to de-conflate whoever is
conflated.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-14 Thread Morten Rasmussen
On Fri, Sep 11, 2015 at 10:05:53AM -0700, bseg...@google.com wrote:
> Morten Rasmussen  writes:
> 
> > On Fri, Sep 11, 2015 at 08:28:25AM +0800, Yuyang Du wrote:
> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
> >> index 119823d..55a7b93 100644
> >> --- a/include/linux/sched.h
> >> +++ b/include/linux/sched.h
> >> @@ -912,7 +912,7 @@ enum cpu_idle_type {
> >>  /*
> >>   * Increase resolution of cpu_capacity calculations
> >>   */
> >> -#define SCHED_CAPACITY_SHIFT  10
> >> +#define SCHED_CAPACITY_SHIFT  SCHED_RESOLUTION_SHIFT
> >>  #define SCHED_CAPACITY_SCALE  (1L << SCHED_CAPACITY_SHIFT)
> >>  
> >>  /*
> >> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> >> index 68cda11..d27cdd8 100644
> >> --- a/kernel/sched/sched.h
> >> +++ b/kernel/sched/sched.h
> >> @@ -40,6 +40,9 @@ static inline void update_cpu_load_active(struct rq 
> >> *this_rq) { }
> >>   */
> >>  #define NS_TO_JIFFIES(TIME)   ((unsigned long)(TIME) / (NSEC_PER_SEC 
> >> / HZ))
> >>  
> >> +# define SCHED_RESOLUTION_SHIFT   10
> >> +# define SCHED_RESOLUTION_SCALE   (1L << SCHED_RESOLUTION_SHIFT)
> >> +
> >>  /*
> >>   * Increase resolution of nice-level calculations for 64-bit 
> >> architectures.
> >>   * The extra resolution improves shares distribution and load balancing of
> >> @@ -53,16 +56,15 @@ static inline void update_cpu_load_active(struct rq 
> >> *this_rq) { }
> >>   * increased costs.
> >>   */
> >>  #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage 
> >> under light load  */
> >> -# define SCHED_LOAD_RESOLUTION10
> >> -# define scale_load(w)((w) << SCHED_LOAD_RESOLUTION)
> >> -# define scale_load_down(w)   ((w) >> SCHED_LOAD_RESOLUTION)
> >> +# define SCHED_LOAD_SHIFT (SCHED_RESOLUTION_SHIFT + 
> >> SCHED_RESOLUTION_SHIFT)
> >> +# define scale_load(w)((w) << SCHED_RESOLUTION_SHIFT)
> >> +# define scale_load_down(w)   ((w) >> SCHED_RESOLUTION_SHIFT)
> >>  #else
> >> -# define SCHED_LOAD_RESOLUTION0
> >> +# define SCHED_LOAD_SHIFT (SCHED_RESOLUTION_SHIFT)
> >>  # define scale_load(w)(w)
> >>  # define scale_load_down(w)   (w)
> >>  #endif
> >>  
> >> -#define SCHED_LOAD_SHIFT  (10 + SCHED_LOAD_RESOLUTION)
> >>  #define SCHED_LOAD_SCALE  (1L << SCHED_LOAD_SHIFT)
> >>  
> >>  #define NICE_0_LOAD   SCHED_LOAD_SCALE
> >
> > I think this is pretty much the required relationship between all the
> > SHIFTs and SCALEs that Peter checked for in his #if-#error thing
> > earlier, so no disagreements from my side :-)
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> SCHED_LOAD_RESOLUTION and the non-SLR part of SCHED_LOAD_SHIFT are not
> required to be the same value and should not be conflated.
> 
> In particular, since cgroups are on the same timeline as tasks and their
> shares are not scaled by SCHED_LOAD_SHIFT in any way (but are scaled so
> that SCHED_LOAD_RESOLUTION is invisible), changing that part of
> SCHED_LOAD_SHIFT would cause issues, since things can assume that nice-0
> = 1024. However changing SCHED_LOAD_RESOLUTION would be fine, as that is
> an internal value to the kernel.
> 
> In addition, changing the non-SLR part of SCHED_LOAD_SHIFT would require
> recomputing all of prio_to_weight/wmult for the new NICE_0_LOAD.

I think I follow, but doesn't that mean that the current code is broken
too? NICE_0_LOAD changes if you change SCHED_LOAD_RESOLUTION:

#define SCHED_LOAD_SHIFT(10 + SCHED_LOAD_RESOLUTION)
#define SCHED_LOAD_SCALE(1L << SCHED_LOAD_SHIFT)

#define NICE_0_LOAD SCHED_LOAD_SCALE
#define NICE_0_SHIFTSCHED_LOAD_SHIFT

To me it sounds like we need to define it the other way around:

#define NICE_0_SHIFT10
#define NICE_0_LOAD (1L << NICE_0_SHIFT)

and then add any additional resolution bits from there to ensure that
NICE_0_LOAD and the prio_to_weight/wmult tables are unchanged.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-17 Thread Peter Zijlstra
On Fri, Sep 11, 2015 at 06:22:47PM +0100, Morten Rasmussen wrote:
> I have done some runs with the proposed fixes added:
> 
> 1. PeterZ's util_sum shift fix (change util_sum).
> 2. Morten's scaling of weight instead of time (reduce bit loss).
> 3. PeterZ's unconditional calls to arch*() functions (compiler opt).
> 
> To be clear: 2 includes 1, and 3 includes 1 and 2.
> 
> Runs where done with the default (#define) implementation of the
> arch-functions and with arch specific implementation for ARM.


> Results:
> 
> perf numbers are average of three (x10) runs. Raw data is available
> further down.
> 
> ARM TC2   #mul#mul_allperf bench
> arch*()   default arm default arm default arm
> 
> 1 shift_fix   10  16  22  36  13.401  13.288
> 2 scaled_weight   12  14  30  32  13.282  13.238
> 3 unconditional   12  14  26  32  13.296  13.427
> 
> Intel E5-2690 #mul#mul_allperf bench
> arch*()   default default default
> 
> 1 shift_fix   13  14.786
> 2 scaled_weight   18  15.078
> 3 unconditional   14  15.195
> 
> 
> Overall it appears that fewer 'mul' instructions doesn't necessarily
> mean better perf bench score. For ARM, 2 seems the best choice overall.

I suspect you're paying for having to do an actual load which can miss
there. So that makes sense.

> While 1 is better for Intel.

Right, because GCC shits itself with those conditionals. Weirdly though;
the below version does not seem so affected.

> I suggest that I spin a v2 of this series and go with scaled_weight to
> reduce bit loss. Any objections?

Just playing devils advocate to myself; how about cgroups? Will not a
per-cpu share of the cgroup weight often be very small?


So I had a little play, and I'm not at all convinced we want to do this
(I've not actually ran any numbers on it, but I can well imagine the
extra condition to hurt on branch miss predict) but it does show GCC
need not always get confused.

---
 kernel/sched/fair.c | 58 +++--
 1 file changed, 38 insertions(+), 20 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9176f7c588a8..1b60fbe3b86c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2519,7 +2519,25 @@ static u32 __compute_runnable_contrib(u64 n)
 #error "load tracking assumes 2^10 as unit"
 #endif
 
-#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
+static __always_inline unsigned long fp_mult2(unsigned long x, unsigned long y)
+{
+   y *= x;
+   y >>= 10;
+
+   return y;
+}
+
+static __always_inline unsigned long fp_mult3(unsigned long x, unsigned long 
y, unsigned long z)
+{
+   if (x > y)
+   swap(x,y);
+
+   z *= y;
+   z >>= 10;
+   z *= x;
+
+   return z;
+}
 
 /*
  * We can represent the historical contribution to runnable average as the
@@ -2553,9 +2571,9 @@ static __always_inline int
 __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
  unsigned long weight, int running, struct cfs_rq *cfs_rq)
 {
-   u64 delta, scaled_delta, periods;
+   u64 delta, periods;
u32 contrib;
-   unsigned int delta_w, scaled_delta_w, decayed = 0;
+   unsigned int delta_w, decayed = 0;
unsigned long scale_freq, scale_cpu;
 
delta = now - sa->last_update_time;
@@ -2577,8 +2595,10 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
return 0;
sa->last_update_time = now;
 
-   scale_freq = arch_scale_freq_capacity(NULL, cpu);
-   scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
+   if (weight)
+   scale_freq = arch_scale_freq_capacity(NULL, cpu);
+   if (running)
+   scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
 
/* delta_w is the amount already accumulated against our next period */
delta_w = sa->period_contrib;
@@ -2594,16 +2614,14 @@ __update_load_avg(u64 now, int cpu, struct sched_avg 
*sa,
 * period and accrue it.
 */
delta_w = 1024 - delta_w;
-   scaled_delta_w = cap_scale(delta_w, scale_freq);
if (weight) {
-   sa->load_sum += weight * scaled_delta_w;
-   if (cfs_rq) {
-   cfs_rq->runnable_load_sum +=
-   weight * scaled_delta_w;
-   }
+   unsigned long t = fp_mult3(delta_w, weight, scale_freq);
+   sa->load_sum += t;
+   if (cfs_rq)
+   cfs_rq->runnable_load_sum += t;
}
if (running)
-   sa->util_sum += scaled_delta_w * scale_cpu;
+  

Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-17 Thread Yuyang Du
On Wed, Sep 16, 2015 at 10:06:24AM -0700, bseg...@google.com wrote:
> > The point really is, metrics (if not many ) need resolution, not just 
> > NICE_0_LOAD does. 
> > You can choose to either hardcode a number, like SCHED_CAPACITY_SHIFT now,
> > or you can use SCHED_RESOLUTION_SHIFT, which is even as simple as a sign to 
> > say what
> > the defined is (the scaled one with a better resolution vs. the original 
> > one).
> > I guess this is to say we now have a (no-big-deal) resolution system.
> 
> Yes they were chosen for similar reasons, but they are not conceptually
> related, and you couldn't decide to just bump up all the resolutions by
> changing SCHED_RESOLUTION_SHIFT, so doing this would just be misleading.

Yes, it appears they are made seemingly conceptually related. But probably 
it isn't worth a concern, if one knows it is just a scaled integer metric.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-17 Thread Peter Zijlstra
On Fri, Sep 11, 2015 at 06:22:47PM +0100, Morten Rasmussen wrote:

> While at it, should I include Yuyang's patch redefining the SCALE/SHIFT
> mess?

I suspect his patch will fail to compile on ARM which uses
SCHED_CAPACITY_* outside of kernel/sched/*.

But if you all (Ben, Yuyang, you) can agree on a patch simplifying these
things I'm not opposed to it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-21 Thread Yuyang Du
On Thu, Sep 17, 2015 at 12:38:25PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 11, 2015 at 06:22:47PM +0100, Morten Rasmussen wrote:
> 
> > While at it, should I include Yuyang's patch redefining the SCALE/SHIFT
> > mess?
> 
> I suspect his patch will fail to compile on ARM which uses
> SCHED_CAPACITY_* outside of kernel/sched/*.
> 
> But if you all (Ben, Yuyang, you) can agree on a patch simplifying these
> things I'm not opposed to it.

Yes, indeed. So SCHED_RESOLUTION_SHIFT has to be defined in 
include/linux/sched.h.

With this, I think the codes still need some cleanup, and importantly
documentation.

But first, I think as load_sum and load_avg can afford NICE_0_LOAD with either 
high
or low resolution. So we have no reason to have low resolution (10bits) load_avg
when NICE_0_LOAD has high resolution (20bits), because load_avg = runnable% * 
load,
as opposed to now we have load_avg = runnable% * scale_load_down(load).

We get rid of all scale_load_down() for runnable load average?

--

Subject: [PATCH] sched/fair: Generalize the load/util averages resolution
 definition

The metric needs certain resolution to determine how much detail we
can look into (or not losing detail by integer rounding), which also
determines the range of the metrics.

For instance, to increase the resolution of [0, 1] (two levels), one
can multiply 1024 and get [0, 1024] (1025 levels).

In sched/fair, a few metrics depend on the resolution: load/load_avg,
util_avg, and capacity (frequency adjustment). In order to reduce the
risks to make mistakes relating to resolution/range, we therefore
generalize the resolution by defining a basic resolution constant
number, and then formalize all metrics by depending on the basic
resolution. The basic resolution is 1024 or (1 << 10). Further, one
can recursively apply the basic resolution to increase the final
resolution.

Pointed out by Ben Segall, NICE_0's weight (visible to user) and load
have independent resolution, but they must be well calibrated.

Signed-off-by: Yuyang Du 
---
 include/linux/sched.h |  9 ++---
 kernel/sched/fair.c   |  4 
 kernel/sched/sched.h  | 15 ++-
 3 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index bd38b3e..9b86f79 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -909,10 +909,13 @@ enum cpu_idle_type {
CPU_MAX_IDLE_TYPES
 };
 
+# define SCHED_RESOLUTION_SHIFT10
+# define SCHED_RESOLUTION_SCALE(1L << SCHED_RESOLUTION_SHIFT)
+
 /*
  * Increase resolution of cpu_capacity calculations
  */
-#define SCHED_CAPACITY_SHIFT   10
+#define SCHED_CAPACITY_SHIFT   SCHED_RESOLUTION_SHIFT
 #define SCHED_CAPACITY_SCALE   (1L << SCHED_CAPACITY_SHIFT)
 
 /*
@@ -1180,8 +1183,8 @@ struct load_weight {
  * 1) load_avg factors frequency scaling into the amount of time that a
  * sched_entity is runnable on a rq into its weight. For cfs_rq, it is the
  * aggregated such weights of all runnable and blocked sched_entities.
- * 2) util_avg factors frequency and cpu scaling into the amount of time
- * that a sched_entity is running on a CPU, in the range [0..SCHED_LOAD_SCALE].
+ * 2) util_avg factors frequency and cpu capacity scaling into the amount of 
time
+ * that a sched_entity is running on a CPU, in the range 
[0..SCHED_CAPACITY_SCALE].
  * For cfs_rq, it is the aggregated such times of all runnable and
  * blocked sched_entities.
  * The 64 bit load_sum can:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4df37a4..c61fd8e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2522,10 +2522,6 @@ static u32 __compute_runnable_contrib(u64 n)
return contrib + runnable_avg_yN_sum[n];
 }
 
-#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || SCHED_CAPACITY_SHIFT 
!= 10
-#error "load tracking assumes 2^10 as unit"
-#endif
-
 #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
 
 /*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3845a71..31b4022 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -53,18 +53,23 @@ static inline void update_cpu_load_active(struct rq 
*this_rq) { }
  * increased costs.
  */
 #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage 
under light load  */
-# define SCHED_LOAD_RESOLUTION 10
-# define scale_load(w) ((w) << SCHED_LOAD_RESOLUTION)
-# define scale_load_down(w)((w) >> SCHED_LOAD_RESOLUTION)
+# define SCHED_LOAD_SHIFT  (SCHED_RESOLUTION_SHIFT + 
SCHED_RESOLUTION_SHIFT)
+# define scale_load(w) ((w) << SCHED_RESOLUTION_SHIFT)
+# define scale_load_down(w)((w) >> SCHED_RESOLUTION_SHIFT)
 #else
-# define SCHED_LOAD_RESOLUTION 0
+# define SCHED_LOAD_SHIFT  (SCHED_RESOLUTION_SHIFT)
 # define scale_load(w) (w)
 # define scale_load_down(w)(w)
 #endif
 
-#define SCHED_LOAD_SHIFT   (10 + SCHED_LOAD_RESOLUTION)
 #define SCHED_LOAD_SCALE   (1L << SCHED_LOAD_SHIFT)
 
+/*
+ * NICE_0's weight (visible to use

Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-23 Thread bsegall
Yuyang Du  writes:

> On Tue, Sep 22, 2015 at 10:18:30AM -0700, bseg...@google.com wrote:
>> Yuyang Du  writes:
>> 
>> > On Mon, Sep 21, 2015 at 10:30:04AM -0700, bseg...@google.com wrote:
>> >> > But first, I think as load_sum and load_avg can afford NICE_0_LOAD with 
>> >> > either high
>> >> > or low resolution. So we have no reason to have low resolution (10bits) 
>> >> > load_avg
>> >> > when NICE_0_LOAD has high resolution (20bits), because load_avg = 
>> >> > runnable% * load,
>> >> > as opposed to now we have load_avg = runnable% * scale_load_down(load).
>> >> >
>> >> > We get rid of all scale_load_down() for runnable load average?
>> >> 
>> >> Hmm, LOAD_AVG_MAX * prio_to_weight[0] is 4237627662, ie barely within a
>> >> 32-bit unsigned long, but in fact LOAD_AVG_MAX * MAX_SHARES is already
>> >> going to give errors on 32-bit (even with the old code in fact). This
>> >> should probably be fixed... somehow (dividing by 4 for load_sum on
>> >> 32-bit would work, though be ugly. Reducing MAX_SHARES by 2 bits on
>> >> 32-bit might have made sense but would be a weird difference between 32
>> >> and 64, and could break userspace anyway, so it's presumably too late
>> >> for that).
>> >> 
>> >> 64-bit has ~30 bits free, so this would be fine so long as SLR is 0 on
>> >> 32-bit.
>> >> 
>> >
>> > load_avg has no LOAD_AVG_MAX term in it, it is runnable% * load, IOW, 
>> > load_avg <= load.
>> > So, on 32bit, cfs_rq's load_avg can host 2^32/prio_to_weight[0]/1024 = 47, 
>> > with 20bits
>> > load resolution. This is ok, because struct load_weight's load is also 
>> > unsigned
>> > long. If overflown, cfs_rq->load.weight will be overflown in the first 
>> > place.
>> >
>> > However, after a second thought, this is not quite right. Because load_avg 
>> > is not
>> > necessarily no greater than load, since load_avg has blocked load in it. 
>> > Although,
>> > load_avg is still at the same level as load (converging to be <= load), we 
>> > may not
>> > want the risk to overflow on 32bit.
>  
> This second thought made a mistake (what was wrong with me). load_avg is for 
> sure
> no greater than load with or without blocked load.
>
> With that said, it really does not matter what the following numbers are, 
> 32bit or
> 64bit machine. What matters is that cfs_rq->load.weight is one that needs to 
> worry
> whether overflow or not, not the load_avg. It is as simple as that.
>
> With that, I think we can and should get rid of the scale_load_down()
> for load_avg.

load_avg yes is bounded by load.weight, but on 64-bit load_sum is only
bounded by load.weight * LOAD_AVG_MAX and is the same size as
load.weight (as I said below). There's still space for anything
reasonable though with 10 bits of SLR.

>
>> Yeah, I missed that load_sum was u64 and only load_avg was long. This
>> means we're fine on 32-bit with no SLR (or more precisely, cfs_rq
>> runnable_load_avg can overflow, but only when cfs_rq load.weight does,
>> so whatever). On 64-bit you can currently have 2^36 cgroups or 2^37
>> tasks before load.weight overflows, and ~2^31 tasks before
>> runnable_load_avg does, which is obviously fine (and in fact you hit
>> PID_MAX_LIMIT and even if you had the cpu/ram/etc to not fall over).
>> 
>> Now, applying SLR to runnable_load_avg would cut this down to ~2^21
>> tasks running at once or 2^20 with cgroups, which is technically
>> allowed, though it seems utterly implausible (especially since this
>> would have to all be on one cpu). If SLR was increased as peterz asked
>> about, this could be an issue though.
>> 
>> All that said, using SLR on load_sum/load_avg as opposed to cfs_rq
>> runnable_load_avg would be fine, as they're limited to only one
>> task/cgroup's weight. Having it SLRed and cfs_rq not would be a
>> little odd, but not impossible.
>  
>
>> > If NICE_0_LOAD is nice-0's load, and if SCHED_LOAD_SHIFT is to say how to 
>> > get 
>> > nice-0's load, I don't understand why you want to separate them.
>> 
>> SCHED_LOAD_SHIFT is not how to get nice-0's load, it just happens to
>> have the same value as NICE_0_SHIFT. (I think anyway, SCHED_LOAD_* is
>> used in precisely one place other than the newish util_avg, and as I
>> mentioned it's not remotely clear what compute_imbalance is doing theer)
>
> Yes, it is not clear to me either.
>
> With the above proposal to get rid of scale_load_down() for load_avg, so I 
> think
> now we can remove SCHED_LOAD_*, and rename scale_load() to 
> user_to_kernel_load(),
> and raname scale_load_down() to kernel_to_user_load().
>
> Hmm?

I have no opinion on renaming the scale_load functions, it's certainly
reasonable, but the scale_load names seem fine too.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-24 Thread Yuyang Du
On Wed, Sep 23, 2015 at 09:54:08AM -0700, bseg...@google.com wrote:
> > This second thought made a mistake (what was wrong with me). load_avg is 
> > for sure
> > no greater than load with or without blocked load.
> >
> > With that said, it really does not matter what the following numbers are, 
> > 32bit or
> > 64bit machine. What matters is that cfs_rq->load.weight is one that needs 
> > to worry
> > whether overflow or not, not the load_avg. It is as simple as that.
> >
> > With that, I think we can and should get rid of the scale_load_down()
> > for load_avg.
> 
> load_avg yes is bounded by load.weight, but on 64-bit load_sum is only
> bounded by load.weight * LOAD_AVG_MAX and is the same size as
> load.weight (as I said below). There's still space for anything
> reasonable though with 10 bits of SLR.
 
You are absolutely right.

> >> > If NICE_0_LOAD is nice-0's load, and if SCHED_LOAD_SHIFT is to say how 
> >> > to get 
> >> > nice-0's load, I don't understand why you want to separate them.
> >> 
> >> SCHED_LOAD_SHIFT is not how to get nice-0's load, it just happens to
> >> have the same value as NICE_0_SHIFT. (I think anyway, SCHED_LOAD_* is
> >> used in precisely one place other than the newish util_avg, and as I
> >> mentioned it's not remotely clear what compute_imbalance is doing theer)
> >
> > Yes, it is not clear to me either.
> >
> > With the above proposal to get rid of scale_load_down() for load_avg, so I 
> > think
> > now we can remove SCHED_LOAD_*, and rename scale_load() to 
> > user_to_kernel_load(),
> > and raname scale_load_down() to kernel_to_user_load().
> >
> > Hmm?
> 
> I have no opinion on renaming the scale_load functions, it's certainly
> reasonable, but the scale_load names seem fine too.

Without scale_load_down() in load_avg, it seems they are only used when
reading/writing load between user and kernel. I will ponder more, but
lets see whether others have opinion.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-14 Thread bsegall
Morten Rasmussen  writes:

> On Fri, Sep 11, 2015 at 10:05:53AM -0700, bseg...@google.com wrote:
>> Morten Rasmussen  writes:
>> 
>> > On Fri, Sep 11, 2015 at 08:28:25AM +0800, Yuyang Du wrote:
>> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> >> index 119823d..55a7b93 100644
>> >> --- a/include/linux/sched.h
>> >> +++ b/include/linux/sched.h
>> >> @@ -912,7 +912,7 @@ enum cpu_idle_type {
>> >>  /*
>> >>   * Increase resolution of cpu_capacity calculations
>> >>   */
>> >> -#define SCHED_CAPACITY_SHIFT 10
>> >> +#define SCHED_CAPACITY_SHIFT SCHED_RESOLUTION_SHIFT
>> >>  #define SCHED_CAPACITY_SCALE (1L << SCHED_CAPACITY_SHIFT)
>> >>  
>> >>  /*
>> >> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> >> index 68cda11..d27cdd8 100644
>> >> --- a/kernel/sched/sched.h
>> >> +++ b/kernel/sched/sched.h
>> >> @@ -40,6 +40,9 @@ static inline void update_cpu_load_active(struct rq 
>> >> *this_rq) { }
>> >>   */
>> >>  #define NS_TO_JIFFIES(TIME)  ((unsigned long)(TIME) / (NSEC_PER_SEC 
>> >> / HZ))
>> >>  
>> >> +# define SCHED_RESOLUTION_SHIFT  10
>> >> +# define SCHED_RESOLUTION_SCALE  (1L << SCHED_RESOLUTION_SHIFT)
>> >> +
>> >>  /*
>> >>   * Increase resolution of nice-level calculations for 64-bit 
>> >> architectures.
>> >>   * The extra resolution improves shares distribution and load balancing 
>> >> of
>> >> @@ -53,16 +56,15 @@ static inline void update_cpu_load_active(struct rq 
>> >> *this_rq) { }
>> >>   * increased costs.
>> >>   */
>> >>  #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power 
>> >> usage under light load  */
>> >> -# define SCHED_LOAD_RESOLUTION   10
>> >> -# define scale_load(w)   ((w) << SCHED_LOAD_RESOLUTION)
>> >> -# define scale_load_down(w)  ((w) >> SCHED_LOAD_RESOLUTION)
>> >> +# define SCHED_LOAD_SHIFT(SCHED_RESOLUTION_SHIFT + 
>> >> SCHED_RESOLUTION_SHIFT)
>> >> +# define scale_load(w)   ((w) << SCHED_RESOLUTION_SHIFT)
>> >> +# define scale_load_down(w)  ((w) >> SCHED_RESOLUTION_SHIFT)
>> >>  #else
>> >> -# define SCHED_LOAD_RESOLUTION   0
>> >> +# define SCHED_LOAD_SHIFT(SCHED_RESOLUTION_SHIFT)
>> >>  # define scale_load(w)   (w)
>> >>  # define scale_load_down(w)  (w)
>> >>  #endif
>> >>  
>> >> -#define SCHED_LOAD_SHIFT (10 + SCHED_LOAD_RESOLUTION)
>> >>  #define SCHED_LOAD_SCALE (1L << SCHED_LOAD_SHIFT)
>> >>  
>> >>  #define NICE_0_LOAD  SCHED_LOAD_SCALE
>> >
>> > I think this is pretty much the required relationship between all the
>> > SHIFTs and SCALEs that Peter checked for in his #if-#error thing
>> > earlier, so no disagreements from my side :-)
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> > the body of a message to majord...@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > Please read the FAQ at  http://www.tux.org/lkml/
>> 
>> SCHED_LOAD_RESOLUTION and the non-SLR part of SCHED_LOAD_SHIFT are not
>> required to be the same value and should not be conflated.
>> 
>> In particular, since cgroups are on the same timeline as tasks and their
>> shares are not scaled by SCHED_LOAD_SHIFT in any way (but are scaled so
>> that SCHED_LOAD_RESOLUTION is invisible), changing that part of
>> SCHED_LOAD_SHIFT would cause issues, since things can assume that nice-0
>> = 1024. However changing SCHED_LOAD_RESOLUTION would be fine, as that is
>> an internal value to the kernel.
>> 
>> In addition, changing the non-SLR part of SCHED_LOAD_SHIFT would require
>> recomputing all of prio_to_weight/wmult for the new NICE_0_LOAD.
>
> I think I follow, but doesn't that mean that the current code is broken
> too? NICE_0_LOAD changes if you change SCHED_LOAD_RESOLUTION:
>
> #define SCHED_LOAD_SHIFT(10 + SCHED_LOAD_RESOLUTION)
> #define SCHED_LOAD_SCALE(1L << SCHED_LOAD_SHIFT)
>
> #define NICE_0_LOAD SCHED_LOAD_SCALE
> #define NICE_0_SHIFTSCHED_LOAD_SHIFT
>
> To me it sounds like we need to define it the other way around:
>
> #define NICE_0_SHIFT10
> #define NICE_0_LOAD (1L << NICE_0_SHIFT)
>
> and then add any additional resolution bits from there to ensure that
> NICE_0_LOAD and the prio_to_weight/wmult tables are unchanged.

No, NICE_0_LOAD is supposed to be scale_load(prio_to_weight[nice_0]),
ie including SLR. It has never been clear to me what
SCHED_LOAD_SCALE/SCHED_LOAD_SHIFT were for as opposed to NICE_0_LOAD,
and the new utilization uses of it are entirely unlinked to 1024 == NICE_0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-14 Thread bsegall
Yuyang Du  writes:

> On Fri, Sep 11, 2015 at 10:05:53AM -0700, bseg...@google.com wrote:
>> 
>> SCHED_LOAD_RESOLUTION and the non-SLR part of SCHED_LOAD_SHIFT are not
>> required to be the same value and should not be conflated.
>  
>> In particular, since cgroups are on the same timeline as tasks and their
>> shares are not scaled by SCHED_LOAD_SHIFT in any way (but are scaled so
>> that SCHED_LOAD_RESOLUTION is invisible), changing that part of
>> SCHED_LOAD_SHIFT would cause issues, since things can assume that nice-0
>> = 1024. However changing SCHED_LOAD_RESOLUTION would be fine, as that is
>> an internal value to the kernel.
>>
>> In addition, changing the non-SLR part of SCHED_LOAD_SHIFT would require
>> recomputing all of prio_to_weight/wmult for the new NICE_0_LOAD.
>
> Not fully looked into the concerns, but the new SCHED_RESOLUTION_SHIFT
> is intended to formalize all the integer metrics that need better resolution.
> It is not special to any metric, so actually it is to de-conflate whoever is
> conflated.

It conflates the userspace-invisible SCHED_LOAD_RESOLUTION with the
userspace-visible value of scale_load_down(NICE_0_LOAD). Increasing
SCHED_LOAD_RESOLUTION must not change scale_load_down(NICE_0_LOAD).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-14 Thread Yuyang Du
On Mon, Sep 14, 2015 at 10:34:00AM -0700, bseg...@google.com wrote:
> >> SCHED_LOAD_RESOLUTION and the non-SLR part of SCHED_LOAD_SHIFT are not
> >> required to be the same value and should not be conflated.
> >> 
> >> In particular, since cgroups are on the same timeline as tasks and their
> >> shares are not scaled by SCHED_LOAD_SHIFT in any way (but are scaled so
> >> that SCHED_LOAD_RESOLUTION is invisible), changing that part of
> >> SCHED_LOAD_SHIFT would cause issues, since things can assume that nice-0
> >> = 1024. However changing SCHED_LOAD_RESOLUTION would be fine, as that is
> >> an internal value to the kernel.
> >> 
> >> In addition, changing the non-SLR part of SCHED_LOAD_SHIFT would require
> >> recomputing all of prio_to_weight/wmult for the new NICE_0_LOAD.
> >
> > I think I follow, but doesn't that mean that the current code is broken
> > too? NICE_0_LOAD changes if you change SCHED_LOAD_RESOLUTION:
> >
> > #define SCHED_LOAD_SHIFT(10 + SCHED_LOAD_RESOLUTION)
> > #define SCHED_LOAD_SCALE(1L << SCHED_LOAD_SHIFT)
> >
> > #define NICE_0_LOAD SCHED_LOAD_SCALE
> > #define NICE_0_SHIFTSCHED_LOAD_SHIFT
> >
> > To me it sounds like we need to define it the other way around:
> >
> > #define NICE_0_SHIFT10
> > #define NICE_0_LOAD (1L << NICE_0_SHIFT)
> >
> > and then add any additional resolution bits from there to ensure that
> > NICE_0_LOAD and the prio_to_weight/wmult tables are unchanged.
> 
> No, NICE_0_LOAD is supposed to be scale_load(prio_to_weight[nice_0]),
> ie including SLR. It has never been clear to me what
> SCHED_LOAD_SCALE/SCHED_LOAD_SHIFT were for as opposed to NICE_0_LOAD,
> and the new utilization uses of it are entirely unlinked to 1024 == NICE_0

Presume your SLR means SCHED_LOAD_RESOLUTION:

1) The introduction of (not redefinition of) SCHED_RESOLUTION_SHIFT does not
change anything after macro expansion.

2) The constants in prio_to_weight[] and prio_to_wmult[] are tied to a
resolution of 10bits NICE_0, i.e., 1024, I guest it is the user visible
part you mentioned, so is the cgroup share.

To me, it is all ok. With the SCHED_RESOLUTION_SHIFT, the basic resolution
unit, it is just for us to state clearly, the NICE_0's weight has a fixed
resolution of SCHED_RESOLUTION_SHIFT, or even add this:

#if prio_to_weight[20] != 1 << SCHED_RESOLUTION_SHIFT
error "NICE_0 weight not calibrated"
#endif
/* I can learn, Peter */

I guess you are saying we are conflating NICE_0 with NICE_0_LOAD. But to me,
they are just integer metrics, needing a resolution respectively. That is it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-15 Thread Morten Rasmussen
On Mon, Sep 14, 2015 at 10:34:00AM -0700, bseg...@google.com wrote:
> Morten Rasmussen  writes:
> 
> > On Fri, Sep 11, 2015 at 10:05:53AM -0700, bseg...@google.com wrote:
> >> Morten Rasmussen  writes:
> >> 
> >> > On Fri, Sep 11, 2015 at 08:28:25AM +0800, Yuyang Du wrote:
> >> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
> >> >> index 119823d..55a7b93 100644
> >> >> --- a/include/linux/sched.h
> >> >> +++ b/include/linux/sched.h
> >> >> @@ -912,7 +912,7 @@ enum cpu_idle_type {
> >> >>  /*
> >> >>   * Increase resolution of cpu_capacity calculations
> >> >>   */
> >> >> -#define SCHED_CAPACITY_SHIFT   10
> >> >> +#define SCHED_CAPACITY_SHIFT   SCHED_RESOLUTION_SHIFT
> >> >>  #define SCHED_CAPACITY_SCALE   (1L << SCHED_CAPACITY_SHIFT)
> >> >>  
> >> >>  /*
> >> >> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> >> >> index 68cda11..d27cdd8 100644
> >> >> --- a/kernel/sched/sched.h
> >> >> +++ b/kernel/sched/sched.h
> >> >> @@ -40,6 +40,9 @@ static inline void update_cpu_load_active(struct rq 
> >> >> *this_rq) { }
> >> >>   */
> >> >>  #define NS_TO_JIFFIES(TIME)((unsigned long)(TIME) / (NSEC_PER_SEC 
> >> >> / HZ))
> >> >>  
> >> >> +# define SCHED_RESOLUTION_SHIFT10
> >> >> +# define SCHED_RESOLUTION_SCALE(1L << SCHED_RESOLUTION_SHIFT)
> >> >> +
> >> >>  /*
> >> >>   * Increase resolution of nice-level calculations for 64-bit 
> >> >> architectures.
> >> >>   * The extra resolution improves shares distribution and load 
> >> >> balancing of
> >> >> @@ -53,16 +56,15 @@ static inline void update_cpu_load_active(struct rq 
> >> >> *this_rq) { }
> >> >>   * increased costs.
> >> >>   */
> >> >>  #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power 
> >> >> usage under light load  */
> >> >> -# define SCHED_LOAD_RESOLUTION 10
> >> >> -# define scale_load(w) ((w) << SCHED_LOAD_RESOLUTION)
> >> >> -# define scale_load_down(w)((w) >> SCHED_LOAD_RESOLUTION)
> >> >> +# define SCHED_LOAD_SHIFT  (SCHED_RESOLUTION_SHIFT + 
> >> >> SCHED_RESOLUTION_SHIFT)
> >> >> +# define scale_load(w) ((w) << SCHED_RESOLUTION_SHIFT)
> >> >> +# define scale_load_down(w)((w) >> SCHED_RESOLUTION_SHIFT)
> >> >>  #else
> >> >> -# define SCHED_LOAD_RESOLUTION 0
> >> >> +# define SCHED_LOAD_SHIFT  (SCHED_RESOLUTION_SHIFT)
> >> >>  # define scale_load(w) (w)
> >> >>  # define scale_load_down(w)(w)
> >> >>  #endif
> >> >>  
> >> >> -#define SCHED_LOAD_SHIFT   (10 + SCHED_LOAD_RESOLUTION)
> >> >>  #define SCHED_LOAD_SCALE   (1L << SCHED_LOAD_SHIFT)
> >> >>  
> >> >>  #define NICE_0_LOADSCHED_LOAD_SCALE
> >> >
> >> > I think this is pretty much the required relationship between all the
> >> > SHIFTs and SCALEs that Peter checked for in his #if-#error thing
> >> > earlier, so no disagreements from my side :-)
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" 
> >> > in
> >> > the body of a message to majord...@vger.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> > Please read the FAQ at  http://www.tux.org/lkml/
> >> 
> >> SCHED_LOAD_RESOLUTION and the non-SLR part of SCHED_LOAD_SHIFT are not
> >> required to be the same value and should not be conflated.
> >> 
> >> In particular, since cgroups are on the same timeline as tasks and their
> >> shares are not scaled by SCHED_LOAD_SHIFT in any way (but are scaled so
> >> that SCHED_LOAD_RESOLUTION is invisible), changing that part of
> >> SCHED_LOAD_SHIFT would cause issues, since things can assume that nice-0
> >> = 1024. However changing SCHED_LOAD_RESOLUTION would be fine, as that is
> >> an internal value to the kernel.
> >> 
> >> In addition, changing the non-SLR part of SCHED_LOAD_SHIFT would require
> >> recomputing all of prio_to_weight/wmult for the new NICE_0_LOAD.
> >
> > I think I follow, but doesn't that mean that the current code is broken
> > too? NICE_0_LOAD changes if you change SCHED_LOAD_RESOLUTION:
> >
> > #define SCHED_LOAD_SHIFT(10 + SCHED_LOAD_RESOLUTION)
> > #define SCHED_LOAD_SCALE(1L << SCHED_LOAD_SHIFT)
> >
> > #define NICE_0_LOAD SCHED_LOAD_SCALE
> > #define NICE_0_SHIFTSCHED_LOAD_SHIFT
> >
> > To me it sounds like we need to define it the other way around:
> >
> > #define NICE_0_SHIFT10
> > #define NICE_0_LOAD (1L << NICE_0_SHIFT)
> >
> > and then add any additional resolution bits from there to ensure that
> > NICE_0_LOAD and the prio_to_weight/wmult tables are unchanged.
> 
> No, NICE_0_LOAD is supposed to be scale_load(prio_to_weight[nice_0]),
> ie including SLR. It has never been clear to me what
> SCHED_LOAD_SCALE/SCHED_LOAD_SHIFT were for as opposed to NICE_0_LOAD,

I see, I wasn't sure if NICE_0_LOAD is being used in the code somewhere
with the assumption that NICE_0_LOAD = load.weight = 1024. The
scale_(down_)_load() conversion between base load (nice_0 = 1024) and
hi-res load makes makes sense.

Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-15 Thread bsegall
Yuyang Du  writes:

> On Mon, Sep 14, 2015 at 10:34:00AM -0700, bseg...@google.com wrote:
>> >> SCHED_LOAD_RESOLUTION and the non-SLR part of SCHED_LOAD_SHIFT are not
>> >> required to be the same value and should not be conflated.
>> >> 
>> >> In particular, since cgroups are on the same timeline as tasks and their
>> >> shares are not scaled by SCHED_LOAD_SHIFT in any way (but are scaled so
>> >> that SCHED_LOAD_RESOLUTION is invisible), changing that part of
>> >> SCHED_LOAD_SHIFT would cause issues, since things can assume that nice-0
>> >> = 1024. However changing SCHED_LOAD_RESOLUTION would be fine, as that is
>> >> an internal value to the kernel.
>> >> 
>> >> In addition, changing the non-SLR part of SCHED_LOAD_SHIFT would require
>> >> recomputing all of prio_to_weight/wmult for the new NICE_0_LOAD.
>> >
>> > I think I follow, but doesn't that mean that the current code is broken
>> > too? NICE_0_LOAD changes if you change SCHED_LOAD_RESOLUTION:
>> >
>> > #define SCHED_LOAD_SHIFT(10 + SCHED_LOAD_RESOLUTION)
>> > #define SCHED_LOAD_SCALE(1L << SCHED_LOAD_SHIFT)
>> >
>> > #define NICE_0_LOAD SCHED_LOAD_SCALE
>> > #define NICE_0_SHIFTSCHED_LOAD_SHIFT
>> >
>> > To me it sounds like we need to define it the other way around:
>> >
>> > #define NICE_0_SHIFT10
>> > #define NICE_0_LOAD (1L << NICE_0_SHIFT)
>> >
>> > and then add any additional resolution bits from there to ensure that
>> > NICE_0_LOAD and the prio_to_weight/wmult tables are unchanged.
>> 
>> No, NICE_0_LOAD is supposed to be scale_load(prio_to_weight[nice_0]),
>> ie including SLR. It has never been clear to me what
>> SCHED_LOAD_SCALE/SCHED_LOAD_SHIFT were for as opposed to NICE_0_LOAD,
>> and the new utilization uses of it are entirely unlinked to 1024 == NICE_0
>
> Presume your SLR means SCHED_LOAD_RESOLUTION:
>
> 1) The introduction of (not redefinition of) SCHED_RESOLUTION_SHIFT does not
> change anything after macro expansion.
>
> 2) The constants in prio_to_weight[] and prio_to_wmult[] are tied to a
> resolution of 10bits NICE_0, i.e., 1024, I guest it is the user visible
> part you mentioned, so is the cgroup share.
>
> To me, it is all ok. With the SCHED_RESOLUTION_SHIFT, the basic resolution
> unit, it is just for us to state clearly, the NICE_0's weight has a fixed
> resolution of SCHED_RESOLUTION_SHIFT, or even add this:
>
> #if prio_to_weight[20] != 1 << SCHED_RESOLUTION_SHIFT
> error "NICE_0 weight not calibrated"
> #endif
> /* I can learn, Peter */
>
> I guess you are saying we are conflating NICE_0 with NICE_0_LOAD. But to me,
> they are just integer metrics, needing a resolution respectively. That is it.

Yes this would change nothing at the moment post-expansion, that's not
the point. SLR being 10 bits and the nice-0 being 1024 are completely
and utterly unrelated and the headers should not pretend they need to be
the same value, any more than there should be a #define that is shared
with every other use of 1024 in the kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-15 Thread Yuyang Du
On Tue, Sep 15, 2015 at 10:11:41AM -0700, bseg...@google.com wrote:
> >
> > I guess you are saying we are conflating NICE_0 with NICE_0_LOAD. But to me,
> > they are just integer metrics, needing a resolution respectively. That is 
> > it.
> 
> Yes this would change nothing at the moment post-expansion, that's not
> the point. SLR being 10 bits and the nice-0 being 1024 are completely
> and utterly unrelated and the headers should not pretend they need to be
> the same value,

I never said they are related, why should they be related. And they need or
need not to be the same value, fine.

However, the SLR has to be a value. It is because it mighe be 10 or 20 (LOAD),
therefore I make SCHED_RESOLUTION_SHIFT 10 (kind of a denominator). Not the
other way around.

We can define SCHED_RESOLUTION_SHIFT 1, and then define SLR = x * 
SCHED_RESOLUTION_SHIFT
with x being a random number, if you must.

And by the way, with SCHED_RESOLUTION_SHIFT, there will not be SLR anymore, we 
only
need SCHED_LOAD_SHIFT, which has a low resolution 1*SCHED_RESOLUTION_SHIFT or a 
high
one 2*SCHED_RESOLUTION_SHIFT. The scale_load*() is the conversion between the 
resolutions of NICE_0 and NICE_0_LOAD.

> any more than there should be a #define that is shared
> with every other use of 1024 in the kernel.

The point really is, metrics (if not many ) need resolution, not just 
NICE_0_LOAD does. 
You can choose to either hardcode a number, like SCHED_CAPACITY_SHIFT now,
or you can use SCHED_RESOLUTION_SHIFT, which is even as simple as a sign to say 
what
the defined is (the scaled one with a better resolution vs. the original one).
I guess this is to say we now have a (no-big-deal) resolution system.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-16 Thread Peter Zijlstra
On Mon, Sep 14, 2015 at 10:34:00AM -0700, bseg...@google.com wrote:

> It has never been clear to me what
> SCHED_LOAD_SCALE/SCHED_LOAD_SHIFT were for as opposed to NICE_0_LOAD,

SCHED_LOAD_SCALE/SHIFT are the fixed point mult/shift, and NICE_0_LOAD
is the load of a nice-0 task. They happen to be the same by the choice
that nice-0 has a load of 1 (the only natural choice given proportional
weight and hyperboles etc..).

But for the fixed point math we use SCHED_LOAD_* and only when we want
to explicitly use the unit load we use NICE_0_LOAD (its only used 4-5
times or so).


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-16 Thread bsegall
Yuyang Du  writes:

> On Tue, Sep 15, 2015 at 10:11:41AM -0700, bseg...@google.com wrote:
>> >
>> > I guess you are saying we are conflating NICE_0 with NICE_0_LOAD. But to 
>> > me,
>> > they are just integer metrics, needing a resolution respectively. That is 
>> > it.
>> 
>> Yes this would change nothing at the moment post-expansion, that's not
>> the point. SLR being 10 bits and the nice-0 being 1024 are completely
>> and utterly unrelated and the headers should not pretend they need to be
>> the same value,
>
> I never said they are related, why should they be related. And they need or
> need not to be the same value, fine.
>
> However, the SLR has to be a value. It is because it mighe be 10 or 20 (LOAD),
> therefore I make SCHED_RESOLUTION_SHIFT 10 (kind of a denominator). Not the
> other way around.
>
> We can define SCHED_RESOLUTION_SHIFT 1, and then define SLR = x * 
> SCHED_RESOLUTION_SHIFT
> with x being a random number, if you must.

That's sorta the point - you could do this and it would be just as 
(non-)sensical.

>
>> any more than there should be a #define that is shared
>> with every other use of 1024 in the kernel.
>
> The point really is, metrics (if not many ) need resolution, not just 
> NICE_0_LOAD does. 
> You can choose to either hardcode a number, like SCHED_CAPACITY_SHIFT now,
> or you can use SCHED_RESOLUTION_SHIFT, which is even as simple as a sign to 
> say what
> the defined is (the scaled one with a better resolution vs. the original one).
> I guess this is to say we now have a (no-big-deal) resolution system.

Yes they were chosen for similar reasons, but they are not conceptually
related, and you couldn't decide to just bump up all the resolutions by
changing SCHED_RESOLUTION_SHIFT, so doing this would just be misleading.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/