Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-09-07 Thread Dietmar Eggemann
On 07/09/15 13:42, Peter Zijlstra wrote:
> On Mon, Aug 31, 2015 at 11:24:49AM +0200, Peter Zijlstra wrote:
> 
>> A quick run here gives:
>>
>> IVB-EP (2*20*2):
> 
> As noted by someone; that should be 2*10*2, for a total of 40 cpus in
> this machine.
> 
>>
>> perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000
>>
>> Before:  After:
>> 5.484170711 ( +-  0.74% )5.590001145 ( +-  0.45% )
>>
>> Which is an almost 2% slowdown :/
>>
>> I've yet to look at what happens.
> 
> OK, so it appears this is link order nonsense. When I compared profiles
> between the series, the one function that had significant change was
> skb_release_data(), which doesn't make much sense.
> 
> If I do a 'make clean' in front of each build, I get a repeatable
> improvement with this patch set (although how much of that is due to the
> patches itself or just because of code movement is as yet undetermined).
> 
> I'm of a mind to apply these patches; with two patches on top, which
> I'll post shortly.
> 

-- >8 --

From: Dietmar Eggemann 
Date: Mon, 7 Sep 2015 14:57:22 +0100
Subject: [PATCH] sched/fair: Defer calling scaling functions

Do not call the scaling functions in case time goes backwards or the
last update of the sched_avg structure has happened less than 1024ns
ago.

Signed-off-by: Dietmar Eggemann 
---
 kernel/sched/fair.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d6ca8d987a63..3445d2fb38f4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2552,8 +2552,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
u64 delta, scaled_delta, periods;
u32 contrib;
unsigned int delta_w, scaled_delta_w, decayed = 0;
-   unsigned long scale_freq = arch_scale_freq_capacity(NULL, cpu);
-   unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
+   unsigned long scale_freq, scale_cpu;
 
delta = now - sa->last_update_time;
/*
@@ -2574,6 +2573,9 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
return 0;
sa->last_update_time = now;
 
+   scale_freq = arch_scale_freq_capacity(NULL, cpu);
+   scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
+
/* delta_w is the amount already accumulated against our next period */
delta_w = sa->period_contrib;
if (delta + delta_w >= 1024) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-09-07 Thread Peter Zijlstra
On Mon, Sep 07, 2015 at 02:42:20PM +0200, Peter Zijlstra wrote:
> I'm of a mind to apply these patches; with two patches on top, which
> I'll post shortly.

---
Subject: sched: Optimize __update_load_avg()
From: Peter Zijlstra 
Date: Mon Sep  7 15:09:15 CEST 2015

Prior to this patch; the line:

scaled_delta_w = (delta_w * 1024) >> 10;

which is the result of the default arch_scale_freq_capacity()
function, turns into:

1b03:   49 89 d1mov%rdx,%r9
1b06:   49 c1 e1 0a shl$0xa,%r9
1b0a:   49 c1 e9 0a shr$0xa,%r9

Which is silly; when made unsigned int, GCC recognises this as
pointless ops and fails to emit them (confirmed on 4.9.3 and 5.1.1).

Furthermore, afaict unsigned is actually the correct type for these
fields anyway, as we've explicitly ruled out negative delta's earlier
in this function.

Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/sched/fair.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2551,7 +2551,7 @@ __update_load_avg(u64 now, int cpu, stru
 {
u64 delta, scaled_delta, periods;
u32 contrib;
-   int delta_w, scaled_delta_w, decayed = 0;
+   unsigned int delta_w, scaled_delta_w, decayed = 0;
unsigned long scale_freq = arch_scale_freq_capacity(NULL, cpu);
unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-09-07 Thread Peter Zijlstra
On Mon, Sep 07, 2015 at 02:42:20PM +0200, Peter Zijlstra wrote:
> I'm of a mind to apply these patches; with two patches on top, which
> I'll post shortly.

---
Subject: sched: Rename scale()
From: Peter Zijlstra 
Date: Mon Sep 7 15:05:42 CEST 2015

Rename scale() to cap_scale() to better reflect its purpose, it is
after all not a general purpose scale function, it has
SCHED_CAPACITY_SHIFT hardcoded in it.

Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/sched/fair.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2515,7 +2515,7 @@ static u32 __compute_runnable_contrib(u6
return contrib + runnable_avg_yN_sum[n];
 }
 
-#define scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
+#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
 
 /*
  * We can represent the historical contribution to runnable average as the
@@ -2588,7 +2588,7 @@ __update_load_avg(u64 now, int cpu, stru
 * period and accrue it.
 */
delta_w = 1024 - delta_w;
-   scaled_delta_w = scale(delta_w, scale_freq);
+   scaled_delta_w = cap_scale(delta_w, scale_freq);
if (weight) {
sa->load_sum += weight * scaled_delta_w;
if (cfs_rq) {
@@ -2597,7 +2597,7 @@ __update_load_avg(u64 now, int cpu, stru
}
}
if (running)
-   sa->util_sum += scale(scaled_delta_w, scale_cpu);
+   sa->util_sum += cap_scale(scaled_delta_w, scale_cpu);
 
delta -= delta_w;
 
@@ -2614,25 +2614,25 @@ __update_load_avg(u64 now, int cpu, stru
 
/* Efficiently calculate \sum (1..n_period) 1024*y^i */
contrib = __compute_runnable_contrib(periods);
-   contrib = scale(contrib, scale_freq);
+   contrib = cap_scale(contrib, scale_freq);
if (weight) {
sa->load_sum += weight * contrib;
if (cfs_rq)
cfs_rq->runnable_load_sum += weight * contrib;
}
if (running)
-   sa->util_sum += scale(contrib, scale_cpu);
+   sa->util_sum += cap_scale(contrib, scale_cpu);
}
 
/* Remainder of delta accrued against u_0` */
-   scaled_delta = scale(delta, scale_freq);
+   scaled_delta = cap_scale(delta, scale_freq);
if (weight) {
sa->load_sum += weight * scaled_delta;
if (cfs_rq)
cfs_rq->runnable_load_sum += weight * scaled_delta;
}
if (running)
-   sa->util_sum += scale(scaled_delta, scale_cpu);
+   sa->util_sum += cap_scale(scaled_delta, scale_cpu);
 
sa->period_contrib += delta;
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-09-07 Thread Peter Zijlstra
On Mon, Aug 31, 2015 at 11:24:49AM +0200, Peter Zijlstra wrote:

> A quick run here gives:
> 
> IVB-EP (2*20*2):

As noted by someone; that should be 2*10*2, for a total of 40 cpus in
this machine.

> 
> perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000
> 
> Before:   After:
> 5.484170711 ( +-  0.74% ) 5.590001145 ( +-  0.45% )
> 
> Which is an almost 2% slowdown :/
> 
> I've yet to look at what happens.

OK, so it appears this is link order nonsense. When I compared profiles
between the series, the one function that had significant change was
skb_release_data(), which doesn't make much sense.

If I do a 'make clean' in front of each build, I get a repeatable
improvement with this patch set (although how much of that is due to the
patches itself or just because of code movement is as yet undetermined).

I'm of a mind to apply these patches; with two patches on top, which
I'll post shortly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-09-07 Thread Peter Zijlstra
On Mon, Sep 07, 2015 at 02:42:20PM +0200, Peter Zijlstra wrote:
> I'm of a mind to apply these patches; with two patches on top, which
> I'll post shortly.

---
Subject: sched: Optimize __update_load_avg()
From: Peter Zijlstra 
Date: Mon Sep  7 15:09:15 CEST 2015

Prior to this patch; the line:

scaled_delta_w = (delta_w * 1024) >> 10;

which is the result of the default arch_scale_freq_capacity()
function, turns into:

1b03:   49 89 d1mov%rdx,%r9
1b06:   49 c1 e1 0a shl$0xa,%r9
1b0a:   49 c1 e9 0a shr$0xa,%r9

Which is silly; when made unsigned int, GCC recognises this as
pointless ops and fails to emit them (confirmed on 4.9.3 and 5.1.1).

Furthermore, afaict unsigned is actually the correct type for these
fields anyway, as we've explicitly ruled out negative delta's earlier
in this function.

Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/sched/fair.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2551,7 +2551,7 @@ __update_load_avg(u64 now, int cpu, stru
 {
u64 delta, scaled_delta, periods;
u32 contrib;
-   int delta_w, scaled_delta_w, decayed = 0;
+   unsigned int delta_w, scaled_delta_w, decayed = 0;
unsigned long scale_freq = arch_scale_freq_capacity(NULL, cpu);
unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-09-07 Thread Dietmar Eggemann
On 07/09/15 13:42, Peter Zijlstra wrote:
> On Mon, Aug 31, 2015 at 11:24:49AM +0200, Peter Zijlstra wrote:
> 
>> A quick run here gives:
>>
>> IVB-EP (2*20*2):
> 
> As noted by someone; that should be 2*10*2, for a total of 40 cpus in
> this machine.
> 
>>
>> perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000
>>
>> Before:  After:
>> 5.484170711 ( +-  0.74% )5.590001145 ( +-  0.45% )
>>
>> Which is an almost 2% slowdown :/
>>
>> I've yet to look at what happens.
> 
> OK, so it appears this is link order nonsense. When I compared profiles
> between the series, the one function that had significant change was
> skb_release_data(), which doesn't make much sense.
> 
> If I do a 'make clean' in front of each build, I get a repeatable
> improvement with this patch set (although how much of that is due to the
> patches itself or just because of code movement is as yet undetermined).
> 
> I'm of a mind to apply these patches; with two patches on top, which
> I'll post shortly.
> 

-- >8 --

From: Dietmar Eggemann 
Date: Mon, 7 Sep 2015 14:57:22 +0100
Subject: [PATCH] sched/fair: Defer calling scaling functions

Do not call the scaling functions in case time goes backwards or the
last update of the sched_avg structure has happened less than 1024ns
ago.

Signed-off-by: Dietmar Eggemann 
---
 kernel/sched/fair.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d6ca8d987a63..3445d2fb38f4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2552,8 +2552,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
u64 delta, scaled_delta, periods;
u32 contrib;
unsigned int delta_w, scaled_delta_w, decayed = 0;
-   unsigned long scale_freq = arch_scale_freq_capacity(NULL, cpu);
-   unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
+   unsigned long scale_freq, scale_cpu;
 
delta = now - sa->last_update_time;
/*
@@ -2574,6 +2573,9 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
return 0;
sa->last_update_time = now;
 
+   scale_freq = arch_scale_freq_capacity(NULL, cpu);
+   scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
+
/* delta_w is the amount already accumulated against our next period */
delta_w = sa->period_contrib;
if (delta + delta_w >= 1024) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-09-07 Thread Peter Zijlstra
On Mon, Sep 07, 2015 at 02:42:20PM +0200, Peter Zijlstra wrote:
> I'm of a mind to apply these patches; with two patches on top, which
> I'll post shortly.

---
Subject: sched: Rename scale()
From: Peter Zijlstra 
Date: Mon Sep 7 15:05:42 CEST 2015

Rename scale() to cap_scale() to better reflect its purpose, it is
after all not a general purpose scale function, it has
SCHED_CAPACITY_SHIFT hardcoded in it.

Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/sched/fair.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2515,7 +2515,7 @@ static u32 __compute_runnable_contrib(u6
return contrib + runnable_avg_yN_sum[n];
 }
 
-#define scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
+#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
 
 /*
  * We can represent the historical contribution to runnable average as the
@@ -2588,7 +2588,7 @@ __update_load_avg(u64 now, int cpu, stru
 * period and accrue it.
 */
delta_w = 1024 - delta_w;
-   scaled_delta_w = scale(delta_w, scale_freq);
+   scaled_delta_w = cap_scale(delta_w, scale_freq);
if (weight) {
sa->load_sum += weight * scaled_delta_w;
if (cfs_rq) {
@@ -2597,7 +2597,7 @@ __update_load_avg(u64 now, int cpu, stru
}
}
if (running)
-   sa->util_sum += scale(scaled_delta_w, scale_cpu);
+   sa->util_sum += cap_scale(scaled_delta_w, scale_cpu);
 
delta -= delta_w;
 
@@ -2614,25 +2614,25 @@ __update_load_avg(u64 now, int cpu, stru
 
/* Efficiently calculate \sum (1..n_period) 1024*y^i */
contrib = __compute_runnable_contrib(periods);
-   contrib = scale(contrib, scale_freq);
+   contrib = cap_scale(contrib, scale_freq);
if (weight) {
sa->load_sum += weight * contrib;
if (cfs_rq)
cfs_rq->runnable_load_sum += weight * contrib;
}
if (running)
-   sa->util_sum += scale(contrib, scale_cpu);
+   sa->util_sum += cap_scale(contrib, scale_cpu);
}
 
/* Remainder of delta accrued against u_0` */
-   scaled_delta = scale(delta, scale_freq);
+   scaled_delta = cap_scale(delta, scale_freq);
if (weight) {
sa->load_sum += weight * scaled_delta;
if (cfs_rq)
cfs_rq->runnable_load_sum += weight * scaled_delta;
}
if (running)
-   sa->util_sum += scale(scaled_delta, scale_cpu);
+   sa->util_sum += cap_scale(scaled_delta, scale_cpu);
 
sa->period_contrib += delta;
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-09-07 Thread Peter Zijlstra
On Mon, Aug 31, 2015 at 11:24:49AM +0200, Peter Zijlstra wrote:

> A quick run here gives:
> 
> IVB-EP (2*20*2):

As noted by someone; that should be 2*10*2, for a total of 40 cpus in
this machine.

> 
> perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000
> 
> Before:   After:
> 5.484170711 ( +-  0.74% ) 5.590001145 ( +-  0.45% )
> 
> Which is an almost 2% slowdown :/
> 
> I've yet to look at what happens.

OK, so it appears this is link order nonsense. When I compared profiles
between the series, the one function that had significant change was
skb_release_data(), which doesn't make much sense.

If I do a 'make clean' in front of each build, I get a repeatable
improvement with this patch set (although how much of that is due to the
patches itself or just because of code movement is as yet undetermined).

I'm of a mind to apply these patches; with two patches on top, which
I'll post shortly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-09-02 Thread Dietmar Eggemann

On 08/31/2015 11:24 AM, Peter Zijlstra wrote:

On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote:

Target: ARM TC2 A7-only (x3)
Test: hackbench -g 25 --threads -l 1

Before  After
315.545 313.408 -0.68%

Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz
Test: hackbench -g 25 --threads -l 1000 (avg of 10)

Before  After
6.4643  6.395   -1.07%



A quick run here gives:

IVB-EP (2*20*2):

perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000

Before: After:
5.484170711 ( +-  0.74% )   5.590001145 ( +-  0.45% )

Which is an almost 2% slowdown :/

I've yet to look at what happens.



I tested the patch-set on top of tip:

ff277d4250fe - sched/deadline: Fix comment in enqueue_task_dl()

on a 2 cluster IVB-EP (2 clusters * 10 cores * 2 HW threads) = 40 
logical cpus w/ (SMT, MC, NUMA sd's).


model name : Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz

perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000

Before: After:
5.049361160 ( +- 1.26% )5.014980654 ( +- 1.20% )

Even by running this test multiple times I never saw something like a 2% 
slowdown.


It's a vanilla ubuntu 15.04 system which might explain the slightly 
higher stddev.


We could optimize the changes we did in __update_load_avg() by only 
calculating the additional scaled values [scaled_delta_w, contrib, 
scaled_delta] in case the function is called w/ 'weight !=0 && running 
!=0'. This is also true for the initialization of scale_freq and scale_cpu.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-09-02 Thread Dietmar Eggemann

On 08/31/2015 11:24 AM, Peter Zijlstra wrote:

On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote:

Target: ARM TC2 A7-only (x3)
Test: hackbench -g 25 --threads -l 1

Before  After
315.545 313.408 -0.68%

Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz
Test: hackbench -g 25 --threads -l 1000 (avg of 10)

Before  After
6.4643  6.395   -1.07%



A quick run here gives:

IVB-EP (2*20*2):

perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000

Before: After:
5.484170711 ( +-  0.74% )   5.590001145 ( +-  0.45% )

Which is an almost 2% slowdown :/

I've yet to look at what happens.



I tested the patch-set on top of tip:

ff277d4250fe - sched/deadline: Fix comment in enqueue_task_dl()

on a 2 cluster IVB-EP (2 clusters * 10 cores * 2 HW threads) = 40 
logical cpus w/ (SMT, MC, NUMA sd's).


model name : Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz

perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000

Before: After:
5.049361160 ( +- 1.26% )5.014980654 ( +- 1.20% )

Even by running this test multiple times I never saw something like a 2% 
slowdown.


It's a vanilla ubuntu 15.04 system which might explain the slightly 
higher stddev.


We could optimize the changes we did in __update_load_avg() by only 
calculating the additional scaled values [scaled_delta_w, contrib, 
scaled_delta] in case the function is called w/ 'weight !=0 && running 
!=0'. This is also true for the initialization of scale_freq and scale_cpu.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-08-31 Thread Peter Zijlstra
On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote:
> Target: ARM TC2 A7-only (x3)
> Test: hackbench -g 25 --threads -l 1
> 
> BeforeAfter
> 315.545   313.408 -0.68%
> 
> Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz
> Test: hackbench -g 25 --threads -l 1000 (avg of 10)
> 
> BeforeAfter
> 6.46436.395   -1.07%
> 

A quick run here gives:

IVB-EP (2*20*2):

perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000

Before: After:
5.484170711 ( +-  0.74% )   5.590001145 ( +-  0.45% )

Which is an almost 2% slowdown :/

I've yet to look at what happens.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-08-31 Thread Peter Zijlstra
On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote:
> Target: ARM TC2 A7-only (x3)
> Test: hackbench -g 25 --threads -l 1
> 
> BeforeAfter
> 315.545   313.408 -0.68%
> 
> Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz
> Test: hackbench -g 25 --threads -l 1000 (avg of 10)
> 
> BeforeAfter
> 6.46436.395   -1.07%
> 

A quick run here gives:

IVB-EP (2*20*2):

perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000

Before: After:
5.484170711 ( +-  0.74% )   5.590001145 ( +-  0.45% )

Which is an almost 2% slowdown :/

I've yet to look at what happens.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-08-17 Thread Peter Zijlstra
On Mon, Aug 17, 2015 at 12:29:51PM +0100, Morten Rasmussen wrote:
> On Sun, Aug 16, 2015 at 10:46:05PM +0200, Peter Zijlstra wrote:
> > On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote:
> > > Target: ARM TC2 A7-only (x3)
> > > Test: hackbench -g 25 --threads -l 1
> > > 
> > > BeforeAfter
> > > 315.545   313.408 -0.68%
> > > 
> > > Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz
> > > Test: hackbench -g 25 --threads -l 1000 (avg of 10)
> > > 
> > > BeforeAfter
> > > 6.46436.395   -1.07%
> > 
> > Yeah, so that is a problem.
> 
> Maybe I'm totally wrong, but doesn't hackbench report execution so less
> is better? In that case -1.07% means we are doing better with the
> patches applied (after time < before time). In any case, I should have
> indicated whether the change is good or bad for performance.
> 
> > I'm taking it some of the new scaling stuff doesn't compile away, can we
> > look at fixing that?
> 
> I will double-check that the stuff goes away as expected. I'm pretty
> sure it does on ARM.

Ah, uhm.. you have a point there ;-) I'll run the numbers when I'm back
home again.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-08-17 Thread Morten Rasmussen
On Sun, Aug 16, 2015 at 10:46:05PM +0200, Peter Zijlstra wrote:
> On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote:
> > Target: ARM TC2 A7-only (x3)
> > Test: hackbench -g 25 --threads -l 1
> > 
> > Before  After
> > 315.545 313.408 -0.68%
> > 
> > Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz
> > Test: hackbench -g 25 --threads -l 1000 (avg of 10)
> > 
> > Before  After
> > 6.4643  6.395   -1.07%
> 
> Yeah, so that is a problem.

Maybe I'm totally wrong, but doesn't hackbench report execution so less
is better? In that case -1.07% means we are doing better with the
patches applied (after time < before time). In any case, I should have
indicated whether the change is good or bad for performance.

> I'm taking it some of the new scaling stuff doesn't compile away, can we
> look at fixing that?

I will double-check that the stuff goes away as expected. I'm pretty
sure it does on ARM.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-08-17 Thread Peter Zijlstra
On Mon, Aug 17, 2015 at 12:29:51PM +0100, Morten Rasmussen wrote:
 On Sun, Aug 16, 2015 at 10:46:05PM +0200, Peter Zijlstra wrote:
  On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote:
   Target: ARM TC2 A7-only (x3)
   Test: hackbench -g 25 --threads -l 1
   
   BeforeAfter
   315.545   313.408 -0.68%
   
   Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz
   Test: hackbench -g 25 --threads -l 1000 (avg of 10)
   
   BeforeAfter
   6.46436.395   -1.07%
  
  Yeah, so that is a problem.
 
 Maybe I'm totally wrong, but doesn't hackbench report execution so less
 is better? In that case -1.07% means we are doing better with the
 patches applied (after time  before time). In any case, I should have
 indicated whether the change is good or bad for performance.
 
  I'm taking it some of the new scaling stuff doesn't compile away, can we
  look at fixing that?
 
 I will double-check that the stuff goes away as expected. I'm pretty
 sure it does on ARM.

Ah, uhm.. you have a point there ;-) I'll run the numbers when I'm back
home again.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-08-17 Thread Morten Rasmussen
On Sun, Aug 16, 2015 at 10:46:05PM +0200, Peter Zijlstra wrote:
 On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote:
  Target: ARM TC2 A7-only (x3)
  Test: hackbench -g 25 --threads -l 1
  
  Before  After
  315.545 313.408 -0.68%
  
  Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz
  Test: hackbench -g 25 --threads -l 1000 (avg of 10)
  
  Before  After
  6.4643  6.395   -1.07%
 
 Yeah, so that is a problem.

Maybe I'm totally wrong, but doesn't hackbench report execution so less
is better? In that case -1.07% means we are doing better with the
patches applied (after time  before time). In any case, I should have
indicated whether the change is good or bad for performance.

 I'm taking it some of the new scaling stuff doesn't compile away, can we
 look at fixing that?

I will double-check that the stuff goes away as expected. I'm pretty
sure it does on ARM.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-08-16 Thread Peter Zijlstra
On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote:
> Target: ARM TC2 A7-only (x3)
> Test: hackbench -g 25 --threads -l 1
> 
> BeforeAfter
> 315.545   313.408 -0.68%
> 
> Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz
> Test: hackbench -g 25 --threads -l 1000 (avg of 10)
> 
> BeforeAfter
> 6.46436.395   -1.07%

Yeah, so that is a problem.

I'm taking it some of the new scaling stuff doesn't compile away, can we
look at fixing that?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-08-16 Thread Peter Zijlstra
On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote:
 Target: ARM TC2 A7-only (x3)
 Test: hackbench -g 25 --threads -l 1
 
 BeforeAfter
 315.545   313.408 -0.68%
 
 Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz
 Test: hackbench -g 25 --threads -l 1000 (avg of 10)
 
 BeforeAfter
 6.46436.395   -1.07%

Yeah, so that is a problem.

I'm taking it some of the new scaling stuff doesn't compile away, can we
look at fixing that?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-08-14 Thread Morten Rasmussen
Per-entity load-tracking currently only compensates for frequency scaling for
utilization tracking. This patch set extends this compensation to load as well,
and adds compute capacity (different microarchitectures and/or max
frequency/P-state) invariance to utilization. The former prevents suboptimal
load-balancing decisions when cpus run at different frequencies, while the
latter ensures that utilization (sched_avg.util_avg) can be compared across
cpus and that utilization can be compared directly to cpu capacity to determine
if the cpu is overloaded.

Note that this patch only contains the scheduler patches, the architecture
specific implementations of arch_scale_{freq, cpu}_capacity() will be posted
separately later.

The patches have posted several times before. Most recently as part of the
energy-model driven scheduling RFCv5 patch set [1] (patch #2,4,6,8-12). That
RFC also contains patches for the architecture specific side. In this posting
the commit messages have been updated and the patches have been rebased on a
more recent tip/sched/core that includes Yuyang's rewrite which made some of
the previously posted patches redundant.

Target: ARM TC2 A7-only (x3)
Test: hackbench -g 25 --threads -l 1

Before  After
315.545 313.408 -0.68%

Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz
Test: hackbench -g 25 --threads -l 1000 (avg of 10)

Before  After
6.4643  6.395   -1.07%

[1] http://www.kernelhub.org/?p=2=787634

Dietmar Eggemann (4):
  sched/fair: Make load tracking frequency scale-invariant
  sched/fair: Make utilization tracking cpu scale-invariant
  sched/fair: Name utilization related data and functions consistently
  sched/fair: Get rid of scaling utilization by capacity_orig

Morten Rasmussen (2):
  sched/fair: Convert arch_scale_cpu_capacity() from weak function to
#define
  sched/fair: Initialize task load and utilization before placing task
on rq

 include/linux/sched.h   |   8 ++--
 kernel/sched/core.c |   4 +-
 kernel/sched/fair.c | 109 +++-
 kernel/sched/features.h |   5 ---
 kernel/sched/sched.h|  11 +
 5 files changed, 69 insertions(+), 68 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking

2015-08-14 Thread Morten Rasmussen
Per-entity load-tracking currently only compensates for frequency scaling for
utilization tracking. This patch set extends this compensation to load as well,
and adds compute capacity (different microarchitectures and/or max
frequency/P-state) invariance to utilization. The former prevents suboptimal
load-balancing decisions when cpus run at different frequencies, while the
latter ensures that utilization (sched_avg.util_avg) can be compared across
cpus and that utilization can be compared directly to cpu capacity to determine
if the cpu is overloaded.

Note that this patch only contains the scheduler patches, the architecture
specific implementations of arch_scale_{freq, cpu}_capacity() will be posted
separately later.

The patches have posted several times before. Most recently as part of the
energy-model driven scheduling RFCv5 patch set [1] (patch #2,4,6,8-12). That
RFC also contains patches for the architecture specific side. In this posting
the commit messages have been updated and the patches have been rebased on a
more recent tip/sched/core that includes Yuyang's rewrite which made some of
the previously posted patches redundant.

Target: ARM TC2 A7-only (x3)
Test: hackbench -g 25 --threads -l 1

Before  After
315.545 313.408 -0.68%

Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz
Test: hackbench -g 25 --threads -l 1000 (avg of 10)

Before  After
6.4643  6.395   -1.07%

[1] http://www.kernelhub.org/?p=2msg=787634

Dietmar Eggemann (4):
  sched/fair: Make load tracking frequency scale-invariant
  sched/fair: Make utilization tracking cpu scale-invariant
  sched/fair: Name utilization related data and functions consistently
  sched/fair: Get rid of scaling utilization by capacity_orig

Morten Rasmussen (2):
  sched/fair: Convert arch_scale_cpu_capacity() from weak function to
#define
  sched/fair: Initialize task load and utilization before placing task
on rq

 include/linux/sched.h   |   8 ++--
 kernel/sched/core.c |   4 +-
 kernel/sched/fair.c | 109 +++-
 kernel/sched/features.h |   5 ---
 kernel/sched/sched.h|  11 +
 5 files changed, 69 insertions(+), 68 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/