Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On 07/09/15 13:42, Peter Zijlstra wrote: > On Mon, Aug 31, 2015 at 11:24:49AM +0200, Peter Zijlstra wrote: > >> A quick run here gives: >> >> IVB-EP (2*20*2): > > As noted by someone; that should be 2*10*2, for a total of 40 cpus in > this machine. > >> >> perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000 >> >> Before: After: >> 5.484170711 ( +- 0.74% )5.590001145 ( +- 0.45% ) >> >> Which is an almost 2% slowdown :/ >> >> I've yet to look at what happens. > > OK, so it appears this is link order nonsense. When I compared profiles > between the series, the one function that had significant change was > skb_release_data(), which doesn't make much sense. > > If I do a 'make clean' in front of each build, I get a repeatable > improvement with this patch set (although how much of that is due to the > patches itself or just because of code movement is as yet undetermined). > > I'm of a mind to apply these patches; with two patches on top, which > I'll post shortly. > -- >8 -- From: Dietmar Eggemann Date: Mon, 7 Sep 2015 14:57:22 +0100 Subject: [PATCH] sched/fair: Defer calling scaling functions Do not call the scaling functions in case time goes backwards or the last update of the sched_avg structure has happened less than 1024ns ago. Signed-off-by: Dietmar Eggemann --- kernel/sched/fair.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d6ca8d987a63..3445d2fb38f4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2552,8 +2552,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, u64 delta, scaled_delta, periods; u32 contrib; unsigned int delta_w, scaled_delta_w, decayed = 0; - unsigned long scale_freq = arch_scale_freq_capacity(NULL, cpu); - unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu); + unsigned long scale_freq, scale_cpu; delta = now - sa->last_update_time; /* @@ -2574,6 +2573,9 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, return 0; sa->last_update_time = now; + scale_freq = arch_scale_freq_capacity(NULL, cpu); + scale_cpu = arch_scale_cpu_capacity(NULL, cpu); + /* delta_w is the amount already accumulated against our next period */ delta_w = sa->period_contrib; if (delta + delta_w >= 1024) { -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On Mon, Sep 07, 2015 at 02:42:20PM +0200, Peter Zijlstra wrote: > I'm of a mind to apply these patches; with two patches on top, which > I'll post shortly. --- Subject: sched: Optimize __update_load_avg() From: Peter Zijlstra Date: Mon Sep 7 15:09:15 CEST 2015 Prior to this patch; the line: scaled_delta_w = (delta_w * 1024) >> 10; which is the result of the default arch_scale_freq_capacity() function, turns into: 1b03: 49 89 d1mov%rdx,%r9 1b06: 49 c1 e1 0a shl$0xa,%r9 1b0a: 49 c1 e9 0a shr$0xa,%r9 Which is silly; when made unsigned int, GCC recognises this as pointless ops and fails to emit them (confirmed on 4.9.3 and 5.1.1). Furthermore, afaict unsigned is actually the correct type for these fields anyway, as we've explicitly ruled out negative delta's earlier in this function. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2551,7 +2551,7 @@ __update_load_avg(u64 now, int cpu, stru { u64 delta, scaled_delta, periods; u32 contrib; - int delta_w, scaled_delta_w, decayed = 0; + unsigned int delta_w, scaled_delta_w, decayed = 0; unsigned long scale_freq = arch_scale_freq_capacity(NULL, cpu); unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On Mon, Sep 07, 2015 at 02:42:20PM +0200, Peter Zijlstra wrote: > I'm of a mind to apply these patches; with two patches on top, which > I'll post shortly. --- Subject: sched: Rename scale() From: Peter Zijlstra Date: Mon Sep 7 15:05:42 CEST 2015 Rename scale() to cap_scale() to better reflect its purpose, it is after all not a general purpose scale function, it has SCHED_CAPACITY_SHIFT hardcoded in it. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2515,7 +2515,7 @@ static u32 __compute_runnable_contrib(u6 return contrib + runnable_avg_yN_sum[n]; } -#define scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT) +#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT) /* * We can represent the historical contribution to runnable average as the @@ -2588,7 +2588,7 @@ __update_load_avg(u64 now, int cpu, stru * period and accrue it. */ delta_w = 1024 - delta_w; - scaled_delta_w = scale(delta_w, scale_freq); + scaled_delta_w = cap_scale(delta_w, scale_freq); if (weight) { sa->load_sum += weight * scaled_delta_w; if (cfs_rq) { @@ -2597,7 +2597,7 @@ __update_load_avg(u64 now, int cpu, stru } } if (running) - sa->util_sum += scale(scaled_delta_w, scale_cpu); + sa->util_sum += cap_scale(scaled_delta_w, scale_cpu); delta -= delta_w; @@ -2614,25 +2614,25 @@ __update_load_avg(u64 now, int cpu, stru /* Efficiently calculate \sum (1..n_period) 1024*y^i */ contrib = __compute_runnable_contrib(periods); - contrib = scale(contrib, scale_freq); + contrib = cap_scale(contrib, scale_freq); if (weight) { sa->load_sum += weight * contrib; if (cfs_rq) cfs_rq->runnable_load_sum += weight * contrib; } if (running) - sa->util_sum += scale(contrib, scale_cpu); + sa->util_sum += cap_scale(contrib, scale_cpu); } /* Remainder of delta accrued against u_0` */ - scaled_delta = scale(delta, scale_freq); + scaled_delta = cap_scale(delta, scale_freq); if (weight) { sa->load_sum += weight * scaled_delta; if (cfs_rq) cfs_rq->runnable_load_sum += weight * scaled_delta; } if (running) - sa->util_sum += scale(scaled_delta, scale_cpu); + sa->util_sum += cap_scale(scaled_delta, scale_cpu); sa->period_contrib += delta; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On Mon, Aug 31, 2015 at 11:24:49AM +0200, Peter Zijlstra wrote: > A quick run here gives: > > IVB-EP (2*20*2): As noted by someone; that should be 2*10*2, for a total of 40 cpus in this machine. > > perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000 > > Before: After: > 5.484170711 ( +- 0.74% ) 5.590001145 ( +- 0.45% ) > > Which is an almost 2% slowdown :/ > > I've yet to look at what happens. OK, so it appears this is link order nonsense. When I compared profiles between the series, the one function that had significant change was skb_release_data(), which doesn't make much sense. If I do a 'make clean' in front of each build, I get a repeatable improvement with this patch set (although how much of that is due to the patches itself or just because of code movement is as yet undetermined). I'm of a mind to apply these patches; with two patches on top, which I'll post shortly. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On Mon, Sep 07, 2015 at 02:42:20PM +0200, Peter Zijlstra wrote: > I'm of a mind to apply these patches; with two patches on top, which > I'll post shortly. --- Subject: sched: Optimize __update_load_avg() From: Peter ZijlstraDate: Mon Sep 7 15:09:15 CEST 2015 Prior to this patch; the line: scaled_delta_w = (delta_w * 1024) >> 10; which is the result of the default arch_scale_freq_capacity() function, turns into: 1b03: 49 89 d1mov%rdx,%r9 1b06: 49 c1 e1 0a shl$0xa,%r9 1b0a: 49 c1 e9 0a shr$0xa,%r9 Which is silly; when made unsigned int, GCC recognises this as pointless ops and fails to emit them (confirmed on 4.9.3 and 5.1.1). Furthermore, afaict unsigned is actually the correct type for these fields anyway, as we've explicitly ruled out negative delta's earlier in this function. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2551,7 +2551,7 @@ __update_load_avg(u64 now, int cpu, stru { u64 delta, scaled_delta, periods; u32 contrib; - int delta_w, scaled_delta_w, decayed = 0; + unsigned int delta_w, scaled_delta_w, decayed = 0; unsigned long scale_freq = arch_scale_freq_capacity(NULL, cpu); unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On 07/09/15 13:42, Peter Zijlstra wrote: > On Mon, Aug 31, 2015 at 11:24:49AM +0200, Peter Zijlstra wrote: > >> A quick run here gives: >> >> IVB-EP (2*20*2): > > As noted by someone; that should be 2*10*2, for a total of 40 cpus in > this machine. > >> >> perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000 >> >> Before: After: >> 5.484170711 ( +- 0.74% )5.590001145 ( +- 0.45% ) >> >> Which is an almost 2% slowdown :/ >> >> I've yet to look at what happens. > > OK, so it appears this is link order nonsense. When I compared profiles > between the series, the one function that had significant change was > skb_release_data(), which doesn't make much sense. > > If I do a 'make clean' in front of each build, I get a repeatable > improvement with this patch set (although how much of that is due to the > patches itself or just because of code movement is as yet undetermined). > > I'm of a mind to apply these patches; with two patches on top, which > I'll post shortly. > -- >8 -- From: Dietmar EggemannDate: Mon, 7 Sep 2015 14:57:22 +0100 Subject: [PATCH] sched/fair: Defer calling scaling functions Do not call the scaling functions in case time goes backwards or the last update of the sched_avg structure has happened less than 1024ns ago. Signed-off-by: Dietmar Eggemann --- kernel/sched/fair.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d6ca8d987a63..3445d2fb38f4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2552,8 +2552,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, u64 delta, scaled_delta, periods; u32 contrib; unsigned int delta_w, scaled_delta_w, decayed = 0; - unsigned long scale_freq = arch_scale_freq_capacity(NULL, cpu); - unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu); + unsigned long scale_freq, scale_cpu; delta = now - sa->last_update_time; /* @@ -2574,6 +2573,9 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, return 0; sa->last_update_time = now; + scale_freq = arch_scale_freq_capacity(NULL, cpu); + scale_cpu = arch_scale_cpu_capacity(NULL, cpu); + /* delta_w is the amount already accumulated against our next period */ delta_w = sa->period_contrib; if (delta + delta_w >= 1024) { -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On Mon, Sep 07, 2015 at 02:42:20PM +0200, Peter Zijlstra wrote: > I'm of a mind to apply these patches; with two patches on top, which > I'll post shortly. --- Subject: sched: Rename scale() From: Peter ZijlstraDate: Mon Sep 7 15:05:42 CEST 2015 Rename scale() to cap_scale() to better reflect its purpose, it is after all not a general purpose scale function, it has SCHED_CAPACITY_SHIFT hardcoded in it. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2515,7 +2515,7 @@ static u32 __compute_runnable_contrib(u6 return contrib + runnable_avg_yN_sum[n]; } -#define scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT) +#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT) /* * We can represent the historical contribution to runnable average as the @@ -2588,7 +2588,7 @@ __update_load_avg(u64 now, int cpu, stru * period and accrue it. */ delta_w = 1024 - delta_w; - scaled_delta_w = scale(delta_w, scale_freq); + scaled_delta_w = cap_scale(delta_w, scale_freq); if (weight) { sa->load_sum += weight * scaled_delta_w; if (cfs_rq) { @@ -2597,7 +2597,7 @@ __update_load_avg(u64 now, int cpu, stru } } if (running) - sa->util_sum += scale(scaled_delta_w, scale_cpu); + sa->util_sum += cap_scale(scaled_delta_w, scale_cpu); delta -= delta_w; @@ -2614,25 +2614,25 @@ __update_load_avg(u64 now, int cpu, stru /* Efficiently calculate \sum (1..n_period) 1024*y^i */ contrib = __compute_runnable_contrib(periods); - contrib = scale(contrib, scale_freq); + contrib = cap_scale(contrib, scale_freq); if (weight) { sa->load_sum += weight * contrib; if (cfs_rq) cfs_rq->runnable_load_sum += weight * contrib; } if (running) - sa->util_sum += scale(contrib, scale_cpu); + sa->util_sum += cap_scale(contrib, scale_cpu); } /* Remainder of delta accrued against u_0` */ - scaled_delta = scale(delta, scale_freq); + scaled_delta = cap_scale(delta, scale_freq); if (weight) { sa->load_sum += weight * scaled_delta; if (cfs_rq) cfs_rq->runnable_load_sum += weight * scaled_delta; } if (running) - sa->util_sum += scale(scaled_delta, scale_cpu); + sa->util_sum += cap_scale(scaled_delta, scale_cpu); sa->period_contrib += delta; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On Mon, Aug 31, 2015 at 11:24:49AM +0200, Peter Zijlstra wrote: > A quick run here gives: > > IVB-EP (2*20*2): As noted by someone; that should be 2*10*2, for a total of 40 cpus in this machine. > > perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000 > > Before: After: > 5.484170711 ( +- 0.74% ) 5.590001145 ( +- 0.45% ) > > Which is an almost 2% slowdown :/ > > I've yet to look at what happens. OK, so it appears this is link order nonsense. When I compared profiles between the series, the one function that had significant change was skb_release_data(), which doesn't make much sense. If I do a 'make clean' in front of each build, I get a repeatable improvement with this patch set (although how much of that is due to the patches itself or just because of code movement is as yet undetermined). I'm of a mind to apply these patches; with two patches on top, which I'll post shortly. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On 08/31/2015 11:24 AM, Peter Zijlstra wrote: On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote: Target: ARM TC2 A7-only (x3) Test: hackbench -g 25 --threads -l 1 Before After 315.545 313.408 -0.68% Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz Test: hackbench -g 25 --threads -l 1000 (avg of 10) Before After 6.4643 6.395 -1.07% A quick run here gives: IVB-EP (2*20*2): perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000 Before: After: 5.484170711 ( +- 0.74% ) 5.590001145 ( +- 0.45% ) Which is an almost 2% slowdown :/ I've yet to look at what happens. I tested the patch-set on top of tip: ff277d4250fe - sched/deadline: Fix comment in enqueue_task_dl() on a 2 cluster IVB-EP (2 clusters * 10 cores * 2 HW threads) = 40 logical cpus w/ (SMT, MC, NUMA sd's). model name : Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000 Before: After: 5.049361160 ( +- 1.26% )5.014980654 ( +- 1.20% ) Even by running this test multiple times I never saw something like a 2% slowdown. It's a vanilla ubuntu 15.04 system which might explain the slightly higher stddev. We could optimize the changes we did in __update_load_avg() by only calculating the additional scaled values [scaled_delta_w, contrib, scaled_delta] in case the function is called w/ 'weight !=0 && running !=0'. This is also true for the initialization of scale_freq and scale_cpu. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On 08/31/2015 11:24 AM, Peter Zijlstra wrote: On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote: Target: ARM TC2 A7-only (x3) Test: hackbench -g 25 --threads -l 1 Before After 315.545 313.408 -0.68% Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz Test: hackbench -g 25 --threads -l 1000 (avg of 10) Before After 6.4643 6.395 -1.07% A quick run here gives: IVB-EP (2*20*2): perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000 Before: After: 5.484170711 ( +- 0.74% ) 5.590001145 ( +- 0.45% ) Which is an almost 2% slowdown :/ I've yet to look at what happens. I tested the patch-set on top of tip: ff277d4250fe - sched/deadline: Fix comment in enqueue_task_dl() on a 2 cluster IVB-EP (2 clusters * 10 cores * 2 HW threads) = 40 logical cpus w/ (SMT, MC, NUMA sd's). model name : Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000 Before: After: 5.049361160 ( +- 1.26% )5.014980654 ( +- 1.20% ) Even by running this test multiple times I never saw something like a 2% slowdown. It's a vanilla ubuntu 15.04 system which might explain the slightly higher stddev. We could optimize the changes we did in __update_load_avg() by only calculating the additional scaled values [scaled_delta_w, contrib, scaled_delta] in case the function is called w/ 'weight !=0 && running !=0'. This is also true for the initialization of scale_freq and scale_cpu. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote: > Target: ARM TC2 A7-only (x3) > Test: hackbench -g 25 --threads -l 1 > > BeforeAfter > 315.545 313.408 -0.68% > > Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz > Test: hackbench -g 25 --threads -l 1000 (avg of 10) > > BeforeAfter > 6.46436.395 -1.07% > A quick run here gives: IVB-EP (2*20*2): perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000 Before: After: 5.484170711 ( +- 0.74% ) 5.590001145 ( +- 0.45% ) Which is an almost 2% slowdown :/ I've yet to look at what happens. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote: > Target: ARM TC2 A7-only (x3) > Test: hackbench -g 25 --threads -l 1 > > BeforeAfter > 315.545 313.408 -0.68% > > Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz > Test: hackbench -g 25 --threads -l 1000 (avg of 10) > > BeforeAfter > 6.46436.395 -1.07% > A quick run here gives: IVB-EP (2*20*2): perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 5000 Before: After: 5.484170711 ( +- 0.74% ) 5.590001145 ( +- 0.45% ) Which is an almost 2% slowdown :/ I've yet to look at what happens. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On Mon, Aug 17, 2015 at 12:29:51PM +0100, Morten Rasmussen wrote: > On Sun, Aug 16, 2015 at 10:46:05PM +0200, Peter Zijlstra wrote: > > On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote: > > > Target: ARM TC2 A7-only (x3) > > > Test: hackbench -g 25 --threads -l 1 > > > > > > BeforeAfter > > > 315.545 313.408 -0.68% > > > > > > Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz > > > Test: hackbench -g 25 --threads -l 1000 (avg of 10) > > > > > > BeforeAfter > > > 6.46436.395 -1.07% > > > > Yeah, so that is a problem. > > Maybe I'm totally wrong, but doesn't hackbench report execution so less > is better? In that case -1.07% means we are doing better with the > patches applied (after time < before time). In any case, I should have > indicated whether the change is good or bad for performance. > > > I'm taking it some of the new scaling stuff doesn't compile away, can we > > look at fixing that? > > I will double-check that the stuff goes away as expected. I'm pretty > sure it does on ARM. Ah, uhm.. you have a point there ;-) I'll run the numbers when I'm back home again. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On Sun, Aug 16, 2015 at 10:46:05PM +0200, Peter Zijlstra wrote: > On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote: > > Target: ARM TC2 A7-only (x3) > > Test: hackbench -g 25 --threads -l 1 > > > > Before After > > 315.545 313.408 -0.68% > > > > Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz > > Test: hackbench -g 25 --threads -l 1000 (avg of 10) > > > > Before After > > 6.4643 6.395 -1.07% > > Yeah, so that is a problem. Maybe I'm totally wrong, but doesn't hackbench report execution so less is better? In that case -1.07% means we are doing better with the patches applied (after time < before time). In any case, I should have indicated whether the change is good or bad for performance. > I'm taking it some of the new scaling stuff doesn't compile away, can we > look at fixing that? I will double-check that the stuff goes away as expected. I'm pretty sure it does on ARM. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On Mon, Aug 17, 2015 at 12:29:51PM +0100, Morten Rasmussen wrote: On Sun, Aug 16, 2015 at 10:46:05PM +0200, Peter Zijlstra wrote: On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote: Target: ARM TC2 A7-only (x3) Test: hackbench -g 25 --threads -l 1 BeforeAfter 315.545 313.408 -0.68% Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz Test: hackbench -g 25 --threads -l 1000 (avg of 10) BeforeAfter 6.46436.395 -1.07% Yeah, so that is a problem. Maybe I'm totally wrong, but doesn't hackbench report execution so less is better? In that case -1.07% means we are doing better with the patches applied (after time before time). In any case, I should have indicated whether the change is good or bad for performance. I'm taking it some of the new scaling stuff doesn't compile away, can we look at fixing that? I will double-check that the stuff goes away as expected. I'm pretty sure it does on ARM. Ah, uhm.. you have a point there ;-) I'll run the numbers when I'm back home again. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On Sun, Aug 16, 2015 at 10:46:05PM +0200, Peter Zijlstra wrote: On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote: Target: ARM TC2 A7-only (x3) Test: hackbench -g 25 --threads -l 1 Before After 315.545 313.408 -0.68% Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz Test: hackbench -g 25 --threads -l 1000 (avg of 10) Before After 6.4643 6.395 -1.07% Yeah, so that is a problem. Maybe I'm totally wrong, but doesn't hackbench report execution so less is better? In that case -1.07% means we are doing better with the patches applied (after time before time). In any case, I should have indicated whether the change is good or bad for performance. I'm taking it some of the new scaling stuff doesn't compile away, can we look at fixing that? I will double-check that the stuff goes away as expected. I'm pretty sure it does on ARM. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote: > Target: ARM TC2 A7-only (x3) > Test: hackbench -g 25 --threads -l 1 > > BeforeAfter > 315.545 313.408 -0.68% > > Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz > Test: hackbench -g 25 --threads -l 1000 (avg of 10) > > BeforeAfter > 6.46436.395 -1.07% Yeah, so that is a problem. I'm taking it some of the new scaling stuff doesn't compile away, can we look at fixing that? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
On Fri, Aug 14, 2015 at 05:23:08PM +0100, Morten Rasmussen wrote: Target: ARM TC2 A7-only (x3) Test: hackbench -g 25 --threads -l 1 BeforeAfter 315.545 313.408 -0.68% Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz Test: hackbench -g 25 --threads -l 1000 (avg of 10) BeforeAfter 6.46436.395 -1.07% Yeah, so that is a problem. I'm taking it some of the new scaling stuff doesn't compile away, can we look at fixing that? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
Per-entity load-tracking currently only compensates for frequency scaling for utilization tracking. This patch set extends this compensation to load as well, and adds compute capacity (different microarchitectures and/or max frequency/P-state) invariance to utilization. The former prevents suboptimal load-balancing decisions when cpus run at different frequencies, while the latter ensures that utilization (sched_avg.util_avg) can be compared across cpus and that utilization can be compared directly to cpu capacity to determine if the cpu is overloaded. Note that this patch only contains the scheduler patches, the architecture specific implementations of arch_scale_{freq, cpu}_capacity() will be posted separately later. The patches have posted several times before. Most recently as part of the energy-model driven scheduling RFCv5 patch set [1] (patch #2,4,6,8-12). That RFC also contains patches for the architecture specific side. In this posting the commit messages have been updated and the patches have been rebased on a more recent tip/sched/core that includes Yuyang's rewrite which made some of the previously posted patches redundant. Target: ARM TC2 A7-only (x3) Test: hackbench -g 25 --threads -l 1 Before After 315.545 313.408 -0.68% Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz Test: hackbench -g 25 --threads -l 1000 (avg of 10) Before After 6.4643 6.395 -1.07% [1] http://www.kernelhub.org/?p=2=787634 Dietmar Eggemann (4): sched/fair: Make load tracking frequency scale-invariant sched/fair: Make utilization tracking cpu scale-invariant sched/fair: Name utilization related data and functions consistently sched/fair: Get rid of scaling utilization by capacity_orig Morten Rasmussen (2): sched/fair: Convert arch_scale_cpu_capacity() from weak function to #define sched/fair: Initialize task load and utilization before placing task on rq include/linux/sched.h | 8 ++-- kernel/sched/core.c | 4 +- kernel/sched/fair.c | 109 +++- kernel/sched/features.h | 5 --- kernel/sched/sched.h| 11 + 5 files changed, 69 insertions(+), 68 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking
Per-entity load-tracking currently only compensates for frequency scaling for utilization tracking. This patch set extends this compensation to load as well, and adds compute capacity (different microarchitectures and/or max frequency/P-state) invariance to utilization. The former prevents suboptimal load-balancing decisions when cpus run at different frequencies, while the latter ensures that utilization (sched_avg.util_avg) can be compared across cpus and that utilization can be compared directly to cpu capacity to determine if the cpu is overloaded. Note that this patch only contains the scheduler patches, the architecture specific implementations of arch_scale_{freq, cpu}_capacity() will be posted separately later. The patches have posted several times before. Most recently as part of the energy-model driven scheduling RFCv5 patch set [1] (patch #2,4,6,8-12). That RFC also contains patches for the architecture specific side. In this posting the commit messages have been updated and the patches have been rebased on a more recent tip/sched/core that includes Yuyang's rewrite which made some of the previously posted patches redundant. Target: ARM TC2 A7-only (x3) Test: hackbench -g 25 --threads -l 1 Before After 315.545 313.408 -0.68% Target: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz Test: hackbench -g 25 --threads -l 1000 (avg of 10) Before After 6.4643 6.395 -1.07% [1] http://www.kernelhub.org/?p=2msg=787634 Dietmar Eggemann (4): sched/fair: Make load tracking frequency scale-invariant sched/fair: Make utilization tracking cpu scale-invariant sched/fair: Name utilization related data and functions consistently sched/fair: Get rid of scaling utilization by capacity_orig Morten Rasmussen (2): sched/fair: Convert arch_scale_cpu_capacity() from weak function to #define sched/fair: Initialize task load and utilization before placing task on rq include/linux/sched.h | 8 ++-- kernel/sched/core.c | 4 +- kernel/sched/fair.c | 109 +++- kernel/sched/features.h | 5 --- kernel/sched/sched.h| 11 + 5 files changed, 69 insertions(+), 68 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/