Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-04-08 Thread Rik van Riel
On Wed, 2021-04-07 at 12:19 +0200, Peter Zijlstra wrote:
> On Wed, Apr 07, 2021 at 11:54:37AM +0200, Peter Zijlstra wrote:
> 
> > Let me have another poke at it.
> 
> Pretty much what you did, except I also did s/smt/has_idle_core/ and
> fixed that @sd thing.
> 
> Like so then?

Looks good to me. Thank you.

-- 
All Rights Reversed.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-04-07 Thread Peter Zijlstra
On Wed, Apr 07, 2021 at 11:47:17AM +0100, Mel Gorman wrote:

> Ok, cpusets do split domains. I can't imagine the logic of splitting SMT
> siblings across cpusets but if it's possible, it has to be checked and
> protecting that with cpusets_enabled() would be a little overkill and
> possibly miss some other corner case :(

Quite. The logic is that some people can't be arsed to look at how the
topology is enumerated and simply split logical CPUs by a random number.

Imagine we have 4 cores, with 2 threads each, for 8 logical CPUs.

Suppose you want a partition of 6 'cpus' and 2 spares. Hoping for a 3:1
core split.

If things are enumerated like: core0{smt0, smt1}, core1{smt0, smt1} ...
then 0-5 will get you 3 whole cores.

If OTOH things are enumerated like: smt0{core0, core1, core2, core3},
smt1{...} then 0-5 will get you the loonie side of the moon.

Funny thing is that x86 can iterate either way depending on how the BIOS
monkey filled out the tables.


Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-04-07 Thread Mel Gorman
On Wed, Apr 07, 2021 at 12:15:13PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 07, 2021 at 10:41:06AM +0100, Mel Gorman wrote:
> 
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -6112,6 +6112,27 @@ static int select_idle_core(struct task_
> > >   return -1;
> > >  }
> > >  
> > > +/*
> > > + * Scan the local SMT mask for idle CPUs.
> > > + */
> > > +static int select_idle_smt(struct task_struct *p, struct sched_domain 
> > > *sd, int target)
> > > +{
> > > + int cpu;
> > > +
> > > + if (!static_branch_likely(_smt_present))
> > > + return -1;
> > > +
> > > + for_each_cpu(cpu, cpu_smt_mask(target)) {
> > > + if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
> > > + !cpumask_test_cpu(cpu, sched_domain_span(sd)))
> > > + continue;
> > 
> > While I know that !cpumask_test_cpu(cpu, sched_domain_span(sd)) was
> > done previously, I found it hard to believe that the test matters. If
> > target/prev share a the LLC domain, why would the SMT siblings *not*
> > share a LLC?
> 
> I think the reason for it is that a cpuset might have split the siblings
> apart and disabled load-balancing across them or something.
> 
> Then the affinity mask can still cross the partition, but we shouldn't
> ever move into it through balancing.

Ok, cpusets do split domains. I can't imagine the logic of splitting SMT
siblings across cpusets but if it's possible, it has to be checked and
protecting that with cpusets_enabled() would be a little overkill and
possibly miss some other corner case :(

Thanks.

-- 
Mel Gorman
SUSE Labs


Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-04-07 Thread Vincent Guittot
On Wed, 7 Apr 2021 at 12:19, Peter Zijlstra  wrote:
>
> On Wed, Apr 07, 2021 at 11:54:37AM +0200, Peter Zijlstra wrote:
>
> > Let me have another poke at it.
>
> Pretty much what you did, except I also did s/smt/has_idle_core/ and
> fixed that @sd thing.
>
> Like so then?

Yes. Looks good to me

>
> ---
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6038,11 +6038,9 @@ static inline bool test_idle_cores(int c
>  {
> struct sched_domain_shared *sds;
>
> -   if (static_branch_likely(_smt_present)) {
> -   sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> -   if (sds)
> -   return READ_ONCE(sds->has_idle_cores);
> -   }
> +   sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> +   if (sds)
> +   return READ_ONCE(sds->has_idle_cores);
>
> return def;
>  }
> @@ -6112,6 +6110,24 @@ static int select_idle_core(struct task_
> return -1;
>  }
>
> +/*
> + * Scan the local SMT mask for idle CPUs.
> + */
> +static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, 
> int target)
> +{
> +   int cpu;
> +
> +   for_each_cpu(cpu, cpu_smt_mask(target)) {
> +   if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
> +   !cpumask_test_cpu(cpu, sched_domain_span(sd)))
> +   continue;
> +   if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> +   return cpu;
> +   }
> +
> +   return -1;
> +}
> +
>  #else /* CONFIG_SCHED_SMT */
>
>  static inline void set_idle_cores(int cpu, int val)
> @@ -6128,6 +6144,11 @@ static inline int select_idle_core(struc
> return __select_idle_cpu(core);
>  }
>
> +static inline int select_idle_smt(struct task_struct *p, struct sched_domain 
> *sd, int target)
> +{
> +   return -1;
> +}
> +
>  #endif /* CONFIG_SCHED_SMT */
>
>  /*
> @@ -6135,11 +6156,10 @@ static inline int select_idle_core(struc
>   * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
>   * average idle time for this rq (as found in rq->avg_idle).
>   */
> -static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, 
> int target)
> +static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, 
> bool has_idle_core, int target)
>  {
> struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> int i, cpu, idle_cpu = -1, nr = INT_MAX;
> -   bool smt = test_idle_cores(target, false);
> int this = smp_processor_id();
> struct sched_domain *this_sd;
> u64 time;
> @@ -6150,7 +6170,7 @@ static int select_idle_cpu(struct task_s
>
> cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>
> -   if (sched_feat(SIS_PROP) && !smt) {
> +   if (sched_feat(SIS_PROP) && !has_idle_core) {
> u64 avg_cost, avg_idle, span_avg;
>
> /*
> @@ -6170,7 +6190,7 @@ static int select_idle_cpu(struct task_s
> }
>
> for_each_cpu_wrap(cpu, cpus, target) {
> -   if (smt) {
> +   if (has_idle_core) {
> i = select_idle_core(p, cpu, cpus, _cpu);
> if ((unsigned int)i < nr_cpumask_bits)
> return i;
> @@ -6184,10 +6204,10 @@ static int select_idle_cpu(struct task_s
> }
> }
>
> -   if (smt)
> +   if (has_idle_core)
> set_idle_cores(this, false);
>
> -   if (sched_feat(SIS_PROP) && !smt) {
> +   if (sched_feat(SIS_PROP) && !has_idle_core) {
> time = cpu_clock(this) - time;
> update_avg(_sd->avg_scan_cost, time);
> }
> @@ -6242,6 +6262,7 @@ static inline bool asym_fits_capacity(in
>   */
>  static int select_idle_sibling(struct task_struct *p, int prev, int target)
>  {
> +   bool has_idle_core = false;
> struct sched_domain *sd;
> unsigned long task_util;
> int i, recent_used_cpu;
> @@ -6321,7 +6342,17 @@ static int select_idle_sibling(struct ta
> if (!sd)
> return target;
>
> -   i = select_idle_cpu(p, sd, target);
> +   if (static_branch_likely(_smt_present)) {
> +   has_idle_core = test_idle_cores(target, false);
> +
> +   if (!has_idle_core && cpus_share_cache(prev, target)) {
> +   i = select_idle_smt(p, sd, prev);
> +   if ((unsigned int)i < nr_cpumask_bits)
> +   return i;
> +   }
> +   }
> +
> +   i = select_idle_cpu(p, sd, has_idle_core, target);
> if ((unsigned)i < nr_cpumask_bits)
> return i;
>


Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-04-07 Thread Peter Zijlstra
On Wed, Apr 07, 2021 at 11:54:37AM +0200, Peter Zijlstra wrote:

> Let me have another poke at it.

Pretty much what you did, except I also did s/smt/has_idle_core/ and
fixed that @sd thing.

Like so then?

---
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6038,11 +6038,9 @@ static inline bool test_idle_cores(int c
 {
struct sched_domain_shared *sds;
 
-   if (static_branch_likely(_smt_present)) {
-   sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
-   if (sds)
-   return READ_ONCE(sds->has_idle_cores);
-   }
+   sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
+   if (sds)
+   return READ_ONCE(sds->has_idle_cores);
 
return def;
 }
@@ -6112,6 +6110,24 @@ static int select_idle_core(struct task_
return -1;
 }
 
+/*
+ * Scan the local SMT mask for idle CPUs.
+ */
+static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int 
target)
+{
+   int cpu;
+
+   for_each_cpu(cpu, cpu_smt_mask(target)) {
+   if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
+   !cpumask_test_cpu(cpu, sched_domain_span(sd)))
+   continue;
+   if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
+   return cpu;
+   }
+
+   return -1;
+}
+
 #else /* CONFIG_SCHED_SMT */
 
 static inline void set_idle_cores(int cpu, int val)
@@ -6128,6 +6144,11 @@ static inline int select_idle_core(struc
return __select_idle_cpu(core);
 }
 
+static inline int select_idle_smt(struct task_struct *p, struct sched_domain 
*sd, int target)
+{
+   return -1;
+}
+
 #endif /* CONFIG_SCHED_SMT */
 
 /*
@@ -6135,11 +6156,10 @@ static inline int select_idle_core(struc
  * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
  * average idle time for this rq (as found in rq->avg_idle).
  */
-static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int 
target)
+static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, 
bool has_idle_core, int target)
 {
struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
int i, cpu, idle_cpu = -1, nr = INT_MAX;
-   bool smt = test_idle_cores(target, false);
int this = smp_processor_id();
struct sched_domain *this_sd;
u64 time;
@@ -6150,7 +6170,7 @@ static int select_idle_cpu(struct task_s
 
cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
 
-   if (sched_feat(SIS_PROP) && !smt) {
+   if (sched_feat(SIS_PROP) && !has_idle_core) {
u64 avg_cost, avg_idle, span_avg;
 
/*
@@ -6170,7 +6190,7 @@ static int select_idle_cpu(struct task_s
}
 
for_each_cpu_wrap(cpu, cpus, target) {
-   if (smt) {
+   if (has_idle_core) {
i = select_idle_core(p, cpu, cpus, _cpu);
if ((unsigned int)i < nr_cpumask_bits)
return i;
@@ -6184,10 +6204,10 @@ static int select_idle_cpu(struct task_s
}
}
 
-   if (smt)
+   if (has_idle_core)
set_idle_cores(this, false);
 
-   if (sched_feat(SIS_PROP) && !smt) {
+   if (sched_feat(SIS_PROP) && !has_idle_core) {
time = cpu_clock(this) - time;
update_avg(_sd->avg_scan_cost, time);
}
@@ -6242,6 +6262,7 @@ static inline bool asym_fits_capacity(in
  */
 static int select_idle_sibling(struct task_struct *p, int prev, int target)
 {
+   bool has_idle_core = false;
struct sched_domain *sd;
unsigned long task_util;
int i, recent_used_cpu;
@@ -6321,7 +6342,17 @@ static int select_idle_sibling(struct ta
if (!sd)
return target;
 
-   i = select_idle_cpu(p, sd, target);
+   if (static_branch_likely(_smt_present)) {
+   has_idle_core = test_idle_cores(target, false);
+
+   if (!has_idle_core && cpus_share_cache(prev, target)) {
+   i = select_idle_smt(p, sd, prev);
+   if ((unsigned int)i < nr_cpumask_bits)
+   return i;
+   }
+   }
+
+   i = select_idle_cpu(p, sd, has_idle_core, target);
if ((unsigned)i < nr_cpumask_bits)
return i;
 


Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-04-07 Thread Peter Zijlstra
On Wed, Apr 07, 2021 at 10:41:06AM +0100, Mel Gorman wrote:

> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6112,6 +6112,27 @@ static int select_idle_core(struct task_
> > return -1;
> >  }
> >  
> > +/*
> > + * Scan the local SMT mask for idle CPUs.
> > + */
> > +static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, 
> > int target)
> > +{
> > +   int cpu;
> > +
> > +   if (!static_branch_likely(_smt_present))
> > +   return -1;
> > +
> > +   for_each_cpu(cpu, cpu_smt_mask(target)) {
> > +   if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
> > +   !cpumask_test_cpu(cpu, sched_domain_span(sd)))
> > +   continue;
> 
> While I know that !cpumask_test_cpu(cpu, sched_domain_span(sd)) was
> done previously, I found it hard to believe that the test matters. If
> target/prev share a the LLC domain, why would the SMT siblings *not*
> share a LLC?

I think the reason for it is that a cpuset might have split the siblings
apart and disabled load-balancing across them or something.

Then the affinity mask can still cross the partition, but we shouldn't
ever move into it through balancing.


Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-04-07 Thread Vincent Guittot
On Wed, 7 Apr 2021 at 11:55, Peter Zijlstra  wrote:
>
> On Wed, Apr 07, 2021 at 11:42:17AM +0200, Vincent Guittot wrote:
> > I would really prefer to keep that out of select_idle_cpu which aims to 
> > merge in one
> > single loop the walk through sd_llc. In the case of select_idle_smt, this 
> > is done outside
> > the loop:
>
> Fair enough.
>
> > @@ -6317,11 +6339,21 @@ static int select_idle_sibling(struct task_struct 
> > *p, int prev, int target)
> >   }
> >   }
> >
> > + if (static_branch_likely(_smt_present)) {
> > + smt = test_idle_cores(target, false);
> > + if (!smt && cpus_share_cache(prev, target)) {
> > + /* No idle core. Check if prev has an idle sibling. */
> > + i = select_idle_smt(p, sd, prev);
> > + if ((unsigned int)i < nr_cpumask_bits)
> > + return i;
> > + }
> > + }
> > +
> >   sd = rcu_dereference(per_cpu(sd_llc, target));
> >   if (!sd)
> >   return target;
>
> It needs to be here, otherwise you're using @sd uninitialized.

argh yes...

>
> > - i = select_idle_cpu(p, sd, target);
> > + i = select_idle_cpu(p, sd, smt, target);
> >   if ((unsigned)i < nr_cpumask_bits)
> >   return i;
>
> Let me have another poke at it.


Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-04-07 Thread Peter Zijlstra
On Wed, Apr 07, 2021 at 11:42:17AM +0200, Vincent Guittot wrote:
> I would really prefer to keep that out of select_idle_cpu which aims to merge 
> in one
> single loop the walk through sd_llc. In the case of select_idle_smt, this is 
> done outside
> the loop:

Fair enough.

> @@ -6317,11 +6339,21 @@ static int select_idle_sibling(struct task_struct *p, 
> int prev, int target)
>   }
>   }
>  
> + if (static_branch_likely(_smt_present)) {
> + smt = test_idle_cores(target, false);
> + if (!smt && cpus_share_cache(prev, target)) {
> + /* No idle core. Check if prev has an idle sibling. */
> + i = select_idle_smt(p, sd, prev);
> + if ((unsigned int)i < nr_cpumask_bits)
> + return i;
> + }
> + }
> +
>   sd = rcu_dereference(per_cpu(sd_llc, target));
>   if (!sd)
>   return target;

It needs to be here, otherwise you're using @sd uninitialized.

> - i = select_idle_cpu(p, sd, target);
> + i = select_idle_cpu(p, sd, smt, target);
>   if ((unsigned)i < nr_cpumask_bits)
>   return i;

Let me have another poke at it.


Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-04-07 Thread Vincent Guittot
Le mercredi 07 avril 2021 à 09:17:18 (+0200), Peter Zijlstra a écrit :
> On Tue, Apr 06, 2021 at 11:26:37AM -0400, Rik van Riel wrote:
> > I would be happy to pull the static branch out of select_idle_smt()
> > and place it into this if condition, though. You are right that
> > would save some overhead on non-smt systems.
> > 
> > Peter, would you prefer a follow-up patch for that or a version 4
> > of the patch?
> 
> Sorry, I was side-tracked with that core scheduling crap.. Something
> like the below then?
> 
> (Also fixed that stray line-wrap)
> 
> ---
> Subject: sched/fair: Bring back select_idle_smt(), but differently
> From: Rik van Riel 
> Date: Fri, 26 Mar 2021 15:19:32 -0400
> 
> From: Rik van Riel 
> 
> Mel Gorman did some nice work in 9fe1f127b913 ("sched/fair: Merge
> select_idle_core/cpu()"), resulting in the kernel being more efficient
> at finding an idle CPU, and in tasks spending less time waiting to be
> run, both according to the schedstats run_delay numbers, and according
> to measured application latencies. Yay.
> 
> The flip side of this is that we see more task migrations (about 30%
> more), higher cache misses, higher memory bandwidth utilization, and
> higher CPU use, for the same number of requests/second.
> 
> This is most pronounced on a memcache type workload, which saw a
> consistent 1-3% increase in total CPU use on the system, due to those
> increased task migrations leading to higher L2 cache miss numbers, and
> higher memory utilization. The exclusive L3 cache on Skylake does us
> no favors there.
> 
> On our web serving workload, that effect is usually negligible.
> 
> It appears that the increased number of CPU migrations is generally a
> good thing, since it leads to lower cpu_delay numbers, reflecting the
> fact that tasks get to run faster. However, the reduced locality and
> the corresponding increase in L2 cache misses hurts a little.
> 
> The patch below appears to fix the regression, while keeping the
> benefit of the lower cpu_delay numbers, by reintroducing
> select_idle_smt with a twist: when a socket has no idle cores, check
> to see if the sibling of "prev" is idle, before searching all the
> other CPUs.
> 
> This fixes both the occasional 9% regression on the web serving
> workload, and the continuous 2% CPU use regression on the memcache
> type workload.
> 
> With Mel's patches and this patch together, task migrations are still
> high, but L2 cache misses, memory bandwidth, and CPU time used are
> back down to what they were before. The p95 and p99 response times for
> the memcache type application improve by about 10% over what they were
> before Mel's patches got merged.
> 
> Signed-off-by: Rik van Riel 
> Signed-off-by: Peter Zijlstra (Intel) 
> Link: https://lkml.kernel.org/r/20210326151932.2c187...@imladris.surriel.com
> ---
>  kernel/sched/fair.c |   39 +--
>  1 file changed, 37 insertions(+), 2 deletions(-)
> 
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6112,6 +6112,27 @@ static int select_idle_core(struct task_
>   return -1;
>  }
>  
> +/*
> + * Scan the local SMT mask for idle CPUs.
> + */
> +static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, 
> int target)
> +{
> + int cpu;
> +
> + if (!static_branch_likely(_smt_present))
> + return -1;
> +
> + for_each_cpu(cpu, cpu_smt_mask(target)) {
> + if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
> + !cpumask_test_cpu(cpu, sched_domain_span(sd)))
> + continue;
> + if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> + return cpu;
> + }
> +
> + return -1;
> +}
> +
>  #else /* CONFIG_SCHED_SMT */
>  
>  static inline void set_idle_cores(int cpu, int val)
> @@ -6128,6 +6149,11 @@ static inline int select_idle_core(struc
>   return __select_idle_cpu(core);
>  }
>  
> +static inline int select_idle_smt(struct task_struct *p, struct sched_domain 
> *sd, int target)
> +{
> + return -1;
> +}
> +
>  #endif /* CONFIG_SCHED_SMT */
>  
>  /*
> @@ -6135,7 +6161,7 @@ static inline int select_idle_core(struc
>   * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
>   * average idle time for this rq (as found in rq->avg_idle).
>   */
> -static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, 
> int target)
> +static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, 
> int prev, int target)
>  {
>   struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
>   int i, cpu, idle_cpu = -1, nr = INT_MAX;
> @@ -6148,6 +6174,15 @@ static int select_idle_cpu(struct task_s
>   if (!this_sd)
>   return -1;
>  
> + /* If we have SMT but there are no idle cores */
> + if (static_branch_likely(_smt_presernt) && !smt) {
> + if (cpus_share_cache(prev, target)) {
> + i = select_idle_smt(p, sd, prev);
> +   

Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-04-07 Thread Mel Gorman
On Wed, Apr 07, 2021 at 09:17:18AM +0200, Peter Zijlstra wrote:
> Subject: sched/fair: Bring back select_idle_smt(), but differently
> From: Rik van Riel 
> Date: Fri, 26 Mar 2021 15:19:32 -0400
> 
> From: Rik van Riel 
> 
> Mel Gorman did some nice work in 9fe1f127b913 ("sched/fair: Merge
> select_idle_core/cpu()"), resulting in the kernel being more efficient
> at finding an idle CPU, and in tasks spending less time waiting to be
> run, both according to the schedstats run_delay numbers, and according
> to measured application latencies. Yay.
> 
> The flip side of this is that we see more task migrations (about 30%
> more), higher cache misses, higher memory bandwidth utilization, and
> higher CPU use, for the same number of requests/second.
> 
> This is most pronounced on a memcache type workload, which saw a
> consistent 1-3% increase in total CPU use on the system, due to those
> increased task migrations leading to higher L2 cache miss numbers, and
> higher memory utilization. The exclusive L3 cache on Skylake does us
> no favors there.
> 
> On our web serving workload, that effect is usually negligible.
> 
> It appears that the increased number of CPU migrations is generally a
> good thing, since it leads to lower cpu_delay numbers, reflecting the
> fact that tasks get to run faster. However, the reduced locality and
> the corresponding increase in L2 cache misses hurts a little.
> 
> The patch below appears to fix the regression, while keeping the
> benefit of the lower cpu_delay numbers, by reintroducing
> select_idle_smt with a twist: when a socket has no idle cores, check
> to see if the sibling of "prev" is idle, before searching all the
> other CPUs.
> 
> This fixes both the occasional 9% regression on the web serving
> workload, and the continuous 2% CPU use regression on the memcache
> type workload.
> 
> With Mel's patches and this patch together, task migrations are still
> high, but L2 cache misses, memory bandwidth, and CPU time used are
> back down to what they were before. The p95 and p99 response times for
> the memcache type application improve by about 10% over what they were
> before Mel's patches got merged.
> 
> Signed-off-by: Rik van Riel 
> Signed-off-by: Peter Zijlstra (Intel) 
> Link: https://lkml.kernel.org/r/20210326151932.2c187...@imladris.surriel.com

I think this is still ok and should not invalidate the previous tests on
v3. While test_idle_cores() was checked on target, as long as target/prev
share cache, the test should be equivalent other than there is a minor
race so

Reviewed-by: Mel Gorman 

One minor question below though which previously confused me.

> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6112,6 +6112,27 @@ static int select_idle_core(struct task_
>   return -1;
>  }
>  
> +/*
> + * Scan the local SMT mask for idle CPUs.
> + */
> +static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, 
> int target)
> +{
> + int cpu;
> +
> + if (!static_branch_likely(_smt_present))
> + return -1;
> +
> + for_each_cpu(cpu, cpu_smt_mask(target)) {
> + if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
> + !cpumask_test_cpu(cpu, sched_domain_span(sd)))
> + continue;

While I know that !cpumask_test_cpu(cpu, sched_domain_span(sd)) was
done previously, I found it hard to believe that the test matters. If
target/prev share a the LLC domain, why would the SMT siblings *not*
share a LLC?

-- 
Mel Gorman
SUSE Labs


Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-04-07 Thread Peter Zijlstra
On Tue, Apr 06, 2021 at 11:26:37AM -0400, Rik van Riel wrote:
> I would be happy to pull the static branch out of select_idle_smt()
> and place it into this if condition, though. You are right that
> would save some overhead on non-smt systems.
> 
> Peter, would you prefer a follow-up patch for that or a version 4
> of the patch?

Sorry, I was side-tracked with that core scheduling crap.. Something
like the below then?

(Also fixed that stray line-wrap)

---
Subject: sched/fair: Bring back select_idle_smt(), but differently
From: Rik van Riel 
Date: Fri, 26 Mar 2021 15:19:32 -0400

From: Rik van Riel 

Mel Gorman did some nice work in 9fe1f127b913 ("sched/fair: Merge
select_idle_core/cpu()"), resulting in the kernel being more efficient
at finding an idle CPU, and in tasks spending less time waiting to be
run, both according to the schedstats run_delay numbers, and according
to measured application latencies. Yay.

The flip side of this is that we see more task migrations (about 30%
more), higher cache misses, higher memory bandwidth utilization, and
higher CPU use, for the same number of requests/second.

This is most pronounced on a memcache type workload, which saw a
consistent 1-3% increase in total CPU use on the system, due to those
increased task migrations leading to higher L2 cache miss numbers, and
higher memory utilization. The exclusive L3 cache on Skylake does us
no favors there.

On our web serving workload, that effect is usually negligible.

It appears that the increased number of CPU migrations is generally a
good thing, since it leads to lower cpu_delay numbers, reflecting the
fact that tasks get to run faster. However, the reduced locality and
the corresponding increase in L2 cache misses hurts a little.

The patch below appears to fix the regression, while keeping the
benefit of the lower cpu_delay numbers, by reintroducing
select_idle_smt with a twist: when a socket has no idle cores, check
to see if the sibling of "prev" is idle, before searching all the
other CPUs.

This fixes both the occasional 9% regression on the web serving
workload, and the continuous 2% CPU use regression on the memcache
type workload.

With Mel's patches and this patch together, task migrations are still
high, but L2 cache misses, memory bandwidth, and CPU time used are
back down to what they were before. The p95 and p99 response times for
the memcache type application improve by about 10% over what they were
before Mel's patches got merged.

Signed-off-by: Rik van Riel 
Signed-off-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20210326151932.2c187...@imladris.surriel.com
---
 kernel/sched/fair.c |   39 +--
 1 file changed, 37 insertions(+), 2 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6112,6 +6112,27 @@ static int select_idle_core(struct task_
return -1;
 }
 
+/*
+ * Scan the local SMT mask for idle CPUs.
+ */
+static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int 
target)
+{
+   int cpu;
+
+   if (!static_branch_likely(_smt_present))
+   return -1;
+
+   for_each_cpu(cpu, cpu_smt_mask(target)) {
+   if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
+   !cpumask_test_cpu(cpu, sched_domain_span(sd)))
+   continue;
+   if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
+   return cpu;
+   }
+
+   return -1;
+}
+
 #else /* CONFIG_SCHED_SMT */
 
 static inline void set_idle_cores(int cpu, int val)
@@ -6128,6 +6149,11 @@ static inline int select_idle_core(struc
return __select_idle_cpu(core);
 }
 
+static inline int select_idle_smt(struct task_struct *p, struct sched_domain 
*sd, int target)
+{
+   return -1;
+}
+
 #endif /* CONFIG_SCHED_SMT */
 
 /*
@@ -6135,7 +6161,7 @@ static inline int select_idle_core(struc
  * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
  * average idle time for this rq (as found in rq->avg_idle).
  */
-static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int 
target)
+static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int 
prev, int target)
 {
struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
int i, cpu, idle_cpu = -1, nr = INT_MAX;
@@ -6148,6 +6174,15 @@ static int select_idle_cpu(struct task_s
if (!this_sd)
return -1;
 
+   /* If we have SMT but there are no idle cores */
+   if (static_branch_likely(_smt_presernt) && !smt) {
+   if (cpus_share_cache(prev, target)) {
+   i = select_idle_smt(p, sd, prev);
+   if ((unsigned int)i < nr_cpumask_bits)
+   return i;
+   }
+   }
+
cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
 
if (sched_feat(SIS_PROP) && !smt) {
@@ -6321,7 +6356,7 @@ static int 

Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-04-06 Thread Vincent Guittot
On Tue, 6 Apr 2021 at 17:55, Rik van Riel  wrote:
>
> On Tue, 2021-04-06 at 17:31 +0200, Vincent Guittot wrote:
> > On Tue, 6 Apr 2021 at 17:26, Rik van Riel  wrote:
> > > On Tue, 2021-04-06 at 17:10 +0200, Vincent Guittot wrote:
> > > > On Fri, 26 Mar 2021 at 20:19, Rik van Riel 
> > > > wrote:
> > > >
> > > > > -static int select_idle_cpu(struct task_struct *p, struct
> > > > > sched_domain *sd, int target)
> > > > > +static int select_idle_cpu(struct task_struct *p, struct
> > > > > sched_domain *sd, int prev, int target)
> > > > >  {
> > > > > struct cpumask *cpus =
> > > > > this_cpu_cpumask_var_ptr(select_idle_mask);
> > > > > int i, cpu, idle_cpu = -1, nr = INT_MAX;
> > > > > @@ -6136,23 +6163,32 @@ static int select_idle_cpu(struct
> > > > > task_struct *p, struct sched_domain *sd, int t
> > > > >
> > > > > cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > > > >
> > > > > -   if (sched_feat(SIS_PROP) && !smt) {
> > > > > -   u64 avg_cost, avg_idle, span_avg;
> > > > > +   if (!smt) {
> > > > > +   if (cpus_share_cache(prev, target)) {
> > > >
> > > > Have you checked the impact on no smt system ? would worth a
> > > > static
> > > > branch.
> > > >
> > > > Also, this doesn't need to be in select_idle_cpu() which aims to
> > > > loop
> > > > the sched_domain becaus you only compare  target and prev. So you
> > > > can
> > > > move this call to select_idle_smt() in select_idle_sibling()
> > >
> > > After Mel's rewrite, there no longer are calls to
> > > select_idle_core() or select_idle_smt() in select_idle_sibling().
> >
> > select_idle_smt() had even disappeared that why it was not in
> > select_idle_sibling
> >
> > > Everything got folded into one single loop in select_idle_cpu()
> >
> > but this is done completely out of the loop so we don't need to
> > complify the function with unrelated stuff
>
> Not entirely. The call to select_idle_smt() is still
> conditional on test_idle_cores() returning false.
>
> We only look for the
> other sibling if there is no idle
> core in the LLC. If there is an idle core, we prefer
> that.
>
> Pulling the select_idle_smt() call out of select_idle_cpu()
> would mean having to test_idle_cores() twice.

In this case passes  the results test_idle_cores as a parameters instead of prev

>
> --
> All Rights Reversed.


Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-04-06 Thread Rik van Riel
On Tue, 2021-04-06 at 17:31 +0200, Vincent Guittot wrote:
> On Tue, 6 Apr 2021 at 17:26, Rik van Riel  wrote:
> > On Tue, 2021-04-06 at 17:10 +0200, Vincent Guittot wrote:
> > > On Fri, 26 Mar 2021 at 20:19, Rik van Riel 
> > > wrote:
> > > 
> > > > -static int select_idle_cpu(struct task_struct *p, struct
> > > > sched_domain *sd, int target)
> > > > +static int select_idle_cpu(struct task_struct *p, struct
> > > > sched_domain *sd, int prev, int target)
> > > >  {
> > > > struct cpumask *cpus =
> > > > this_cpu_cpumask_var_ptr(select_idle_mask);
> > > > int i, cpu, idle_cpu = -1, nr = INT_MAX;
> > > > @@ -6136,23 +6163,32 @@ static int select_idle_cpu(struct
> > > > task_struct *p, struct sched_domain *sd, int t
> > > > 
> > > > cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > > > 
> > > > -   if (sched_feat(SIS_PROP) && !smt) {
> > > > -   u64 avg_cost, avg_idle, span_avg;
> > > > +   if (!smt) {
> > > > +   if (cpus_share_cache(prev, target)) {
> > > 
> > > Have you checked the impact on no smt system ? would worth a
> > > static
> > > branch.
> > > 
> > > Also, this doesn't need to be in select_idle_cpu() which aims to
> > > loop
> > > the sched_domain becaus you only compare  target and prev. So you
> > > can
> > > move this call to select_idle_smt() in select_idle_sibling()
> > 
> > After Mel's rewrite, there no longer are calls to
> > select_idle_core() or select_idle_smt() in select_idle_sibling().
> 
> select_idle_smt() had even disappeared that why it was not in
> select_idle_sibling
> 
> > Everything got folded into one single loop in select_idle_cpu()
> 
> but this is done completely out of the loop so we don't need to
> complify the function with unrelated stuff

Not entirely. The call to select_idle_smt() is still
conditional on test_idle_cores() returning false.

We only look for the
other sibling if there is no idle
core in the LLC. If there is an idle core, we prefer
that.

Pulling the select_idle_smt() call out of select_idle_cpu()
would mean having to test_idle_cores() twice.

-- 
All Rights Reversed.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-04-06 Thread Vincent Guittot
On Tue, 6 Apr 2021 at 17:31, Vincent Guittot  wrote:
>
> On Tue, 6 Apr 2021 at 17:26, Rik van Riel  wrote:
> >
> > On Tue, 2021-04-06 at 17:10 +0200, Vincent Guittot wrote:
> > > On Fri, 26 Mar 2021 at 20:19, Rik van Riel  wrote:
> > >
> > > > -static int select_idle_cpu(struct task_struct *p, struct
> > > > sched_domain *sd, int target)
> > > > +static int select_idle_cpu(struct task_struct *p, struct
> > > > sched_domain *sd, int prev, int target)
> > > >  {
> > > > struct cpumask *cpus =
> > > > this_cpu_cpumask_var_ptr(select_idle_mask);
> > > > int i, cpu, idle_cpu = -1, nr = INT_MAX;
> > > > @@ -6136,23 +6163,32 @@ static int select_idle_cpu(struct
> > > > task_struct *p, struct sched_domain *sd, int t
> > > >
> > > > cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > > >
> > > > -   if (sched_feat(SIS_PROP) && !smt) {
> > > > -   u64 avg_cost, avg_idle, span_avg;
> > > > +   if (!smt) {
> > > > +   if (cpus_share_cache(prev, target)) {
> > >
> > > Have you checked the impact on no smt system ? would worth a static
> > > branch.
> > >
> > > Also, this doesn't need to be in select_idle_cpu() which aims to loop
> > > the sched_domain becaus you only compare  target and prev. So you can
> > > move this call to select_idle_smt() in select_idle_sibling()
> >
> > After Mel's rewrite, there no longer are calls to
> > select_idle_core() or select_idle_smt() in select_idle_sibling().
>
> select_idle_smt() had even disappeared that why it was not in
> select_idle_sibling
>
> >
> > Everything got folded into one single loop in select_idle_cpu()
>
> but this is done completely out of the loop so we don't need to
> complify the function with unrelated stuff


s/complify/complexify/

>
> >
> > I would be happy to pull the static branch out of select_idle_smt()
> > and place it into this if condition, though. You are right that
> > would save some overhead on non-smt systems.
> >
> > Peter, would you prefer a follow-up patch for that or a version 4
> > of the patch?
> >
> > --
> > All Rights Reversed.


Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-04-06 Thread Vincent Guittot
On Tue, 6 Apr 2021 at 17:26, Rik van Riel  wrote:
>
> On Tue, 2021-04-06 at 17:10 +0200, Vincent Guittot wrote:
> > On Fri, 26 Mar 2021 at 20:19, Rik van Riel  wrote:
> >
> > > -static int select_idle_cpu(struct task_struct *p, struct
> > > sched_domain *sd, int target)
> > > +static int select_idle_cpu(struct task_struct *p, struct
> > > sched_domain *sd, int prev, int target)
> > >  {
> > > struct cpumask *cpus =
> > > this_cpu_cpumask_var_ptr(select_idle_mask);
> > > int i, cpu, idle_cpu = -1, nr = INT_MAX;
> > > @@ -6136,23 +6163,32 @@ static int select_idle_cpu(struct
> > > task_struct *p, struct sched_domain *sd, int t
> > >
> > > cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > >
> > > -   if (sched_feat(SIS_PROP) && !smt) {
> > > -   u64 avg_cost, avg_idle, span_avg;
> > > +   if (!smt) {
> > > +   if (cpus_share_cache(prev, target)) {
> >
> > Have you checked the impact on no smt system ? would worth a static
> > branch.
> >
> > Also, this doesn't need to be in select_idle_cpu() which aims to loop
> > the sched_domain becaus you only compare  target and prev. So you can
> > move this call to select_idle_smt() in select_idle_sibling()
>
> After Mel's rewrite, there no longer are calls to
> select_idle_core() or select_idle_smt() in select_idle_sibling().

select_idle_smt() had even disappeared that why it was not in
select_idle_sibling

>
> Everything got folded into one single loop in select_idle_cpu()

but this is done completely out of the loop so we don't need to
complify the function with unrelated stuff

>
> I would be happy to pull the static branch out of select_idle_smt()
> and place it into this if condition, though. You are right that
> would save some overhead on non-smt systems.
>
> Peter, would you prefer a follow-up patch for that or a version 4
> of the patch?
>
> --
> All Rights Reversed.


Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-04-06 Thread Rik van Riel
On Tue, 2021-04-06 at 17:10 +0200, Vincent Guittot wrote:
> On Fri, 26 Mar 2021 at 20:19, Rik van Riel  wrote:
> 
> > -static int select_idle_cpu(struct task_struct *p, struct
> > sched_domain *sd, int target)
> > +static int select_idle_cpu(struct task_struct *p, struct
> > sched_domain *sd, int prev, int target)
> >  {
> > struct cpumask *cpus =
> > this_cpu_cpumask_var_ptr(select_idle_mask);
> > int i, cpu, idle_cpu = -1, nr = INT_MAX;
> > @@ -6136,23 +6163,32 @@ static int select_idle_cpu(struct
> > task_struct *p, struct sched_domain *sd, int t
> > 
> > cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > 
> > -   if (sched_feat(SIS_PROP) && !smt) {
> > -   u64 avg_cost, avg_idle, span_avg;
> > +   if (!smt) {
> > +   if (cpus_share_cache(prev, target)) {
> 
> Have you checked the impact on no smt system ? would worth a static
> branch.
> 
> Also, this doesn't need to be in select_idle_cpu() which aims to loop
> the sched_domain becaus you only compare  target and prev. So you can
> move this call to select_idle_smt() in select_idle_sibling()

After Mel's rewrite, there no longer are calls to
select_idle_core() or select_idle_smt() in select_idle_sibling().

Everything got folded into one single loop in select_idle_cpu()

I would be happy to pull the static branch out of select_idle_smt()
and place it into this if condition, though. You are right that
would save some overhead on non-smt systems.

Peter, would you prefer a follow-up patch for that or a version 4
of the patch?

-- 
All Rights Reversed.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-04-06 Thread Vincent Guittot
On Fri, 26 Mar 2021 at 20:19, Rik van Riel  wrote:
>
> On Mon, 22 Mar 2021 11:03:06 +
> Mel Gorman  wrote:
>
>
> > Second, select_idle_smt() does not use the cpus mask so consider moving
> > the cpus initialisation after select_idle_smt() has been called.
> > Specifically this initialisation
> >
> >   cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> >
> > Alternatively, clear the bits in the SMT sibling scan to avoid checking
> > the siblings twice. It's a tradeoff because initialising and clearing
> > bits is not free and the cost is wasted if a sibling is free.
>
> I tried a number of different variations on moving the CPU mask
> initialization, and clearing CPUs from the mask, and failed to
> get any clear results from those in testing, even in workloads
> with lots of context switches.
>
> Below is a simple version that seems to perform identically to
> more complicated versions :)
>
> ---8<---
> sched,fair: bring back select_idle_smt, but differently
>
> Mel Gorman did some nice work in 9fe1f127b913
> ("sched/fair: Merge select_idle_core/cpu()"), resulting in the kernel
> being more efficient at finding an idle CPU, and in tasks spending less
> time waiting to be run, both according to the schedstats run_delay
> numbers, and according to measured application latencies. Yay.
>
> The flip side of this is that we see more task migrations (about
> 30% more), higher cache misses, higher memory bandwidth utilization,
> and higher CPU use, for the same number of requests/second.
>
> This is most pronounced on a memcache type workload, which saw
> a consistent 1-3% increase in total CPU use on the system, due
> to those increased task migrations leading to higher L2 cache
> miss numbers, and higher memory utilization. The exclusive L3
> cache on Skylake does us no favors there.
>
> On our web serving workload, that effect is usually negligible.
>
> It appears that the increased number of CPU migrations is generally
> a good thing, since it leads to lower cpu_delay numbers, reflecting
> the fact that tasks get to run faster. However, the reduced locality
> and the corresponding increase in L2 cache misses hurts a little.
>
> The patch below appears to fix the regression, while keeping the
> benefit of the lower cpu_delay numbers, by reintroducing select_idle_smt
> with a twist: when a socket has no idle cores, check to see if the
> sibling of "prev" is idle, before searching all the other CPUs.
>
> This fixes both the occasional 9% regression on the web serving
> workload, and the continuous 2% CPU use regression on the memcache
> type workload.
>
> With Mel's patches and this patch together, task migrations are still
> high, but L2 cache misses, memory bandwidth, and CPU time used are back
> down to what they were before. The p95 and p99 response times for the
> memcache type application improve by about 10% over what they were
> before Mel's patches got merged.
>
> Signed-off-by: Rik van Riel 
> ---
>  kernel/sched/fair.c | 68 ++---
>  1 file changed, 52 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 794c2cb945f8..69680158963f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6098,6 +6098,28 @@ static int select_idle_core(struct task_struct *p, int 
> core, struct cpumask *cpu
> return -1;
>  }
>
> +/*
> + * Scan the local SMT mask for idle CPUs.
> + */
> +static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, 
> int
> +target)
> +{
> +   int cpu;
> +
> +   if (!static_branch_likely(_smt_present))
> +   return -1;
> +
> +   for_each_cpu(cpu, cpu_smt_mask(target)) {
> +   if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
> +   !cpumask_test_cpu(cpu, sched_domain_span(sd)))
> +   continue;
> +   if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> +   return cpu;
> +   }
> +
> +   return -1;
> +}
> +
>  #else /* CONFIG_SCHED_SMT */
>
>  static inline void set_idle_cores(int cpu, int val)
> @@ -6114,6 +6136,11 @@ static inline int select_idle_core(struct task_struct 
> *p, int core, struct cpuma
> return __select_idle_cpu(core);
>  }
>
> +static inline int select_idle_smt(struct task_struct *p, struct sched_domain 
> *sd, int target)
> +{
> +   return -1;
> +}
> +
>  #endif /* CONFIG_SCHED_SMT */
>
>  /*
> @@ -6121,7 +6148,7 @@ static inline int select_idle_core(struct task_struct 
> *p, int core, struct cpuma
>   * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
>   * average idle time for this rq (as found in rq->avg_idle).
>   */
> -static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, 
> int target)
> +static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, 
> int prev, int target)
>  {
> struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> 

Re: [PATCH v3] sched/fair: bring back select_idle_smt, but differently

2021-03-28 Thread Mel Gorman
On Fri, Mar 26, 2021 at 03:19:32PM -0400, Rik van Riel wrote:
> ---8<---
> sched,fair: bring back select_idle_smt, but differently
> 
> Mel Gorman did some nice work in 9fe1f127b913
> ("sched/fair: Merge select_idle_core/cpu()"), resulting in the kernel
> being more efficient at finding an idle CPU, and in tasks spending less
> time waiting to be run, both according to the schedstats run_delay
> numbers, and according to measured application latencies. Yay.
> 
> The flip side of this is that we see more task migrations (about
> 30% more), higher cache misses, higher memory bandwidth utilization,
> and higher CPU use, for the same number of requests/second.
> 
> This is most pronounced on a memcache type workload, which saw
> a consistent 1-3% increase in total CPU use on the system, due
> to those increased task migrations leading to higher L2 cache
> miss numbers, and higher memory utilization. The exclusive L3
> cache on Skylake does us no favors there.
> 
> On our web serving workload, that effect is usually negligible.
> 
> It appears that the increased number of CPU migrations is generally
> a good thing, since it leads to lower cpu_delay numbers, reflecting
> the fact that tasks get to run faster. However, the reduced locality
> and the corresponding increase in L2 cache misses hurts a little.
> 
> The patch below appears to fix the regression, while keeping the
> benefit of the lower cpu_delay numbers, by reintroducing select_idle_smt
> with a twist: when a socket has no idle cores, check to see if the
> sibling of "prev" is idle, before searching all the other CPUs.
> 
> This fixes both the occasional 9% regression on the web serving
> workload, and the continuous 2% CPU use regression on the memcache
> type workload.
> 
> With Mel's patches and this patch together, task migrations are still
> high, but L2 cache misses, memory bandwidth, and CPU time used are back
> down to what they were before. The p95 and p99 response times for the
> memcache type application improve by about 10% over what they were
> before Mel's patches got merged.
> 
> Signed-off-by: Rik van Riel 

FWIW, v3 appears to have performed faster than v2 on the few tests I ran
and the patch looks fine.

Reviewed-by: Mel Gorman 

-- 
Mel Gorman
SUSE Labs