Re: [PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-09-11 Thread Morten Rasmussen
On Mon, Sep 10, 2018 at 10:21:11AM +0200, Ingo Molnar wrote:
> 
> * Morten Rasmussen  wrote:
> 
> > The SD_ASYM_CPUCAPACITY sched_domain flag is supposed to mark the
> > sched_domain in the hierarchy where all cpu capacities are visible for
> > any cpu's point of view on asymmetric cpu capacity systems. The
> 
> >  /*
> > + * Find the sched_domain_topology_level where all cpu capacities are 
> > visible
> > + * for all cpus.
> > + */
> 
> > +   /*
> > +* Examine topology from all cpu's point of views to detect the lowest
> > +* sched_domain_topology_level where a highest capacity cpu is visible
> > +* to everyone.
> > +*/
> 
> >  #define SD_WAKE_AFFINE   0x0020  /* Wake task to waking CPU */
> > -#define SD_ASYM_CPUCAPACITY  0x0040  /* Groups have different max cpu 
> > capacities */
> > +#define SD_ASYM_CPUCAPACITY  0x0040  /* Domain members have different cpu 
> > capacities */
> 
> For future reference: *please* capitalize 'CPU' and 'CPUs' in future patches 
> like the rest of 
> the scheduler does.
> 
> You can see it spelled right above the new definition: 'waking CPU' ;-)
> 
> (I fixed this up in this patch.)

Noted. Thanks for fixing up the patch.

Morten


Re: [PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-09-11 Thread Morten Rasmussen
On Mon, Sep 10, 2018 at 10:21:11AM +0200, Ingo Molnar wrote:
> 
> * Morten Rasmussen  wrote:
> 
> > The SD_ASYM_CPUCAPACITY sched_domain flag is supposed to mark the
> > sched_domain in the hierarchy where all cpu capacities are visible for
> > any cpu's point of view on asymmetric cpu capacity systems. The
> 
> >  /*
> > + * Find the sched_domain_topology_level where all cpu capacities are 
> > visible
> > + * for all cpus.
> > + */
> 
> > +   /*
> > +* Examine topology from all cpu's point of views to detect the lowest
> > +* sched_domain_topology_level where a highest capacity cpu is visible
> > +* to everyone.
> > +*/
> 
> >  #define SD_WAKE_AFFINE   0x0020  /* Wake task to waking CPU */
> > -#define SD_ASYM_CPUCAPACITY  0x0040  /* Groups have different max cpu 
> > capacities */
> > +#define SD_ASYM_CPUCAPACITY  0x0040  /* Domain members have different cpu 
> > capacities */
> 
> For future reference: *please* capitalize 'CPU' and 'CPUs' in future patches 
> like the rest of 
> the scheduler does.
> 
> You can see it spelled right above the new definition: 'waking CPU' ;-)
> 
> (I fixed this up in this patch.)

Noted. Thanks for fixing up the patch.

Morten


Re: [PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-09-10 Thread Ingo Molnar


* Morten Rasmussen  wrote:

> The SD_ASYM_CPUCAPACITY sched_domain flag is supposed to mark the
> sched_domain in the hierarchy where all cpu capacities are visible for
> any cpu's point of view on asymmetric cpu capacity systems. The

>  /*
> + * Find the sched_domain_topology_level where all cpu capacities are visible
> + * for all cpus.
> + */

> + /*
> +  * Examine topology from all cpu's point of views to detect the lowest
> +  * sched_domain_topology_level where a highest capacity cpu is visible
> +  * to everyone.
> +  */

>  #define SD_WAKE_AFFINE   0x0020  /* Wake task to waking CPU */
> -#define SD_ASYM_CPUCAPACITY  0x0040  /* Groups have different max cpu 
> capacities */
> +#define SD_ASYM_CPUCAPACITY  0x0040  /* Domain members have different cpu 
> capacities */

For future reference: *please* capitalize 'CPU' and 'CPUs' in future patches 
like the rest of 
the scheduler does.

You can see it spelled right above the new definition: 'waking CPU' ;-)

(I fixed this up in this patch.)

Thanks!

Ingo


Re: [PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-09-10 Thread Ingo Molnar


* Morten Rasmussen  wrote:

> The SD_ASYM_CPUCAPACITY sched_domain flag is supposed to mark the
> sched_domain in the hierarchy where all cpu capacities are visible for
> any cpu's point of view on asymmetric cpu capacity systems. The

>  /*
> + * Find the sched_domain_topology_level where all cpu capacities are visible
> + * for all cpus.
> + */

> + /*
> +  * Examine topology from all cpu's point of views to detect the lowest
> +  * sched_domain_topology_level where a highest capacity cpu is visible
> +  * to everyone.
> +  */

>  #define SD_WAKE_AFFINE   0x0020  /* Wake task to waking CPU */
> -#define SD_ASYM_CPUCAPACITY  0x0040  /* Groups have different max cpu 
> capacities */
> +#define SD_ASYM_CPUCAPACITY  0x0040  /* Domain members have different cpu 
> capacities */

For future reference: *please* capitalize 'CPU' and 'CPUs' in future patches 
like the rest of 
the scheduler does.

You can see it spelled right above the new definition: 'waking CPU' ;-)

(I fixed this up in this patch.)

Thanks!

Ingo


Re: [PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-07-24 Thread Qais Yousef

On 24/07/18 09:37, Morten Rasmussen wrote:

On Mon, Jul 23, 2018 at 05:07:50PM +0100, Qais Yousef wrote:

On 23/07/18 16:27, Morten Rasmussen wrote:

It does increase the cost of things like hotplug slightly and
repartitioning of root_domains a slightly but I don't see how we can
avoid it if we want generic code to set this flag. If the costs are not
acceptable I think the only option is to make the detection architecture
specific.

I think hotplug is already expensive and this overhead would be small in
comparison. But this could be called when frequency changes if I understood
correctly - this is the one I wasn't sure how 'hot' it could be. I wouldn't
expect frequency changes at a very high rate because it's relatively
expensive too..

A frequency change shouldn't lead to a flag change or a rebuild of the
sched_domain hierarhcy. The situations where the hierarchy should be
rebuild to update the flag is during boot as we only know the amount of
asymmetry once cpufreq has been initialized, when cpus are hotplugged
in/out, and when root_domains change due to cpuset reconfiguration. So
it should be a relatively rare event.



Ah OK I misunderstood that part then.

The series LGTM then.

--
Qais Yousef



Re: [PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-07-24 Thread Qais Yousef

On 24/07/18 09:37, Morten Rasmussen wrote:

On Mon, Jul 23, 2018 at 05:07:50PM +0100, Qais Yousef wrote:

On 23/07/18 16:27, Morten Rasmussen wrote:

It does increase the cost of things like hotplug slightly and
repartitioning of root_domains a slightly but I don't see how we can
avoid it if we want generic code to set this flag. If the costs are not
acceptable I think the only option is to make the detection architecture
specific.

I think hotplug is already expensive and this overhead would be small in
comparison. But this could be called when frequency changes if I understood
correctly - this is the one I wasn't sure how 'hot' it could be. I wouldn't
expect frequency changes at a very high rate because it's relatively
expensive too..

A frequency change shouldn't lead to a flag change or a rebuild of the
sched_domain hierarhcy. The situations where the hierarchy should be
rebuild to update the flag is during boot as we only know the amount of
asymmetry once cpufreq has been initialized, when cpus are hotplugged
in/out, and when root_domains change due to cpuset reconfiguration. So
it should be a relatively rare event.



Ah OK I misunderstood that part then.

The series LGTM then.

--
Qais Yousef



Re: [PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-07-24 Thread Morten Rasmussen
On Mon, Jul 23, 2018 at 05:07:50PM +0100, Qais Yousef wrote:
> On 23/07/18 16:27, Morten Rasmussen wrote:
> >It does increase the cost of things like hotplug slightly and
> >repartitioning of root_domains a slightly but I don't see how we can
> >avoid it if we want generic code to set this flag. If the costs are not
> >acceptable I think the only option is to make the detection architecture
> >specific.
> 
> I think hotplug is already expensive and this overhead would be small in
> comparison. But this could be called when frequency changes if I understood
> correctly - this is the one I wasn't sure how 'hot' it could be. I wouldn't
> expect frequency changes at a very high rate because it's relatively
> expensive too..

A frequency change shouldn't lead to a flag change or a rebuild of the
sched_domain hierarhcy. The situations where the hierarchy should be
rebuild to update the flag is during boot as we only know the amount of
asymmetry once cpufreq has been initialized, when cpus are hotplugged
in/out, and when root_domains change due to cpuset reconfiguration. So
it should be a relatively rare event.


Re: [PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-07-24 Thread Morten Rasmussen
On Mon, Jul 23, 2018 at 05:07:50PM +0100, Qais Yousef wrote:
> On 23/07/18 16:27, Morten Rasmussen wrote:
> >It does increase the cost of things like hotplug slightly and
> >repartitioning of root_domains a slightly but I don't see how we can
> >avoid it if we want generic code to set this flag. If the costs are not
> >acceptable I think the only option is to make the detection architecture
> >specific.
> 
> I think hotplug is already expensive and this overhead would be small in
> comparison. But this could be called when frequency changes if I understood
> correctly - this is the one I wasn't sure how 'hot' it could be. I wouldn't
> expect frequency changes at a very high rate because it's relatively
> expensive too..

A frequency change shouldn't lead to a flag change or a rebuild of the
sched_domain hierarhcy. The situations where the hierarchy should be
rebuild to update the flag is during boot as we only know the amount of
asymmetry once cpufreq has been initialized, when cpus are hotplugged
in/out, and when root_domains change due to cpuset reconfiguration. So
it should be a relatively rare event.


Re: [PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-07-23 Thread Qais Yousef

On 23/07/18 16:27, Morten Rasmussen wrote:

[...]


+   /*
+* Examine topology from all cpu's point of views to detect the lowest
+* sched_domain_topology_level where a highest capacity cpu is visible
+* to everyone.
+*/
+   for_each_cpu(i, cpu_map) {
+   unsigned long max_capacity = arch_scale_cpu_capacity(NULL, i);
+   int tl_id = 0;
+
+   for_each_sd_topology(tl) {
+   if (tl_id < asym_level)
+   goto next_level;
+

I think if you increment and then continue here you might save the extra
branch. I didn't look at any disassembly though to verify the generated
code.

I wonder if we can introduce for_each_sd_topology_from(tl, starting_level)
so that you can start searching from a provided level - which will make this
skipping logic unnecessary? So the code will look like

             for_each_sd_topology_from(tl, asymc_level) {
                 ...
             }

Both options would work. Increment+contrinue instead of goto would be
slightly less readable I think since we would still have the increment
at the end of the loop, but easy to do. Introducing
for_each_sd_topology_from() improve things too, but I wonder if it is
worth it.


I don't mind the current form to be honest. I agree it's not worth it if 
it is called infrequent enough.



@@ -1647,18 +1707,27 @@ build_sched_domains(const struct cpumask *cpu_map, 
struct sched_domain_attr *att
struct s_data d;
struct rq *rq = NULL;
int i, ret = -ENOMEM;
+   struct sched_domain_topology_level *tl_asym;
alloc_state = __visit_domain_allocation_hell(, cpu_map);
if (alloc_state != sa_rootdomain)
goto error;
+   tl_asym = asym_cpu_capacity_level(cpu_map);
+

Or maybe this is not a hot path and we don't care that much about optimizing
the search since you call it unconditionally here even for systems that
don't care?

It does increase the cost of things like hotplug slightly and
repartitioning of root_domains a slightly but I don't see how we can
avoid it if we want generic code to set this flag. If the costs are not
acceptable I think the only option is to make the detection architecture
specific.


I think hotplug is already expensive and this overhead would be small in 
comparison. But this could be called when frequency changes if I 
understood correctly - this is the one I wasn't sure how 'hot' it could 
be. I wouldn't expect frequency changes at a very high rate because it's 
relatively expensive too..



In any case, AFAIK rebuilding the sched_domain hierarchy shouldn't be a
normal and common thing to do. If checking for the flag is not
acceptable on SMP-only architectures, I can move it under arch/arm[,64]
although it is not as clean.



I like the approach and I think it's nice and clean. If it actually 
appears in some profiles I think we have room to optimize it.


--
Qais Yousef



Re: [PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-07-23 Thread Qais Yousef

On 23/07/18 16:27, Morten Rasmussen wrote:

[...]


+   /*
+* Examine topology from all cpu's point of views to detect the lowest
+* sched_domain_topology_level where a highest capacity cpu is visible
+* to everyone.
+*/
+   for_each_cpu(i, cpu_map) {
+   unsigned long max_capacity = arch_scale_cpu_capacity(NULL, i);
+   int tl_id = 0;
+
+   for_each_sd_topology(tl) {
+   if (tl_id < asym_level)
+   goto next_level;
+

I think if you increment and then continue here you might save the extra
branch. I didn't look at any disassembly though to verify the generated
code.

I wonder if we can introduce for_each_sd_topology_from(tl, starting_level)
so that you can start searching from a provided level - which will make this
skipping logic unnecessary? So the code will look like

             for_each_sd_topology_from(tl, asymc_level) {
                 ...
             }

Both options would work. Increment+contrinue instead of goto would be
slightly less readable I think since we would still have the increment
at the end of the loop, but easy to do. Introducing
for_each_sd_topology_from() improve things too, but I wonder if it is
worth it.


I don't mind the current form to be honest. I agree it's not worth it if 
it is called infrequent enough.



@@ -1647,18 +1707,27 @@ build_sched_domains(const struct cpumask *cpu_map, 
struct sched_domain_attr *att
struct s_data d;
struct rq *rq = NULL;
int i, ret = -ENOMEM;
+   struct sched_domain_topology_level *tl_asym;
alloc_state = __visit_domain_allocation_hell(, cpu_map);
if (alloc_state != sa_rootdomain)
goto error;
+   tl_asym = asym_cpu_capacity_level(cpu_map);
+

Or maybe this is not a hot path and we don't care that much about optimizing
the search since you call it unconditionally here even for systems that
don't care?

It does increase the cost of things like hotplug slightly and
repartitioning of root_domains a slightly but I don't see how we can
avoid it if we want generic code to set this flag. If the costs are not
acceptable I think the only option is to make the detection architecture
specific.


I think hotplug is already expensive and this overhead would be small in 
comparison. But this could be called when frequency changes if I 
understood correctly - this is the one I wasn't sure how 'hot' it could 
be. I wouldn't expect frequency changes at a very high rate because it's 
relatively expensive too..



In any case, AFAIK rebuilding the sched_domain hierarchy shouldn't be a
normal and common thing to do. If checking for the flag is not
acceptable on SMP-only architectures, I can move it under arch/arm[,64]
although it is not as clean.



I like the approach and I think it's nice and clean. If it actually 
appears in some profiles I think we have room to optimize it.


--
Qais Yousef



Re: [PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-07-23 Thread Morten Rasmussen
On Mon, Jul 23, 2018 at 02:25:34PM +0100, Qais Yousef wrote:
> Hi Morten
> 
> On 20/07/18 14:32, Morten Rasmussen wrote:
> >The SD_ASYM_CPUCAPACITY sched_domain flag is supposed to mark the
> >sched_domain in the hierarchy where all cpu capacities are visible for
> >any cpu's point of view on asymmetric cpu capacity systems. The
> >scheduler can then take to take capacity asymmetry into account when
> 
> Did you mean "s/take to take/try to take/"?

Yes.


[...]

> >+/*
> >+ * Examine topology from all cpu's point of views to detect the lowest
> >+ * sched_domain_topology_level where a highest capacity cpu is visible
> >+ * to everyone.
> >+ */
> >+for_each_cpu(i, cpu_map) {
> >+unsigned long max_capacity = arch_scale_cpu_capacity(NULL, i);
> >+int tl_id = 0;
> >+
> >+for_each_sd_topology(tl) {
> >+if (tl_id < asym_level)
> >+goto next_level;
> >+
> 
> I think if you increment and then continue here you might save the extra
> branch. I didn't look at any disassembly though to verify the generated
> code.
> 
> I wonder if we can introduce for_each_sd_topology_from(tl, starting_level)
> so that you can start searching from a provided level - which will make this
> skipping logic unnecessary? So the code will look like
> 
>             for_each_sd_topology_from(tl, asymc_level) {
>                 ...
>             }

Both options would work. Increment+contrinue instead of goto would be
slightly less readable I think since we would still have the increment
at the end of the loop, but easy to do. Introducing
for_each_sd_topology_from() improve things too, but I wonder if it is
worth it.

> >@@ -1647,18 +1707,27 @@ build_sched_domains(const struct cpumask *cpu_map, 
> >struct sched_domain_attr *att
> > struct s_data d;
> > struct rq *rq = NULL;
> > int i, ret = -ENOMEM;
> >+struct sched_domain_topology_level *tl_asym;
> > alloc_state = __visit_domain_allocation_hell(, cpu_map);
> > if (alloc_state != sa_rootdomain)
> > goto error;
> >+tl_asym = asym_cpu_capacity_level(cpu_map);
> >+
> 
> Or maybe this is not a hot path and we don't care that much about optimizing
> the search since you call it unconditionally here even for systems that
> don't care?

It does increase the cost of things like hotplug slightly and
repartitioning of root_domains a slightly but I don't see how we can
avoid it if we want generic code to set this flag. If the costs are not
acceptable I think the only option is to make the detection architecture
specific.

In any case, AFAIK rebuilding the sched_domain hierarchy shouldn't be a
normal and common thing to do. If checking for the flag is not
acceptable on SMP-only architectures, I can move it under arch/arm[,64]
although it is not as clean.

Morten


Re: [PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-07-23 Thread Morten Rasmussen
On Mon, Jul 23, 2018 at 02:25:34PM +0100, Qais Yousef wrote:
> Hi Morten
> 
> On 20/07/18 14:32, Morten Rasmussen wrote:
> >The SD_ASYM_CPUCAPACITY sched_domain flag is supposed to mark the
> >sched_domain in the hierarchy where all cpu capacities are visible for
> >any cpu's point of view on asymmetric cpu capacity systems. The
> >scheduler can then take to take capacity asymmetry into account when
> 
> Did you mean "s/take to take/try to take/"?

Yes.


[...]

> >+/*
> >+ * Examine topology from all cpu's point of views to detect the lowest
> >+ * sched_domain_topology_level where a highest capacity cpu is visible
> >+ * to everyone.
> >+ */
> >+for_each_cpu(i, cpu_map) {
> >+unsigned long max_capacity = arch_scale_cpu_capacity(NULL, i);
> >+int tl_id = 0;
> >+
> >+for_each_sd_topology(tl) {
> >+if (tl_id < asym_level)
> >+goto next_level;
> >+
> 
> I think if you increment and then continue here you might save the extra
> branch. I didn't look at any disassembly though to verify the generated
> code.
> 
> I wonder if we can introduce for_each_sd_topology_from(tl, starting_level)
> so that you can start searching from a provided level - which will make this
> skipping logic unnecessary? So the code will look like
> 
>             for_each_sd_topology_from(tl, asymc_level) {
>                 ...
>             }

Both options would work. Increment+contrinue instead of goto would be
slightly less readable I think since we would still have the increment
at the end of the loop, but easy to do. Introducing
for_each_sd_topology_from() improve things too, but I wonder if it is
worth it.

> >@@ -1647,18 +1707,27 @@ build_sched_domains(const struct cpumask *cpu_map, 
> >struct sched_domain_attr *att
> > struct s_data d;
> > struct rq *rq = NULL;
> > int i, ret = -ENOMEM;
> >+struct sched_domain_topology_level *tl_asym;
> > alloc_state = __visit_domain_allocation_hell(, cpu_map);
> > if (alloc_state != sa_rootdomain)
> > goto error;
> >+tl_asym = asym_cpu_capacity_level(cpu_map);
> >+
> 
> Or maybe this is not a hot path and we don't care that much about optimizing
> the search since you call it unconditionally here even for systems that
> don't care?

It does increase the cost of things like hotplug slightly and
repartitioning of root_domains a slightly but I don't see how we can
avoid it if we want generic code to set this flag. If the costs are not
acceptable I think the only option is to make the detection architecture
specific.

In any case, AFAIK rebuilding the sched_domain hierarchy shouldn't be a
normal and common thing to do. If checking for the flag is not
acceptable on SMP-only architectures, I can move it under arch/arm[,64]
although it is not as clean.

Morten


Re: [PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-07-23 Thread Qais Yousef

Hi Morten

On 20/07/18 14:32, Morten Rasmussen wrote:

The SD_ASYM_CPUCAPACITY sched_domain flag is supposed to mark the
sched_domain in the hierarchy where all cpu capacities are visible for
any cpu's point of view on asymmetric cpu capacity systems. The
scheduler can then take to take capacity asymmetry into account when


Did you mean "s/take to take/try to take/"?


balancing at this level. It also serves as an indicator for how wide
task placement heuristics have to search to consider all available cpu
capacities as asymmetric systems might often appear symmetric at
smallest level(s) of the sched_domain hierarchy.

The flag has been around for while but so far only been set by
out-of-tree code in Android kernels. One solution is to let each
architecture provide the flag through a custom sched_domain topology
array and associated mask and flag functions. However,
SD_ASYM_CPUCAPACITY is special in the sense that it depends on the
capacity and presence of all cpus in the system, i.e. when hotplugging
all cpus out except those with one particular cpu capacity the flag
should disappear even if the sched_domains don't collapse. Similarly,
the flag is affected by cpusets where load-balancing is turned off.
Detecting when the flags should be set therefore depends not only on
topology information but also the cpuset configuration and hotplug
state. The arch code doesn't have easy access to the cpuset
configuration.

Instead, this patch implements the flag detection in generic code where
cpusets and hotplug state is already taken care of. All the arch is
responsible for is to implement arch_scale_cpu_capacity() and force a
full rebuild of the sched_domain hierarchy if capacities are updated,
e.g. later in the boot process when cpufreq has initialized.

cc: Ingo Molnar 
cc: Peter Zijlstra 

Signed-off-by: Morten Rasmussen 
---
  include/linux/sched/topology.h |  2 +-
  kernel/sched/topology.c| 81 ++
  2 files changed, 76 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 26347741ba50..4fe2e49ab13b 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -23,7 +23,7 @@
  #define SD_BALANCE_FORK   0x0008  /* Balance on fork, clone */
  #define SD_BALANCE_WAKE   0x0010  /* Balance on wakeup */
  #define SD_WAKE_AFFINE0x0020  /* Wake task to waking CPU */
-#define SD_ASYM_CPUCAPACITY0x0040  /* Groups have different max cpu 
capacities */
+#define SD_ASYM_CPUCAPACITY0x0040  /* Domain members have different cpu 
capacities */
  #define SD_SHARE_CPUCAPACITY  0x0080  /* Domain members share cpu capacity */
  #define SD_SHARE_POWERDOMAIN  0x0100  /* Domain members share power domain */
  #define SD_SHARE_PKG_RESOURCES0x0200  /* Domain members share cpu pkg 
resources */
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 05a831427bc7..b8f41d557612 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1061,7 +1061,6 @@ static struct cpumask 
***sched_domains_numa_masks;
   *   SD_SHARE_PKG_RESOURCES - describes shared caches
   *   SD_NUMA- describes NUMA topologies
   *   SD_SHARE_POWERDOMAIN   - describes shared power domain
- *   SD_ASYM_CPUCAPACITY- describes mixed capacity topologies
   *
   * Odd one out, which beside describing the topology has a quirk also
   * prescribes the desired behaviour that goes along with it:
@@ -1073,13 +1072,12 @@ static struct cpumask   
***sched_domains_numa_masks;
 SD_SHARE_PKG_RESOURCES |   \
 SD_NUMA|   \
 SD_ASYM_PACKING|   \
-SD_ASYM_CPUCAPACITY|   \
 SD_SHARE_POWERDOMAIN)
  
  static struct sched_domain *

  sd_init(struct sched_domain_topology_level *tl,
const struct cpumask *cpu_map,
-   struct sched_domain *child, int cpu)
+   struct sched_domain *child, int dflags, int cpu)
  {
struct sd_data *sdd = >data;
struct sched_domain *sd = *per_cpu_ptr(sdd->sd, cpu);
@@ -1100,6 +1098,9 @@ sd_init(struct sched_domain_topology_level *tl,
"wrong sd_flags in topology description\n"))
sd_flags &= ~TOPOLOGY_SD_FLAGS;
  
+	/* Apply detected topology flags */

+   sd_flags |= dflags;
+
*sd = (struct sched_domain){
.min_interval   = sd_weight,
.max_interval   = 2*sd_weight,
@@ -1607,9 +1608,9 @@ static void __sdt_free(const struct cpumask *cpu_map)
  
  static struct sched_domain *build_sched_domain(struct sched_domain_topology_level *tl,

const struct cpumask *cpu_map, struct sched_domain_attr *attr,
-   struct sched_domain *child, int cpu)
+   struct sched_domain *child, int dflags, int cpu)
  {
-   struct sched_domain *sd = sd_init(tl, cpu_map, child, cpu);
+   

Re: [PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-07-23 Thread Qais Yousef

Hi Morten

On 20/07/18 14:32, Morten Rasmussen wrote:

The SD_ASYM_CPUCAPACITY sched_domain flag is supposed to mark the
sched_domain in the hierarchy where all cpu capacities are visible for
any cpu's point of view on asymmetric cpu capacity systems. The
scheduler can then take to take capacity asymmetry into account when


Did you mean "s/take to take/try to take/"?


balancing at this level. It also serves as an indicator for how wide
task placement heuristics have to search to consider all available cpu
capacities as asymmetric systems might often appear symmetric at
smallest level(s) of the sched_domain hierarchy.

The flag has been around for while but so far only been set by
out-of-tree code in Android kernels. One solution is to let each
architecture provide the flag through a custom sched_domain topology
array and associated mask and flag functions. However,
SD_ASYM_CPUCAPACITY is special in the sense that it depends on the
capacity and presence of all cpus in the system, i.e. when hotplugging
all cpus out except those with one particular cpu capacity the flag
should disappear even if the sched_domains don't collapse. Similarly,
the flag is affected by cpusets where load-balancing is turned off.
Detecting when the flags should be set therefore depends not only on
topology information but also the cpuset configuration and hotplug
state. The arch code doesn't have easy access to the cpuset
configuration.

Instead, this patch implements the flag detection in generic code where
cpusets and hotplug state is already taken care of. All the arch is
responsible for is to implement arch_scale_cpu_capacity() and force a
full rebuild of the sched_domain hierarchy if capacities are updated,
e.g. later in the boot process when cpufreq has initialized.

cc: Ingo Molnar 
cc: Peter Zijlstra 

Signed-off-by: Morten Rasmussen 
---
  include/linux/sched/topology.h |  2 +-
  kernel/sched/topology.c| 81 ++
  2 files changed, 76 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 26347741ba50..4fe2e49ab13b 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -23,7 +23,7 @@
  #define SD_BALANCE_FORK   0x0008  /* Balance on fork, clone */
  #define SD_BALANCE_WAKE   0x0010  /* Balance on wakeup */
  #define SD_WAKE_AFFINE0x0020  /* Wake task to waking CPU */
-#define SD_ASYM_CPUCAPACITY0x0040  /* Groups have different max cpu 
capacities */
+#define SD_ASYM_CPUCAPACITY0x0040  /* Domain members have different cpu 
capacities */
  #define SD_SHARE_CPUCAPACITY  0x0080  /* Domain members share cpu capacity */
  #define SD_SHARE_POWERDOMAIN  0x0100  /* Domain members share power domain */
  #define SD_SHARE_PKG_RESOURCES0x0200  /* Domain members share cpu pkg 
resources */
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 05a831427bc7..b8f41d557612 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1061,7 +1061,6 @@ static struct cpumask 
***sched_domains_numa_masks;
   *   SD_SHARE_PKG_RESOURCES - describes shared caches
   *   SD_NUMA- describes NUMA topologies
   *   SD_SHARE_POWERDOMAIN   - describes shared power domain
- *   SD_ASYM_CPUCAPACITY- describes mixed capacity topologies
   *
   * Odd one out, which beside describing the topology has a quirk also
   * prescribes the desired behaviour that goes along with it:
@@ -1073,13 +1072,12 @@ static struct cpumask   
***sched_domains_numa_masks;
 SD_SHARE_PKG_RESOURCES |   \
 SD_NUMA|   \
 SD_ASYM_PACKING|   \
-SD_ASYM_CPUCAPACITY|   \
 SD_SHARE_POWERDOMAIN)
  
  static struct sched_domain *

  sd_init(struct sched_domain_topology_level *tl,
const struct cpumask *cpu_map,
-   struct sched_domain *child, int cpu)
+   struct sched_domain *child, int dflags, int cpu)
  {
struct sd_data *sdd = >data;
struct sched_domain *sd = *per_cpu_ptr(sdd->sd, cpu);
@@ -1100,6 +1098,9 @@ sd_init(struct sched_domain_topology_level *tl,
"wrong sd_flags in topology description\n"))
sd_flags &= ~TOPOLOGY_SD_FLAGS;
  
+	/* Apply detected topology flags */

+   sd_flags |= dflags;
+
*sd = (struct sched_domain){
.min_interval   = sd_weight,
.max_interval   = 2*sd_weight,
@@ -1607,9 +1608,9 @@ static void __sdt_free(const struct cpumask *cpu_map)
  
  static struct sched_domain *build_sched_domain(struct sched_domain_topology_level *tl,

const struct cpumask *cpu_map, struct sched_domain_attr *attr,
-   struct sched_domain *child, int cpu)
+   struct sched_domain *child, int dflags, int cpu)
  {
-   struct sched_domain *sd = sd_init(tl, cpu_map, child, cpu);
+   

[PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-07-20 Thread Morten Rasmussen
The SD_ASYM_CPUCAPACITY sched_domain flag is supposed to mark the
sched_domain in the hierarchy where all cpu capacities are visible for
any cpu's point of view on asymmetric cpu capacity systems. The
scheduler can then take to take capacity asymmetry into account when
balancing at this level. It also serves as an indicator for how wide
task placement heuristics have to search to consider all available cpu
capacities as asymmetric systems might often appear symmetric at
smallest level(s) of the sched_domain hierarchy.

The flag has been around for while but so far only been set by
out-of-tree code in Android kernels. One solution is to let each
architecture provide the flag through a custom sched_domain topology
array and associated mask and flag functions. However,
SD_ASYM_CPUCAPACITY is special in the sense that it depends on the
capacity and presence of all cpus in the system, i.e. when hotplugging
all cpus out except those with one particular cpu capacity the flag
should disappear even if the sched_domains don't collapse. Similarly,
the flag is affected by cpusets where load-balancing is turned off.
Detecting when the flags should be set therefore depends not only on
topology information but also the cpuset configuration and hotplug
state. The arch code doesn't have easy access to the cpuset
configuration.

Instead, this patch implements the flag detection in generic code where
cpusets and hotplug state is already taken care of. All the arch is
responsible for is to implement arch_scale_cpu_capacity() and force a
full rebuild of the sched_domain hierarchy if capacities are updated,
e.g. later in the boot process when cpufreq has initialized.

cc: Ingo Molnar 
cc: Peter Zijlstra 

Signed-off-by: Morten Rasmussen 
---
 include/linux/sched/topology.h |  2 +-
 kernel/sched/topology.c| 81 ++
 2 files changed, 76 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 26347741ba50..4fe2e49ab13b 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -23,7 +23,7 @@
 #define SD_BALANCE_FORK0x0008  /* Balance on fork, clone */
 #define SD_BALANCE_WAKE0x0010  /* Balance on wakeup */
 #define SD_WAKE_AFFINE 0x0020  /* Wake task to waking CPU */
-#define SD_ASYM_CPUCAPACITY0x0040  /* Groups have different max cpu 
capacities */
+#define SD_ASYM_CPUCAPACITY0x0040  /* Domain members have different cpu 
capacities */
 #define SD_SHARE_CPUCAPACITY   0x0080  /* Domain members share cpu capacity */
 #define SD_SHARE_POWERDOMAIN   0x0100  /* Domain members share power domain */
 #define SD_SHARE_PKG_RESOURCES 0x0200  /* Domain members share cpu pkg 
resources */
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 05a831427bc7..b8f41d557612 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1061,7 +1061,6 @@ static struct cpumask 
***sched_domains_numa_masks;
  *   SD_SHARE_PKG_RESOURCES - describes shared caches
  *   SD_NUMA- describes NUMA topologies
  *   SD_SHARE_POWERDOMAIN   - describes shared power domain
- *   SD_ASYM_CPUCAPACITY- describes mixed capacity topologies
  *
  * Odd one out, which beside describing the topology has a quirk also
  * prescribes the desired behaviour that goes along with it:
@@ -1073,13 +1072,12 @@ static struct cpumask   
***sched_domains_numa_masks;
 SD_SHARE_PKG_RESOURCES |   \
 SD_NUMA|   \
 SD_ASYM_PACKING|   \
-SD_ASYM_CPUCAPACITY|   \
 SD_SHARE_POWERDOMAIN)
 
 static struct sched_domain *
 sd_init(struct sched_domain_topology_level *tl,
const struct cpumask *cpu_map,
-   struct sched_domain *child, int cpu)
+   struct sched_domain *child, int dflags, int cpu)
 {
struct sd_data *sdd = >data;
struct sched_domain *sd = *per_cpu_ptr(sdd->sd, cpu);
@@ -1100,6 +1098,9 @@ sd_init(struct sched_domain_topology_level *tl,
"wrong sd_flags in topology description\n"))
sd_flags &= ~TOPOLOGY_SD_FLAGS;
 
+   /* Apply detected topology flags */
+   sd_flags |= dflags;
+
*sd = (struct sched_domain){
.min_interval   = sd_weight,
.max_interval   = 2*sd_weight,
@@ -1607,9 +1608,9 @@ static void __sdt_free(const struct cpumask *cpu_map)
 
 static struct sched_domain *build_sched_domain(struct 
sched_domain_topology_level *tl,
const struct cpumask *cpu_map, struct sched_domain_attr *attr,
-   struct sched_domain *child, int cpu)
+   struct sched_domain *child, int dflags, int cpu)
 {
-   struct sched_domain *sd = sd_init(tl, cpu_map, child, cpu);
+   struct sched_domain *sd = sd_init(tl, cpu_map, child, dflags, cpu);
 
if (child) {
sd->level = child->level + 1;

[PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-07-20 Thread Morten Rasmussen
The SD_ASYM_CPUCAPACITY sched_domain flag is supposed to mark the
sched_domain in the hierarchy where all cpu capacities are visible for
any cpu's point of view on asymmetric cpu capacity systems. The
scheduler can then take to take capacity asymmetry into account when
balancing at this level. It also serves as an indicator for how wide
task placement heuristics have to search to consider all available cpu
capacities as asymmetric systems might often appear symmetric at
smallest level(s) of the sched_domain hierarchy.

The flag has been around for while but so far only been set by
out-of-tree code in Android kernels. One solution is to let each
architecture provide the flag through a custom sched_domain topology
array and associated mask and flag functions. However,
SD_ASYM_CPUCAPACITY is special in the sense that it depends on the
capacity and presence of all cpus in the system, i.e. when hotplugging
all cpus out except those with one particular cpu capacity the flag
should disappear even if the sched_domains don't collapse. Similarly,
the flag is affected by cpusets where load-balancing is turned off.
Detecting when the flags should be set therefore depends not only on
topology information but also the cpuset configuration and hotplug
state. The arch code doesn't have easy access to the cpuset
configuration.

Instead, this patch implements the flag detection in generic code where
cpusets and hotplug state is already taken care of. All the arch is
responsible for is to implement arch_scale_cpu_capacity() and force a
full rebuild of the sched_domain hierarchy if capacities are updated,
e.g. later in the boot process when cpufreq has initialized.

cc: Ingo Molnar 
cc: Peter Zijlstra 

Signed-off-by: Morten Rasmussen 
---
 include/linux/sched/topology.h |  2 +-
 kernel/sched/topology.c| 81 ++
 2 files changed, 76 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 26347741ba50..4fe2e49ab13b 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -23,7 +23,7 @@
 #define SD_BALANCE_FORK0x0008  /* Balance on fork, clone */
 #define SD_BALANCE_WAKE0x0010  /* Balance on wakeup */
 #define SD_WAKE_AFFINE 0x0020  /* Wake task to waking CPU */
-#define SD_ASYM_CPUCAPACITY0x0040  /* Groups have different max cpu 
capacities */
+#define SD_ASYM_CPUCAPACITY0x0040  /* Domain members have different cpu 
capacities */
 #define SD_SHARE_CPUCAPACITY   0x0080  /* Domain members share cpu capacity */
 #define SD_SHARE_POWERDOMAIN   0x0100  /* Domain members share power domain */
 #define SD_SHARE_PKG_RESOURCES 0x0200  /* Domain members share cpu pkg 
resources */
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 05a831427bc7..b8f41d557612 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1061,7 +1061,6 @@ static struct cpumask 
***sched_domains_numa_masks;
  *   SD_SHARE_PKG_RESOURCES - describes shared caches
  *   SD_NUMA- describes NUMA topologies
  *   SD_SHARE_POWERDOMAIN   - describes shared power domain
- *   SD_ASYM_CPUCAPACITY- describes mixed capacity topologies
  *
  * Odd one out, which beside describing the topology has a quirk also
  * prescribes the desired behaviour that goes along with it:
@@ -1073,13 +1072,12 @@ static struct cpumask   
***sched_domains_numa_masks;
 SD_SHARE_PKG_RESOURCES |   \
 SD_NUMA|   \
 SD_ASYM_PACKING|   \
-SD_ASYM_CPUCAPACITY|   \
 SD_SHARE_POWERDOMAIN)
 
 static struct sched_domain *
 sd_init(struct sched_domain_topology_level *tl,
const struct cpumask *cpu_map,
-   struct sched_domain *child, int cpu)
+   struct sched_domain *child, int dflags, int cpu)
 {
struct sd_data *sdd = >data;
struct sched_domain *sd = *per_cpu_ptr(sdd->sd, cpu);
@@ -1100,6 +1098,9 @@ sd_init(struct sched_domain_topology_level *tl,
"wrong sd_flags in topology description\n"))
sd_flags &= ~TOPOLOGY_SD_FLAGS;
 
+   /* Apply detected topology flags */
+   sd_flags |= dflags;
+
*sd = (struct sched_domain){
.min_interval   = sd_weight,
.max_interval   = 2*sd_weight,
@@ -1607,9 +1608,9 @@ static void __sdt_free(const struct cpumask *cpu_map)
 
 static struct sched_domain *build_sched_domain(struct 
sched_domain_topology_level *tl,
const struct cpumask *cpu_map, struct sched_domain_attr *attr,
-   struct sched_domain *child, int cpu)
+   struct sched_domain *child, int dflags, int cpu)
 {
-   struct sched_domain *sd = sd_init(tl, cpu_map, child, cpu);
+   struct sched_domain *sd = sd_init(tl, cpu_map, child, dflags, cpu);
 
if (child) {
sd->level = child->level + 1;