Re: [RFC PATCH 02/10] sched: Task placement for heterogeneous systems based on task load-tracking

2012-10-09 Thread Morten Rasmussen
Hi Viresh,

On Thu, Oct 04, 2012 at 07:02:03AM +0100, Viresh Kumar wrote:
 Hi Morten,
 
 On 22 September 2012 00:02,  morten.rasmus...@arm.com wrote:
  From: Morten Rasmussen morten.rasmus...@arm.com
 
  This patch introduces the basic SCHED_HMP infrastructure. Each class of
  cpus is represented by a hmp_domain and tasks will only be moved between
  these domains when their load profiles suggest it is beneficial.
 
  SCHED_HMP relies heavily on the task load-tracking introduced in Paul
  Turners fair group scheduling patch set:
 
  https://lkml.org/lkml/2012/8/23/267
 
  SCHED_HMP requires that the platform implements arch_get_hmp_domains()
  which should set up the platform specific list of hmp_domains. It is
  also assumed that the platform disables SD_LOAD_BALANCE for the
  appropriate sched_domains.
 
 An explanation of this requirement would be helpful here.
 

Yes. This is to prevent the load-balancer from moving tasks between
hmp_domains. This will be done exclusively by SCHED_HMP instead to
implement a strict task migration policy and avoid changing the
load-balancer behaviour. The load-balancer will take care of
load-balacing within each hmp_domain.

  Tasks placement takes place every time a task is to be inserted into
  a runqueue based on its load history. The task placement decision is
  based on load thresholds.
 
  There are no restrictions on the number of hmp_domains, however,
  multiple (2) has not been tested and the up/down migration policy is
  rather simple.
 
  Signed-off-by: Morten Rasmussen morten.rasmus...@arm.com
  ---
   arch/arm/Kconfig  |   17 +
   include/linux/sched.h |6 ++
   kernel/sched/fair.c   |  168 
  +
   kernel/sched/sched.h  |6 ++
   4 files changed, 197 insertions(+)
 
  diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
  index f4a5d58..5b09684 100644
  --- a/arch/arm/Kconfig
  +++ b/arch/arm/Kconfig
  @@ -1554,6 +1554,23 @@ config SCHED_SMT
MultiThreading at a cost of slightly increased overhead in some
places. If unsure say N here.
 
  +config DISABLE_CPU_SCHED_DOMAIN_BALANCE
  +   bool (EXPERIMENTAL) Disable CPU level scheduler load-balancing
  +   help
  + Disables scheduler load-balancing at CPU sched domain level.
 
 Shouldn't this depend on EXPERIMENTAL?
 

It should. The ongoing discussion about CONFIG_EXPERIMENTAL that Amit is
referring to hasn't come to a conclusion yet.

  +config SCHED_HMP
  +   bool (EXPERIMENTAL) Heterogenous multiprocessor scheduling
 
 ditto.
 
  +   depends on DISABLE_CPU_SCHED_DOMAIN_BALANCE  SCHED_MC  
  FAIR_GROUP_SCHED  !SCHED_AUTOGROUP
  +   help
  + Experimental scheduler optimizations for heterogeneous platforms.
  + Attempts to introspectively select task affinity to optimize power
  + and performance. Basic support for multiple (2) cpu types is in 
  place,
  + but it has only been tested with two types of cpus.
  + There is currently no support for migration of task groups, hence
  + !SCHED_AUTOGROUP. Furthermore, normal load-balancing must be 
  disabled
  + between cpus of different type (DISABLE_CPU_SCHED_DOMAIN_BALANCE).
  +
   config HAVE_ARM_SCU
  bool
  help
  diff --git a/include/linux/sched.h b/include/linux/sched.h
  index 81e4e82..df971a3 100644
  --- a/include/linux/sched.h
  +++ b/include/linux/sched.h
  @@ -1039,6 +1039,12 @@ unsigned long default_scale_smt_power(struct 
  sched_domain *sd, int cpu);
 
   bool cpus_share_cache(int this_cpu, int that_cpu);
 
  +#ifdef CONFIG_SCHED_HMP
  +struct hmp_domain {
  +   struct cpumask cpus;
  +   struct list_head hmp_domains;
 
 Probably need a better name here. domain_list?
 

Yes. hmp_domain_list would be better and stick with the hmp_* naming
convention.

  +};
  +#endif /* CONFIG_SCHED_HMP */
   #else /* CONFIG_SMP */
 
   struct sched_domain_attr;
  diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
  index 3e17dd5..d80de46 100644
  --- a/kernel/sched/fair.c
  +++ b/kernel/sched/fair.c
  @@ -3077,6 +3077,125 @@ static int select_idle_sibling(struct task_struct 
  *p, int target)
  return target;
   }
 
  +#ifdef CONFIG_SCHED_HMP
  +/*
  + * Heterogenous multiprocessor (HMP) optimizations
  + *
  + * The cpu types are distinguished using a list of hmp_domains
  + * which each represent one cpu type using a cpumask.
  + * The list is assumed ordered by compute capacity with the
  + * fastest domain first.
  + */
  +DEFINE_PER_CPU(struct hmp_domain *, hmp_cpu_domain);
  +
  +extern void __init arch_get_hmp_domains(struct list_head 
  *hmp_domains_list);
  +
  +/* Setup hmp_domains */
  +static int __init hmp_cpu_mask_setup(void)
 
 How should we interpret its return value? Can you mention what does 0  1 mean
 here?
 

Returns 0 if domain setup failed, i.e. the domain list is empty, and 1
otherwise.

  +{
  +   char buf[64];
  +   struct hmp_domain 

Re: [RFC PATCH 02/10] sched: Task placement for heterogeneous systems based on task load-tracking

2012-10-09 Thread Viresh Kumar
On 9 October 2012 21:26, Morten Rasmussen morten.rasmus...@arm.com wrote:
 On Thu, Oct 04, 2012 at 07:02:03AM +0100, Viresh Kumar wrote:
 On 22 September 2012 00:02,  morten.rasmus...@arm.com wrote:

  SCHED_HMP requires that the platform implements arch_get_hmp_domains()
  which should set up the platform specific list of hmp_domains. It is
  also assumed that the platform disables SD_LOAD_BALANCE for the
  appropriate sched_domains.

 An explanation of this requirement would be helpful here.

 Yes. This is to prevent the load-balancer from moving tasks between
 hmp_domains. This will be done exclusively by SCHED_HMP instead to
 implement a strict task migration policy and avoid changing the
 load-balancer behaviour. The load-balancer will take care of
 load-balacing within each hmp_domain.

Honestly speaking i understood this point now and earlier it wasn't clear
to me :)

What would be ideal is to put this information in the comment just before
we re-define other SCHED_*** domains where we disable balancing.
And keep it in the commit log too.

  +struct hmp_domain {
  +   struct cpumask cpus;
  +   struct list_head hmp_domains;

 Probably need a better name here. domain_list?

 Yes. hmp_domain_list would be better and stick with the hmp_* naming
 convention.

IMHO hmp_ would be better for global names, but names of variables
enclosed within another hmp_*** data type don't actually need hmp_**,
as this is implicity.

i.e.
struct hmp_domain {
   struct cpumask cpus;
   struct list_head domain_list;
}

would be better than
   struct list_head hmp domain_list;

as the parent structure already have hmp_***. So whatever is inside the
struct is obviously hmp specific.

  +/* Setup hmp_domains */
  +static int __init hmp_cpu_mask_setup(void)

 How should we interpret its return value? Can you mention what does 0  1 
 mean
 here?


 Returns 0 if domain setup failed, i.e. the domain list is empty, and 1
 otherwise.

Helpful. Please mention this in function comment in your next revision.

  +{
  +   char buf[64];
  +   struct hmp_domain *domain;
  +   struct list_head *pos;
  +   int dc, cpu;

  +   /* Print hmp_domains */
  +   dc = 0;

 Should be done during definition of dc.

You missed this ??

  +   for_each_cpu_mask(cpu, domain-cpus) {
  +   per_cpu(hmp_cpu_domain, cpu) = domain;
  +   }

 Should use hmp_cpu_domain(cpu) here. Also no need of {} for single
 line loop.

??

  +   dc++;

 You aren't using it... Only for testing? Should we remove it from mainline
 patchset and keep it locally?


 I'm using it in the pr_debug line a little earlier. It is used for
 enumerating the hmp_domains.

My mistake :(

  +/* Check if cpu is in fastest hmp_domain */
  +static inline unsigned int hmp_cpu_is_fastest(int cpu)
  +{
  +   struct list_head *pos;
  +
  +   pos = hmp_cpu_domain(cpu)-hmp_domains;
  +   return pos == hmp_domains.next;

 better create list_is_first() for this.

 I had the same thought, but I see that as a separate patch that should
 be submitted separately.

Correct. So better send it now, so that it is included before you send your
next version. :)

--
viresh

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [RFC PATCH 02/10] sched: Task placement for heterogeneous systems based on task load-tracking

2012-10-04 Thread Viresh Kumar
Hi Morten,

On 22 September 2012 00:02,  morten.rasmus...@arm.com wrote:
 From: Morten Rasmussen morten.rasmus...@arm.com

 This patch introduces the basic SCHED_HMP infrastructure. Each class of
 cpus is represented by a hmp_domain and tasks will only be moved between
 these domains when their load profiles suggest it is beneficial.

 SCHED_HMP relies heavily on the task load-tracking introduced in Paul
 Turners fair group scheduling patch set:

 https://lkml.org/lkml/2012/8/23/267

 SCHED_HMP requires that the platform implements arch_get_hmp_domains()
 which should set up the platform specific list of hmp_domains. It is
 also assumed that the platform disables SD_LOAD_BALANCE for the
 appropriate sched_domains.

An explanation of this requirement would be helpful here.

 Tasks placement takes place every time a task is to be inserted into
 a runqueue based on its load history. The task placement decision is
 based on load thresholds.

 There are no restrictions on the number of hmp_domains, however,
 multiple (2) has not been tested and the up/down migration policy is
 rather simple.

 Signed-off-by: Morten Rasmussen morten.rasmus...@arm.com
 ---
  arch/arm/Kconfig  |   17 +
  include/linux/sched.h |6 ++
  kernel/sched/fair.c   |  168 
 +
  kernel/sched/sched.h  |6 ++
  4 files changed, 197 insertions(+)

 diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
 index f4a5d58..5b09684 100644
 --- a/arch/arm/Kconfig
 +++ b/arch/arm/Kconfig
 @@ -1554,6 +1554,23 @@ config SCHED_SMT
   MultiThreading at a cost of slightly increased overhead in some
   places. If unsure say N here.

 +config DISABLE_CPU_SCHED_DOMAIN_BALANCE
 +   bool (EXPERIMENTAL) Disable CPU level scheduler load-balancing
 +   help
 + Disables scheduler load-balancing at CPU sched domain level.

Shouldn't this depend on EXPERIMENTAL?

 +config SCHED_HMP
 +   bool (EXPERIMENTAL) Heterogenous multiprocessor scheduling

ditto.

 +   depends on DISABLE_CPU_SCHED_DOMAIN_BALANCE  SCHED_MC  
 FAIR_GROUP_SCHED  !SCHED_AUTOGROUP
 +   help
 + Experimental scheduler optimizations for heterogeneous platforms.
 + Attempts to introspectively select task affinity to optimize power
 + and performance. Basic support for multiple (2) cpu types is in 
 place,
 + but it has only been tested with two types of cpus.
 + There is currently no support for migration of task groups, hence
 + !SCHED_AUTOGROUP. Furthermore, normal load-balancing must be 
 disabled
 + between cpus of different type (DISABLE_CPU_SCHED_DOMAIN_BALANCE).
 +
  config HAVE_ARM_SCU
 bool
 help
 diff --git a/include/linux/sched.h b/include/linux/sched.h
 index 81e4e82..df971a3 100644
 --- a/include/linux/sched.h
 +++ b/include/linux/sched.h
 @@ -1039,6 +1039,12 @@ unsigned long default_scale_smt_power(struct 
 sched_domain *sd, int cpu);

  bool cpus_share_cache(int this_cpu, int that_cpu);

 +#ifdef CONFIG_SCHED_HMP
 +struct hmp_domain {
 +   struct cpumask cpus;
 +   struct list_head hmp_domains;

Probably need a better name here. domain_list?

 +};
 +#endif /* CONFIG_SCHED_HMP */
  #else /* CONFIG_SMP */

  struct sched_domain_attr;
 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
 index 3e17dd5..d80de46 100644
 --- a/kernel/sched/fair.c
 +++ b/kernel/sched/fair.c
 @@ -3077,6 +3077,125 @@ static int select_idle_sibling(struct task_struct *p, 
 int target)
 return target;
  }

 +#ifdef CONFIG_SCHED_HMP
 +/*
 + * Heterogenous multiprocessor (HMP) optimizations
 + *
 + * The cpu types are distinguished using a list of hmp_domains
 + * which each represent one cpu type using a cpumask.
 + * The list is assumed ordered by compute capacity with the
 + * fastest domain first.
 + */
 +DEFINE_PER_CPU(struct hmp_domain *, hmp_cpu_domain);
 +
 +extern void __init arch_get_hmp_domains(struct list_head *hmp_domains_list);
 +
 +/* Setup hmp_domains */
 +static int __init hmp_cpu_mask_setup(void)

How should we interpret its return value? Can you mention what does 0  1 mean
here?

 +{
 +   char buf[64];
 +   struct hmp_domain *domain;
 +   struct list_head *pos;
 +   int dc, cpu;
 +
 +   pr_debug(Initializing HMP scheduler:\n);
 +
 +   /* Initialize hmp_domains using platform code */
 +   arch_get_hmp_domains(hmp_domains);
 +   if (list_empty(hmp_domains)) {
 +   pr_debug(HMP domain list is empty!\n);
 +   return 0;
 +   }
 +
 +   /* Print hmp_domains */
 +   dc = 0;

Should be done during definition of dc.

 +   list_for_each(pos, hmp_domains) {
 +   domain = list_entry(pos, struct hmp_domain, hmp_domains);
 +   cpulist_scnprintf(buf, 64, domain-cpus);
 +   pr_debug(  HMP domain %d: %s\n, dc, buf);

Spaces before HMP are intentional?

 +
 +   for_each_cpu_mask(cpu, domain-cpus) {
 +

[RFC PATCH 02/10] sched: Task placement for heterogeneous systems based on task load-tracking

2012-09-21 Thread morten . rasmussen
From: Morten Rasmussen morten.rasmus...@arm.com

This patch introduces the basic SCHED_HMP infrastructure. Each class of
cpus is represented by a hmp_domain and tasks will only be moved between
these domains when their load profiles suggest it is beneficial.

SCHED_HMP relies heavily on the task load-tracking introduced in Paul
Turners fair group scheduling patch set:

https://lkml.org/lkml/2012/8/23/267

SCHED_HMP requires that the platform implements arch_get_hmp_domains()
which should set up the platform specific list of hmp_domains. It is
also assumed that the platform disables SD_LOAD_BALANCE for the
appropriate sched_domains.
Tasks placement takes place every time a task is to be inserted into
a runqueue based on its load history. The task placement decision is
based on load thresholds.

There are no restrictions on the number of hmp_domains, however,
multiple (2) has not been tested and the up/down migration policy is
rather simple.

Signed-off-by: Morten Rasmussen morten.rasmus...@arm.com
---
 arch/arm/Kconfig  |   17 +
 include/linux/sched.h |6 ++
 kernel/sched/fair.c   |  168 +
 kernel/sched/sched.h  |6 ++
 4 files changed, 197 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index f4a5d58..5b09684 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1554,6 +1554,23 @@ config SCHED_SMT
  MultiThreading at a cost of slightly increased overhead in some
  places. If unsure say N here.
 
+config DISABLE_CPU_SCHED_DOMAIN_BALANCE
+   bool (EXPERIMENTAL) Disable CPU level scheduler load-balancing
+   help
+ Disables scheduler load-balancing at CPU sched domain level.
+
+config SCHED_HMP
+   bool (EXPERIMENTAL) Heterogenous multiprocessor scheduling
+   depends on DISABLE_CPU_SCHED_DOMAIN_BALANCE  SCHED_MC  
FAIR_GROUP_SCHED  !SCHED_AUTOGROUP
+   help
+ Experimental scheduler optimizations for heterogeneous platforms.
+ Attempts to introspectively select task affinity to optimize power
+ and performance. Basic support for multiple (2) cpu types is in 
place,
+ but it has only been tested with two types of cpus.
+ There is currently no support for migration of task groups, hence
+ !SCHED_AUTOGROUP. Furthermore, normal load-balancing must be disabled
+ between cpus of different type (DISABLE_CPU_SCHED_DOMAIN_BALANCE).
+
 config HAVE_ARM_SCU
bool
help
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 81e4e82..df971a3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1039,6 +1039,12 @@ unsigned long default_scale_smt_power(struct 
sched_domain *sd, int cpu);
 
 bool cpus_share_cache(int this_cpu, int that_cpu);
 
+#ifdef CONFIG_SCHED_HMP
+struct hmp_domain {
+   struct cpumask cpus;
+   struct list_head hmp_domains;
+};
+#endif /* CONFIG_SCHED_HMP */
 #else /* CONFIG_SMP */
 
 struct sched_domain_attr;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3e17dd5..d80de46 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3077,6 +3077,125 @@ static int select_idle_sibling(struct task_struct *p, 
int target)
return target;
 }
 
+#ifdef CONFIG_SCHED_HMP
+/*
+ * Heterogenous multiprocessor (HMP) optimizations
+ *
+ * The cpu types are distinguished using a list of hmp_domains
+ * which each represent one cpu type using a cpumask.
+ * The list is assumed ordered by compute capacity with the
+ * fastest domain first.
+ */
+DEFINE_PER_CPU(struct hmp_domain *, hmp_cpu_domain);
+
+extern void __init arch_get_hmp_domains(struct list_head *hmp_domains_list);
+
+/* Setup hmp_domains */
+static int __init hmp_cpu_mask_setup(void)
+{
+   char buf[64];
+   struct hmp_domain *domain;
+   struct list_head *pos;
+   int dc, cpu;
+
+   pr_debug(Initializing HMP scheduler:\n);
+
+   /* Initialize hmp_domains using platform code */
+   arch_get_hmp_domains(hmp_domains);
+   if (list_empty(hmp_domains)) {
+   pr_debug(HMP domain list is empty!\n);
+   return 0;
+   }
+
+   /* Print hmp_domains */
+   dc = 0;
+   list_for_each(pos, hmp_domains) {
+   domain = list_entry(pos, struct hmp_domain, hmp_domains);
+   cpulist_scnprintf(buf, 64, domain-cpus);
+   pr_debug(  HMP domain %d: %s\n, dc, buf);
+
+   for_each_cpu_mask(cpu, domain-cpus) {
+   per_cpu(hmp_cpu_domain, cpu) = domain;
+   }
+   dc++;
+   }
+
+   return 1;
+}
+
+/*
+ * Migration thresholds should be in the range [0..1023]
+ * hmp_up_threshold: min. load required for migrating tasks to a faster cpu
+ * hmp_down_threshold: max. load allowed for tasks migrating to a slower cpu
+ * The default values (512, 256) offer good responsiveness, but may need
+ * tweaking suit particular needs.
+ */
+unsigned int hmp_up_threshold = 512;
+unsigned