Re: [PATCH 2/5] sched: add asymmetric packing option for sibling domain

2010-04-14 Thread Michael Neuling
In message 1271161767.4807.1281.ca...@twins you wrote:
 On Fri, 2010-04-09 at 16:21 +1000, Michael Neuling wrote:
  Peter: Since this is based mainly off your initial patch, it should
  have your signed-off-by too, but I didn't want to add without your
  permission.  Can I add it?
 
 Of course! :-)
 
 This thing does need a better changelog though, and maybe a larger
 comment with check_asym_packing(), explaining why and what we're doing
 and what we're assuming (that lower cpu number also means lower thread
 number).
 

OK, updated patch below...

Mikey


[PATCH 2/5] sched: add asymmetric group packing option for sibling domain

Check to see if the group is packed in a sched doman.

This is primarily intended to used at the sibling level.  Some cores
like POWER7 prefer to use lower numbered SMT threads.  In the case of
POWER7, it can move to lower SMT modes only when higher threads are
idle.  When in lower SMT modes, the threads will perform better since
they share less core resources.  Hence when we have idle threads, we
want them to be the higher ones.

This adds a hook into f_b_g() called check_asym_packing() to check the
packing.  This packing function is run on idle threads.  It checks to
see if the busiest CPU in this domain (core in the P7 case) has a
higher CPU number than what where the packing function is being run
on.  If it is, calculate the imbalance and return the higher busier
thread as the busiest group to f_b_g().  Here we are assuming a lower
CPU number will be equivalent to a lower SMT thread number.

It also creates a new SD_ASYM_PACKING flag to enable this feature at
any scheduler domain level.

It also creates an arch hook to enable this feature at the sibling
level.  The default function doesn't enable this feature.

Based heavily on patch from Peter Zijlstra.  

Signed-off-by: Michael Neuling mi...@neuling.org
Signed-off-by: Peter Zijlstra pet...@infradead.org
---
 include/linux/sched.h|4 +-
 include/linux/topology.h |1 
 kernel/sched_fair.c  |   93 +--
 3 files changed, 94 insertions(+), 4 deletions(-)

Index: linux-2.6-ozlabs/include/linux/sched.h
===
--- linux-2.6-ozlabs.orig/include/linux/sched.h
+++ linux-2.6-ozlabs/include/linux/sched.h
@@ -799,7 +799,7 @@ enum cpu_idle_type {
 #define SD_POWERSAVINGS_BALANCE0x0100  /* Balance for power savings */
 #define SD_SHARE_PKG_RESOURCES 0x0200  /* Domain members share cpu pkg 
resources */
 #define SD_SERIALIZE   0x0400  /* Only a single load balancing 
instance */
-
+#define SD_ASYM_PACKING0x0800  /* Place busy groups earlier in 
the domain */
 #define SD_PREFER_SIBLING  0x1000  /* Prefer to place tasks in a sibling 
domain */
 
 enum powersavings_balance_level {
@@ -834,6 +834,8 @@ static inline int sd_balance_for_package
return SD_PREFER_SIBLING;
 }
 
+extern int __weak arch_sd_sibiling_asym_packing(void);
+
 /*
  * Optimise SD flags for power savings:
  * SD_BALANCE_NEWIDLE helps agressive task consolidation and power savings.
Index: linux-2.6-ozlabs/include/linux/topology.h
===
--- linux-2.6-ozlabs.orig/include/linux/topology.h
+++ linux-2.6-ozlabs/include/linux/topology.h
@@ -102,6 +102,7 @@ int arch_update_cpu_topology(void);
| 1*SD_SHARE_PKG_RESOURCES  \
| 0*SD_SERIALIZE\
| 0*SD_PREFER_SIBLING   \
+   | arch_sd_sibiling_asym_packing()   \
,   \
.last_balance   = jiffies,  \
.balance_interval   = 1,\
Index: linux-2.6-ozlabs/kernel/sched_fair.c
===
--- linux-2.6-ozlabs.orig/kernel/sched_fair.c
+++ linux-2.6-ozlabs/kernel/sched_fair.c
@@ -2493,6 +2493,39 @@ static inline void update_sg_lb_stats(st
 }
 
 /**
+ * update_sd_pick_busiest - return 1 on busiest group
+ * @sd: sched_domain whose statistics are to be checked
+ * @sds: sched_domain statistics
+ * @sg: sched_group candidate to be checked for being the busiest
+ * @sds: sched_group statistics
+ *
+ * This returns 1 for the busiest group. If asymmetric packing is
+ * enabled and we already have a busiest, but this candidate group has
+ * a higher cpu number than the current busiest, pick this sg.
+ */
+static int update_sd_pick_busiest(struct sched_domain *sd,
+ struct sd_lb_stats *sds,
+ struct sched_group *sg,
+ struct sg_lb_stats *sgs)
+{
+   if (sgs-sum_nr_running  sgs-group_capacity)
+   return 1;
+
+   if (sgs-group_imb

Re: [PATCH 2/5] sched: add asymmetric packing option for sibling domain

2010-04-13 Thread Peter Zijlstra
On Fri, 2010-04-09 at 16:21 +1000, Michael Neuling wrote:
 Peter: Since this is based mainly off your initial patch, it should
 have your signed-off-by too, but I didn't want to add without your
 permission.  Can I add it?

Of course! :-)

This thing does need a better changelog though, and maybe a larger
comment with check_asym_packing(), explaining why and what we're doing
and what we're assuming (that lower cpu number also means lower thread
number).
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 2/5] sched: add asymmetric packing option for sibling domain

2010-04-09 Thread Michael Neuling
Some CPUs perform better when tasks are run on lower thread numbers.
In the case of POWER7, when higher threads are idled, the core can run
in lower SMT modes and hence perform better.

This creates a new sd flag to prefer lower threads. 

Based heavily on patch from Peter Zijlstra.  

Signed-off-by: Michael Neuling mi...@neuling.org
---
Peter: Since this is based mainly off your initial patch, it should
have your signed-off-by too, but I didn't want to add without your
permission.  Can I add it?

---

 include/linux/sched.h|4 ++
 include/linux/topology.h |1 
 kernel/sched_fair.c  |   64 ---
 3 files changed, 65 insertions(+), 4 deletions(-)

Index: linux-2.6-ozlabs/include/linux/sched.h
===
--- linux-2.6-ozlabs.orig/include/linux/sched.h
+++ linux-2.6-ozlabs/include/linux/sched.h
@@ -799,7 +799,7 @@ enum cpu_idle_type {
 #define SD_POWERSAVINGS_BALANCE0x0100  /* Balance for power savings */
 #define SD_SHARE_PKG_RESOURCES 0x0200  /* Domain members share cpu pkg 
resources */
 #define SD_SERIALIZE   0x0400  /* Only a single load balancing 
instance */
-
+#define SD_ASYM_PACKING0x0800  /* Place busy groups earlier in 
the domain */
 #define SD_PREFER_SIBLING  0x1000  /* Prefer to place tasks in a sibling 
domain */
 
 enum powersavings_balance_level {
@@ -834,6 +834,8 @@ static inline int sd_balance_for_package
return SD_PREFER_SIBLING;
 }
 
+extern int __weak arch_sd_sibiling_asym_packing(void);
+
 /*
  * Optimise SD flags for power savings:
  * SD_BALANCE_NEWIDLE helps agressive task consolidation and power savings.
Index: linux-2.6-ozlabs/include/linux/topology.h
===
--- linux-2.6-ozlabs.orig/include/linux/topology.h
+++ linux-2.6-ozlabs/include/linux/topology.h
@@ -102,6 +102,7 @@ int arch_update_cpu_topology(void);
| 1*SD_SHARE_PKG_RESOURCES  \
| 0*SD_SERIALIZE\
| 0*SD_PREFER_SIBLING   \
+   | arch_sd_sibiling_asym_packing()   \
,   \
.last_balance   = jiffies,  \
.balance_interval   = 1,\
Index: linux-2.6-ozlabs/kernel/sched_fair.c
===
--- linux-2.6-ozlabs.orig/kernel/sched_fair.c
+++ linux-2.6-ozlabs/kernel/sched_fair.c
@@ -2493,6 +2493,31 @@ static inline void update_sg_lb_stats(st
 }
 
 /**
+ * update_sd_pick_busiest - return 1 on busiest
+ */
+static int update_sd_pick_busiest(struct sched_domain *sd,
+ struct sd_lb_stats *sds,
+ struct sched_group *sg,
+ struct sg_lb_stats *sgs)
+{
+   if (sgs-sum_nr_running  sgs-group_capacity)
+   return 1;
+
+   if (sgs-group_imb)
+   return 1;
+
+   if ((sd-flags  SD_ASYM_PACKING)  sgs-sum_nr_running) {
+   if (!sds-busiest)
+   return 1;
+
+   if (group_first_cpu(sds-busiest)  group_first_cpu(sg))
+   return 1;
+   }
+
+   return 0;
+}
+
+/**
  * update_sd_lb_stats - Update sched_group's statistics for load balancing.
  * @sd: sched_domain whose statistics are to be updated.
  * @this_cpu: Cpu for which load balance is currently performed.
@@ -2546,9 +2571,8 @@ static inline void update_sd_lb_stats(st
sds-this = group;
sds-this_nr_running = sgs.sum_nr_running;
sds-this_load_per_task = sgs.sum_weighted_load;
-   } else if (sgs.avg_load  sds-max_load 
-  (sgs.sum_nr_running  sgs.group_capacity ||
-   sgs.group_imb)) {
+   } else if (sgs.avg_load = sds-max_load 
+  update_sd_pick_busiest(sd, sds, group, sgs)) {
sds-max_load = sgs.avg_load;
sds-busiest = group;
sds-busiest_nr_running = sgs.sum_nr_running;
@@ -2562,6 +2586,36 @@ static inline void update_sd_lb_stats(st
} while (group != sd-groups);
 }
 
+int __weak arch_sd_sibiling_asym_packing(void)
+{
+   return 0*SD_ASYM_PACKING;
+}
+
+/**
+ * check_asym_packing - Check to see if we the group is packed into
+ * the sched doman
+ */
+static int check_asym_packing(struct sched_domain *sd,
+ struct sd_lb_stats *sds,
+ int this_cpu, unsigned long *imbalance)
+{
+   int busiest_cpu;
+
+   if (!(sd-flags  SD_ASYM_PACKING))
+   return 0;
+
+