Re: HMP patches v2

2013-01-02 Thread Morten Rasmussen

On 02/01/13 10:29, Vincent Guittot wrote:

On 2 January 2013 06:28, Viresh Kumar  wrote:

On 20 December 2012 13:41, Vincent Guittot  wrote:

On 19 December 2012 11:57, Morten Rasmussen  wrote:

If I understand the new version of "sched: secure access to other CPU
statistics" correctly, the effect of the patch is:

Without the patch the cpu will appear to be busy if sum/period are not
coherent (sum>period). The same is true with the patch except in the
case where nr_running is 0. In this particular case the cpu will appear
not to be busy. I assume there is good reason why this particular case
is important?


Sorry for this late reply.

It's not really more important than other but it's one case we can
safely detect to prevent spurious spread of tasks.
In addition, The incoherency occurs if both value are close so
nr_running == 0 was the only  condition that left to be tested



In any case the patch is fine by me.


Hmm... I am still confused :(

We have two patches from ARM, do let me know if i can drop these:


I think you can drop them as they don't apply anymore for V2.
Morten, do you confirm ?


Confirmed. I don't see any problems with the v2 patch. The overhead of
the check should be minimal.

Morten



Vincent



commit 3f1dff11ac95eda2772bef577e368bc124bfe087
Author: Morten Rasmussen 
Date:   Fri Nov 16 18:32:40 2012 +

 ARM: TC2: Re-enable SD_SHARE_POWERLINE

 Re-enable SD_SHARE_POWERLINE to reflect the power domains of TC2.

  arch/arm/kernel/topology.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

commit e8cceacd3913e3a3e955614bacc1bc81866bc243
Author: Liviu Dudau 
Date:   Fri Nov 16 18:32:38 2012 +

 Revert "sched: secure access to other CPU statistics"

 This reverts commit 2aa14d0379cc54bc0ec44adb7a2e0ad02ae293d0.

 The way this functionality is implemented is under review and the
current implementation
 is considered not safe.

 Signed-of-by: Liviu Dudau 

  kernel/sched/fair.c | 19 ++-
  1 file changed, 2 insertions(+), 17 deletions(-)





-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: HMP patches v2

2012-12-19 Thread Morten Rasmussen

On 19/12/12 09:34, Viresh Kumar wrote:

On 19 December 2012 14:53, Vincent Guittot  wrote:

Le 19 déc. 2012 07:34, "Viresh Kumar"  a écrit :

Can we resolve this issue now? I don't want anything during the release
period
this time.


The new version of the patchset should solve the concerns of everybody


Morten,

Can you confirm or cross-check that? Branch is: sched-pack-small-tasks-v2



If I understand the new version of "sched: secure access to other CPU
statistics" correctly, the effect of the patch is:

Without the patch the cpu will appear to be busy if sum/period are not
coherent (sum>period). The same is true with the patch except in the
case where nr_running is 0. In this particular case the cpu will appear
not to be busy. I assume there is good reason why this particular case
is important?

In any case the patch is fine by me.

Morten

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [HMP][PATCH 0/1] Global balance

2012-12-07 Thread Morten Rasmussen

On 07/12/12 14:54, Viresh Kumar wrote:

On 7 December 2012 18:43, Morten Rasmussen  wrote:

I should have included the numbers in the cover letter. Here are
numbers for TC2.

sysbench (normalized execution time, lower is better)
threads   2   4  8
HMP  1.00  1.00  1.00
HMP+GB1.00  0.67  0.58

coremark (normalized iterations per second, higher is better)
threads   2   4  8
HMP  1.00  1.00  1.00
HMP+GB   1.00  1.39  1.73

So there is clear benefit of utilizing the A7s. It actually saves
energy too as the whole benchmark completes faster.


Hi Morten,

I have applied your patch now and pushed v13. Please cross-check v13
to see if everything is correct.



It looks right to me.

Morten

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [HMP][PATCH 0/1] Global balance

2012-12-07 Thread Morten Rasmussen
Hi Amit,

I should have included the numbers in the cover letter. Here are
numbers for TC2.

sysbench (normalized execution time, lower is better)
threads   2   4  8
HMP  1.00  1.00  1.00
HMP+GB1.00  0.67  0.58

coremark (normalized iterations per second, higher is better)
threads   2   4  8
HMP  1.00  1.00  1.00
HMP+GB   1.00  1.39  1.73

So there is clear benefit of utilizing the A7s. It actually saves
energy too as the whole benchmark completes faster.

Regards,
Morten

On Fri, Dec 7, 2012 at 12:14 PM, Amit Kucheria  wrote:
>
> On Fri, Dec 7, 2012 at 5:33 PM, Morten Rasmussen
>  wrote:
> > Hi Viresh,
> >
> > Here is a patch that introduces global load balancing on top of the 
> > existing HMP
> > patch set. It depends on the HMP patches already present in your 
> > task-placement-v2
> > branch. It can be applied on top of the HMP sysfs patches if needed. The 
> > fix should
> > be trivial.
> >
> > Could you include in the MP branch for the 12.12 release? Testing with 
> > sysbench and
> > coremark show significant performance improvements for parallel workloads 
> > as all
> > cpus can now be used for cpu intensive tasks.
>
> Morten,
>
> Can you share some performance number improvements and/or
> kernelshark-type graphs with and without this patch? It'd be very
> interesting to see the changes.
>
> Monday is the deadline to get this merged into the MP tree to make it
> to the release. It is end of week now. Not sure how much testing and
> review can be done before Monday. Your numbers might make a compelling
> argument.
>
> Regards,
> Amit
>
> > Thanks,
> > Morten
> >
> > Morten Rasmussen (1):
> >   sched: Basic global balancing support for HMP
> >
> >  kernel/sched/fair.c |  101 
> > +--
> >  1 file changed, 97 insertions(+), 4 deletions(-)
> >
> > --
> > 1.7.9.5
> >
> >
> >
> > ___
> > linaro-dev mailing list
> > linaro-dev@lists.linaro.org
> > http://lists.linaro.org/mailman/listinfo/linaro-dev
>
> ___
> linaro-dev mailing list
> linaro-dev@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-dev

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


[HMP][PATCH 1/1] sched: Basic global balancing support for HMP

2012-12-07 Thread Morten Rasmussen
This patch introduces an extra-check at task up-migration to
prevent overloading the cpus in the faster hmp_domain while the
slower hmp_domain is not fully utilized. The patch also introduces
a periodic balance check that can down-migrate tasks if the faster
domain is oversubscribed and the slower is under-utilized.

Signed-off-by: Morten Rasmussen 
---
 kernel/sched/fair.c |  101 +--
 1 file changed, 97 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1cfe112..7ac47c9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3249,6 +3249,80 @@ static inline void hmp_next_down_delay(struct 
sched_entity *se, int cpu)
se->avg.hmp_last_down_migration = cfs_rq_clock_task(cfs_rq);
se->avg.hmp_last_up_migration = 0;
 }
+
+static inline unsigned int hmp_domain_min_load(struct hmp_domain *hmpd,
+   int *min_cpu)
+{
+   int cpu;
+   int min_load = INT_MAX;
+   int min_cpu_temp = NR_CPUS;
+
+   for_each_cpu_mask(cpu, hmpd->cpus) {
+   if (cpu_rq(cpu)->cfs.tg_load_contrib < min_load) {
+   min_load = cpu_rq(cpu)->cfs.tg_load_contrib;
+   min_cpu_temp = cpu;
+   }
+   }
+
+   if (min_cpu)
+   *min_cpu = min_cpu_temp;
+
+   return min_load;
+}
+
+/*
+ * Calculate the task starvation
+ * This is the ratio of actually running time vs. runnable time.
+ * If the two are equal the task is getting the cpu time it needs or
+ * it is alone on the cpu and the cpu is fully utilized.
+ */
+static inline unsigned int hmp_task_starvation(struct sched_entity *se)
+{
+   u32 starvation;
+
+   starvation = se->avg.usage_avg_sum * scale_load_down(NICE_0_LOAD);
+   starvation /= (se->avg.runnable_avg_sum + 1);
+
+   return scale_load(starvation);
+}
+
+static inline unsigned int hmp_offload_down(int cpu, struct sched_entity *se)
+{
+   int min_usage;
+   int dest_cpu = NR_CPUS;
+
+   if (hmp_cpu_is_slowest(cpu))
+   return NR_CPUS;
+
+   /* Is the current domain fully loaded? */
+   /* load < ~94% */
+   min_usage = hmp_domain_min_load(hmp_cpu_domain(cpu), NULL);
+   if (min_usage < NICE_0_LOAD-64)
+   return NR_CPUS;
+
+   /* Is the cpu oversubscribed? */
+   /* load < ~194% */
+   if (cpu_rq(cpu)->cfs.tg_load_contrib < 2*NICE_0_LOAD-64)
+   return NR_CPUS;
+
+   /* Is the task alone on the cpu? */
+   if (cpu_rq(cpu)->cfs.nr_running < 2)
+   return NR_CPUS;
+
+   /* Is the task actually starving? */
+   if (hmp_task_starvation(se) > 768) /* <25% waiting */
+   return NR_CPUS;
+
+   /* Does the slower domain have spare cycles? */
+   min_usage = hmp_domain_min_load(hmp_slower_domain(cpu), &dest_cpu);
+   /* load > 50% */
+   if (min_usage > NICE_0_LOAD/2)
+   return NR_CPUS;
+
+   if (cpumask_test_cpu(dest_cpu, &hmp_slower_domain(cpu)->cpus))
+   return dest_cpu;
+   return NR_CPUS;
+}
 #endif /* CONFIG_SCHED_HMP */
 
 /*
@@ -5643,10 +5717,14 @@ static unsigned int hmp_up_migration(int cpu, struct 
sched_entity *se)
< hmp_next_up_threshold)
return 0;
 
-   if (se->avg.load_avg_ratio > hmp_up_threshold &&
-   cpumask_intersects(&hmp_faster_domain(cpu)->cpus,
-   tsk_cpus_allowed(p))) {
-   return 1;
+   if (se->avg.load_avg_ratio > hmp_up_threshold) {
+   /* Target domain load < ~94% */
+   if (hmp_domain_min_load(hmp_faster_domain(cpu), NULL)
+   > NICE_0_LOAD-64)
+   return 0;
+   if (cpumask_intersects(&hmp_faster_domain(cpu)->cpus,
+   tsk_cpus_allowed(p)))
+   return 1;
}
return 0;
 }
@@ -5868,6 +5946,21 @@ static void hmp_force_up_migration(int this_cpu)
hmp_next_up_delay(&p->se, target->push_cpu);
}
}
+   if (!force && !target->active_balance) {
+   /*
+* For now we just check the currently running task.
+* Selecting the lightest task for offloading will
+* require extensive book keeping.
+*/
+   target->push_cpu = hmp_offload_down(cpu, curr);
+   if (target->push_cpu < NR_CPUS) {
+   target->active_balance = 1;
+   tar

[HMP][PATCH 0/1] Global balance

2012-12-07 Thread Morten Rasmussen
Hi Viresh,

Here is a patch that introduces global load balancing on top of the existing HMP
patch set. It depends on the HMP patches already present in your 
task-placement-v2
branch. It can be applied on top of the HMP sysfs patches if needed. The fix 
should
be trivial.

Could you include in the MP branch for the 12.12 release? Testing with sysbench 
and
coremark show significant performance improvements for parallel workloads as all
cpus can now be used for cpu intensive tasks.

Thanks,
Morten

Morten Rasmussen (1):
  sched: Basic global balancing support for HMP

 kernel/sched/fair.c |  101 +--
 1 file changed, 97 insertions(+), 4 deletions(-)

-- 
1.7.9.5



___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: HMP patches v2

2012-12-05 Thread Morten Rasmussen

On 05/12/12 11:35, Viresh Kumar wrote:

On 5 December 2012 16:58, Morten Rasmussen  wrote:

I tested Vincent's fix ("sched: pack small tasks: fix update packing
domain") for the buddy selection some weeks ago and confirmed that it
works. So my quick fixes are no longer necessary.

The issues around the reverted "sched: secure access to other CPU
statistics" have not yet been resolved. I don't think that we should
re-enable it until we are clear about what it is doing.


There are four patches i am carrying from ARM

4a29297 ARM: TC2: Re-enable SD_SHARE_POWERLINE
a1924a4 sched: SD_SHARE_POWERLINE buddy selection fix
39b0e77 Revert "sched: secure access to other CPU statistics"
eed72c8 Revert "sched: pack small tasks: fix update packing domain"

You want me to drop eed72c8 and a1924a4 ? Correct.


Yes.

Morten



--
viresh




-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: HMP patches v2

2012-12-05 Thread Morten Rasmussen

On 05/12/12 11:01, Viresh Kumar wrote:

On 5 December 2012 16:28, Liviu Dudau  wrote:

The revert request came at Morten's suggestion. He has comments on the code and 
technical reasons
why he believes that the approach is not the best one as well as some scenarios 
where possible race
conditions can occur.

Morten, what is the latest update in this area. I'm not sure I have followed 
your discussion with
Vincent on the subject.


Just to make it more clear.. There are two reverts now. Please look
at the latest tree/branches. Vincent has provided another fixup patch
after which he commented we no longer need Mortens fix.

I have reverted that too, for the moment to keep things same as the
last release. Can Morten test with latest patches from Vincent (from his
branch) ? And provide fixups again ?



Hi,

I tested Vincent's fix ("sched: pack small tasks: fix update packing
domain") for the buddy selection some weeks ago and confirmed that it
works. So my quick fixes are no longer necessary.

The issues around the reverted "sched: secure access to other CPU
statistics" have not yet been resolved. I don't think that we should
re-enable it until we are clear about what it is doing.

Morten

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [RFC 3/6] sched: pack small tasks

2012-11-20 Thread Morten Rasmussen
Hi Vincent,

On Mon, Nov 12, 2012 at 01:51:00PM +, Vincent Guittot wrote:
> On 9 November 2012 18:13, Morten Rasmussen  wrote:
> > Hi Vincent,
> >
> > I have experienced suboptimal buddy selection on a dual cluster setup
> > (ARM TC2) if SD_SHARE_POWERLINE is enabled at MC level and disabled at
> > CPU level. This seems to be the correct flag settings for a system with
> > only cluster level power gating.
> >
> > To me it looks like update_packing_domain() is not doing the right
> > thing. See inline comments below.
> 
> Hi Morten,
> 
> Thanks for testing the patches.
> 
> It seems that I have too optimized the loop and remove some use cases.
> 
> >
> > On Sun, Oct 07, 2012 at 08:43:55AM +0100, Vincent Guittot wrote:
> >> During sched_domain creation, we define a pack buddy CPU if available.
> >>
> >> On a system that share the powerline at all level, the buddy is set to -1
> >>
> >> On a dual clusters / dual cores system which can powergate each core and
> >> cluster independantly, the buddy configuration will be :
> >>   | CPU0 | CPU1 | CPU2 | CPU3 |
> >> ---
> >> buddy | CPU0 | CPU0 | CPU0 | CPU2 |
> >>
> >> Small tasks tend to slip out of the periodic load balance.
> >> The best place to choose to migrate them is at their wake up.
> >>
> >> Signed-off-by: Vincent Guittot 
> >> ---
> >>  kernel/sched/core.c  |1 +
> >>  kernel/sched/fair.c  |  109 
> >> ++
> >>  kernel/sched/sched.h |1 +
> >>  3 files changed, 111 insertions(+)
> >>
> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> index dab7908..70cadbe 100644
> >> --- a/kernel/sched/core.c
> >> +++ b/kernel/sched/core.c
> >> @@ -6131,6 +6131,7 @@ cpu_attach_domain(struct sched_domain *sd, struct 
> >> root_domain *rd, int cpu)
> >>   rcu_assign_pointer(rq->sd, sd);
> >>   destroy_sched_domains(tmp, cpu);
> >>
> >> + update_packing_domain(cpu);
> >>   update_top_cache_domain(cpu);
> >>  }
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 4f4a4f6..8c9d3ed 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -157,6 +157,63 @@ void sched_init_granularity(void)
> >>   update_sysctl();
> >>  }
> >>
> >> +
> >> +/*
> >> + * Save the id of the optimal CPU that should be used to pack small tasks
> >> + * The value -1 is used when no buddy has been found
> >> + */
> >> +DEFINE_PER_CPU(int, sd_pack_buddy);
> >> +
> >> +/* Look for the best buddy CPU that can be used to pack small tasks
> >> + * We make the assumption that it doesn't wort to pack on CPU that share 
> >> the
> >> + * same powerline. We looks for the 1st sched_domain without the
> >> + * SD_SHARE_POWERLINE flag. Then We look for the sched_group witht the 
> >> lowest
> >> + * power per core based on the assumption that their power efficiency is
> >> + * better */
> >> +void update_packing_domain(int cpu)
> >> +{
> >> + struct sched_domain *sd;
> >> + int id = -1;
> >> +
> >> + sd = highest_flag_domain(cpu, SD_SHARE_POWERLINE);
> >> + if (!sd)
> >> + sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd);
> >> + else
> >> + sd = sd->parent;
> > sd is the highest level where SD_SHARE_POWERLINE is enabled so the sched
> > groups of the parent level would represent the power domains. If get it
> > right, we want to pack inside the cluster first and only let first cpu
> 
> You probably wanted to use sched_group instead of cluster because
> cluster is only a special use case, didn't you ?
> 
> > of the cluster do packing on another cluster. So all cpus - except the
> > first one - in the current sched domain should find its buddy within the
> > domain and only the first one should go to the parent sched domain to
> > find its buddy.
> 
> We don't want to pack in the current sched_domain because it shares
> power domain. We want to pack at the parent level
> 

Yes. I think we mean the same thing. The packing takes place at the
parent sched_domain but the sched_group that we are looking at only
contains the cpus of the level below.

> >
> > I propose the following fix:
> >
> > -

Re: [HMP tunables v2][PATCH 3/7] ARM: TC2: Re-enable SD_SHARE_POWERLINE

2012-11-19 Thread Morten Rasmussen

On 19/11/12 14:09, Vincent Guittot wrote:

On 19 November 2012 14:36, Morten Rasmussen  wrote:

On 19/11/12 12:23, Vincent Guittot wrote:


On 19 November 2012 13:08, Morten Rasmussen 
wrote:


Hi Vincent,


On 19/11/12 09:20, Vincent Guittot wrote:



Hi,

On 16 November 2012 19:32, Liviu Dudau  wrote:



From: Morten Rasmussen 

Re-enable SD_SHARE_POWERLINE to reflect the power domains of TC2.
---
arch/arm/kernel/topology.c |2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 317dac6..4d34e0e 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -228,7 +228,7 @@ struct cputopo_arm cpu_topology[NR_CPUS];

int arch_sd_share_power_line(void)
{
-   return 0*SD_SHARE_POWERLINE;
+   return 1*SD_SHARE_POWERLINE;




I'm not sure to catch your goal. With this modification, the power
line (or power domain) is shared at all level which should disable the
packing mechanism. But in a previous patch you fix the update packing
loop so I assume that you want to use it. Which kind of configuration
you would like to have among the proposal below ?

cpu   : CPU0 | CPU1 | CPU2 | CPU3 | CPU4
buddy conf 1 : CPU2 | CPU0 | CPU2 | CPU2 | CPU2
buddy conf 2 : CPU2 | CPU2 | CPU2 | CPU2 | CPU2
buddy conf 3 :   -1 |   -1 |   -1 |   -1 |   -1

When we look at the  git://git.linaro.org/arm/big.LITTLE/mp.git
big-LITTLE-MP-master-v12, we can see that you have defined a custom
sched_domain which hasn't been updated with SD_SHARE_POWERLINE flag so
the flag is cleared at CPU level. Based on this, I would say that you
want buddy conf 2 ? but I would say that buddy conf 1 should give
better result. Have you tried both ?



My goal with this fix is to set up the SD_SHARE_POWERLINE flags as they
really are on TC2. It could have been done more elegantly. Since the HMP
patches overrides the sched_domain flags at CPU level the
SD_SHARE_POWERLINE
is not being set by arch_sd_share_power_line(). With this fix we will get
SD_SHARE_POWERLINE at MC level and no SD_SHARE_POWERLINE at CPU level,
which
I believe is the correct set up for TC2.

For the buddy configuration the goal is to get configuration 1 in your
list
above. You should get that when using the other patch to fix the buddy
selection algorithm.
I'm not sure if conf 1 or 2 is best. I think it depends on the
power/performance trade-off of the specific platform. conf 1 may lead to
CPU1->CPU0->CPU2 migrations which may be undesirable. If your cpus are
very
leaky it might make sense to not do packing at all inside a high
performance
cluster and always do packing directly on a another low power cluster
like
conf 2. I think this needs further investigation.

I have only tested with conf 1 on TC2.



Hi Morten,

Conf1 is the default configuration for ARM platform because
SD_SHARE_POWERLINE is cleared at all levels for this architecture.

Conf2 should be used if you can't powergate the core independently but
several tests have demonstrated that even if you can't powergate each
core independently, it worth packing small task on few CPUs in a core
so it's worth using conf1 on TC2 as well.

Based on your explanation, we should use the original configuration of
SD_SHARE_POWERLINE (cleared at all level for ARM platform)



I agree that the result is the same, but I don't like disabling
SD_SHARE_POWERLINE for all level when the cpus in each cluster actually are
in the same power domain as it is the case on TC2. The name SHARE_POWERLINE
implies a clear relation to the actual hardware design, thus setting the
flags differently than the actual hardware design is misleading in my
opinion. If the buddy selection algorithm doesn't select appropriate buddies
when flags are set to reflect the actual hardware design I would suggest
changing the buddy selection algorithm instead of changing the sched_domain
flags.

If it is chosen to not have a direct relation between the flags and the
hardware design, I think that the flag should be renamed so it doesn't give
the wrong impression.


There is a direct link between the powergating and the SHARE_POWERLINE
and if you want that the buddy selection strictly reflects your HW
configuration, you must use conf2 and not conf1.


I just want the buddy selection to be reasonable when the 
SHARE_POWERLINE flags are reflecting the true hardware configuration. I 
haven't tested whether conf 1 or 2 is best yet. As long as I am getting 
one them it is definitely an improvement over not having task packing at 
all :)



Now, beside the packing small task patch and the TC2 configuration, it
has been proven that packing small tasks on an ARM platform (dual
cortex-A9) which can only powergate the cluster, improves the power
consumption of some low cpu load use cases like the MP3 playback (we
had used cpu hotplug at that time). This assumption has been proven
only for ARM platform and that's why

Re: [HMP tunables v2][PATCH 3/7] ARM: TC2: Re-enable SD_SHARE_POWERLINE

2012-11-19 Thread Morten Rasmussen

On 19/11/12 12:23, Vincent Guittot wrote:

On 19 November 2012 13:08, Morten Rasmussen  wrote:

Hi Vincent,


On 19/11/12 09:20, Vincent Guittot wrote:


Hi,

On 16 November 2012 19:32, Liviu Dudau  wrote:


From: Morten Rasmussen 

Re-enable SD_SHARE_POWERLINE to reflect the power domains of TC2.
---
   arch/arm/kernel/topology.c |2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 317dac6..4d34e0e 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -228,7 +228,7 @@ struct cputopo_arm cpu_topology[NR_CPUS];

   int arch_sd_share_power_line(void)
   {
-   return 0*SD_SHARE_POWERLINE;
+   return 1*SD_SHARE_POWERLINE;



I'm not sure to catch your goal. With this modification, the power
line (or power domain) is shared at all level which should disable the
packing mechanism. But in a previous patch you fix the update packing
loop so I assume that you want to use it. Which kind of configuration
you would like to have among the proposal below ?

cpu   : CPU0 | CPU1 | CPU2 | CPU3 | CPU4
buddy conf 1 : CPU2 | CPU0 | CPU2 | CPU2 | CPU2
buddy conf 2 : CPU2 | CPU2 | CPU2 | CPU2 | CPU2
buddy conf 3 :   -1 |   -1 |   -1 |   -1 |   -1

When we look at the  git://git.linaro.org/arm/big.LITTLE/mp.git
big-LITTLE-MP-master-v12, we can see that you have defined a custom
sched_domain which hasn't been updated with SD_SHARE_POWERLINE flag so
the flag is cleared at CPU level. Based on this, I would say that you
want buddy conf 2 ? but I would say that buddy conf 1 should give
better result. Have you tried both ?



My goal with this fix is to set up the SD_SHARE_POWERLINE flags as they
really are on TC2. It could have been done more elegantly. Since the HMP
patches overrides the sched_domain flags at CPU level the SD_SHARE_POWERLINE
is not being set by arch_sd_share_power_line(). With this fix we will get
SD_SHARE_POWERLINE at MC level and no SD_SHARE_POWERLINE at CPU level, which
I believe is the correct set up for TC2.

For the buddy configuration the goal is to get configuration 1 in your list
above. You should get that when using the other patch to fix the buddy
selection algorithm.
I'm not sure if conf 1 or 2 is best. I think it depends on the
power/performance trade-off of the specific platform. conf 1 may lead to
CPU1->CPU0->CPU2 migrations which may be undesirable. If your cpus are very
leaky it might make sense to not do packing at all inside a high performance
cluster and always do packing directly on a another low power cluster like
conf 2. I think this needs further investigation.

I have only tested with conf 1 on TC2.


Hi Morten,

Conf1 is the default configuration for ARM platform because
SD_SHARE_POWERLINE is cleared at all levels for this architecture.

Conf2 should be used if you can't powergate the core independently but
several tests have demonstrated that even if you can't powergate each
core independently, it worth packing small task on few CPUs in a core
so it's worth using conf1 on TC2 as well.

Based on your explanation, we should use the original configuration of
SD_SHARE_POWERLINE (cleared at all level for ARM platform)


I agree that the result is the same, but I don't like disabling 
SD_SHARE_POWERLINE for all level when the cpus in each cluster actually 
are in the same power domain as it is the case on TC2. The name 
SHARE_POWERLINE implies a clear relation to the actual hardware design, 
thus setting the flags differently than the actual hardware design is 
misleading in my opinion. If the buddy selection algorithm doesn't 
select appropriate buddies when flags are set to reflect the actual 
hardware design I would suggest changing the buddy selection algorithm 
instead of changing the sched_domain flags.


If it is chosen to not have a direct relation between the flags and the 
hardware design, I think that the flag should be renamed so it doesn't 
give the wrong impression.


Morten



Regards
Vincent




Regards,
Morten



Regards,
Vincent


   }

   const struct cpumask *cpu_coregroup_mask(int cpu)
--
1.7.9.5



___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev












___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [HMP tunables v2][PATCH 3/7] ARM: TC2: Re-enable SD_SHARE_POWERLINE

2012-11-19 Thread Morten Rasmussen

Hi Vincent,

On 19/11/12 09:20, Vincent Guittot wrote:

Hi,

On 16 November 2012 19:32, Liviu Dudau  wrote:

From: Morten Rasmussen 

Re-enable SD_SHARE_POWERLINE to reflect the power domains of TC2.
---
  arch/arm/kernel/topology.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 317dac6..4d34e0e 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -228,7 +228,7 @@ struct cputopo_arm cpu_topology[NR_CPUS];

  int arch_sd_share_power_line(void)
  {
-   return 0*SD_SHARE_POWERLINE;
+   return 1*SD_SHARE_POWERLINE;


I'm not sure to catch your goal. With this modification, the power
line (or power domain) is shared at all level which should disable the
packing mechanism. But in a previous patch you fix the update packing
loop so I assume that you want to use it. Which kind of configuration
you would like to have among the proposal below ?

cpu   : CPU0 | CPU1 | CPU2 | CPU3 | CPU4
buddy conf 1 : CPU2 | CPU0 | CPU2 | CPU2 | CPU2
buddy conf 2 : CPU2 | CPU2 | CPU2 | CPU2 | CPU2
buddy conf 3 :   -1 |   -1 |   -1 |   -1 |   -1

When we look at the  git://git.linaro.org/arm/big.LITTLE/mp.git
big-LITTLE-MP-master-v12, we can see that you have defined a custom
sched_domain which hasn't been updated with SD_SHARE_POWERLINE flag so
the flag is cleared at CPU level. Based on this, I would say that you
want buddy conf 2 ? but I would say that buddy conf 1 should give
better result. Have you tried both ?



My goal with this fix is to set up the SD_SHARE_POWERLINE flags as they 
really are on TC2. It could have been done more elegantly. Since the HMP 
patches overrides the sched_domain flags at CPU level the 
SD_SHARE_POWERLINE is not being set by arch_sd_share_power_line(). With 
this fix we will get SD_SHARE_POWERLINE at MC level and no 
SD_SHARE_POWERLINE at CPU level, which I believe is the correct set up 
for TC2.


For the buddy configuration the goal is to get configuration 1 in your 
list above. You should get that when using the other patch to fix the 
buddy selection algorithm.
I'm not sure if conf 1 or 2 is best. I think it depends on the 
power/performance trade-off of the specific platform. conf 1 may lead to 
CPU1->CPU0->CPU2 migrations which may be undesirable. If your cpus are 
very leaky it might make sense to not do packing at all inside a high 
performance cluster and always do packing directly on a another low 
power cluster like conf 2. I think this needs further investigation.


I have only tested with conf 1 on TC2.

Regards,
Morten


Regards,
Vincent


  }

  const struct cpumask *cpu_coregroup_mask(int cpu)
--
1.7.9.5



___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev






___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [RFC 3/6] sched: pack small tasks

2012-11-09 Thread Morten Rasmussen
Hi Vincent,

I have experienced suboptimal buddy selection on a dual cluster setup
(ARM TC2) if SD_SHARE_POWERLINE is enabled at MC level and disabled at
CPU level. This seems to be the correct flag settings for a system with
only cluster level power gating.

To me it looks like update_packing_domain() is not doing the right
thing. See inline comments below.

On Sun, Oct 07, 2012 at 08:43:55AM +0100, Vincent Guittot wrote:
> During sched_domain creation, we define a pack buddy CPU if available.
> 
> On a system that share the powerline at all level, the buddy is set to -1
> 
> On a dual clusters / dual cores system which can powergate each core and
> cluster independantly, the buddy configuration will be :
>   | CPU0 | CPU1 | CPU2 | CPU3 |
> ---
> buddy | CPU0 | CPU0 | CPU0 | CPU2 |
> 
> Small tasks tend to slip out of the periodic load balance.
> The best place to choose to migrate them is at their wake up.
> 
> Signed-off-by: Vincent Guittot 
> ---
>  kernel/sched/core.c  |1 +
>  kernel/sched/fair.c  |  109 
> ++
>  kernel/sched/sched.h |1 +
>  3 files changed, 111 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index dab7908..70cadbe 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6131,6 +6131,7 @@ cpu_attach_domain(struct sched_domain *sd, struct 
> root_domain *rd, int cpu)
>   rcu_assign_pointer(rq->sd, sd);
>   destroy_sched_domains(tmp, cpu);
>  
> + update_packing_domain(cpu);
>   update_top_cache_domain(cpu);
>  }
>  
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4f4a4f6..8c9d3ed 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -157,6 +157,63 @@ void sched_init_granularity(void)
>   update_sysctl();
>  }
>  
> +
> +/*
> + * Save the id of the optimal CPU that should be used to pack small tasks
> + * The value -1 is used when no buddy has been found
> + */
> +DEFINE_PER_CPU(int, sd_pack_buddy);
> +
> +/* Look for the best buddy CPU that can be used to pack small tasks
> + * We make the assumption that it doesn't wort to pack on CPU that share the
> + * same powerline. We looks for the 1st sched_domain without the
> + * SD_SHARE_POWERLINE flag. Then We look for the sched_group witht the lowest
> + * power per core based on the assumption that their power efficiency is
> + * better */
> +void update_packing_domain(int cpu)
> +{
> + struct sched_domain *sd;
> + int id = -1;
> +
> + sd = highest_flag_domain(cpu, SD_SHARE_POWERLINE);
> + if (!sd)
> + sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd);
> + else
> + sd = sd->parent;
sd is the highest level where SD_SHARE_POWERLINE is enabled so the sched
groups of the parent level would represent the power domains. If get it
right, we want to pack inside the cluster first and only let first cpu
of the cluster do packing on another cluster. So all cpus - except the
first one - in the current sched domain should find its buddy within the
domain and only the first one should go to the parent sched domain to
find its buddy.

I propose the following fix:

-   sd = sd->parent;
+   if (cpumask_first(sched_domain_span(sd)) == cpu
+   || !sd->parent)
+   sd = sd->parent;


> +
> + while (sd) {
> + struct sched_group *sg = sd->groups;
> + struct sched_group *pack = sg;
> + struct sched_group *tmp = sg->next;
> +
> + /* 1st CPU of the sched domain is a good candidate */
> + if (id == -1)
> + id = cpumask_first(sched_domain_span(sd));

There is no guarantee that id is in the sched group pointed to by
sd->groups, which is implicitly assumed later in the search loop. We
need to find the sched group that contains id and point sg to that
instead. I haven't found an elegant way to find that group, but the fix
below should at least give the right result.

+   /* Find sched group of candidate */
+   tmp = sd->groups;
+   do {
+   if (cpumask_test_cpu(id, sched_group_cpus(tmp)))
+   {
+   sg = tmp;
+   break;
+   }
+   } while (tmp = tmp->next, tmp != sd->groups);
+
+   pack = sg;
+   tmp = sg->next;

Regards,
Morten

> +
> + /* loop the sched groups to find the best one */
> + while (tmp != sg) {
> + if (tmp->sgp->power * sg->group_weight <
> + sg->sgp->power * tmp->group_weight)
> + pack = tmp;
> + tmp = tmp->next;
> + }
> +
> + /* we have found a better group */
> + if (pack != sg)
> + id = cpumask_first(sched_group_

Re: [RFC 3/6] sched: pack small tasks

2012-11-09 Thread Morten Rasmussen
On Fri, Nov 02, 2012 at 10:53:47AM +, Santosh Shilimkar wrote:
> On Monday 29 October 2012 06:42 PM, Vincent Guittot wrote:
> > On 24 October 2012 17:20, Santosh Shilimkar  
> > wrote:
> >> Vincent,
> >>
> >> Few comments/questions.
> >>
> >>
> >> On Sunday 07 October 2012 01:13 PM, Vincent Guittot wrote:
> >>>
> >>> During sched_domain creation, we define a pack buddy CPU if available.
> >>>
> >>> On a system that share the powerline at all level, the buddy is set to -1
> >>>
> >>> On a dual clusters / dual cores system which can powergate each core and
> >>> cluster independantly, the buddy configuration will be :
> >>> | CPU0 | CPU1 | CPU2 | CPU3 |
> >>> ---
> >>> buddy | CPU0 | CPU0 | CPU0 | CPU2 |
> >>
> >>  ^
> >> Is that a typo ? Should it be CPU2 instead of
> >> CPU0 ?
> >
> > No it's not a typo.
> > The system packs at each scheduling level. It starts to pack in
> > cluster because each core can power gate independently so CPU1 tries
> > to pack its tasks in CPU0 and CPU3 in CPU2. Then, it packs at CPU
> > level so CPU2 tries to pack in the cluster of CPU0 and CPU0 packs in
> > itself
> >
> I get it. Though in above example a task may migrate from say
> CPU3->CPU2->CPU0 as part of packing. I was just thinking whether
> moving such task from say CPU3 to CPU0 might be best instead.

To me it seems suboptimal to pack the task twice, but the alternative is
not good either. If you try to move the task directly to CPU0 you may
miss packing opportunities if CPU0 is already busy, while CPU2 might
have enough capacity to take it. It would probably be better to check
the business of CPU0 and then back off and try CPU2 if CP0 is busy. This
would require a buddy list for each CPU rather just a single buddy and
thus might become expensive.

> 
> >>
> >>> Small tasks tend to slip out of the periodic load balance.
> >>> The best place to choose to migrate them is at their wake up.
> >>>
> >> I have tried this series since I was looking at some of these packing
> >> bits. On Mobile workloads like OSIdle with Screen ON, MP3, gallary,
> >> I did see some additional filtering of threads with this series
> >> but its not making much difference in power. More on this below.
> >
> > Can I ask you which configuration you have used ? how many cores and
> > cluster ?  Can they be power gated independently ?
> >
> I have been trying with couple of setups. Dual Core ARM machine and
> Quad core X86 box with single package thought most of the mobile
> workload analysis I was doing on ARM machine. On both setups
> CPUs can be gated independently.
> 
> >>
> >>
> >>> Signed-off-by: Vincent Guittot 
> >>> ---
> >>>kernel/sched/core.c  |1 +
> >>>kernel/sched/fair.c  |  109
> >>> ++
> >>>kernel/sched/sched.h |1 +
> >>>3 files changed, 111 insertions(+)
> >>>
> >>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >>> index dab7908..70cadbe 100644
> >>> --- a/kernel/sched/core.c
> >>> +++ b/kernel/sched/core.c
> >>> @@ -6131,6 +6131,7 @@ cpu_attach_domain(struct sched_domain *sd, struct
> >>> root_domain *rd, int cpu)
> >>>  rcu_assign_pointer(rq->sd, sd);
> >>>  destroy_sched_domains(tmp, cpu);
> >>>
> >>> +   update_packing_domain(cpu);
> >>>  update_top_cache_domain(cpu);
> >>>}
> >>>
> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>> index 4f4a4f6..8c9d3ed 100644
> >>> --- a/kernel/sched/fair.c
> >>> +++ b/kernel/sched/fair.c
> >>> @@ -157,6 +157,63 @@ void sched_init_granularity(void)
> >>>  update_sysctl();
> >>>}
> >>>
> >>> +
> >>> +/*
> >>> + * Save the id of the optimal CPU that should be used to pack small tasks
> >>> + * The value -1 is used when no buddy has been found
> >>> + */
> >>> +DEFINE_PER_CPU(int, sd_pack_buddy);
> >>> +
> >>> +/* Look for the best buddy CPU that can be used to pack small tasks
> >>> + * We make the assumption that it doesn't wort to pack on CPU that share
> >>> the
> >>
> >> s/wort/worth
> >
> > yes
> >
> >>
> >>> + * same powerline. We looks for the 1st sched_domain without the
> >>> + * SD_SHARE_POWERLINE flag. Then We look for the sched_group witht the
> >>> lowest
> >>> + * power per core based on the assumption that their power efficiency is
> >>> + * better */
> >>
> >> Commenting style..
> >> /*
> >>   *
> >>   */
> >>
> >
> > yes
> >
> >> Can you please expand the why the assumption is right ?
> >> "it doesn't wort to pack on CPU that share the same powerline"
> >
> > By "share the same power-line", I mean that the CPUs can't power off
> > independently. So if some CPUs can't power off independently, it's
> > worth to try to use most of them to race to idle.
> >
> In that case I suggest we use different word here. Power line can be
> treated as voltage line, power domain.
> May be SD_SHARE_CPU_POWERDOMAIN ?
> 

How about just SD_SHARE_POWERDOMAIN ?

> >>
> >> Think about a scenario 

Re: Fix for HMP scheduler crash [ Re: [GIT PULL]: big LITTLE MP v10]

2012-10-12 Thread Morten Rasmussen
On Fri, Oct 12, 2012 at 04:33:19PM +0100, Jon Medhurst (Tixy) wrote:
> On Fri, 2012-10-12 at 16:11 +0100, Morten Rasmussen wrote:
> > Hi Tixy,
> > 
> > Thanks for the patch. I think this patch is the right way to solve this
> > issue.
> > 
> > There is still a problem with the priority filter in
> > hmp_down_migration() which Viresh pointed out earlier. There is no
> > checking of whether the task is actually allowed to run on any of the
> > slower cpus. Solving that would actually also fix the issue that you are
> > observing as a side effect. I have attached a patch.
> 
> The patch looks reasonable. I've just run it on TC2 and A9 with the
> addition of a "pr_err("$");" before the "return 1;" and can see the
> occosional '$' on TC2 and none on A9, as we would expect. So I guess
> that counts as:
> 
> Reviewed-by: Jon Medhurst 
> Tested-by: Jon Medhurst 
>

Thanks for reviewing and testing.

My comments to your patch in the previous reply would count as:

Reviewed-by: Morten Rasmussen 

I have only tested it on TC2.

Morten
 
> -- 
> Tixy
> 
> 
> > I think we should apply both.
> > 
> > Thanks,
> > Morten
> > 
> > On Fri, Oct 12, 2012 at 02:33:40PM +0100, Jon Medhurst (Tixy) wrote:
> > > On Fri, 2012-10-12 at 14:19 +0100, Jon Medhurst (Tixy) wrote:
> > > > The attached patch fixes the immediate problem by avoiding the empty
> > > > domain (which is probably a good thing anyway)
> > > 
> > > Oops, my last patch included some extra junk, the one attached to this
> > > mail fixes this...
> > 
> > > From 7365076675b851355d48e9b1157e223d7719e3ac Mon Sep 17 00:00:00 2001
> > > From: Jon Medhurst 
> > > Date: Fri, 12 Oct 2012 13:45:35 +0100
> > > Subject: [PATCH] ARM: sched: Avoid empty 'slow' HMP domain
> > > 
> > > On homogeneous (non-heterogeneous) systems all CPUs will be declared
> > > 'fast' and the slow cpu list will be empty. In this situation we need to
> > > avoid adding an empty slow HMP domain otherwise the scheduler code will
> > > blow up when it attempts to move a task to the slow domain.
> > > 
> > > Signed-off-by: Jon Medhurst 
> > > ---
> > >  arch/arm/kernel/topology.c |   10 ++
> > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
> > > index 58dac7a..0b51233 100644
> > > --- a/arch/arm/kernel/topology.c
> > > +++ b/arch/arm/kernel/topology.c
> > > @@ -396,10 +396,12 @@ void __init arch_get_hmp_domains(struct list_head 
> > > *hmp_domains_list)
> > >* Must be ordered with respect to compute capacity.
> > >* Fastest domain at head of list.
> > >*/
> > > - domain = (struct hmp_domain *)
> > > - kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
> > > - cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask);
> > > - list_add(&domain->hmp_domains, hmp_domains_list);
> > > + if(!cpumask_empty(&hmp_slow_cpu_mask)) {
> > > + domain = (struct hmp_domain *)
> > > + kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
> > > + cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask);
> > > + list_add(&domain->hmp_domains, hmp_domains_list);
> > > + }
> > >   domain = (struct hmp_domain *)
> > >   kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
> > >   cpumask_copy(&domain->cpus, &hmp_fast_cpu_mask);
> > > -- 
> > > 1.7.10.4
> 
> 
> 


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: Fix for HMP scheduler crash [ Re: [GIT PULL]: big LITTLE MP v10]

2012-10-12 Thread Morten Rasmussen
Hi Tixy,

Thanks for the patch. I think this patch is the right way to solve this
issue.

There is still a problem with the priority filter in
hmp_down_migration() which Viresh pointed out earlier. There is no
checking of whether the task is actually allowed to run on any of the
slower cpus. Solving that would actually also fix the issue that you are
observing as a side effect. I have attached a patch.

I think we should apply both.

Thanks,
Morten

On Fri, Oct 12, 2012 at 02:33:40PM +0100, Jon Medhurst (Tixy) wrote:
> On Fri, 2012-10-12 at 14:19 +0100, Jon Medhurst (Tixy) wrote:
> > The attached patch fixes the immediate problem by avoiding the empty
> > domain (which is probably a good thing anyway)
> 
> Oops, my last patch included some extra junk, the one attached to this
> mail fixes this...

> From 7365076675b851355d48e9b1157e223d7719e3ac Mon Sep 17 00:00:00 2001
> From: Jon Medhurst 
> Date: Fri, 12 Oct 2012 13:45:35 +0100
> Subject: [PATCH] ARM: sched: Avoid empty 'slow' HMP domain
> 
> On homogeneous (non-heterogeneous) systems all CPUs will be declared
> 'fast' and the slow cpu list will be empty. In this situation we need to
> avoid adding an empty slow HMP domain otherwise the scheduler code will
> blow up when it attempts to move a task to the slow domain.
> 
> Signed-off-by: Jon Medhurst 
> ---
>  arch/arm/kernel/topology.c |   10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
> index 58dac7a..0b51233 100644
> --- a/arch/arm/kernel/topology.c
> +++ b/arch/arm/kernel/topology.c
> @@ -396,10 +396,12 @@ void __init arch_get_hmp_domains(struct list_head 
> *hmp_domains_list)
>* Must be ordered with respect to compute capacity.
>* Fastest domain at head of list.
>*/
> - domain = (struct hmp_domain *)
> - kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
> - cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask);
> - list_add(&domain->hmp_domains, hmp_domains_list);
> + if(!cpumask_empty(&hmp_slow_cpu_mask)) {
> + domain = (struct hmp_domain *)
> + kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
> + cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask);
> + list_add(&domain->hmp_domains, hmp_domains_list);
> + }
>   domain = (struct hmp_domain *)
>   kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
>   cpumask_copy(&domain->cpus, &hmp_fast_cpu_mask);
> -- 
> 1.7.10.4
>From 9f241c37bb7316eeea56e6c93541352cf5c9b8a8 Mon Sep 17 00:00:00 2001
From: Morten Rasmussen 
Date: Fri, 12 Oct 2012 15:25:02 +0100
Subject: [PATCH] sched: Only down migrate low priority tasks if allowed by
 affinity mask

Adds an extra check intersection of the task affinity mask and the slower
hmp_domain cpumask before down migrating low priority tasks.

Signed-off-by: Morten Rasmussen 
---
 kernel/sched/fair.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 56cbda1..edcf922 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5562,8 +5562,11 @@ static unsigned int hmp_down_migration(int cpu, struct sched_entity *se)
 
 #ifdef CONFIG_SCHED_HMP_PRIO_FILTER
 	/* Filter by task priority */
-	if (p->prio >= hmp_up_prio)
+	if ((p->prio >= hmp_up_prio) &&
+		cpumask_intersects(&hmp_slower_domain(cpu)->cpus,
+	tsk_cpus_allowed(p))) {
 		return 1;
+	}
 #endif
 
 	/* Let the task load settle before doing another down migration */
-- 
1.7.9.5
___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [RFC PATCH 07/10] ARM: sched: Setup SCHED_HMP domains

2012-10-10 Thread Morten Rasmussen
On Thu, Oct 04, 2012 at 07:58:45AM +0100, Viresh Kumar wrote:
> On 22 September 2012 00:02,   wrote:
> > diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
> 
> > +void __init arch_get_hmp_domains(struct list_head *hmp_domains_list)
> > +{
> > +   struct cpumask hmp_fast_cpu_mask;
> > +   struct cpumask hmp_slow_cpu_mask;
> 
> can be merged to single line.
> 
> > +   struct hmp_domain *domain;
> > +
> > +   arch_get_fast_and_slow_cpus(&hmp_fast_cpu_mask, &hmp_slow_cpu_mask);
> > +
> > +   /*
> > +* Initialize hmp_domains
> > +* Must be ordered with respect to compute capacity.
> > +* Fastest domain at head of list.
> > +*/
> > +   domain = (struct hmp_domain *)
> > +   kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
> 
> should be:
> 
> domain = kmalloc(sizeof(*domain), GFP_KERNEL);
> 
> > +   cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask);
> 
> what if kmalloc failed?
> 
> > +   list_add(&domain->hmp_domains, hmp_domains_list);
> > +   domain = (struct hmp_domain *)
> > +   kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
> 
> would be better to kmalloc only once with size 2* sizeof(*domain)
> 
> > +   cpumask_copy(&domain->cpus, &hmp_fast_cpu_mask);
> > +   list_add(&domain->hmp_domains, hmp_domains_list);
> 
> Also would be better to create a macro for above two lines to remove
> code redundancy.
> 

Agree on all of the above.

Thanks,
Morten


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [RFC PATCH 06/10] ARM: sched: Use device-tree to provide fast/slow CPU list for HMP

2012-10-10 Thread Morten Rasmussen
Hi Tixy,

Could you have a look at my code stealing patch below? Since it is
basically a trimmed version of one of your patches I would prefer to
put you as author and have your SOB on it. What is your opinion?

Thanks,
Morten

On Fri, Sep 21, 2012 at 07:32:21PM +0100, Morten Rasmussen wrote:
> From: Morten Rasmussen 
> 
> We can't rely on Kconfig options to set the fast and slow CPU lists for
> HMP scheduling if we want a single kernel binary to support multiple
> devices with different CPU topology. E.g. TC2 (ARM's Test-Chip-2
> big.LITTLE system), Fast Models, or even non big.LITTLE devices.
> 
> This patch adds the function arch_get_fast_and_slow_cpus() to generate
> the lists at run-time by parsing the CPU nodes in device-tree; it
> assumes slow cores are A7s and everything else is fast. The function
> still supports the old Kconfig options as this is useful for testing the
> HMP scheduler on devices without big.LITTLE.
> 
> This patch is reuse of a patch by Jon Medhurst  with a
> few bits left out.
> 
> Signed-off-by: Morten Rasmussen 
> ---
>  arch/arm/Kconfig   |4 ++-
>  arch/arm/kernel/topology.c |   69 
> 
>  2 files changed, 72 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index cb80846..f1271bc 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1588,13 +1588,15 @@ config HMP_FAST_CPU_MASK
>   string "HMP scheduler fast CPU mask"
>   depends on SCHED_HMP
>   help
> -  Specify the cpuids of the fast CPUs in the system as a list string,
> +  Leave empty to use device tree information.
> +   Specify the cpuids of the fast CPUs in the system as a list string,
> e.g. cpuid 0+1 should be specified as 0-1.
>  
>  config HMP_SLOW_CPU_MASK
>   string "HMP scheduler slow CPU mask"
>   depends on SCHED_HMP
>   help
> +   Leave empty to use device tree information.
> Specify the cpuids of the slow CPUs in the system as a list string,
> e.g. cpuid 0+1 should be specified as 0-1.
>  
> diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
> index 26c12c6..7682e12 100644
> --- a/arch/arm/kernel/topology.c
> +++ b/arch/arm/kernel/topology.c
> @@ -317,6 +317,75 @@ void store_cpu_topology(unsigned int cpuid)
>   cpu_topology[cpuid].socket_id, mpidr);
>  }
>  
> +
> +#ifdef CONFIG_SCHED_HMP
> +
> +static const char * const little_cores[] = {
> + "arm,cortex-a7",
> + NULL,
> +};
> +
> +static bool is_little_cpu(struct device_node *cn)
> +{
> + const char * const *lc;
> + for (lc = little_cores; *lc; lc++)
> + if (of_device_is_compatible(cn, *lc))
> + return true;
> + return false;
> +}
> +
> +void __init arch_get_fast_and_slow_cpus(struct cpumask *fast,
> + struct cpumask *slow)
> +{
> + struct device_node *cn = NULL;
> + int cpu = 0;
> +
> + cpumask_clear(fast);
> + cpumask_clear(slow);
> +
> + /*
> +  * Use the config options if they are given. This helps testing
> +  * HMP scheduling on systems without a big.LITTLE architecture.
> +  */
> + if (strlen(CONFIG_HMP_FAST_CPU_MASK) && 
> strlen(CONFIG_HMP_SLOW_CPU_MASK)) {
> + if (cpulist_parse(CONFIG_HMP_FAST_CPU_MASK, fast))
> + WARN(1, "Failed to parse HMP fast cpu mask!\n");
> + if (cpulist_parse(CONFIG_HMP_SLOW_CPU_MASK, slow))
> + WARN(1, "Failed to parse HMP slow cpu mask!\n");
> + return;
> + }
> +
> + /*
> +  * Else, parse device tree for little cores.
> +  */
> + while ((cn = of_find_node_by_type(cn, "cpu"))) {
> +
> + if (cpu >= num_possible_cpus())
> + break;
> +
> + if (is_little_cpu(cn))
> + cpumask_set_cpu(cpu, slow);
> + else
> + cpumask_set_cpu(cpu, fast);
> +
> + cpu++;
> + }
> +
> + if (!cpumask_empty(fast) && !cpumask_empty(slow))
> + return;
> +
> + /*
> +  * We didn't find both big and little cores so let's call all cores
> +  * fast as this will keep the system running, with all cores being
> +  * treated equal.
> +  */
> + cpumask_setall(fast);
> + cpumask_clear(slow);
> +}
> +
> +#endif /* CONFIG_SCHED_HMP */
> +
> +
>  /*
>   * init_cpu_topology is called at boot when only one cpu is running
>   * which prevent simultaneous write access to cpu_topology array
> -- 
> 1.7.9.5
> 


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [RFC PATCH 06/10] ARM: sched: Use device-tree to provide fast/slow CPU list for HMP

2012-10-10 Thread Morten Rasmussen
On Thu, Oct 04, 2012 at 07:49:32AM +0100, Viresh Kumar wrote:
> On 22 September 2012 00:02,   wrote:
> > From: Morten Rasmussen 
> >
> > We can't rely on Kconfig options to set the fast and slow CPU lists for
> > HMP scheduling if we want a single kernel binary to support multiple
> > devices with different CPU topology. E.g. TC2 (ARM's Test-Chip-2
> > big.LITTLE system), Fast Models, or even non big.LITTLE devices.
> >
> > This patch adds the function arch_get_fast_and_slow_cpus() to generate
> > the lists at run-time by parsing the CPU nodes in device-tree; it
> > assumes slow cores are A7s and everything else is fast. The function
> > still supports the old Kconfig options as this is useful for testing the
> > HMP scheduler on devices without big.LITTLE.
> 
> But this code is handling this case too at the end, with following logic:
> 
> > +   cpumask_setall(fast);
> > +   cpumask_clear(slow);
> 
> Am i missing something?
> 

The HMP setup can be defined using Kconfig or DT. If both fails, it will
set all cpus to be fast cpus and effectively disable SCHED_HMP. The
Kconfig option is kept to allow testing of alternative HMP setups
without having to change the DT or use DT at all which might be handy
for non-ARM platforms. I hope that answers you question.

> > This patch is reuse of a patch by Jon Medhurst  with a
> > few bits left out.
> 
> Then probably he must be the author of this commit? Also a SOB is required
> from him here.
> 

I don't know what the correct procedure is for this sort of partial
patch reuse. Since I didn't know better, I adopted Tixy's own reference
style that he used in one of his patches which is an extension of a
previous patch by me. I will of course fix it to follow normal procedure
if there is one.

> > Signed-off-by: Morten Rasmussen 
> > ---
> >  arch/arm/Kconfig   |4 ++-
> >  arch/arm/kernel/topology.c |   69 
> > 
> >  2 files changed, 72 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index cb80846..f1271bc 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -1588,13 +1588,15 @@ config HMP_FAST_CPU_MASK
> > string "HMP scheduler fast CPU mask"
> > depends on SCHED_HMP
> > help
> > -  Specify the cpuids of the fast CPUs in the system as a list 
> > string,
> > +  Leave empty to use device tree information.
> > + Specify the cpuids of the fast CPUs in the system as a list 
> > string,
> >   e.g. cpuid 0+1 should be specified as 0-1.
> >
> >  config HMP_SLOW_CPU_MASK
> > string "HMP scheduler slow CPU mask"
> > depends on SCHED_HMP
> > help
> > + Leave empty to use device tree information.
> >   Specify the cpuids of the slow CPUs in the system as a list 
> > string,
> >   e.g. cpuid 0+1 should be specified as 0-1.
> >
> > diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
> > index 26c12c6..7682e12 100644
> > --- a/arch/arm/kernel/topology.c
> > +++ b/arch/arm/kernel/topology.c
> > @@ -317,6 +317,75 @@ void store_cpu_topology(unsigned int cpuid)
> > cpu_topology[cpuid].socket_id, mpidr);
> >  }
> >
> > +
> > +#ifdef CONFIG_SCHED_HMP
> > +
> > +static const char * const little_cores[] = {
> > +   "arm,cortex-a7",
> > +   NULL,
> > +};
> > +
> > +static bool is_little_cpu(struct device_node *cn)
> > +{
> > +   const char * const *lc;
> > +   for (lc = little_cores; *lc; lc++)
> > +   if (of_device_is_compatible(cn, *lc))
> > +   return true;
> > +   return false;
> > +}
> > +
> > +void __init arch_get_fast_and_slow_cpus(struct cpumask *fast,
> > +   struct cpumask *slow)
> > +{
> > +   struct device_node *cn = NULL;
> > +   int cpu = 0;
> > +
> > +   cpumask_clear(fast);
> > +   cpumask_clear(slow);
> > +
> > +   /*
> > +* Use the config options if they are given. This helps testing
> > +* HMP scheduling on systems without a big.LITTLE architecture.
> > +*/
> > +   if (strlen(CONFIG_HMP_FAST_CPU_MASK) && 
> > strlen(CONFIG_HMP_SLOW_CPU_MASK)) {
> > +   if (cpulist_parse(CONFIG_HMP_FAST_CPU_MASK, fast))
> > + 

Re: [RFC PATCH 04/10] sched: Introduce priority-based task migration filter

2012-10-09 Thread Morten Rasmussen
On Thu, Oct 04, 2012 at 07:27:00AM +0100, Viresh Kumar wrote:
> On 22 September 2012 00:02,   wrote:
> 
> > +config SCHED_HMP_PRIO_FILTER
> > +   bool "(EXPERIMENTAL) Filter HMP migrations by task priority"
> > +   depends on SCHED_HMP
> 
> Should it depend on EXPERIMENTAL?
> 
> > +   help
> > + Enables task priority based HMP migration filter. Any task with
> > + a NICE value above the threshold will always be on low-power cpus
> > + with less compute capacity.
> > +
> > +config SCHED_HMP_PRIO_FILTER_VAL
> > +   int "NICE priority threshold"
> > +   default 5
> > +   depends on SCHED_HMP_PRIO_FILTER
> > +
> >  config HAVE_ARM_SCU
> > bool
> > help
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 490f1f0..8f0f3b9 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3129,9 +3129,12 @@ static int __init hmp_cpu_mask_setup(void)
> >   * hmp_down_threshold: max. load allowed for tasks migrating to a slower 
> > cpu
> >   * The default values (512, 256) offer good responsiveness, but may need
> >   * tweaking suit particular needs.
> > + *
> > + * hmp_up_prio: Only up migrate task with high priority ( >   */
> >  unsigned int hmp_up_threshold = 512;
> >  unsigned int hmp_down_threshold = 256;
> > +unsigned int hmp_up_prio = NICE_TO_PRIO(CONFIG_SCHED_HMP_PRIO_FILTER_VAL);
> >
> >  static unsigned int hmp_up_migration(int cpu, struct sched_entity *se);
> >  static unsigned int hmp_down_migration(int cpu, struct sched_entity *se);
> > @@ -5491,6 +5494,12 @@ static unsigned int hmp_up_migration(int cpu, struct 
> > sched_entity *se)
> > if (hmp_cpu_is_fastest(cpu))
> > return 0;
> >
> > +#ifdef CONFIG_SCHED_HMP_PRIO_FILTER
> > +   /* Filter by task priority */
> > +   if (p->prio >= hmp_up_prio)
> > +   return 0;
> > +#endif
> > +
> > if (cpumask_intersects(&hmp_faster_domain(cpu)->cpus,
> > tsk_cpus_allowed(p))
> > && se->avg.load_avg_ratio > hmp_up_threshold) {
> > @@ -5507,6 +5516,12 @@ static unsigned int hmp_down_migration(int cpu, 
> > struct sched_entity *se)
> > if (hmp_cpu_is_slowest(cpu))
> > return 0;
> >
> > +#ifdef CONFIG_SCHED_HMP_PRIO_FILTER
> > +   /* Filter by task priority */
> > +   if (p->prio >= hmp_up_prio)
> > +   return 1;
> > +#endif
> 
> Even if below cpumask_intersects() fails?
> 

No. Good catch :)

> > if (cpumask_intersects(&hmp_slower_domain(cpu)->cpus,
> > tsk_cpus_allowed(p))
> > && se->avg.load_avg_ratio < hmp_down_threshold) {
> 
> --
> viresh
> 

Thanks,
Morten


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [RFC PATCH 02/10] sched: Task placement for heterogeneous systems based on task load-tracking

2012-10-09 Thread Morten Rasmussen
Hi Viresh,

On Thu, Oct 04, 2012 at 07:02:03AM +0100, Viresh Kumar wrote:
> Hi Morten,
> 
> On 22 September 2012 00:02,   wrote:
> > From: Morten Rasmussen 
> >
> > This patch introduces the basic SCHED_HMP infrastructure. Each class of
> > cpus is represented by a hmp_domain and tasks will only be moved between
> > these domains when their load profiles suggest it is beneficial.
> >
> > SCHED_HMP relies heavily on the task load-tracking introduced in Paul
> > Turners fair group scheduling patch set:
> >
> > <https://lkml.org/lkml/2012/8/23/267>
> >
> > SCHED_HMP requires that the platform implements arch_get_hmp_domains()
> > which should set up the platform specific list of hmp_domains. It is
> > also assumed that the platform disables SD_LOAD_BALANCE for the
> > appropriate sched_domains.
> 
> An explanation of this requirement would be helpful here.
> 

Yes. This is to prevent the load-balancer from moving tasks between
hmp_domains. This will be done exclusively by SCHED_HMP instead to
implement a strict task migration policy and avoid changing the
load-balancer behaviour. The load-balancer will take care of
load-balacing within each hmp_domain.

> > Tasks placement takes place every time a task is to be inserted into
> > a runqueue based on its load history. The task placement decision is
> > based on load thresholds.
> >
> > There are no restrictions on the number of hmp_domains, however,
> > multiple (>2) has not been tested and the up/down migration policy is
> > rather simple.
> >
> > Signed-off-by: Morten Rasmussen 
> > ---
> >  arch/arm/Kconfig  |   17 +
> >  include/linux/sched.h |6 ++
> >  kernel/sched/fair.c   |  168 
> > +
> >  kernel/sched/sched.h  |6 ++
> >  4 files changed, 197 insertions(+)
> >
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index f4a5d58..5b09684 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -1554,6 +1554,23 @@ config SCHED_SMT
> >   MultiThreading at a cost of slightly increased overhead in some
> >   places. If unsure say N here.
> >
> > +config DISABLE_CPU_SCHED_DOMAIN_BALANCE
> > +   bool "(EXPERIMENTAL) Disable CPU level scheduler load-balancing"
> > +   help
> > + Disables scheduler load-balancing at CPU sched domain level.
> 
> Shouldn't this depend on EXPERIMENTAL?
> 

It should. The ongoing discussion about CONFIG_EXPERIMENTAL that Amit is
referring to hasn't come to a conclusion yet.

> > +config SCHED_HMP
> > +   bool "(EXPERIMENTAL) Heterogenous multiprocessor scheduling"
> 
> ditto.
> 
> > +   depends on DISABLE_CPU_SCHED_DOMAIN_BALANCE && SCHED_MC && 
> > FAIR_GROUP_SCHED && !SCHED_AUTOGROUP
> > +   help
> > + Experimental scheduler optimizations for heterogeneous platforms.
> > + Attempts to introspectively select task affinity to optimize power
> > + and performance. Basic support for multiple (>2) cpu types is in 
> > place,
> > + but it has only been tested with two types of cpus.
> > + There is currently no support for migration of task groups, hence
> > + !SCHED_AUTOGROUP. Furthermore, normal load-balancing must be 
> > disabled
> > + between cpus of different type (DISABLE_CPU_SCHED_DOMAIN_BALANCE).
> > +
> >  config HAVE_ARM_SCU
> > bool
> > help
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 81e4e82..df971a3 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1039,6 +1039,12 @@ unsigned long default_scale_smt_power(struct 
> > sched_domain *sd, int cpu);
> >
> >  bool cpus_share_cache(int this_cpu, int that_cpu);
> >
> > +#ifdef CONFIG_SCHED_HMP
> > +struct hmp_domain {
> > +   struct cpumask cpus;
> > +   struct list_head hmp_domains;
> 
> Probably need a better name here. domain_list?
> 

Yes. hmp_domain_list would be better and stick with the hmp_* naming
convention.

> > +};
> > +#endif /* CONFIG_SCHED_HMP */
> >  #else /* CONFIG_SMP */
> >
> >  struct sched_domain_attr;
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 3e17dd5..d80de46 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3077,6 +3077,125 @@ static int select_idle_sibling(struct task_struct 
> > *p, int target)
> > return target;
> &

[RFC PATCH 01/10] sched: entity load-tracking load_avg_ratio

2012-09-21 Thread morten . rasmussen
From: Morten Rasmussen 

This patch adds load_avg_ratio to each task. The load_avg_ratio is a
variant of load_avg_contrib which is not scaled by the task priority. It
is calculated like this:

runnable_avg_sum * NICE_0_LOAD / (runnable_avg_period + 1).

Signed-off-by: Morten Rasmussen 
---
 include/linux/sched.h |1 +
 kernel/sched/fair.c   |3 +++
 2 files changed, 4 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4dc4990..81e4e82 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1151,6 +1151,7 @@ struct sched_avg {
u64 last_runnable_update;
s64 decay_count;
unsigned long load_avg_contrib;
+   unsigned long load_avg_ratio;
u32 usage_avg_sum;
 };
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 095d86c..3e17dd5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1192,6 +1192,9 @@ static inline void __update_task_entity_contrib(struct 
sched_entity *se)
contrib = se->avg.runnable_avg_sum * scale_load_down(se->load.weight);
contrib /= (se->avg.runnable_avg_period + 1);
se->avg.load_avg_contrib = scale_load(contrib);
+   contrib = se->avg.runnable_avg_sum * scale_load_down(NICE_0_LOAD);
+   contrib /= (se->avg.runnable_avg_period + 1);
+   se->avg.load_avg_ratio = scale_load(contrib);
 }
 
 /* Compute the current contribution to load_avg by se, return any delta */
-- 
1.7.9.5



___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


[RFC PATCH 02/10] sched: Task placement for heterogeneous systems based on task load-tracking

2012-09-21 Thread morten . rasmussen
From: Morten Rasmussen 

This patch introduces the basic SCHED_HMP infrastructure. Each class of
cpus is represented by a hmp_domain and tasks will only be moved between
these domains when their load profiles suggest it is beneficial.

SCHED_HMP relies heavily on the task load-tracking introduced in Paul
Turners fair group scheduling patch set:

<https://lkml.org/lkml/2012/8/23/267>

SCHED_HMP requires that the platform implements arch_get_hmp_domains()
which should set up the platform specific list of hmp_domains. It is
also assumed that the platform disables SD_LOAD_BALANCE for the
appropriate sched_domains.
Tasks placement takes place every time a task is to be inserted into
a runqueue based on its load history. The task placement decision is
based on load thresholds.

There are no restrictions on the number of hmp_domains, however,
multiple (>2) has not been tested and the up/down migration policy is
rather simple.

Signed-off-by: Morten Rasmussen 
---
 arch/arm/Kconfig  |   17 +
 include/linux/sched.h |6 ++
 kernel/sched/fair.c   |  168 +
 kernel/sched/sched.h  |6 ++
 4 files changed, 197 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index f4a5d58..5b09684 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1554,6 +1554,23 @@ config SCHED_SMT
  MultiThreading at a cost of slightly increased overhead in some
  places. If unsure say N here.
 
+config DISABLE_CPU_SCHED_DOMAIN_BALANCE
+   bool "(EXPERIMENTAL) Disable CPU level scheduler load-balancing"
+   help
+ Disables scheduler load-balancing at CPU sched domain level.
+
+config SCHED_HMP
+   bool "(EXPERIMENTAL) Heterogenous multiprocessor scheduling"
+   depends on DISABLE_CPU_SCHED_DOMAIN_BALANCE && SCHED_MC && 
FAIR_GROUP_SCHED && !SCHED_AUTOGROUP
+   help
+ Experimental scheduler optimizations for heterogeneous platforms.
+ Attempts to introspectively select task affinity to optimize power
+ and performance. Basic support for multiple (>2) cpu types is in 
place,
+ but it has only been tested with two types of cpus.
+ There is currently no support for migration of task groups, hence
+ !SCHED_AUTOGROUP. Furthermore, normal load-balancing must be disabled
+ between cpus of different type (DISABLE_CPU_SCHED_DOMAIN_BALANCE).
+
 config HAVE_ARM_SCU
bool
help
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 81e4e82..df971a3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1039,6 +1039,12 @@ unsigned long default_scale_smt_power(struct 
sched_domain *sd, int cpu);
 
 bool cpus_share_cache(int this_cpu, int that_cpu);
 
+#ifdef CONFIG_SCHED_HMP
+struct hmp_domain {
+   struct cpumask cpus;
+   struct list_head hmp_domains;
+};
+#endif /* CONFIG_SCHED_HMP */
 #else /* CONFIG_SMP */
 
 struct sched_domain_attr;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3e17dd5..d80de46 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3077,6 +3077,125 @@ static int select_idle_sibling(struct task_struct *p, 
int target)
return target;
 }
 
+#ifdef CONFIG_SCHED_HMP
+/*
+ * Heterogenous multiprocessor (HMP) optimizations
+ *
+ * The cpu types are distinguished using a list of hmp_domains
+ * which each represent one cpu type using a cpumask.
+ * The list is assumed ordered by compute capacity with the
+ * fastest domain first.
+ */
+DEFINE_PER_CPU(struct hmp_domain *, hmp_cpu_domain);
+
+extern void __init arch_get_hmp_domains(struct list_head *hmp_domains_list);
+
+/* Setup hmp_domains */
+static int __init hmp_cpu_mask_setup(void)
+{
+   char buf[64];
+   struct hmp_domain *domain;
+   struct list_head *pos;
+   int dc, cpu;
+
+   pr_debug("Initializing HMP scheduler:\n");
+
+   /* Initialize hmp_domains using platform code */
+   arch_get_hmp_domains(&hmp_domains);
+   if (list_empty(&hmp_domains)) {
+   pr_debug("HMP domain list is empty!\n");
+   return 0;
+   }
+
+   /* Print hmp_domains */
+   dc = 0;
+   list_for_each(pos, &hmp_domains) {
+   domain = list_entry(pos, struct hmp_domain, hmp_domains);
+   cpulist_scnprintf(buf, 64, &domain->cpus);
+   pr_debug("  HMP domain %d: %s\n", dc, buf);
+
+   for_each_cpu_mask(cpu, domain->cpus) {
+   per_cpu(hmp_cpu_domain, cpu) = domain;
+   }
+   dc++;
+   }
+
+   return 1;
+}
+
+/*
+ * Migration thresholds should be in the range [0..1023]
+ * hmp_up_threshold: min. load required for migrating tasks to a faster cpu
+ * hmp_down_threshold: max. load allowed for tasks migrating to a slower cpu
+ * The default values (512, 256) offer good responsiveness, but may need
+ *

[RFC PATCH 04/10] sched: Introduce priority-based task migration filter

2012-09-21 Thread morten . rasmussen
From: Morten Rasmussen 

Introduces a priority threshold which prevents low priority task
from migrating to faster hmp_domains (cpus). This is useful for
user-space software which assigns lower task priority to background
task.

Signed-off-by: Morten Rasmussen 
---
 arch/arm/Kconfig|   13 +
 kernel/sched/fair.c |   15 +++
 2 files changed, 28 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 5b09684..05de193 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1571,6 +1571,19 @@ config SCHED_HMP
  !SCHED_AUTOGROUP. Furthermore, normal load-balancing must be disabled
  between cpus of different type (DISABLE_CPU_SCHED_DOMAIN_BALANCE).
 
+config SCHED_HMP_PRIO_FILTER
+   bool "(EXPERIMENTAL) Filter HMP migrations by task priority"
+   depends on SCHED_HMP
+   help
+ Enables task priority based HMP migration filter. Any task with
+ a NICE value above the threshold will always be on low-power cpus
+ with less compute capacity.
+
+config SCHED_HMP_PRIO_FILTER_VAL
+   int "NICE priority threshold"
+   default 5
+   depends on SCHED_HMP_PRIO_FILTER
+
 config HAVE_ARM_SCU
bool
help
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 490f1f0..8f0f3b9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3129,9 +3129,12 @@ static int __init hmp_cpu_mask_setup(void)
  * hmp_down_threshold: max. load allowed for tasks migrating to a slower cpu
  * The default values (512, 256) offer good responsiveness, but may need
  * tweaking suit particular needs.
+ *
+ * hmp_up_prio: Only up migrate task with high priority (prio >= hmp_up_prio)
+   return 0;
+#endif
+
if (cpumask_intersects(&hmp_faster_domain(cpu)->cpus,
tsk_cpus_allowed(p))
&& se->avg.load_avg_ratio > hmp_up_threshold) {
@@ -5507,6 +5516,12 @@ static unsigned int hmp_down_migration(int cpu, struct 
sched_entity *se)
if (hmp_cpu_is_slowest(cpu))
return 0;
 
+#ifdef CONFIG_SCHED_HMP_PRIO_FILTER
+   /* Filter by task priority */
+   if (p->prio >= hmp_up_prio)
+   return 1;
+#endif
+
if (cpumask_intersects(&hmp_slower_domain(cpu)->cpus,
tsk_cpus_allowed(p))
&& se->avg.load_avg_ratio < hmp_down_threshold) {
-- 
1.7.9.5



___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


[RFC PATCH 10/10] sched: SCHED_HMP multi-domain task migration control

2012-09-21 Thread morten . rasmussen
From: Morten Rasmussen 

We need a way to prevent tasks that are migrating up and down the
hmp_domains from migrating straight on through before the load has
adapted to the new compute capacity of the CPU on the new hmp_domain.
This patch adds a next up/down migration delay that prevents the task
from doing another migration in the same direction until the delay
has expired.

Signed-off-by: Morten Rasmussen 
---
 include/linux/sched.h |4 
 kernel/sched/core.c   |4 
 kernel/sched/fair.c   |   38 ++
 3 files changed, 46 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index df971a3..ca3890a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1158,6 +1158,10 @@ struct sched_avg {
s64 decay_count;
unsigned long load_avg_contrib;
unsigned long load_avg_ratio;
+#ifdef CONFIG_SCHED_HMP
+   u64 hmp_last_up_migration;
+   u64 hmp_last_down_migration;
+#endif
u32 usage_avg_sum;
 };
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 652b86b..a3b1ff6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1723,6 +1723,10 @@ static void __sched_fork(struct task_struct *p)
 #if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
p->se.avg.runnable_avg_period = 0;
p->se.avg.runnable_avg_sum = 0;
+#ifdef CONFIG_SCHED_HMP
+   p->se.avg.hmp_last_up_migration = 0;
+   p->se.avg.hmp_last_down_migration = 0;
+#endif
 #endif
 #ifdef CONFIG_SCHEDSTATS
memset(&p->se.statistics, 0, sizeof(p->se.statistics));
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 811b2b9..56cbda1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3138,10 +3138,14 @@ static int __init hmp_cpu_mask_setup(void)
  * tweaking suit particular needs.
  *
  * hmp_up_prio: Only up migrate task with high priority (cfs;
+
+   se->avg.hmp_last_up_migration = cfs_rq_clock_task(cfs_rq);
+   se->avg.hmp_last_down_migration = 0;
+}
+
+static inline void hmp_next_down_delay(struct sched_entity *se, int cpu)
+{
+   struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs;
+
+   se->avg.hmp_last_down_migration = cfs_rq_clock_task(cfs_rq);
+   se->avg.hmp_last_up_migration = 0;
+}
 #endif /* CONFIG_SCHED_HMP */
 
 /*
@@ -3335,11 +3354,13 @@ unlock:
 #ifdef CONFIG_SCHED_HMP
if (hmp_up_migration(prev_cpu, &p->se)) {
new_cpu = hmp_select_faster_cpu(p, prev_cpu);
+   hmp_next_up_delay(&p->se, new_cpu);
trace_sched_hmp_migrate(p, new_cpu, 0);
return new_cpu;
}
if (hmp_down_migration(prev_cpu, &p->se)) {
new_cpu = hmp_select_slower_cpu(p, prev_cpu);
+   hmp_next_down_delay(&p->se, new_cpu);
trace_sched_hmp_migrate(p, new_cpu, 0);
return new_cpu;
}
@@ -5503,6 +5524,8 @@ static void nohz_idle_balance(int this_cpu, enum 
cpu_idle_type idle) { }
 static unsigned int hmp_up_migration(int cpu, struct sched_entity *se)
 {
struct task_struct *p = task_of(se);
+   struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs;
+   u64 now;
 
if (hmp_cpu_is_fastest(cpu))
return 0;
@@ -5513,6 +5536,12 @@ static unsigned int hmp_up_migration(int cpu, struct 
sched_entity *se)
return 0;
 #endif
 
+   /* Let the task load settle before doing another up migration */
+   now = cfs_rq_clock_task(cfs_rq);
+   if (((now - se->avg.hmp_last_up_migration) >> 10)
+   < hmp_next_up_threshold)
+   return 0;
+
if (cpumask_intersects(&hmp_faster_domain(cpu)->cpus,
tsk_cpus_allowed(p))
&& se->avg.load_avg_ratio > hmp_up_threshold) {
@@ -5525,6 +5554,8 @@ static unsigned int hmp_up_migration(int cpu, struct 
sched_entity *se)
 static unsigned int hmp_down_migration(int cpu, struct sched_entity *se)
 {
struct task_struct *p = task_of(se);
+   struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs;
+   u64 now;
 
if (hmp_cpu_is_slowest(cpu))
return 0;
@@ -5535,6 +5566,12 @@ static unsigned int hmp_down_migration(int cpu, struct 
sched_entity *se)
return 1;
 #endif
 
+   /* Let the task load settle before doing another down migration */
+   now = cfs_rq_clock_task(cfs_rq);
+   if (((now - se->avg.hmp_last_down_migration) >> 10)
+   < hmp_next_down_threshold)
+   return 0;
+
if (cpumask_intersects(&hmp_slower_domain(cpu)->cpus,
tsk_cpus_allowed(p))
&& se->avg.load_avg_ratio < hmp_down_threshold) {
@@ -5725,6 +5762,7 @@ static void hmp_force_up_migration(int this_cpu)
 

[RFC PATCH 08/10] sched: Add ftrace events for entity load-tracking

2012-09-21 Thread morten . rasmussen
From: Morten Rasmussen 

Adds ftrace events for key variables related to the entity
load-tracking to help debugging scheduler behaviour. Allows tracing
of load contribution and runqueue residency ratio for both entities
and runqueues as well as entity CPU usage ratio.

Signed-off-by: Morten Rasmussen 
---
 include/trace/events/sched.h |  125 ++
 kernel/sched/fair.c  |7 +++
 2 files changed, 132 insertions(+)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 5a8671e..847eb76 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -430,6 +430,131 @@ TRACE_EVENT(sched_pi_setprio,
__entry->oldprio, __entry->newprio)
 );
 
+/*
+ * Tracepoint for showing tracked load contribution.
+ */
+TRACE_EVENT(sched_task_load_contrib,
+
+   TP_PROTO(struct task_struct *tsk, unsigned long load_contrib),
+
+   TP_ARGS(tsk, load_contrib),
+
+   TP_STRUCT__entry(
+   __array(char, comm, TASK_COMM_LEN)
+   __field(pid_t, pid)
+   __field(unsigned long, load_contrib)
+   ),
+
+   TP_fast_assign(
+   memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+   __entry->pid= tsk->pid;
+   __entry->load_contrib   = load_contrib;
+   ),
+
+   TP_printk("comm=%s pid=%d load_contrib=%lu",
+   __entry->comm, __entry->pid,
+   __entry->load_contrib)
+);
+
+/*
+ * Tracepoint for showing tracked task runnable ratio [0..1023].
+ */
+TRACE_EVENT(sched_task_runnable_ratio,
+
+   TP_PROTO(struct task_struct *tsk, unsigned long ratio),
+
+   TP_ARGS(tsk, ratio),
+
+   TP_STRUCT__entry(
+   __array(char, comm, TASK_COMM_LEN)
+   __field(pid_t, pid)
+   __field(unsigned long, ratio)
+   ),
+
+   TP_fast_assign(
+   memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+   __entry->pid   = tsk->pid;
+   __entry->ratio = ratio;
+   ),
+
+   TP_printk("comm=%s pid=%d ratio=%lu",
+   __entry->comm, __entry->pid,
+   __entry->ratio)
+);
+
+/*
+ * Tracepoint for showing tracked rq runnable ratio [0..1023].
+ */
+TRACE_EVENT(sched_rq_runnable_ratio,
+
+   TP_PROTO(int cpu, unsigned long ratio),
+
+   TP_ARGS(cpu, ratio),
+
+   TP_STRUCT__entry(
+   __field(int, cpu)
+   __field(unsigned long, ratio)
+   ),
+
+   TP_fast_assign(
+   __entry->cpu   = cpu;
+   __entry->ratio = ratio;
+   ),
+
+   TP_printk("cpu=%d ratio=%lu",
+   __entry->cpu,
+   __entry->ratio)
+);
+
+/*
+ * Tracepoint for showing tracked rq runnable load.
+ */
+TRACE_EVENT(sched_rq_runnable_load,
+
+   TP_PROTO(int cpu, u64 load),
+
+   TP_ARGS(cpu, load),
+
+   TP_STRUCT__entry(
+   __field(int, cpu)
+   __field(u64, load)
+   ),
+
+   TP_fast_assign(
+   __entry->cpu  = cpu;
+   __entry->load = load;
+   ),
+
+   TP_printk("cpu=%d load=%llu",
+   __entry->cpu,
+   __entry->load)
+);
+
+/*
+ * Tracepoint for showing tracked task cpu usage ratio [0..1023].
+ */
+TRACE_EVENT(sched_task_usage_ratio,
+
+   TP_PROTO(struct task_struct *tsk, unsigned long ratio),
+
+   TP_ARGS(tsk, ratio),
+
+   TP_STRUCT__entry(
+   __array(char, comm, TASK_COMM_LEN)
+   __field(pid_t, pid)
+   __field(unsigned long, ratio)
+   ),
+
+   TP_fast_assign(
+   memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+   __entry->pid   = tsk->pid;
+   __entry->ratio = ratio;
+   ),
+
+   TP_printk("comm=%s pid=%d ratio=%lu",
+   __entry->comm, __entry->pid,
+   __entry->ratio)
+);
 #endif /* _TRACE_SCHED_H */
 
 /* This part must be outside protection */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8f0f3b9..0be53be 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1192,9 +1192,11 @@ static inline void __update_task_entity_contrib(struct 
sched_entity *se)
contrib = se->avg.runnable_avg_sum * scale_load_down(se->load.weight);
contrib /= (se->avg.runnable_avg_period + 1);
se->avg.load_avg_contrib = scale_load(contrib);
+   trace_sched_task_load_contrib(task_of(se), se->avg.load_avg_contrib);
contrib = se->avg.runnable_avg_sum * scale_load_down(NICE_0_LOAD);
contrib /= (se->avg.runnable_avg_period + 1);
se->avg.load_avg_ratio = scale_load(contrib);
+   trace_sched_task_runnable_ratio(task_of(se), se->avg.load_avg

[RFC PATCH 00/10] sched: Task placement for heterogeneous MP systems

2012-09-21 Thread morten . rasmussen
From: Morten Rasmussen 

Hi Paul, Paul, Peter, Suresh, linaro-sched-sig, and LKML,

As a follow-up on my Linux Plumbers Conference talk about my experiments with
scheduling on heterogeneous systems I'm posting a proof-of-concept patch set
with my modifications. The intention behind the modifications is to tweak
scheduling behaviour to only use fast (and power hungry) cores when it is
necessary and also improve performance consistency. Without the modifications
it is more or less random where tasks are scheduled and so is the execution
time.

I'm seeing good improvements on performance consistency for web browsing on
Android using Bbench <http://www.gem5.org/Bbench> on the ARM big.LITTLE TC2
chip, which has two fast cores (Cortex-A15) and three power-efficient cores
(Cortex-A7). The total execution time numbers below are for Androids
SurfaceFlinger process is key for page rendering performance. The average
execution time is lower with the patches enabled and the standard deviation is
much smaller. Similar improvements can be seen for the Android.Browser and
WebViewCoreThread processes.

Total execution time statistics based on 50 runs.

SurfaceFlinger  SMP kernel [s]  HMP modifications [s]
--
Average 14.617  11.012
St. Dev. 4.577   0.902
10% Pctl.9.343  10.783
90% Pctl.   18.743  11.695

Unfortunately, I cannot share power-efficiency numbers at this stage.

This patch set introduces proof-of-concept scheduler modifications which
attempt to improve scheduling decisions on heterogeneous multi-processor
systems (HMP) such as ARM big.LITTLE systems. The patch set relies on the
entity load-tracking re-work patch set by Paul Turner:

<https://lkml.org/lkml/2012/8/23/267>

The modifications attempt to migrate tasks between cores with different
compute capacity depending on the tracked load and priority. The aim is
to only use fast cores for tasks which really need the extra performance
and thereby improve power consumption by running everything else on the
slow cores.

The patch introduces hmp_domains to represent the different types of cores
that are available on the given platform. Multiple (>2) hmp_domains is
supported but not tested. hmp_domains must be set up by platform code and
the patch set includes patches for ARM platforms using device-tree.

The patches intentionally try to avoid modifying the existing code paths
as much as possible. The aim is to experiment with HMP scheduling and get
the overall policy right before integrating it properly with the existing
load-balancer.

Morten

Morten Rasmussen (10):
  sched: entity load-tracking load_avg_ratio
  sched: Task placement for heterogeneous systems based on task
load-tracking
  sched: Forced task migration on heterogeneous systems
  sched: Introduce priority-based task migration filter
  ARM: Add HMP scheduling support for ARM architecture
  ARM: sched: Use device-tree to provide fast/slow CPU list for HMP
  ARM: sched: Setup SCHED_HMP domains
  sched: Add ftrace events for entity load-tracking
  sched: Add HMP task migration ftrace event
  sched: SCHED_HMP multi-domain task migration control

 arch/arm/Kconfig|   46 +
 arch/arm/include/asm/topology.h |   32 +++
 arch/arm/kernel/topology.c  |   91 
 include/linux/sched.h   |   11 +
 include/trace/events/sched.h|  153 ++
 kernel/sched/core.c |4 +
 kernel/sched/fair.c |  434 ++-
 kernel/sched/sched.h|9 +
 8 files changed, 779 insertions(+), 1 deletion(-)

-- 
1.7.9.5



___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


[RFC PATCH 07/10] ARM: sched: Setup SCHED_HMP domains

2012-09-21 Thread morten . rasmussen
From: Morten Rasmussen 

SCHED_HMP requires the different cpu types to be represented by an
ordered list of hmp_domains. Each hmp_domain represents all cpus of
a particular type using a cpumask.

The list is platform specific and therefore must be generated by
platform code by implementing arch_get_hmp_domains().

Signed-off-by: Morten Rasmussen 
---
 arch/arm/kernel/topology.c |   22 ++
 1 file changed, 22 insertions(+)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 7682e12..ec8ad5c 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -383,6 +383,28 @@ void __init arch_get_fast_and_slow_cpus(struct cpumask 
*fast,
cpumask_clear(slow);
 }
 
+void __init arch_get_hmp_domains(struct list_head *hmp_domains_list)
+{
+   struct cpumask hmp_fast_cpu_mask;
+   struct cpumask hmp_slow_cpu_mask;
+   struct hmp_domain *domain;
+
+   arch_get_fast_and_slow_cpus(&hmp_fast_cpu_mask, &hmp_slow_cpu_mask);
+
+   /*
+* Initialize hmp_domains
+* Must be ordered with respect to compute capacity.
+* Fastest domain at head of list.
+*/
+   domain = (struct hmp_domain *)
+   kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
+   cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask);
+   list_add(&domain->hmp_domains, hmp_domains_list);
+   domain = (struct hmp_domain *)
+   kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
+   cpumask_copy(&domain->cpus, &hmp_fast_cpu_mask);
+   list_add(&domain->hmp_domains, hmp_domains_list);
+}
 #endif /* CONFIG_SCHED_HMP */
 
 
-- 
1.7.9.5



___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


[RFC PATCH 06/10] ARM: sched: Use device-tree to provide fast/slow CPU list for HMP

2012-09-21 Thread morten . rasmussen
From: Morten Rasmussen 

We can't rely on Kconfig options to set the fast and slow CPU lists for
HMP scheduling if we want a single kernel binary to support multiple
devices with different CPU topology. E.g. TC2 (ARM's Test-Chip-2
big.LITTLE system), Fast Models, or even non big.LITTLE devices.

This patch adds the function arch_get_fast_and_slow_cpus() to generate
the lists at run-time by parsing the CPU nodes in device-tree; it
assumes slow cores are A7s and everything else is fast. The function
still supports the old Kconfig options as this is useful for testing the
HMP scheduler on devices without big.LITTLE.

This patch is reuse of a patch by Jon Medhurst  with a
few bits left out.

Signed-off-by: Morten Rasmussen 
---
 arch/arm/Kconfig   |4 ++-
 arch/arm/kernel/topology.c |   69 
 2 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index cb80846..f1271bc 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1588,13 +1588,15 @@ config HMP_FAST_CPU_MASK
string "HMP scheduler fast CPU mask"
depends on SCHED_HMP
help
-  Specify the cpuids of the fast CPUs in the system as a list string,
+  Leave empty to use device tree information.
+ Specify the cpuids of the fast CPUs in the system as a list string,
  e.g. cpuid 0+1 should be specified as 0-1.
 
 config HMP_SLOW_CPU_MASK
string "HMP scheduler slow CPU mask"
depends on SCHED_HMP
help
+ Leave empty to use device tree information.
  Specify the cpuids of the slow CPUs in the system as a list string,
  e.g. cpuid 0+1 should be specified as 0-1.
 
diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 26c12c6..7682e12 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -317,6 +317,75 @@ void store_cpu_topology(unsigned int cpuid)
cpu_topology[cpuid].socket_id, mpidr);
 }
 
+
+#ifdef CONFIG_SCHED_HMP
+
+static const char * const little_cores[] = {
+   "arm,cortex-a7",
+   NULL,
+};
+
+static bool is_little_cpu(struct device_node *cn)
+{
+   const char * const *lc;
+   for (lc = little_cores; *lc; lc++)
+   if (of_device_is_compatible(cn, *lc))
+   return true;
+   return false;
+}
+
+void __init arch_get_fast_and_slow_cpus(struct cpumask *fast,
+   struct cpumask *slow)
+{
+   struct device_node *cn = NULL;
+   int cpu = 0;
+
+   cpumask_clear(fast);
+   cpumask_clear(slow);
+
+   /*
+* Use the config options if they are given. This helps testing
+* HMP scheduling on systems without a big.LITTLE architecture.
+*/
+   if (strlen(CONFIG_HMP_FAST_CPU_MASK) && 
strlen(CONFIG_HMP_SLOW_CPU_MASK)) {
+   if (cpulist_parse(CONFIG_HMP_FAST_CPU_MASK, fast))
+   WARN(1, "Failed to parse HMP fast cpu mask!\n");
+   if (cpulist_parse(CONFIG_HMP_SLOW_CPU_MASK, slow))
+   WARN(1, "Failed to parse HMP slow cpu mask!\n");
+   return;
+   }
+
+   /*
+* Else, parse device tree for little cores.
+*/
+   while ((cn = of_find_node_by_type(cn, "cpu"))) {
+
+   if (cpu >= num_possible_cpus())
+   break;
+
+   if (is_little_cpu(cn))
+   cpumask_set_cpu(cpu, slow);
+   else
+   cpumask_set_cpu(cpu, fast);
+
+   cpu++;
+   }
+
+   if (!cpumask_empty(fast) && !cpumask_empty(slow))
+   return;
+
+   /*
+* We didn't find both big and little cores so let's call all cores
+* fast as this will keep the system running, with all cores being
+* treated equal.
+*/
+   cpumask_setall(fast);
+   cpumask_clear(slow);
+}
+
+#endif /* CONFIG_SCHED_HMP */
+
+
 /*
  * init_cpu_topology is called at boot when only one cpu is running
  * which prevent simultaneous write access to cpu_topology array
-- 
1.7.9.5



___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


[RFC PATCH 03/10] sched: Forced task migration on heterogeneous systems

2012-09-21 Thread morten . rasmussen
From: Morten Rasmussen 

This patch introduces forced task migration for moving suitable
currently running tasks between hmp_domains. Task behaviour is likely
to change over time. Tasks running in a less capable hmp_domain may
change to become more demanding and should therefore be migrated up.
They are unlikely go through the select_task_rq_fair() path anytime
soon and therefore need special attention.

This patch introduces a period check (SCHED_TICK) of the currently
running task on all runqueues and sets up a forced migration using
stop_machine_no_wait() if the task needs to be migrated.

Ideally, this should not be implemented by polling all runqueues.

Signed-off-by: Morten Rasmussen 
---
 kernel/sched/fair.c  |  196 +-
 kernel/sched/sched.h |3 +
 2 files changed, 198 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d80de46..490f1f0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3744,7 +3744,6 @@ int can_migrate_task(struct task_struct *p, struct lb_env 
*env)
 * 1) task is cache cold, or
 * 2) too many balance attempts have failed.
 */
-
tsk_cache_hot = task_hot(p, env->src_rq->clock_task, env->sd);
if (!tsk_cache_hot ||
env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
@@ -5516,6 +5515,199 @@ static unsigned int hmp_down_migration(int cpu, struct 
sched_entity *se)
return 0;
 }
 
+/*
+ * hmp_can_migrate_task - may task p from runqueue rq be migrated to this_cpu?
+ * Ideally this function should be merged with can_migrate_task() to avoid
+ * redundant code.
+ */
+static int hmp_can_migrate_task(struct task_struct *p, struct lb_env *env)
+{
+   int tsk_cache_hot = 0;
+
+   /*
+* We do not migrate tasks that are:
+* 1) running (obviously), or
+* 2) cannot be migrated to this CPU due to cpus_allowed
+*/
+   if (!cpumask_test_cpu(env->dst_cpu, tsk_cpus_allowed(p))) {
+   schedstat_inc(p, se.statistics.nr_failed_migrations_affine);
+   return 0;
+   }
+   env->flags &= ~LBF_ALL_PINNED;
+
+   if (task_running(env->src_rq, p)) {
+   schedstat_inc(p, se.statistics.nr_failed_migrations_running);
+   return 0;
+   }
+
+   /*
+* Aggressive migration if:
+* 1) task is cache cold, or
+* 2) too many balance attempts have failed.
+*/
+
+   tsk_cache_hot = task_hot(p, env->src_rq->clock_task, env->sd);
+   if (!tsk_cache_hot ||
+   env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
+#ifdef CONFIG_SCHEDSTATS
+   if (tsk_cache_hot) {
+   schedstat_inc(env->sd, lb_hot_gained[env->idle]);
+   schedstat_inc(p, se.statistics.nr_forced_migrations);
+   }
+#endif
+   return 1;
+   }
+
+   return 1;
+}
+
+/*
+ * move_specific_task tries to move a specific task.
+ * Returns 1 if successful and 0 otherwise.
+ * Called with both runqueues locked.
+ */
+static int move_specific_task(struct lb_env *env, struct task_struct *pm)
+{
+   struct task_struct *p, *n;
+
+   list_for_each_entry_safe(p, n, &env->src_rq->cfs_tasks, se.group_node) {
+   if (throttled_lb_pair(task_group(p), env->src_rq->cpu,
+   env->dst_cpu))
+   continue;
+
+   if (!hmp_can_migrate_task(p, env))
+   continue;
+   /* Check if we found the right task */
+   if (p != pm)
+   continue;
+
+   move_task(p, env);
+   /*
+* Right now, this is only the third place move_task()
+* is called, so we can safely collect move_task()
+* stats here rather than inside move_task().
+*/
+   schedstat_inc(env->sd, lb_gained[env->idle]);
+   return 1;
+   }
+   return 0;
+}
+
+/*
+ * hmp_active_task_migration_cpu_stop is run by cpu stopper and used to
+ * migrate a specific task from one runqueue to another.
+ * hmp_force_up_migration uses this to push a currently running task
+ * off a runqueue.
+ * Based on active_load_balance_stop_cpu and can potentially be merged.
+ */
+static int hmp_active_task_migration_cpu_stop(void *data)
+{
+   struct rq *busiest_rq = data;
+   struct task_struct *p = busiest_rq->migrate_task;
+   int busiest_cpu = cpu_of(busiest_rq);
+   int target_cpu = busiest_rq->push_cpu;
+   struct rq *target_rq = cpu_rq(target_cpu);
+   struct sched_domain *sd;
+
+   raw_spin_lock_irq(&busiest_rq->lock);
+   /* make sure the requested cpu hasn't gone down in the meantime */
+   if (unlikely(busiest_cpu != smp_processor_id() ||
+   !

[RFC PATCH 09/10] sched: Add HMP task migration ftrace event

2012-09-21 Thread morten . rasmussen
From: Morten Rasmussen 

Adds ftrace event for tracing task migrations using HMP
optimized scheduling.

Signed-off-by: Morten Rasmussen 
---
 include/trace/events/sched.h |   28 
 kernel/sched/fair.c  |   15 +++
 2 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 847eb76..501aa32 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -555,6 +555,34 @@ TRACE_EVENT(sched_task_usage_ratio,
__entry->comm, __entry->pid,
__entry->ratio)
 );
+
+/*
+ * Tracepoint for HMP (CONFIG_SCHED_HMP) task migrations.
+ */
+TRACE_EVENT(sched_hmp_migrate,
+
+   TP_PROTO(struct task_struct *tsk, int dest, int force),
+
+   TP_ARGS(tsk, dest, force),
+
+   TP_STRUCT__entry(
+   __array(char, comm, TASK_COMM_LEN)
+   __field(pid_t, pid)
+   __field(int,  dest)
+   __field(int,  force)
+   ),
+
+   TP_fast_assign(
+   memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+   __entry->pid   = tsk->pid;
+   __entry->dest  = dest;
+   __entry->force = force;
+   ),
+
+   TP_printk("comm=%s pid=%d dest=%d force=%d",
+   __entry->comm, __entry->pid,
+   __entry->dest, __entry->force)
+);
 #endif /* _TRACE_SCHED_H */
 
 /* This part must be outside protection */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0be53be..811b2b9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -,10 +,16 @@ unlock:
rcu_read_unlock();
 
 #ifdef CONFIG_SCHED_HMP
-   if (hmp_up_migration(prev_cpu, &p->se))
-   return hmp_select_faster_cpu(p, prev_cpu);
-   if (hmp_down_migration(prev_cpu, &p->se))
-   return hmp_select_slower_cpu(p, prev_cpu);
+   if (hmp_up_migration(prev_cpu, &p->se)) {
+   new_cpu = hmp_select_faster_cpu(p, prev_cpu);
+   trace_sched_hmp_migrate(p, new_cpu, 0);
+   return new_cpu;
+   }
+   if (hmp_down_migration(prev_cpu, &p->se)) {
+   new_cpu = hmp_select_slower_cpu(p, prev_cpu);
+   trace_sched_hmp_migrate(p, new_cpu, 0);
+   return new_cpu;
+   }
/* Make sure that the task stays in its previous hmp domain */
if (!cpumask_test_cpu(new_cpu, &hmp_cpu_domain(prev_cpu)->cpus))
return prev_cpu;
@@ -5718,6 +5724,7 @@ static void hmp_force_up_migration(int this_cpu)
target->push_cpu = hmp_select_faster_cpu(p, 
cpu);
target->migrate_task = p;
force = 1;
+   trace_sched_hmp_migrate(p, target->push_cpu, 1);
}
}
raw_spin_unlock_irqrestore(&target->lock, flags);
-- 
1.7.9.5



___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


[RFC PATCH 05/10] ARM: Add HMP scheduling support for ARM architecture

2012-09-21 Thread morten . rasmussen
From: Morten Rasmussen 

Adds Kconfig entries to enable HMP scheduling on ARM platforms.
Currently, it disables CPU level sched_domain load-balacing in order
to simplify things. This needs fixing in a later revision. HMP
scheduling will do the load-balancing at this level instead.

Signed-off-by: Morten Rasmussen 
---
 arch/arm/Kconfig|   14 ++
 arch/arm/include/asm/topology.h |   32 
 2 files changed, 46 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 05de193..cb80846 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1584,6 +1584,20 @@ config SCHED_HMP_PRIO_FILTER_VAL
default 5
depends on SCHED_HMP_PRIO_FILTER
 
+config HMP_FAST_CPU_MASK
+   string "HMP scheduler fast CPU mask"
+   depends on SCHED_HMP
+   help
+  Specify the cpuids of the fast CPUs in the system as a list string,
+ e.g. cpuid 0+1 should be specified as 0-1.
+
+config HMP_SLOW_CPU_MASK
+   string "HMP scheduler slow CPU mask"
+   depends on SCHED_HMP
+   help
+ Specify the cpuids of the slow CPUs in the system as a list string,
+ e.g. cpuid 0+1 should be specified as 0-1.
+
 config HAVE_ARM_SCU
bool
help
diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
index 58b8b84..13a03de 100644
--- a/arch/arm/include/asm/topology.h
+++ b/arch/arm/include/asm/topology.h
@@ -27,6 +27,38 @@ void init_cpu_topology(void);
 void store_cpu_topology(unsigned int cpuid);
 const struct cpumask *cpu_coregroup_mask(int cpu);
 
+#ifdef CONFIG_DISABLE_CPU_SCHED_DOMAIN_BALANCE
+/* Common values for CPUs */
+#ifndef SD_CPU_INIT
+#define SD_CPU_INIT (struct sched_domain) {\
+   .min_interval   = 1,\
+   .max_interval   = 4,\
+   .busy_factor= 64,   \
+   .imbalance_pct  = 125,  \
+   .cache_nice_tries   = 1,\
+   .busy_idx   = 2,\
+   .idle_idx   = 1,\
+   .newidle_idx= 0,\
+   .wake_idx   = 0,\
+   .forkexec_idx   = 0,\
+   \
+   .flags  = 0*SD_LOAD_BALANCE \
+   | 1*SD_BALANCE_NEWIDLE  \
+   | 1*SD_BALANCE_EXEC \
+   | 1*SD_BALANCE_FORK \
+   | 0*SD_BALANCE_WAKE \
+   | 1*SD_WAKE_AFFINE  \
+   | 0*SD_PREFER_LOCAL \
+   | 0*SD_SHARE_CPUPOWER   \
+   | 0*SD_SHARE_PKG_RESOURCES  \
+   | 0*SD_SERIALIZE\
+   ,   \
+   .last_balance= jiffies, \
+   .balance_interval   = 1,\
+}
+#endif
+#endif /* CONFIG_DISABLE_CPU_SCHED_DOMAIN_BALANCE */
+
 #else
 
 static inline void init_cpu_topology(void) { }
-- 
1.7.9.5



___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [GIT PULL] bit-LITTLE-MP-v7 - IMPORTANT

2012-09-05 Thread Morten Rasmussen
Hi Viresh,

On Mon, Sep 03, 2012 at 06:21:26AM +0100, Viresh Kumar wrote:
> On 28 August 2012 10:37, Viresh Kumar  wrote:
> > I have updated
> >
> > https://wiki.linaro.org/WorkingGroups/PowerManagement/Process/bigLittleMPTree
> >
> > as per our last discussion. Please see if i have missed something.
> 
> Hi Guys,
> 
> I will be sending PULL request of big-LITTLE-MP-v7 today as per schedule.
> Do let me know if you want anything to be included in it before that.
> 
> @Morten: What should i do with patch reported by Santosh:
> 
> ARM-Add-HMP-scheduling-support-for-ARM-architecture
> 
> Do i need to apply it over your branch?

The patch is already in the original patch set, so I'm not sure why it
is missing.

http://linux-arm.org/git?p=arm-bls.git;a=commit;h=1416200dd62551aa9ac4aa207b0c66651ccbff2c

It needs to be there for the HMP scheduling to work.

Regards,
Morten


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev