Re: HMP patches v2
On 02/01/13 10:29, Vincent Guittot wrote: On 2 January 2013 06:28, Viresh Kumar wrote: On 20 December 2012 13:41, Vincent Guittot wrote: On 19 December 2012 11:57, Morten Rasmussen wrote: If I understand the new version of "sched: secure access to other CPU statistics" correctly, the effect of the patch is: Without the patch the cpu will appear to be busy if sum/period are not coherent (sum>period). The same is true with the patch except in the case where nr_running is 0. In this particular case the cpu will appear not to be busy. I assume there is good reason why this particular case is important? Sorry for this late reply. It's not really more important than other but it's one case we can safely detect to prevent spurious spread of tasks. In addition, The incoherency occurs if both value are close so nr_running == 0 was the only condition that left to be tested In any case the patch is fine by me. Hmm... I am still confused :( We have two patches from ARM, do let me know if i can drop these: I think you can drop them as they don't apply anymore for V2. Morten, do you confirm ? Confirmed. I don't see any problems with the v2 patch. The overhead of the check should be minimal. Morten Vincent commit 3f1dff11ac95eda2772bef577e368bc124bfe087 Author: Morten Rasmussen Date: Fri Nov 16 18:32:40 2012 + ARM: TC2: Re-enable SD_SHARE_POWERLINE Re-enable SD_SHARE_POWERLINE to reflect the power domains of TC2. arch/arm/kernel/topology.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit e8cceacd3913e3a3e955614bacc1bc81866bc243 Author: Liviu Dudau Date: Fri Nov 16 18:32:38 2012 + Revert "sched: secure access to other CPU statistics" This reverts commit 2aa14d0379cc54bc0ec44adb7a2e0ad02ae293d0. The way this functionality is implemented is under review and the current implementation is considered not safe. Signed-of-by: Liviu Dudau kernel/sched/fair.c | 19 ++- 1 file changed, 2 insertions(+), 17 deletions(-) -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Re: HMP patches v2
On 19/12/12 09:34, Viresh Kumar wrote: On 19 December 2012 14:53, Vincent Guittot wrote: Le 19 déc. 2012 07:34, "Viresh Kumar" a écrit : Can we resolve this issue now? I don't want anything during the release period this time. The new version of the patchset should solve the concerns of everybody Morten, Can you confirm or cross-check that? Branch is: sched-pack-small-tasks-v2 If I understand the new version of "sched: secure access to other CPU statistics" correctly, the effect of the patch is: Without the patch the cpu will appear to be busy if sum/period are not coherent (sum>period). The same is true with the patch except in the case where nr_running is 0. In this particular case the cpu will appear not to be busy. I assume there is good reason why this particular case is important? In any case the patch is fine by me. Morten -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Re: [HMP][PATCH 0/1] Global balance
On 07/12/12 14:54, Viresh Kumar wrote: On 7 December 2012 18:43, Morten Rasmussen wrote: I should have included the numbers in the cover letter. Here are numbers for TC2. sysbench (normalized execution time, lower is better) threads 2 4 8 HMP 1.00 1.00 1.00 HMP+GB1.00 0.67 0.58 coremark (normalized iterations per second, higher is better) threads 2 4 8 HMP 1.00 1.00 1.00 HMP+GB 1.00 1.39 1.73 So there is clear benefit of utilizing the A7s. It actually saves energy too as the whole benchmark completes faster. Hi Morten, I have applied your patch now and pushed v13. Please cross-check v13 to see if everything is correct. It looks right to me. Morten -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Re: [HMP][PATCH 0/1] Global balance
Hi Amit, I should have included the numbers in the cover letter. Here are numbers for TC2. sysbench (normalized execution time, lower is better) threads 2 4 8 HMP 1.00 1.00 1.00 HMP+GB1.00 0.67 0.58 coremark (normalized iterations per second, higher is better) threads 2 4 8 HMP 1.00 1.00 1.00 HMP+GB 1.00 1.39 1.73 So there is clear benefit of utilizing the A7s. It actually saves energy too as the whole benchmark completes faster. Regards, Morten On Fri, Dec 7, 2012 at 12:14 PM, Amit Kucheria wrote: > > On Fri, Dec 7, 2012 at 5:33 PM, Morten Rasmussen > wrote: > > Hi Viresh, > > > > Here is a patch that introduces global load balancing on top of the > > existing HMP > > patch set. It depends on the HMP patches already present in your > > task-placement-v2 > > branch. It can be applied on top of the HMP sysfs patches if needed. The > > fix should > > be trivial. > > > > Could you include in the MP branch for the 12.12 release? Testing with > > sysbench and > > coremark show significant performance improvements for parallel workloads > > as all > > cpus can now be used for cpu intensive tasks. > > Morten, > > Can you share some performance number improvements and/or > kernelshark-type graphs with and without this patch? It'd be very > interesting to see the changes. > > Monday is the deadline to get this merged into the MP tree to make it > to the release. It is end of week now. Not sure how much testing and > review can be done before Monday. Your numbers might make a compelling > argument. > > Regards, > Amit > > > Thanks, > > Morten > > > > Morten Rasmussen (1): > > sched: Basic global balancing support for HMP > > > > kernel/sched/fair.c | 101 > > +-- > > 1 file changed, 97 insertions(+), 4 deletions(-) > > > > -- > > 1.7.9.5 > > > > > > > > ___ > > linaro-dev mailing list > > linaro-dev@lists.linaro.org > > http://lists.linaro.org/mailman/listinfo/linaro-dev > > ___ > linaro-dev mailing list > linaro-dev@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/linaro-dev ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
[HMP][PATCH 1/1] sched: Basic global balancing support for HMP
This patch introduces an extra-check at task up-migration to prevent overloading the cpus in the faster hmp_domain while the slower hmp_domain is not fully utilized. The patch also introduces a periodic balance check that can down-migrate tasks if the faster domain is oversubscribed and the slower is under-utilized. Signed-off-by: Morten Rasmussen --- kernel/sched/fair.c | 101 +-- 1 file changed, 97 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1cfe112..7ac47c9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3249,6 +3249,80 @@ static inline void hmp_next_down_delay(struct sched_entity *se, int cpu) se->avg.hmp_last_down_migration = cfs_rq_clock_task(cfs_rq); se->avg.hmp_last_up_migration = 0; } + +static inline unsigned int hmp_domain_min_load(struct hmp_domain *hmpd, + int *min_cpu) +{ + int cpu; + int min_load = INT_MAX; + int min_cpu_temp = NR_CPUS; + + for_each_cpu_mask(cpu, hmpd->cpus) { + if (cpu_rq(cpu)->cfs.tg_load_contrib < min_load) { + min_load = cpu_rq(cpu)->cfs.tg_load_contrib; + min_cpu_temp = cpu; + } + } + + if (min_cpu) + *min_cpu = min_cpu_temp; + + return min_load; +} + +/* + * Calculate the task starvation + * This is the ratio of actually running time vs. runnable time. + * If the two are equal the task is getting the cpu time it needs or + * it is alone on the cpu and the cpu is fully utilized. + */ +static inline unsigned int hmp_task_starvation(struct sched_entity *se) +{ + u32 starvation; + + starvation = se->avg.usage_avg_sum * scale_load_down(NICE_0_LOAD); + starvation /= (se->avg.runnable_avg_sum + 1); + + return scale_load(starvation); +} + +static inline unsigned int hmp_offload_down(int cpu, struct sched_entity *se) +{ + int min_usage; + int dest_cpu = NR_CPUS; + + if (hmp_cpu_is_slowest(cpu)) + return NR_CPUS; + + /* Is the current domain fully loaded? */ + /* load < ~94% */ + min_usage = hmp_domain_min_load(hmp_cpu_domain(cpu), NULL); + if (min_usage < NICE_0_LOAD-64) + return NR_CPUS; + + /* Is the cpu oversubscribed? */ + /* load < ~194% */ + if (cpu_rq(cpu)->cfs.tg_load_contrib < 2*NICE_0_LOAD-64) + return NR_CPUS; + + /* Is the task alone on the cpu? */ + if (cpu_rq(cpu)->cfs.nr_running < 2) + return NR_CPUS; + + /* Is the task actually starving? */ + if (hmp_task_starvation(se) > 768) /* <25% waiting */ + return NR_CPUS; + + /* Does the slower domain have spare cycles? */ + min_usage = hmp_domain_min_load(hmp_slower_domain(cpu), &dest_cpu); + /* load > 50% */ + if (min_usage > NICE_0_LOAD/2) + return NR_CPUS; + + if (cpumask_test_cpu(dest_cpu, &hmp_slower_domain(cpu)->cpus)) + return dest_cpu; + return NR_CPUS; +} #endif /* CONFIG_SCHED_HMP */ /* @@ -5643,10 +5717,14 @@ static unsigned int hmp_up_migration(int cpu, struct sched_entity *se) < hmp_next_up_threshold) return 0; - if (se->avg.load_avg_ratio > hmp_up_threshold && - cpumask_intersects(&hmp_faster_domain(cpu)->cpus, - tsk_cpus_allowed(p))) { - return 1; + if (se->avg.load_avg_ratio > hmp_up_threshold) { + /* Target domain load < ~94% */ + if (hmp_domain_min_load(hmp_faster_domain(cpu), NULL) + > NICE_0_LOAD-64) + return 0; + if (cpumask_intersects(&hmp_faster_domain(cpu)->cpus, + tsk_cpus_allowed(p))) + return 1; } return 0; } @@ -5868,6 +5946,21 @@ static void hmp_force_up_migration(int this_cpu) hmp_next_up_delay(&p->se, target->push_cpu); } } + if (!force && !target->active_balance) { + /* +* For now we just check the currently running task. +* Selecting the lightest task for offloading will +* require extensive book keeping. +*/ + target->push_cpu = hmp_offload_down(cpu, curr); + if (target->push_cpu < NR_CPUS) { + target->active_balance = 1; + tar
[HMP][PATCH 0/1] Global balance
Hi Viresh, Here is a patch that introduces global load balancing on top of the existing HMP patch set. It depends on the HMP patches already present in your task-placement-v2 branch. It can be applied on top of the HMP sysfs patches if needed. The fix should be trivial. Could you include in the MP branch for the 12.12 release? Testing with sysbench and coremark show significant performance improvements for parallel workloads as all cpus can now be used for cpu intensive tasks. Thanks, Morten Morten Rasmussen (1): sched: Basic global balancing support for HMP kernel/sched/fair.c | 101 +-- 1 file changed, 97 insertions(+), 4 deletions(-) -- 1.7.9.5 ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Re: HMP patches v2
On 05/12/12 11:35, Viresh Kumar wrote: On 5 December 2012 16:58, Morten Rasmussen wrote: I tested Vincent's fix ("sched: pack small tasks: fix update packing domain") for the buddy selection some weeks ago and confirmed that it works. So my quick fixes are no longer necessary. The issues around the reverted "sched: secure access to other CPU statistics" have not yet been resolved. I don't think that we should re-enable it until we are clear about what it is doing. There are four patches i am carrying from ARM 4a29297 ARM: TC2: Re-enable SD_SHARE_POWERLINE a1924a4 sched: SD_SHARE_POWERLINE buddy selection fix 39b0e77 Revert "sched: secure access to other CPU statistics" eed72c8 Revert "sched: pack small tasks: fix update packing domain" You want me to drop eed72c8 and a1924a4 ? Correct. Yes. Morten -- viresh -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Re: HMP patches v2
On 05/12/12 11:01, Viresh Kumar wrote: On 5 December 2012 16:28, Liviu Dudau wrote: The revert request came at Morten's suggestion. He has comments on the code and technical reasons why he believes that the approach is not the best one as well as some scenarios where possible race conditions can occur. Morten, what is the latest update in this area. I'm not sure I have followed your discussion with Vincent on the subject. Just to make it more clear.. There are two reverts now. Please look at the latest tree/branches. Vincent has provided another fixup patch after which he commented we no longer need Mortens fix. I have reverted that too, for the moment to keep things same as the last release. Can Morten test with latest patches from Vincent (from his branch) ? And provide fixups again ? Hi, I tested Vincent's fix ("sched: pack small tasks: fix update packing domain") for the buddy selection some weeks ago and confirmed that it works. So my quick fixes are no longer necessary. The issues around the reverted "sched: secure access to other CPU statistics" have not yet been resolved. I don't think that we should re-enable it until we are clear about what it is doing. Morten -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Re: [RFC 3/6] sched: pack small tasks
Hi Vincent, On Mon, Nov 12, 2012 at 01:51:00PM +, Vincent Guittot wrote: > On 9 November 2012 18:13, Morten Rasmussen wrote: > > Hi Vincent, > > > > I have experienced suboptimal buddy selection on a dual cluster setup > > (ARM TC2) if SD_SHARE_POWERLINE is enabled at MC level and disabled at > > CPU level. This seems to be the correct flag settings for a system with > > only cluster level power gating. > > > > To me it looks like update_packing_domain() is not doing the right > > thing. See inline comments below. > > Hi Morten, > > Thanks for testing the patches. > > It seems that I have too optimized the loop and remove some use cases. > > > > > On Sun, Oct 07, 2012 at 08:43:55AM +0100, Vincent Guittot wrote: > >> During sched_domain creation, we define a pack buddy CPU if available. > >> > >> On a system that share the powerline at all level, the buddy is set to -1 > >> > >> On a dual clusters / dual cores system which can powergate each core and > >> cluster independantly, the buddy configuration will be : > >> | CPU0 | CPU1 | CPU2 | CPU3 | > >> --- > >> buddy | CPU0 | CPU0 | CPU0 | CPU2 | > >> > >> Small tasks tend to slip out of the periodic load balance. > >> The best place to choose to migrate them is at their wake up. > >> > >> Signed-off-by: Vincent Guittot > >> --- > >> kernel/sched/core.c |1 + > >> kernel/sched/fair.c | 109 > >> ++ > >> kernel/sched/sched.h |1 + > >> 3 files changed, 111 insertions(+) > >> > >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c > >> index dab7908..70cadbe 100644 > >> --- a/kernel/sched/core.c > >> +++ b/kernel/sched/core.c > >> @@ -6131,6 +6131,7 @@ cpu_attach_domain(struct sched_domain *sd, struct > >> root_domain *rd, int cpu) > >> rcu_assign_pointer(rq->sd, sd); > >> destroy_sched_domains(tmp, cpu); > >> > >> + update_packing_domain(cpu); > >> update_top_cache_domain(cpu); > >> } > >> > >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > >> index 4f4a4f6..8c9d3ed 100644 > >> --- a/kernel/sched/fair.c > >> +++ b/kernel/sched/fair.c > >> @@ -157,6 +157,63 @@ void sched_init_granularity(void) > >> update_sysctl(); > >> } > >> > >> + > >> +/* > >> + * Save the id of the optimal CPU that should be used to pack small tasks > >> + * The value -1 is used when no buddy has been found > >> + */ > >> +DEFINE_PER_CPU(int, sd_pack_buddy); > >> + > >> +/* Look for the best buddy CPU that can be used to pack small tasks > >> + * We make the assumption that it doesn't wort to pack on CPU that share > >> the > >> + * same powerline. We looks for the 1st sched_domain without the > >> + * SD_SHARE_POWERLINE flag. Then We look for the sched_group witht the > >> lowest > >> + * power per core based on the assumption that their power efficiency is > >> + * better */ > >> +void update_packing_domain(int cpu) > >> +{ > >> + struct sched_domain *sd; > >> + int id = -1; > >> + > >> + sd = highest_flag_domain(cpu, SD_SHARE_POWERLINE); > >> + if (!sd) > >> + sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); > >> + else > >> + sd = sd->parent; > > sd is the highest level where SD_SHARE_POWERLINE is enabled so the sched > > groups of the parent level would represent the power domains. If get it > > right, we want to pack inside the cluster first and only let first cpu > > You probably wanted to use sched_group instead of cluster because > cluster is only a special use case, didn't you ? > > > of the cluster do packing on another cluster. So all cpus - except the > > first one - in the current sched domain should find its buddy within the > > domain and only the first one should go to the parent sched domain to > > find its buddy. > > We don't want to pack in the current sched_domain because it shares > power domain. We want to pack at the parent level > Yes. I think we mean the same thing. The packing takes place at the parent sched_domain but the sched_group that we are looking at only contains the cpus of the level below. > > > > I propose the following fix: > > > > -
Re: [HMP tunables v2][PATCH 3/7] ARM: TC2: Re-enable SD_SHARE_POWERLINE
On 19/11/12 14:09, Vincent Guittot wrote: On 19 November 2012 14:36, Morten Rasmussen wrote: On 19/11/12 12:23, Vincent Guittot wrote: On 19 November 2012 13:08, Morten Rasmussen wrote: Hi Vincent, On 19/11/12 09:20, Vincent Guittot wrote: Hi, On 16 November 2012 19:32, Liviu Dudau wrote: From: Morten Rasmussen Re-enable SD_SHARE_POWERLINE to reflect the power domains of TC2. --- arch/arm/kernel/topology.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c index 317dac6..4d34e0e 100644 --- a/arch/arm/kernel/topology.c +++ b/arch/arm/kernel/topology.c @@ -228,7 +228,7 @@ struct cputopo_arm cpu_topology[NR_CPUS]; int arch_sd_share_power_line(void) { - return 0*SD_SHARE_POWERLINE; + return 1*SD_SHARE_POWERLINE; I'm not sure to catch your goal. With this modification, the power line (or power domain) is shared at all level which should disable the packing mechanism. But in a previous patch you fix the update packing loop so I assume that you want to use it. Which kind of configuration you would like to have among the proposal below ? cpu : CPU0 | CPU1 | CPU2 | CPU3 | CPU4 buddy conf 1 : CPU2 | CPU0 | CPU2 | CPU2 | CPU2 buddy conf 2 : CPU2 | CPU2 | CPU2 | CPU2 | CPU2 buddy conf 3 : -1 | -1 | -1 | -1 | -1 When we look at the git://git.linaro.org/arm/big.LITTLE/mp.git big-LITTLE-MP-master-v12, we can see that you have defined a custom sched_domain which hasn't been updated with SD_SHARE_POWERLINE flag so the flag is cleared at CPU level. Based on this, I would say that you want buddy conf 2 ? but I would say that buddy conf 1 should give better result. Have you tried both ? My goal with this fix is to set up the SD_SHARE_POWERLINE flags as they really are on TC2. It could have been done more elegantly. Since the HMP patches overrides the sched_domain flags at CPU level the SD_SHARE_POWERLINE is not being set by arch_sd_share_power_line(). With this fix we will get SD_SHARE_POWERLINE at MC level and no SD_SHARE_POWERLINE at CPU level, which I believe is the correct set up for TC2. For the buddy configuration the goal is to get configuration 1 in your list above. You should get that when using the other patch to fix the buddy selection algorithm. I'm not sure if conf 1 or 2 is best. I think it depends on the power/performance trade-off of the specific platform. conf 1 may lead to CPU1->CPU0->CPU2 migrations which may be undesirable. If your cpus are very leaky it might make sense to not do packing at all inside a high performance cluster and always do packing directly on a another low power cluster like conf 2. I think this needs further investigation. I have only tested with conf 1 on TC2. Hi Morten, Conf1 is the default configuration for ARM platform because SD_SHARE_POWERLINE is cleared at all levels for this architecture. Conf2 should be used if you can't powergate the core independently but several tests have demonstrated that even if you can't powergate each core independently, it worth packing small task on few CPUs in a core so it's worth using conf1 on TC2 as well. Based on your explanation, we should use the original configuration of SD_SHARE_POWERLINE (cleared at all level for ARM platform) I agree that the result is the same, but I don't like disabling SD_SHARE_POWERLINE for all level when the cpus in each cluster actually are in the same power domain as it is the case on TC2. The name SHARE_POWERLINE implies a clear relation to the actual hardware design, thus setting the flags differently than the actual hardware design is misleading in my opinion. If the buddy selection algorithm doesn't select appropriate buddies when flags are set to reflect the actual hardware design I would suggest changing the buddy selection algorithm instead of changing the sched_domain flags. If it is chosen to not have a direct relation between the flags and the hardware design, I think that the flag should be renamed so it doesn't give the wrong impression. There is a direct link between the powergating and the SHARE_POWERLINE and if you want that the buddy selection strictly reflects your HW configuration, you must use conf2 and not conf1. I just want the buddy selection to be reasonable when the SHARE_POWERLINE flags are reflecting the true hardware configuration. I haven't tested whether conf 1 or 2 is best yet. As long as I am getting one them it is definitely an improvement over not having task packing at all :) Now, beside the packing small task patch and the TC2 configuration, it has been proven that packing small tasks on an ARM platform (dual cortex-A9) which can only powergate the cluster, improves the power consumption of some low cpu load use cases like the MP3 playback (we had used cpu hotplug at that time). This assumption has been proven only for ARM platform and that's why
Re: [HMP tunables v2][PATCH 3/7] ARM: TC2: Re-enable SD_SHARE_POWERLINE
On 19/11/12 12:23, Vincent Guittot wrote: On 19 November 2012 13:08, Morten Rasmussen wrote: Hi Vincent, On 19/11/12 09:20, Vincent Guittot wrote: Hi, On 16 November 2012 19:32, Liviu Dudau wrote: From: Morten Rasmussen Re-enable SD_SHARE_POWERLINE to reflect the power domains of TC2. --- arch/arm/kernel/topology.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c index 317dac6..4d34e0e 100644 --- a/arch/arm/kernel/topology.c +++ b/arch/arm/kernel/topology.c @@ -228,7 +228,7 @@ struct cputopo_arm cpu_topology[NR_CPUS]; int arch_sd_share_power_line(void) { - return 0*SD_SHARE_POWERLINE; + return 1*SD_SHARE_POWERLINE; I'm not sure to catch your goal. With this modification, the power line (or power domain) is shared at all level which should disable the packing mechanism. But in a previous patch you fix the update packing loop so I assume that you want to use it. Which kind of configuration you would like to have among the proposal below ? cpu : CPU0 | CPU1 | CPU2 | CPU3 | CPU4 buddy conf 1 : CPU2 | CPU0 | CPU2 | CPU2 | CPU2 buddy conf 2 : CPU2 | CPU2 | CPU2 | CPU2 | CPU2 buddy conf 3 : -1 | -1 | -1 | -1 | -1 When we look at the git://git.linaro.org/arm/big.LITTLE/mp.git big-LITTLE-MP-master-v12, we can see that you have defined a custom sched_domain which hasn't been updated with SD_SHARE_POWERLINE flag so the flag is cleared at CPU level. Based on this, I would say that you want buddy conf 2 ? but I would say that buddy conf 1 should give better result. Have you tried both ? My goal with this fix is to set up the SD_SHARE_POWERLINE flags as they really are on TC2. It could have been done more elegantly. Since the HMP patches overrides the sched_domain flags at CPU level the SD_SHARE_POWERLINE is not being set by arch_sd_share_power_line(). With this fix we will get SD_SHARE_POWERLINE at MC level and no SD_SHARE_POWERLINE at CPU level, which I believe is the correct set up for TC2. For the buddy configuration the goal is to get configuration 1 in your list above. You should get that when using the other patch to fix the buddy selection algorithm. I'm not sure if conf 1 or 2 is best. I think it depends on the power/performance trade-off of the specific platform. conf 1 may lead to CPU1->CPU0->CPU2 migrations which may be undesirable. If your cpus are very leaky it might make sense to not do packing at all inside a high performance cluster and always do packing directly on a another low power cluster like conf 2. I think this needs further investigation. I have only tested with conf 1 on TC2. Hi Morten, Conf1 is the default configuration for ARM platform because SD_SHARE_POWERLINE is cleared at all levels for this architecture. Conf2 should be used if you can't powergate the core independently but several tests have demonstrated that even if you can't powergate each core independently, it worth packing small task on few CPUs in a core so it's worth using conf1 on TC2 as well. Based on your explanation, we should use the original configuration of SD_SHARE_POWERLINE (cleared at all level for ARM platform) I agree that the result is the same, but I don't like disabling SD_SHARE_POWERLINE for all level when the cpus in each cluster actually are in the same power domain as it is the case on TC2. The name SHARE_POWERLINE implies a clear relation to the actual hardware design, thus setting the flags differently than the actual hardware design is misleading in my opinion. If the buddy selection algorithm doesn't select appropriate buddies when flags are set to reflect the actual hardware design I would suggest changing the buddy selection algorithm instead of changing the sched_domain flags. If it is chosen to not have a direct relation between the flags and the hardware design, I think that the flag should be renamed so it doesn't give the wrong impression. Morten Regards Vincent Regards, Morten Regards, Vincent } const struct cpumask *cpu_coregroup_mask(int cpu) -- 1.7.9.5 ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Re: [HMP tunables v2][PATCH 3/7] ARM: TC2: Re-enable SD_SHARE_POWERLINE
Hi Vincent, On 19/11/12 09:20, Vincent Guittot wrote: Hi, On 16 November 2012 19:32, Liviu Dudau wrote: From: Morten Rasmussen Re-enable SD_SHARE_POWERLINE to reflect the power domains of TC2. --- arch/arm/kernel/topology.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c index 317dac6..4d34e0e 100644 --- a/arch/arm/kernel/topology.c +++ b/arch/arm/kernel/topology.c @@ -228,7 +228,7 @@ struct cputopo_arm cpu_topology[NR_CPUS]; int arch_sd_share_power_line(void) { - return 0*SD_SHARE_POWERLINE; + return 1*SD_SHARE_POWERLINE; I'm not sure to catch your goal. With this modification, the power line (or power domain) is shared at all level which should disable the packing mechanism. But in a previous patch you fix the update packing loop so I assume that you want to use it. Which kind of configuration you would like to have among the proposal below ? cpu : CPU0 | CPU1 | CPU2 | CPU3 | CPU4 buddy conf 1 : CPU2 | CPU0 | CPU2 | CPU2 | CPU2 buddy conf 2 : CPU2 | CPU2 | CPU2 | CPU2 | CPU2 buddy conf 3 : -1 | -1 | -1 | -1 | -1 When we look at the git://git.linaro.org/arm/big.LITTLE/mp.git big-LITTLE-MP-master-v12, we can see that you have defined a custom sched_domain which hasn't been updated with SD_SHARE_POWERLINE flag so the flag is cleared at CPU level. Based on this, I would say that you want buddy conf 2 ? but I would say that buddy conf 1 should give better result. Have you tried both ? My goal with this fix is to set up the SD_SHARE_POWERLINE flags as they really are on TC2. It could have been done more elegantly. Since the HMP patches overrides the sched_domain flags at CPU level the SD_SHARE_POWERLINE is not being set by arch_sd_share_power_line(). With this fix we will get SD_SHARE_POWERLINE at MC level and no SD_SHARE_POWERLINE at CPU level, which I believe is the correct set up for TC2. For the buddy configuration the goal is to get configuration 1 in your list above. You should get that when using the other patch to fix the buddy selection algorithm. I'm not sure if conf 1 or 2 is best. I think it depends on the power/performance trade-off of the specific platform. conf 1 may lead to CPU1->CPU0->CPU2 migrations which may be undesirable. If your cpus are very leaky it might make sense to not do packing at all inside a high performance cluster and always do packing directly on a another low power cluster like conf 2. I think this needs further investigation. I have only tested with conf 1 on TC2. Regards, Morten Regards, Vincent } const struct cpumask *cpu_coregroup_mask(int cpu) -- 1.7.9.5 ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Re: [RFC 3/6] sched: pack small tasks
Hi Vincent, I have experienced suboptimal buddy selection on a dual cluster setup (ARM TC2) if SD_SHARE_POWERLINE is enabled at MC level and disabled at CPU level. This seems to be the correct flag settings for a system with only cluster level power gating. To me it looks like update_packing_domain() is not doing the right thing. See inline comments below. On Sun, Oct 07, 2012 at 08:43:55AM +0100, Vincent Guittot wrote: > During sched_domain creation, we define a pack buddy CPU if available. > > On a system that share the powerline at all level, the buddy is set to -1 > > On a dual clusters / dual cores system which can powergate each core and > cluster independantly, the buddy configuration will be : > | CPU0 | CPU1 | CPU2 | CPU3 | > --- > buddy | CPU0 | CPU0 | CPU0 | CPU2 | > > Small tasks tend to slip out of the periodic load balance. > The best place to choose to migrate them is at their wake up. > > Signed-off-by: Vincent Guittot > --- > kernel/sched/core.c |1 + > kernel/sched/fair.c | 109 > ++ > kernel/sched/sched.h |1 + > 3 files changed, 111 insertions(+) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index dab7908..70cadbe 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6131,6 +6131,7 @@ cpu_attach_domain(struct sched_domain *sd, struct > root_domain *rd, int cpu) > rcu_assign_pointer(rq->sd, sd); > destroy_sched_domains(tmp, cpu); > > + update_packing_domain(cpu); > update_top_cache_domain(cpu); > } > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 4f4a4f6..8c9d3ed 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -157,6 +157,63 @@ void sched_init_granularity(void) > update_sysctl(); > } > > + > +/* > + * Save the id of the optimal CPU that should be used to pack small tasks > + * The value -1 is used when no buddy has been found > + */ > +DEFINE_PER_CPU(int, sd_pack_buddy); > + > +/* Look for the best buddy CPU that can be used to pack small tasks > + * We make the assumption that it doesn't wort to pack on CPU that share the > + * same powerline. We looks for the 1st sched_domain without the > + * SD_SHARE_POWERLINE flag. Then We look for the sched_group witht the lowest > + * power per core based on the assumption that their power efficiency is > + * better */ > +void update_packing_domain(int cpu) > +{ > + struct sched_domain *sd; > + int id = -1; > + > + sd = highest_flag_domain(cpu, SD_SHARE_POWERLINE); > + if (!sd) > + sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); > + else > + sd = sd->parent; sd is the highest level where SD_SHARE_POWERLINE is enabled so the sched groups of the parent level would represent the power domains. If get it right, we want to pack inside the cluster first and only let first cpu of the cluster do packing on another cluster. So all cpus - except the first one - in the current sched domain should find its buddy within the domain and only the first one should go to the parent sched domain to find its buddy. I propose the following fix: - sd = sd->parent; + if (cpumask_first(sched_domain_span(sd)) == cpu + || !sd->parent) + sd = sd->parent; > + > + while (sd) { > + struct sched_group *sg = sd->groups; > + struct sched_group *pack = sg; > + struct sched_group *tmp = sg->next; > + > + /* 1st CPU of the sched domain is a good candidate */ > + if (id == -1) > + id = cpumask_first(sched_domain_span(sd)); There is no guarantee that id is in the sched group pointed to by sd->groups, which is implicitly assumed later in the search loop. We need to find the sched group that contains id and point sg to that instead. I haven't found an elegant way to find that group, but the fix below should at least give the right result. + /* Find sched group of candidate */ + tmp = sd->groups; + do { + if (cpumask_test_cpu(id, sched_group_cpus(tmp))) + { + sg = tmp; + break; + } + } while (tmp = tmp->next, tmp != sd->groups); + + pack = sg; + tmp = sg->next; Regards, Morten > + > + /* loop the sched groups to find the best one */ > + while (tmp != sg) { > + if (tmp->sgp->power * sg->group_weight < > + sg->sgp->power * tmp->group_weight) > + pack = tmp; > + tmp = tmp->next; > + } > + > + /* we have found a better group */ > + if (pack != sg) > + id = cpumask_first(sched_group_
Re: [RFC 3/6] sched: pack small tasks
On Fri, Nov 02, 2012 at 10:53:47AM +, Santosh Shilimkar wrote: > On Monday 29 October 2012 06:42 PM, Vincent Guittot wrote: > > On 24 October 2012 17:20, Santosh Shilimkar > > wrote: > >> Vincent, > >> > >> Few comments/questions. > >> > >> > >> On Sunday 07 October 2012 01:13 PM, Vincent Guittot wrote: > >>> > >>> During sched_domain creation, we define a pack buddy CPU if available. > >>> > >>> On a system that share the powerline at all level, the buddy is set to -1 > >>> > >>> On a dual clusters / dual cores system which can powergate each core and > >>> cluster independantly, the buddy configuration will be : > >>> | CPU0 | CPU1 | CPU2 | CPU3 | > >>> --- > >>> buddy | CPU0 | CPU0 | CPU0 | CPU2 | > >> > >> ^ > >> Is that a typo ? Should it be CPU2 instead of > >> CPU0 ? > > > > No it's not a typo. > > The system packs at each scheduling level. It starts to pack in > > cluster because each core can power gate independently so CPU1 tries > > to pack its tasks in CPU0 and CPU3 in CPU2. Then, it packs at CPU > > level so CPU2 tries to pack in the cluster of CPU0 and CPU0 packs in > > itself > > > I get it. Though in above example a task may migrate from say > CPU3->CPU2->CPU0 as part of packing. I was just thinking whether > moving such task from say CPU3 to CPU0 might be best instead. To me it seems suboptimal to pack the task twice, but the alternative is not good either. If you try to move the task directly to CPU0 you may miss packing opportunities if CPU0 is already busy, while CPU2 might have enough capacity to take it. It would probably be better to check the business of CPU0 and then back off and try CPU2 if CP0 is busy. This would require a buddy list for each CPU rather just a single buddy and thus might become expensive. > > >> > >>> Small tasks tend to slip out of the periodic load balance. > >>> The best place to choose to migrate them is at their wake up. > >>> > >> I have tried this series since I was looking at some of these packing > >> bits. On Mobile workloads like OSIdle with Screen ON, MP3, gallary, > >> I did see some additional filtering of threads with this series > >> but its not making much difference in power. More on this below. > > > > Can I ask you which configuration you have used ? how many cores and > > cluster ? Can they be power gated independently ? > > > I have been trying with couple of setups. Dual Core ARM machine and > Quad core X86 box with single package thought most of the mobile > workload analysis I was doing on ARM machine. On both setups > CPUs can be gated independently. > > >> > >> > >>> Signed-off-by: Vincent Guittot > >>> --- > >>>kernel/sched/core.c |1 + > >>>kernel/sched/fair.c | 109 > >>> ++ > >>>kernel/sched/sched.h |1 + > >>>3 files changed, 111 insertions(+) > >>> > >>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c > >>> index dab7908..70cadbe 100644 > >>> --- a/kernel/sched/core.c > >>> +++ b/kernel/sched/core.c > >>> @@ -6131,6 +6131,7 @@ cpu_attach_domain(struct sched_domain *sd, struct > >>> root_domain *rd, int cpu) > >>> rcu_assign_pointer(rq->sd, sd); > >>> destroy_sched_domains(tmp, cpu); > >>> > >>> + update_packing_domain(cpu); > >>> update_top_cache_domain(cpu); > >>>} > >>> > >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > >>> index 4f4a4f6..8c9d3ed 100644 > >>> --- a/kernel/sched/fair.c > >>> +++ b/kernel/sched/fair.c > >>> @@ -157,6 +157,63 @@ void sched_init_granularity(void) > >>> update_sysctl(); > >>>} > >>> > >>> + > >>> +/* > >>> + * Save the id of the optimal CPU that should be used to pack small tasks > >>> + * The value -1 is used when no buddy has been found > >>> + */ > >>> +DEFINE_PER_CPU(int, sd_pack_buddy); > >>> + > >>> +/* Look for the best buddy CPU that can be used to pack small tasks > >>> + * We make the assumption that it doesn't wort to pack on CPU that share > >>> the > >> > >> s/wort/worth > > > > yes > > > >> > >>> + * same powerline. We looks for the 1st sched_domain without the > >>> + * SD_SHARE_POWERLINE flag. Then We look for the sched_group witht the > >>> lowest > >>> + * power per core based on the assumption that their power efficiency is > >>> + * better */ > >> > >> Commenting style.. > >> /* > >> * > >> */ > >> > > > > yes > > > >> Can you please expand the why the assumption is right ? > >> "it doesn't wort to pack on CPU that share the same powerline" > > > > By "share the same power-line", I mean that the CPUs can't power off > > independently. So if some CPUs can't power off independently, it's > > worth to try to use most of them to race to idle. > > > In that case I suggest we use different word here. Power line can be > treated as voltage line, power domain. > May be SD_SHARE_CPU_POWERDOMAIN ? > How about just SD_SHARE_POWERDOMAIN ? > >> > >> Think about a scenario
Re: Fix for HMP scheduler crash [ Re: [GIT PULL]: big LITTLE MP v10]
On Fri, Oct 12, 2012 at 04:33:19PM +0100, Jon Medhurst (Tixy) wrote: > On Fri, 2012-10-12 at 16:11 +0100, Morten Rasmussen wrote: > > Hi Tixy, > > > > Thanks for the patch. I think this patch is the right way to solve this > > issue. > > > > There is still a problem with the priority filter in > > hmp_down_migration() which Viresh pointed out earlier. There is no > > checking of whether the task is actually allowed to run on any of the > > slower cpus. Solving that would actually also fix the issue that you are > > observing as a side effect. I have attached a patch. > > The patch looks reasonable. I've just run it on TC2 and A9 with the > addition of a "pr_err("$");" before the "return 1;" and can see the > occosional '$' on TC2 and none on A9, as we would expect. So I guess > that counts as: > > Reviewed-by: Jon Medhurst > Tested-by: Jon Medhurst > Thanks for reviewing and testing. My comments to your patch in the previous reply would count as: Reviewed-by: Morten Rasmussen I have only tested it on TC2. Morten > -- > Tixy > > > > I think we should apply both. > > > > Thanks, > > Morten > > > > On Fri, Oct 12, 2012 at 02:33:40PM +0100, Jon Medhurst (Tixy) wrote: > > > On Fri, 2012-10-12 at 14:19 +0100, Jon Medhurst (Tixy) wrote: > > > > The attached patch fixes the immediate problem by avoiding the empty > > > > domain (which is probably a good thing anyway) > > > > > > Oops, my last patch included some extra junk, the one attached to this > > > mail fixes this... > > > > > From 7365076675b851355d48e9b1157e223d7719e3ac Mon Sep 17 00:00:00 2001 > > > From: Jon Medhurst > > > Date: Fri, 12 Oct 2012 13:45:35 +0100 > > > Subject: [PATCH] ARM: sched: Avoid empty 'slow' HMP domain > > > > > > On homogeneous (non-heterogeneous) systems all CPUs will be declared > > > 'fast' and the slow cpu list will be empty. In this situation we need to > > > avoid adding an empty slow HMP domain otherwise the scheduler code will > > > blow up when it attempts to move a task to the slow domain. > > > > > > Signed-off-by: Jon Medhurst > > > --- > > > arch/arm/kernel/topology.c | 10 ++ > > > 1 file changed, 6 insertions(+), 4 deletions(-) > > > > > > diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c > > > index 58dac7a..0b51233 100644 > > > --- a/arch/arm/kernel/topology.c > > > +++ b/arch/arm/kernel/topology.c > > > @@ -396,10 +396,12 @@ void __init arch_get_hmp_domains(struct list_head > > > *hmp_domains_list) > > >* Must be ordered with respect to compute capacity. > > >* Fastest domain at head of list. > > >*/ > > > - domain = (struct hmp_domain *) > > > - kmalloc(sizeof(struct hmp_domain), GFP_KERNEL); > > > - cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask); > > > - list_add(&domain->hmp_domains, hmp_domains_list); > > > + if(!cpumask_empty(&hmp_slow_cpu_mask)) { > > > + domain = (struct hmp_domain *) > > > + kmalloc(sizeof(struct hmp_domain), GFP_KERNEL); > > > + cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask); > > > + list_add(&domain->hmp_domains, hmp_domains_list); > > > + } > > > domain = (struct hmp_domain *) > > > kmalloc(sizeof(struct hmp_domain), GFP_KERNEL); > > > cpumask_copy(&domain->cpus, &hmp_fast_cpu_mask); > > > -- > > > 1.7.10.4 > > > ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Re: Fix for HMP scheduler crash [ Re: [GIT PULL]: big LITTLE MP v10]
Hi Tixy, Thanks for the patch. I think this patch is the right way to solve this issue. There is still a problem with the priority filter in hmp_down_migration() which Viresh pointed out earlier. There is no checking of whether the task is actually allowed to run on any of the slower cpus. Solving that would actually also fix the issue that you are observing as a side effect. I have attached a patch. I think we should apply both. Thanks, Morten On Fri, Oct 12, 2012 at 02:33:40PM +0100, Jon Medhurst (Tixy) wrote: > On Fri, 2012-10-12 at 14:19 +0100, Jon Medhurst (Tixy) wrote: > > The attached patch fixes the immediate problem by avoiding the empty > > domain (which is probably a good thing anyway) > > Oops, my last patch included some extra junk, the one attached to this > mail fixes this... > From 7365076675b851355d48e9b1157e223d7719e3ac Mon Sep 17 00:00:00 2001 > From: Jon Medhurst > Date: Fri, 12 Oct 2012 13:45:35 +0100 > Subject: [PATCH] ARM: sched: Avoid empty 'slow' HMP domain > > On homogeneous (non-heterogeneous) systems all CPUs will be declared > 'fast' and the slow cpu list will be empty. In this situation we need to > avoid adding an empty slow HMP domain otherwise the scheduler code will > blow up when it attempts to move a task to the slow domain. > > Signed-off-by: Jon Medhurst > --- > arch/arm/kernel/topology.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c > index 58dac7a..0b51233 100644 > --- a/arch/arm/kernel/topology.c > +++ b/arch/arm/kernel/topology.c > @@ -396,10 +396,12 @@ void __init arch_get_hmp_domains(struct list_head > *hmp_domains_list) >* Must be ordered with respect to compute capacity. >* Fastest domain at head of list. >*/ > - domain = (struct hmp_domain *) > - kmalloc(sizeof(struct hmp_domain), GFP_KERNEL); > - cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask); > - list_add(&domain->hmp_domains, hmp_domains_list); > + if(!cpumask_empty(&hmp_slow_cpu_mask)) { > + domain = (struct hmp_domain *) > + kmalloc(sizeof(struct hmp_domain), GFP_KERNEL); > + cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask); > + list_add(&domain->hmp_domains, hmp_domains_list); > + } > domain = (struct hmp_domain *) > kmalloc(sizeof(struct hmp_domain), GFP_KERNEL); > cpumask_copy(&domain->cpus, &hmp_fast_cpu_mask); > -- > 1.7.10.4 >From 9f241c37bb7316eeea56e6c93541352cf5c9b8a8 Mon Sep 17 00:00:00 2001 From: Morten Rasmussen Date: Fri, 12 Oct 2012 15:25:02 +0100 Subject: [PATCH] sched: Only down migrate low priority tasks if allowed by affinity mask Adds an extra check intersection of the task affinity mask and the slower hmp_domain cpumask before down migrating low priority tasks. Signed-off-by: Morten Rasmussen --- kernel/sched/fair.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 56cbda1..edcf922 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5562,8 +5562,11 @@ static unsigned int hmp_down_migration(int cpu, struct sched_entity *se) #ifdef CONFIG_SCHED_HMP_PRIO_FILTER /* Filter by task priority */ - if (p->prio >= hmp_up_prio) + if ((p->prio >= hmp_up_prio) && + cpumask_intersects(&hmp_slower_domain(cpu)->cpus, + tsk_cpus_allowed(p))) { return 1; + } #endif /* Let the task load settle before doing another down migration */ -- 1.7.9.5 ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Re: [RFC PATCH 07/10] ARM: sched: Setup SCHED_HMP domains
On Thu, Oct 04, 2012 at 07:58:45AM +0100, Viresh Kumar wrote: > On 22 September 2012 00:02, wrote: > > diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c > > > +void __init arch_get_hmp_domains(struct list_head *hmp_domains_list) > > +{ > > + struct cpumask hmp_fast_cpu_mask; > > + struct cpumask hmp_slow_cpu_mask; > > can be merged to single line. > > > + struct hmp_domain *domain; > > + > > + arch_get_fast_and_slow_cpus(&hmp_fast_cpu_mask, &hmp_slow_cpu_mask); > > + > > + /* > > +* Initialize hmp_domains > > +* Must be ordered with respect to compute capacity. > > +* Fastest domain at head of list. > > +*/ > > + domain = (struct hmp_domain *) > > + kmalloc(sizeof(struct hmp_domain), GFP_KERNEL); > > should be: > > domain = kmalloc(sizeof(*domain), GFP_KERNEL); > > > + cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask); > > what if kmalloc failed? > > > + list_add(&domain->hmp_domains, hmp_domains_list); > > + domain = (struct hmp_domain *) > > + kmalloc(sizeof(struct hmp_domain), GFP_KERNEL); > > would be better to kmalloc only once with size 2* sizeof(*domain) > > > + cpumask_copy(&domain->cpus, &hmp_fast_cpu_mask); > > + list_add(&domain->hmp_domains, hmp_domains_list); > > Also would be better to create a macro for above two lines to remove > code redundancy. > Agree on all of the above. Thanks, Morten ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Re: [RFC PATCH 06/10] ARM: sched: Use device-tree to provide fast/slow CPU list for HMP
Hi Tixy, Could you have a look at my code stealing patch below? Since it is basically a trimmed version of one of your patches I would prefer to put you as author and have your SOB on it. What is your opinion? Thanks, Morten On Fri, Sep 21, 2012 at 07:32:21PM +0100, Morten Rasmussen wrote: > From: Morten Rasmussen > > We can't rely on Kconfig options to set the fast and slow CPU lists for > HMP scheduling if we want a single kernel binary to support multiple > devices with different CPU topology. E.g. TC2 (ARM's Test-Chip-2 > big.LITTLE system), Fast Models, or even non big.LITTLE devices. > > This patch adds the function arch_get_fast_and_slow_cpus() to generate > the lists at run-time by parsing the CPU nodes in device-tree; it > assumes slow cores are A7s and everything else is fast. The function > still supports the old Kconfig options as this is useful for testing the > HMP scheduler on devices without big.LITTLE. > > This patch is reuse of a patch by Jon Medhurst with a > few bits left out. > > Signed-off-by: Morten Rasmussen > --- > arch/arm/Kconfig |4 ++- > arch/arm/kernel/topology.c | 69 > > 2 files changed, 72 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index cb80846..f1271bc 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -1588,13 +1588,15 @@ config HMP_FAST_CPU_MASK > string "HMP scheduler fast CPU mask" > depends on SCHED_HMP > help > - Specify the cpuids of the fast CPUs in the system as a list string, > + Leave empty to use device tree information. > + Specify the cpuids of the fast CPUs in the system as a list string, > e.g. cpuid 0+1 should be specified as 0-1. > > config HMP_SLOW_CPU_MASK > string "HMP scheduler slow CPU mask" > depends on SCHED_HMP > help > + Leave empty to use device tree information. > Specify the cpuids of the slow CPUs in the system as a list string, > e.g. cpuid 0+1 should be specified as 0-1. > > diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c > index 26c12c6..7682e12 100644 > --- a/arch/arm/kernel/topology.c > +++ b/arch/arm/kernel/topology.c > @@ -317,6 +317,75 @@ void store_cpu_topology(unsigned int cpuid) > cpu_topology[cpuid].socket_id, mpidr); > } > > + > +#ifdef CONFIG_SCHED_HMP > + > +static const char * const little_cores[] = { > + "arm,cortex-a7", > + NULL, > +}; > + > +static bool is_little_cpu(struct device_node *cn) > +{ > + const char * const *lc; > + for (lc = little_cores; *lc; lc++) > + if (of_device_is_compatible(cn, *lc)) > + return true; > + return false; > +} > + > +void __init arch_get_fast_and_slow_cpus(struct cpumask *fast, > + struct cpumask *slow) > +{ > + struct device_node *cn = NULL; > + int cpu = 0; > + > + cpumask_clear(fast); > + cpumask_clear(slow); > + > + /* > + * Use the config options if they are given. This helps testing > + * HMP scheduling on systems without a big.LITTLE architecture. > + */ > + if (strlen(CONFIG_HMP_FAST_CPU_MASK) && > strlen(CONFIG_HMP_SLOW_CPU_MASK)) { > + if (cpulist_parse(CONFIG_HMP_FAST_CPU_MASK, fast)) > + WARN(1, "Failed to parse HMP fast cpu mask!\n"); > + if (cpulist_parse(CONFIG_HMP_SLOW_CPU_MASK, slow)) > + WARN(1, "Failed to parse HMP slow cpu mask!\n"); > + return; > + } > + > + /* > + * Else, parse device tree for little cores. > + */ > + while ((cn = of_find_node_by_type(cn, "cpu"))) { > + > + if (cpu >= num_possible_cpus()) > + break; > + > + if (is_little_cpu(cn)) > + cpumask_set_cpu(cpu, slow); > + else > + cpumask_set_cpu(cpu, fast); > + > + cpu++; > + } > + > + if (!cpumask_empty(fast) && !cpumask_empty(slow)) > + return; > + > + /* > + * We didn't find both big and little cores so let's call all cores > + * fast as this will keep the system running, with all cores being > + * treated equal. > + */ > + cpumask_setall(fast); > + cpumask_clear(slow); > +} > + > +#endif /* CONFIG_SCHED_HMP */ > + > + > /* > * init_cpu_topology is called at boot when only one cpu is running > * which prevent simultaneous write access to cpu_topology array > -- > 1.7.9.5 > ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Re: [RFC PATCH 06/10] ARM: sched: Use device-tree to provide fast/slow CPU list for HMP
On Thu, Oct 04, 2012 at 07:49:32AM +0100, Viresh Kumar wrote: > On 22 September 2012 00:02, wrote: > > From: Morten Rasmussen > > > > We can't rely on Kconfig options to set the fast and slow CPU lists for > > HMP scheduling if we want a single kernel binary to support multiple > > devices with different CPU topology. E.g. TC2 (ARM's Test-Chip-2 > > big.LITTLE system), Fast Models, or even non big.LITTLE devices. > > > > This patch adds the function arch_get_fast_and_slow_cpus() to generate > > the lists at run-time by parsing the CPU nodes in device-tree; it > > assumes slow cores are A7s and everything else is fast. The function > > still supports the old Kconfig options as this is useful for testing the > > HMP scheduler on devices without big.LITTLE. > > But this code is handling this case too at the end, with following logic: > > > + cpumask_setall(fast); > > + cpumask_clear(slow); > > Am i missing something? > The HMP setup can be defined using Kconfig or DT. If both fails, it will set all cpus to be fast cpus and effectively disable SCHED_HMP. The Kconfig option is kept to allow testing of alternative HMP setups without having to change the DT or use DT at all which might be handy for non-ARM platforms. I hope that answers you question. > > This patch is reuse of a patch by Jon Medhurst with a > > few bits left out. > > Then probably he must be the author of this commit? Also a SOB is required > from him here. > I don't know what the correct procedure is for this sort of partial patch reuse. Since I didn't know better, I adopted Tixy's own reference style that he used in one of his patches which is an extension of a previous patch by me. I will of course fix it to follow normal procedure if there is one. > > Signed-off-by: Morten Rasmussen > > --- > > arch/arm/Kconfig |4 ++- > > arch/arm/kernel/topology.c | 69 > > > > 2 files changed, 72 insertions(+), 1 deletion(-) > > > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > > index cb80846..f1271bc 100644 > > --- a/arch/arm/Kconfig > > +++ b/arch/arm/Kconfig > > @@ -1588,13 +1588,15 @@ config HMP_FAST_CPU_MASK > > string "HMP scheduler fast CPU mask" > > depends on SCHED_HMP > > help > > - Specify the cpuids of the fast CPUs in the system as a list > > string, > > + Leave empty to use device tree information. > > + Specify the cpuids of the fast CPUs in the system as a list > > string, > > e.g. cpuid 0+1 should be specified as 0-1. > > > > config HMP_SLOW_CPU_MASK > > string "HMP scheduler slow CPU mask" > > depends on SCHED_HMP > > help > > + Leave empty to use device tree information. > > Specify the cpuids of the slow CPUs in the system as a list > > string, > > e.g. cpuid 0+1 should be specified as 0-1. > > > > diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c > > index 26c12c6..7682e12 100644 > > --- a/arch/arm/kernel/topology.c > > +++ b/arch/arm/kernel/topology.c > > @@ -317,6 +317,75 @@ void store_cpu_topology(unsigned int cpuid) > > cpu_topology[cpuid].socket_id, mpidr); > > } > > > > + > > +#ifdef CONFIG_SCHED_HMP > > + > > +static const char * const little_cores[] = { > > + "arm,cortex-a7", > > + NULL, > > +}; > > + > > +static bool is_little_cpu(struct device_node *cn) > > +{ > > + const char * const *lc; > > + for (lc = little_cores; *lc; lc++) > > + if (of_device_is_compatible(cn, *lc)) > > + return true; > > + return false; > > +} > > + > > +void __init arch_get_fast_and_slow_cpus(struct cpumask *fast, > > + struct cpumask *slow) > > +{ > > + struct device_node *cn = NULL; > > + int cpu = 0; > > + > > + cpumask_clear(fast); > > + cpumask_clear(slow); > > + > > + /* > > +* Use the config options if they are given. This helps testing > > +* HMP scheduling on systems without a big.LITTLE architecture. > > +*/ > > + if (strlen(CONFIG_HMP_FAST_CPU_MASK) && > > strlen(CONFIG_HMP_SLOW_CPU_MASK)) { > > + if (cpulist_parse(CONFIG_HMP_FAST_CPU_MASK, fast)) > > +
Re: [RFC PATCH 04/10] sched: Introduce priority-based task migration filter
On Thu, Oct 04, 2012 at 07:27:00AM +0100, Viresh Kumar wrote: > On 22 September 2012 00:02, wrote: > > > +config SCHED_HMP_PRIO_FILTER > > + bool "(EXPERIMENTAL) Filter HMP migrations by task priority" > > + depends on SCHED_HMP > > Should it depend on EXPERIMENTAL? > > > + help > > + Enables task priority based HMP migration filter. Any task with > > + a NICE value above the threshold will always be on low-power cpus > > + with less compute capacity. > > + > > +config SCHED_HMP_PRIO_FILTER_VAL > > + int "NICE priority threshold" > > + default 5 > > + depends on SCHED_HMP_PRIO_FILTER > > + > > config HAVE_ARM_SCU > > bool > > help > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 490f1f0..8f0f3b9 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -3129,9 +3129,12 @@ static int __init hmp_cpu_mask_setup(void) > > * hmp_down_threshold: max. load allowed for tasks migrating to a slower > > cpu > > * The default values (512, 256) offer good responsiveness, but may need > > * tweaking suit particular needs. > > + * > > + * hmp_up_prio: Only up migrate task with high priority ( > */ > > unsigned int hmp_up_threshold = 512; > > unsigned int hmp_down_threshold = 256; > > +unsigned int hmp_up_prio = NICE_TO_PRIO(CONFIG_SCHED_HMP_PRIO_FILTER_VAL); > > > > static unsigned int hmp_up_migration(int cpu, struct sched_entity *se); > > static unsigned int hmp_down_migration(int cpu, struct sched_entity *se); > > @@ -5491,6 +5494,12 @@ static unsigned int hmp_up_migration(int cpu, struct > > sched_entity *se) > > if (hmp_cpu_is_fastest(cpu)) > > return 0; > > > > +#ifdef CONFIG_SCHED_HMP_PRIO_FILTER > > + /* Filter by task priority */ > > + if (p->prio >= hmp_up_prio) > > + return 0; > > +#endif > > + > > if (cpumask_intersects(&hmp_faster_domain(cpu)->cpus, > > tsk_cpus_allowed(p)) > > && se->avg.load_avg_ratio > hmp_up_threshold) { > > @@ -5507,6 +5516,12 @@ static unsigned int hmp_down_migration(int cpu, > > struct sched_entity *se) > > if (hmp_cpu_is_slowest(cpu)) > > return 0; > > > > +#ifdef CONFIG_SCHED_HMP_PRIO_FILTER > > + /* Filter by task priority */ > > + if (p->prio >= hmp_up_prio) > > + return 1; > > +#endif > > Even if below cpumask_intersects() fails? > No. Good catch :) > > if (cpumask_intersects(&hmp_slower_domain(cpu)->cpus, > > tsk_cpus_allowed(p)) > > && se->avg.load_avg_ratio < hmp_down_threshold) { > > -- > viresh > Thanks, Morten ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Re: [RFC PATCH 02/10] sched: Task placement for heterogeneous systems based on task load-tracking
Hi Viresh, On Thu, Oct 04, 2012 at 07:02:03AM +0100, Viresh Kumar wrote: > Hi Morten, > > On 22 September 2012 00:02, wrote: > > From: Morten Rasmussen > > > > This patch introduces the basic SCHED_HMP infrastructure. Each class of > > cpus is represented by a hmp_domain and tasks will only be moved between > > these domains when their load profiles suggest it is beneficial. > > > > SCHED_HMP relies heavily on the task load-tracking introduced in Paul > > Turners fair group scheduling patch set: > > > > <https://lkml.org/lkml/2012/8/23/267> > > > > SCHED_HMP requires that the platform implements arch_get_hmp_domains() > > which should set up the platform specific list of hmp_domains. It is > > also assumed that the platform disables SD_LOAD_BALANCE for the > > appropriate sched_domains. > > An explanation of this requirement would be helpful here. > Yes. This is to prevent the load-balancer from moving tasks between hmp_domains. This will be done exclusively by SCHED_HMP instead to implement a strict task migration policy and avoid changing the load-balancer behaviour. The load-balancer will take care of load-balacing within each hmp_domain. > > Tasks placement takes place every time a task is to be inserted into > > a runqueue based on its load history. The task placement decision is > > based on load thresholds. > > > > There are no restrictions on the number of hmp_domains, however, > > multiple (>2) has not been tested and the up/down migration policy is > > rather simple. > > > > Signed-off-by: Morten Rasmussen > > --- > > arch/arm/Kconfig | 17 + > > include/linux/sched.h |6 ++ > > kernel/sched/fair.c | 168 > > + > > kernel/sched/sched.h |6 ++ > > 4 files changed, 197 insertions(+) > > > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > > index f4a5d58..5b09684 100644 > > --- a/arch/arm/Kconfig > > +++ b/arch/arm/Kconfig > > @@ -1554,6 +1554,23 @@ config SCHED_SMT > > MultiThreading at a cost of slightly increased overhead in some > > places. If unsure say N here. > > > > +config DISABLE_CPU_SCHED_DOMAIN_BALANCE > > + bool "(EXPERIMENTAL) Disable CPU level scheduler load-balancing" > > + help > > + Disables scheduler load-balancing at CPU sched domain level. > > Shouldn't this depend on EXPERIMENTAL? > It should. The ongoing discussion about CONFIG_EXPERIMENTAL that Amit is referring to hasn't come to a conclusion yet. > > +config SCHED_HMP > > + bool "(EXPERIMENTAL) Heterogenous multiprocessor scheduling" > > ditto. > > > + depends on DISABLE_CPU_SCHED_DOMAIN_BALANCE && SCHED_MC && > > FAIR_GROUP_SCHED && !SCHED_AUTOGROUP > > + help > > + Experimental scheduler optimizations for heterogeneous platforms. > > + Attempts to introspectively select task affinity to optimize power > > + and performance. Basic support for multiple (>2) cpu types is in > > place, > > + but it has only been tested with two types of cpus. > > + There is currently no support for migration of task groups, hence > > + !SCHED_AUTOGROUP. Furthermore, normal load-balancing must be > > disabled > > + between cpus of different type (DISABLE_CPU_SCHED_DOMAIN_BALANCE). > > + > > config HAVE_ARM_SCU > > bool > > help > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > index 81e4e82..df971a3 100644 > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -1039,6 +1039,12 @@ unsigned long default_scale_smt_power(struct > > sched_domain *sd, int cpu); > > > > bool cpus_share_cache(int this_cpu, int that_cpu); > > > > +#ifdef CONFIG_SCHED_HMP > > +struct hmp_domain { > > + struct cpumask cpus; > > + struct list_head hmp_domains; > > Probably need a better name here. domain_list? > Yes. hmp_domain_list would be better and stick with the hmp_* naming convention. > > +}; > > +#endif /* CONFIG_SCHED_HMP */ > > #else /* CONFIG_SMP */ > > > > struct sched_domain_attr; > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 3e17dd5..d80de46 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -3077,6 +3077,125 @@ static int select_idle_sibling(struct task_struct > > *p, int target) > > return target; > &
[RFC PATCH 01/10] sched: entity load-tracking load_avg_ratio
From: Morten Rasmussen This patch adds load_avg_ratio to each task. The load_avg_ratio is a variant of load_avg_contrib which is not scaled by the task priority. It is calculated like this: runnable_avg_sum * NICE_0_LOAD / (runnable_avg_period + 1). Signed-off-by: Morten Rasmussen --- include/linux/sched.h |1 + kernel/sched/fair.c |3 +++ 2 files changed, 4 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index 4dc4990..81e4e82 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1151,6 +1151,7 @@ struct sched_avg { u64 last_runnable_update; s64 decay_count; unsigned long load_avg_contrib; + unsigned long load_avg_ratio; u32 usage_avg_sum; }; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 095d86c..3e17dd5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1192,6 +1192,9 @@ static inline void __update_task_entity_contrib(struct sched_entity *se) contrib = se->avg.runnable_avg_sum * scale_load_down(se->load.weight); contrib /= (se->avg.runnable_avg_period + 1); se->avg.load_avg_contrib = scale_load(contrib); + contrib = se->avg.runnable_avg_sum * scale_load_down(NICE_0_LOAD); + contrib /= (se->avg.runnable_avg_period + 1); + se->avg.load_avg_ratio = scale_load(contrib); } /* Compute the current contribution to load_avg by se, return any delta */ -- 1.7.9.5 ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
[RFC PATCH 02/10] sched: Task placement for heterogeneous systems based on task load-tracking
From: Morten Rasmussen This patch introduces the basic SCHED_HMP infrastructure. Each class of cpus is represented by a hmp_domain and tasks will only be moved between these domains when their load profiles suggest it is beneficial. SCHED_HMP relies heavily on the task load-tracking introduced in Paul Turners fair group scheduling patch set: <https://lkml.org/lkml/2012/8/23/267> SCHED_HMP requires that the platform implements arch_get_hmp_domains() which should set up the platform specific list of hmp_domains. It is also assumed that the platform disables SD_LOAD_BALANCE for the appropriate sched_domains. Tasks placement takes place every time a task is to be inserted into a runqueue based on its load history. The task placement decision is based on load thresholds. There are no restrictions on the number of hmp_domains, however, multiple (>2) has not been tested and the up/down migration policy is rather simple. Signed-off-by: Morten Rasmussen --- arch/arm/Kconfig | 17 + include/linux/sched.h |6 ++ kernel/sched/fair.c | 168 + kernel/sched/sched.h |6 ++ 4 files changed, 197 insertions(+) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index f4a5d58..5b09684 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1554,6 +1554,23 @@ config SCHED_SMT MultiThreading at a cost of slightly increased overhead in some places. If unsure say N here. +config DISABLE_CPU_SCHED_DOMAIN_BALANCE + bool "(EXPERIMENTAL) Disable CPU level scheduler load-balancing" + help + Disables scheduler load-balancing at CPU sched domain level. + +config SCHED_HMP + bool "(EXPERIMENTAL) Heterogenous multiprocessor scheduling" + depends on DISABLE_CPU_SCHED_DOMAIN_BALANCE && SCHED_MC && FAIR_GROUP_SCHED && !SCHED_AUTOGROUP + help + Experimental scheduler optimizations for heterogeneous platforms. + Attempts to introspectively select task affinity to optimize power + and performance. Basic support for multiple (>2) cpu types is in place, + but it has only been tested with two types of cpus. + There is currently no support for migration of task groups, hence + !SCHED_AUTOGROUP. Furthermore, normal load-balancing must be disabled + between cpus of different type (DISABLE_CPU_SCHED_DOMAIN_BALANCE). + config HAVE_ARM_SCU bool help diff --git a/include/linux/sched.h b/include/linux/sched.h index 81e4e82..df971a3 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1039,6 +1039,12 @@ unsigned long default_scale_smt_power(struct sched_domain *sd, int cpu); bool cpus_share_cache(int this_cpu, int that_cpu); +#ifdef CONFIG_SCHED_HMP +struct hmp_domain { + struct cpumask cpus; + struct list_head hmp_domains; +}; +#endif /* CONFIG_SCHED_HMP */ #else /* CONFIG_SMP */ struct sched_domain_attr; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3e17dd5..d80de46 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3077,6 +3077,125 @@ static int select_idle_sibling(struct task_struct *p, int target) return target; } +#ifdef CONFIG_SCHED_HMP +/* + * Heterogenous multiprocessor (HMP) optimizations + * + * The cpu types are distinguished using a list of hmp_domains + * which each represent one cpu type using a cpumask. + * The list is assumed ordered by compute capacity with the + * fastest domain first. + */ +DEFINE_PER_CPU(struct hmp_domain *, hmp_cpu_domain); + +extern void __init arch_get_hmp_domains(struct list_head *hmp_domains_list); + +/* Setup hmp_domains */ +static int __init hmp_cpu_mask_setup(void) +{ + char buf[64]; + struct hmp_domain *domain; + struct list_head *pos; + int dc, cpu; + + pr_debug("Initializing HMP scheduler:\n"); + + /* Initialize hmp_domains using platform code */ + arch_get_hmp_domains(&hmp_domains); + if (list_empty(&hmp_domains)) { + pr_debug("HMP domain list is empty!\n"); + return 0; + } + + /* Print hmp_domains */ + dc = 0; + list_for_each(pos, &hmp_domains) { + domain = list_entry(pos, struct hmp_domain, hmp_domains); + cpulist_scnprintf(buf, 64, &domain->cpus); + pr_debug(" HMP domain %d: %s\n", dc, buf); + + for_each_cpu_mask(cpu, domain->cpus) { + per_cpu(hmp_cpu_domain, cpu) = domain; + } + dc++; + } + + return 1; +} + +/* + * Migration thresholds should be in the range [0..1023] + * hmp_up_threshold: min. load required for migrating tasks to a faster cpu + * hmp_down_threshold: max. load allowed for tasks migrating to a slower cpu + * The default values (512, 256) offer good responsiveness, but may need + *
[RFC PATCH 04/10] sched: Introduce priority-based task migration filter
From: Morten Rasmussen Introduces a priority threshold which prevents low priority task from migrating to faster hmp_domains (cpus). This is useful for user-space software which assigns lower task priority to background task. Signed-off-by: Morten Rasmussen --- arch/arm/Kconfig| 13 + kernel/sched/fair.c | 15 +++ 2 files changed, 28 insertions(+) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 5b09684..05de193 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1571,6 +1571,19 @@ config SCHED_HMP !SCHED_AUTOGROUP. Furthermore, normal load-balancing must be disabled between cpus of different type (DISABLE_CPU_SCHED_DOMAIN_BALANCE). +config SCHED_HMP_PRIO_FILTER + bool "(EXPERIMENTAL) Filter HMP migrations by task priority" + depends on SCHED_HMP + help + Enables task priority based HMP migration filter. Any task with + a NICE value above the threshold will always be on low-power cpus + with less compute capacity. + +config SCHED_HMP_PRIO_FILTER_VAL + int "NICE priority threshold" + default 5 + depends on SCHED_HMP_PRIO_FILTER + config HAVE_ARM_SCU bool help diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 490f1f0..8f0f3b9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3129,9 +3129,12 @@ static int __init hmp_cpu_mask_setup(void) * hmp_down_threshold: max. load allowed for tasks migrating to a slower cpu * The default values (512, 256) offer good responsiveness, but may need * tweaking suit particular needs. + * + * hmp_up_prio: Only up migrate task with high priority (prio >= hmp_up_prio) + return 0; +#endif + if (cpumask_intersects(&hmp_faster_domain(cpu)->cpus, tsk_cpus_allowed(p)) && se->avg.load_avg_ratio > hmp_up_threshold) { @@ -5507,6 +5516,12 @@ static unsigned int hmp_down_migration(int cpu, struct sched_entity *se) if (hmp_cpu_is_slowest(cpu)) return 0; +#ifdef CONFIG_SCHED_HMP_PRIO_FILTER + /* Filter by task priority */ + if (p->prio >= hmp_up_prio) + return 1; +#endif + if (cpumask_intersects(&hmp_slower_domain(cpu)->cpus, tsk_cpus_allowed(p)) && se->avg.load_avg_ratio < hmp_down_threshold) { -- 1.7.9.5 ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
[RFC PATCH 10/10] sched: SCHED_HMP multi-domain task migration control
From: Morten Rasmussen We need a way to prevent tasks that are migrating up and down the hmp_domains from migrating straight on through before the load has adapted to the new compute capacity of the CPU on the new hmp_domain. This patch adds a next up/down migration delay that prevents the task from doing another migration in the same direction until the delay has expired. Signed-off-by: Morten Rasmussen --- include/linux/sched.h |4 kernel/sched/core.c |4 kernel/sched/fair.c | 38 ++ 3 files changed, 46 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index df971a3..ca3890a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1158,6 +1158,10 @@ struct sched_avg { s64 decay_count; unsigned long load_avg_contrib; unsigned long load_avg_ratio; +#ifdef CONFIG_SCHED_HMP + u64 hmp_last_up_migration; + u64 hmp_last_down_migration; +#endif u32 usage_avg_sum; }; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 652b86b..a3b1ff6 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1723,6 +1723,10 @@ static void __sched_fork(struct task_struct *p) #if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED) p->se.avg.runnable_avg_period = 0; p->se.avg.runnable_avg_sum = 0; +#ifdef CONFIG_SCHED_HMP + p->se.avg.hmp_last_up_migration = 0; + p->se.avg.hmp_last_down_migration = 0; +#endif #endif #ifdef CONFIG_SCHEDSTATS memset(&p->se.statistics, 0, sizeof(p->se.statistics)); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 811b2b9..56cbda1 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3138,10 +3138,14 @@ static int __init hmp_cpu_mask_setup(void) * tweaking suit particular needs. * * hmp_up_prio: Only up migrate task with high priority (cfs; + + se->avg.hmp_last_up_migration = cfs_rq_clock_task(cfs_rq); + se->avg.hmp_last_down_migration = 0; +} + +static inline void hmp_next_down_delay(struct sched_entity *se, int cpu) +{ + struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs; + + se->avg.hmp_last_down_migration = cfs_rq_clock_task(cfs_rq); + se->avg.hmp_last_up_migration = 0; +} #endif /* CONFIG_SCHED_HMP */ /* @@ -3335,11 +3354,13 @@ unlock: #ifdef CONFIG_SCHED_HMP if (hmp_up_migration(prev_cpu, &p->se)) { new_cpu = hmp_select_faster_cpu(p, prev_cpu); + hmp_next_up_delay(&p->se, new_cpu); trace_sched_hmp_migrate(p, new_cpu, 0); return new_cpu; } if (hmp_down_migration(prev_cpu, &p->se)) { new_cpu = hmp_select_slower_cpu(p, prev_cpu); + hmp_next_down_delay(&p->se, new_cpu); trace_sched_hmp_migrate(p, new_cpu, 0); return new_cpu; } @@ -5503,6 +5524,8 @@ static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle) { } static unsigned int hmp_up_migration(int cpu, struct sched_entity *se) { struct task_struct *p = task_of(se); + struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs; + u64 now; if (hmp_cpu_is_fastest(cpu)) return 0; @@ -5513,6 +5536,12 @@ static unsigned int hmp_up_migration(int cpu, struct sched_entity *se) return 0; #endif + /* Let the task load settle before doing another up migration */ + now = cfs_rq_clock_task(cfs_rq); + if (((now - se->avg.hmp_last_up_migration) >> 10) + < hmp_next_up_threshold) + return 0; + if (cpumask_intersects(&hmp_faster_domain(cpu)->cpus, tsk_cpus_allowed(p)) && se->avg.load_avg_ratio > hmp_up_threshold) { @@ -5525,6 +5554,8 @@ static unsigned int hmp_up_migration(int cpu, struct sched_entity *se) static unsigned int hmp_down_migration(int cpu, struct sched_entity *se) { struct task_struct *p = task_of(se); + struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs; + u64 now; if (hmp_cpu_is_slowest(cpu)) return 0; @@ -5535,6 +5566,12 @@ static unsigned int hmp_down_migration(int cpu, struct sched_entity *se) return 1; #endif + /* Let the task load settle before doing another down migration */ + now = cfs_rq_clock_task(cfs_rq); + if (((now - se->avg.hmp_last_down_migration) >> 10) + < hmp_next_down_threshold) + return 0; + if (cpumask_intersects(&hmp_slower_domain(cpu)->cpus, tsk_cpus_allowed(p)) && se->avg.load_avg_ratio < hmp_down_threshold) { @@ -5725,6 +5762,7 @@ static void hmp_force_up_migration(int this_cpu)
[RFC PATCH 08/10] sched: Add ftrace events for entity load-tracking
From: Morten Rasmussen Adds ftrace events for key variables related to the entity load-tracking to help debugging scheduler behaviour. Allows tracing of load contribution and runqueue residency ratio for both entities and runqueues as well as entity CPU usage ratio. Signed-off-by: Morten Rasmussen --- include/trace/events/sched.h | 125 ++ kernel/sched/fair.c |7 +++ 2 files changed, 132 insertions(+) diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 5a8671e..847eb76 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -430,6 +430,131 @@ TRACE_EVENT(sched_pi_setprio, __entry->oldprio, __entry->newprio) ); +/* + * Tracepoint for showing tracked load contribution. + */ +TRACE_EVENT(sched_task_load_contrib, + + TP_PROTO(struct task_struct *tsk, unsigned long load_contrib), + + TP_ARGS(tsk, load_contrib), + + TP_STRUCT__entry( + __array(char, comm, TASK_COMM_LEN) + __field(pid_t, pid) + __field(unsigned long, load_contrib) + ), + + TP_fast_assign( + memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN); + __entry->pid= tsk->pid; + __entry->load_contrib = load_contrib; + ), + + TP_printk("comm=%s pid=%d load_contrib=%lu", + __entry->comm, __entry->pid, + __entry->load_contrib) +); + +/* + * Tracepoint for showing tracked task runnable ratio [0..1023]. + */ +TRACE_EVENT(sched_task_runnable_ratio, + + TP_PROTO(struct task_struct *tsk, unsigned long ratio), + + TP_ARGS(tsk, ratio), + + TP_STRUCT__entry( + __array(char, comm, TASK_COMM_LEN) + __field(pid_t, pid) + __field(unsigned long, ratio) + ), + + TP_fast_assign( + memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN); + __entry->pid = tsk->pid; + __entry->ratio = ratio; + ), + + TP_printk("comm=%s pid=%d ratio=%lu", + __entry->comm, __entry->pid, + __entry->ratio) +); + +/* + * Tracepoint for showing tracked rq runnable ratio [0..1023]. + */ +TRACE_EVENT(sched_rq_runnable_ratio, + + TP_PROTO(int cpu, unsigned long ratio), + + TP_ARGS(cpu, ratio), + + TP_STRUCT__entry( + __field(int, cpu) + __field(unsigned long, ratio) + ), + + TP_fast_assign( + __entry->cpu = cpu; + __entry->ratio = ratio; + ), + + TP_printk("cpu=%d ratio=%lu", + __entry->cpu, + __entry->ratio) +); + +/* + * Tracepoint for showing tracked rq runnable load. + */ +TRACE_EVENT(sched_rq_runnable_load, + + TP_PROTO(int cpu, u64 load), + + TP_ARGS(cpu, load), + + TP_STRUCT__entry( + __field(int, cpu) + __field(u64, load) + ), + + TP_fast_assign( + __entry->cpu = cpu; + __entry->load = load; + ), + + TP_printk("cpu=%d load=%llu", + __entry->cpu, + __entry->load) +); + +/* + * Tracepoint for showing tracked task cpu usage ratio [0..1023]. + */ +TRACE_EVENT(sched_task_usage_ratio, + + TP_PROTO(struct task_struct *tsk, unsigned long ratio), + + TP_ARGS(tsk, ratio), + + TP_STRUCT__entry( + __array(char, comm, TASK_COMM_LEN) + __field(pid_t, pid) + __field(unsigned long, ratio) + ), + + TP_fast_assign( + memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN); + __entry->pid = tsk->pid; + __entry->ratio = ratio; + ), + + TP_printk("comm=%s pid=%d ratio=%lu", + __entry->comm, __entry->pid, + __entry->ratio) +); #endif /* _TRACE_SCHED_H */ /* This part must be outside protection */ diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8f0f3b9..0be53be 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1192,9 +1192,11 @@ static inline void __update_task_entity_contrib(struct sched_entity *se) contrib = se->avg.runnable_avg_sum * scale_load_down(se->load.weight); contrib /= (se->avg.runnable_avg_period + 1); se->avg.load_avg_contrib = scale_load(contrib); + trace_sched_task_load_contrib(task_of(se), se->avg.load_avg_contrib); contrib = se->avg.runnable_avg_sum * scale_load_down(NICE_0_LOAD); contrib /= (se->avg.runnable_avg_period + 1); se->avg.load_avg_ratio = scale_load(contrib); + trace_sched_task_runnable_ratio(task_of(se), se->avg.load_avg
[RFC PATCH 00/10] sched: Task placement for heterogeneous MP systems
From: Morten Rasmussen Hi Paul, Paul, Peter, Suresh, linaro-sched-sig, and LKML, As a follow-up on my Linux Plumbers Conference talk about my experiments with scheduling on heterogeneous systems I'm posting a proof-of-concept patch set with my modifications. The intention behind the modifications is to tweak scheduling behaviour to only use fast (and power hungry) cores when it is necessary and also improve performance consistency. Without the modifications it is more or less random where tasks are scheduled and so is the execution time. I'm seeing good improvements on performance consistency for web browsing on Android using Bbench <http://www.gem5.org/Bbench> on the ARM big.LITTLE TC2 chip, which has two fast cores (Cortex-A15) and three power-efficient cores (Cortex-A7). The total execution time numbers below are for Androids SurfaceFlinger process is key for page rendering performance. The average execution time is lower with the patches enabled and the standard deviation is much smaller. Similar improvements can be seen for the Android.Browser and WebViewCoreThread processes. Total execution time statistics based on 50 runs. SurfaceFlinger SMP kernel [s] HMP modifications [s] -- Average 14.617 11.012 St. Dev. 4.577 0.902 10% Pctl.9.343 10.783 90% Pctl. 18.743 11.695 Unfortunately, I cannot share power-efficiency numbers at this stage. This patch set introduces proof-of-concept scheduler modifications which attempt to improve scheduling decisions on heterogeneous multi-processor systems (HMP) such as ARM big.LITTLE systems. The patch set relies on the entity load-tracking re-work patch set by Paul Turner: <https://lkml.org/lkml/2012/8/23/267> The modifications attempt to migrate tasks between cores with different compute capacity depending on the tracked load and priority. The aim is to only use fast cores for tasks which really need the extra performance and thereby improve power consumption by running everything else on the slow cores. The patch introduces hmp_domains to represent the different types of cores that are available on the given platform. Multiple (>2) hmp_domains is supported but not tested. hmp_domains must be set up by platform code and the patch set includes patches for ARM platforms using device-tree. The patches intentionally try to avoid modifying the existing code paths as much as possible. The aim is to experiment with HMP scheduling and get the overall policy right before integrating it properly with the existing load-balancer. Morten Morten Rasmussen (10): sched: entity load-tracking load_avg_ratio sched: Task placement for heterogeneous systems based on task load-tracking sched: Forced task migration on heterogeneous systems sched: Introduce priority-based task migration filter ARM: Add HMP scheduling support for ARM architecture ARM: sched: Use device-tree to provide fast/slow CPU list for HMP ARM: sched: Setup SCHED_HMP domains sched: Add ftrace events for entity load-tracking sched: Add HMP task migration ftrace event sched: SCHED_HMP multi-domain task migration control arch/arm/Kconfig| 46 + arch/arm/include/asm/topology.h | 32 +++ arch/arm/kernel/topology.c | 91 include/linux/sched.h | 11 + include/trace/events/sched.h| 153 ++ kernel/sched/core.c |4 + kernel/sched/fair.c | 434 ++- kernel/sched/sched.h|9 + 8 files changed, 779 insertions(+), 1 deletion(-) -- 1.7.9.5 ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
[RFC PATCH 07/10] ARM: sched: Setup SCHED_HMP domains
From: Morten Rasmussen SCHED_HMP requires the different cpu types to be represented by an ordered list of hmp_domains. Each hmp_domain represents all cpus of a particular type using a cpumask. The list is platform specific and therefore must be generated by platform code by implementing arch_get_hmp_domains(). Signed-off-by: Morten Rasmussen --- arch/arm/kernel/topology.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c index 7682e12..ec8ad5c 100644 --- a/arch/arm/kernel/topology.c +++ b/arch/arm/kernel/topology.c @@ -383,6 +383,28 @@ void __init arch_get_fast_and_slow_cpus(struct cpumask *fast, cpumask_clear(slow); } +void __init arch_get_hmp_domains(struct list_head *hmp_domains_list) +{ + struct cpumask hmp_fast_cpu_mask; + struct cpumask hmp_slow_cpu_mask; + struct hmp_domain *domain; + + arch_get_fast_and_slow_cpus(&hmp_fast_cpu_mask, &hmp_slow_cpu_mask); + + /* +* Initialize hmp_domains +* Must be ordered with respect to compute capacity. +* Fastest domain at head of list. +*/ + domain = (struct hmp_domain *) + kmalloc(sizeof(struct hmp_domain), GFP_KERNEL); + cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask); + list_add(&domain->hmp_domains, hmp_domains_list); + domain = (struct hmp_domain *) + kmalloc(sizeof(struct hmp_domain), GFP_KERNEL); + cpumask_copy(&domain->cpus, &hmp_fast_cpu_mask); + list_add(&domain->hmp_domains, hmp_domains_list); +} #endif /* CONFIG_SCHED_HMP */ -- 1.7.9.5 ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
[RFC PATCH 06/10] ARM: sched: Use device-tree to provide fast/slow CPU list for HMP
From: Morten Rasmussen We can't rely on Kconfig options to set the fast and slow CPU lists for HMP scheduling if we want a single kernel binary to support multiple devices with different CPU topology. E.g. TC2 (ARM's Test-Chip-2 big.LITTLE system), Fast Models, or even non big.LITTLE devices. This patch adds the function arch_get_fast_and_slow_cpus() to generate the lists at run-time by parsing the CPU nodes in device-tree; it assumes slow cores are A7s and everything else is fast. The function still supports the old Kconfig options as this is useful for testing the HMP scheduler on devices without big.LITTLE. This patch is reuse of a patch by Jon Medhurst with a few bits left out. Signed-off-by: Morten Rasmussen --- arch/arm/Kconfig |4 ++- arch/arm/kernel/topology.c | 69 2 files changed, 72 insertions(+), 1 deletion(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index cb80846..f1271bc 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1588,13 +1588,15 @@ config HMP_FAST_CPU_MASK string "HMP scheduler fast CPU mask" depends on SCHED_HMP help - Specify the cpuids of the fast CPUs in the system as a list string, + Leave empty to use device tree information. + Specify the cpuids of the fast CPUs in the system as a list string, e.g. cpuid 0+1 should be specified as 0-1. config HMP_SLOW_CPU_MASK string "HMP scheduler slow CPU mask" depends on SCHED_HMP help + Leave empty to use device tree information. Specify the cpuids of the slow CPUs in the system as a list string, e.g. cpuid 0+1 should be specified as 0-1. diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c index 26c12c6..7682e12 100644 --- a/arch/arm/kernel/topology.c +++ b/arch/arm/kernel/topology.c @@ -317,6 +317,75 @@ void store_cpu_topology(unsigned int cpuid) cpu_topology[cpuid].socket_id, mpidr); } + +#ifdef CONFIG_SCHED_HMP + +static const char * const little_cores[] = { + "arm,cortex-a7", + NULL, +}; + +static bool is_little_cpu(struct device_node *cn) +{ + const char * const *lc; + for (lc = little_cores; *lc; lc++) + if (of_device_is_compatible(cn, *lc)) + return true; + return false; +} + +void __init arch_get_fast_and_slow_cpus(struct cpumask *fast, + struct cpumask *slow) +{ + struct device_node *cn = NULL; + int cpu = 0; + + cpumask_clear(fast); + cpumask_clear(slow); + + /* +* Use the config options if they are given. This helps testing +* HMP scheduling on systems without a big.LITTLE architecture. +*/ + if (strlen(CONFIG_HMP_FAST_CPU_MASK) && strlen(CONFIG_HMP_SLOW_CPU_MASK)) { + if (cpulist_parse(CONFIG_HMP_FAST_CPU_MASK, fast)) + WARN(1, "Failed to parse HMP fast cpu mask!\n"); + if (cpulist_parse(CONFIG_HMP_SLOW_CPU_MASK, slow)) + WARN(1, "Failed to parse HMP slow cpu mask!\n"); + return; + } + + /* +* Else, parse device tree for little cores. +*/ + while ((cn = of_find_node_by_type(cn, "cpu"))) { + + if (cpu >= num_possible_cpus()) + break; + + if (is_little_cpu(cn)) + cpumask_set_cpu(cpu, slow); + else + cpumask_set_cpu(cpu, fast); + + cpu++; + } + + if (!cpumask_empty(fast) && !cpumask_empty(slow)) + return; + + /* +* We didn't find both big and little cores so let's call all cores +* fast as this will keep the system running, with all cores being +* treated equal. +*/ + cpumask_setall(fast); + cpumask_clear(slow); +} + +#endif /* CONFIG_SCHED_HMP */ + + /* * init_cpu_topology is called at boot when only one cpu is running * which prevent simultaneous write access to cpu_topology array -- 1.7.9.5 ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
[RFC PATCH 03/10] sched: Forced task migration on heterogeneous systems
From: Morten Rasmussen This patch introduces forced task migration for moving suitable currently running tasks between hmp_domains. Task behaviour is likely to change over time. Tasks running in a less capable hmp_domain may change to become more demanding and should therefore be migrated up. They are unlikely go through the select_task_rq_fair() path anytime soon and therefore need special attention. This patch introduces a period check (SCHED_TICK) of the currently running task on all runqueues and sets up a forced migration using stop_machine_no_wait() if the task needs to be migrated. Ideally, this should not be implemented by polling all runqueues. Signed-off-by: Morten Rasmussen --- kernel/sched/fair.c | 196 +- kernel/sched/sched.h |3 + 2 files changed, 198 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d80de46..490f1f0 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3744,7 +3744,6 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) * 1) task is cache cold, or * 2) too many balance attempts have failed. */ - tsk_cache_hot = task_hot(p, env->src_rq->clock_task, env->sd); if (!tsk_cache_hot || env->sd->nr_balance_failed > env->sd->cache_nice_tries) { @@ -5516,6 +5515,199 @@ static unsigned int hmp_down_migration(int cpu, struct sched_entity *se) return 0; } +/* + * hmp_can_migrate_task - may task p from runqueue rq be migrated to this_cpu? + * Ideally this function should be merged with can_migrate_task() to avoid + * redundant code. + */ +static int hmp_can_migrate_task(struct task_struct *p, struct lb_env *env) +{ + int tsk_cache_hot = 0; + + /* +* We do not migrate tasks that are: +* 1) running (obviously), or +* 2) cannot be migrated to this CPU due to cpus_allowed +*/ + if (!cpumask_test_cpu(env->dst_cpu, tsk_cpus_allowed(p))) { + schedstat_inc(p, se.statistics.nr_failed_migrations_affine); + return 0; + } + env->flags &= ~LBF_ALL_PINNED; + + if (task_running(env->src_rq, p)) { + schedstat_inc(p, se.statistics.nr_failed_migrations_running); + return 0; + } + + /* +* Aggressive migration if: +* 1) task is cache cold, or +* 2) too many balance attempts have failed. +*/ + + tsk_cache_hot = task_hot(p, env->src_rq->clock_task, env->sd); + if (!tsk_cache_hot || + env->sd->nr_balance_failed > env->sd->cache_nice_tries) { +#ifdef CONFIG_SCHEDSTATS + if (tsk_cache_hot) { + schedstat_inc(env->sd, lb_hot_gained[env->idle]); + schedstat_inc(p, se.statistics.nr_forced_migrations); + } +#endif + return 1; + } + + return 1; +} + +/* + * move_specific_task tries to move a specific task. + * Returns 1 if successful and 0 otherwise. + * Called with both runqueues locked. + */ +static int move_specific_task(struct lb_env *env, struct task_struct *pm) +{ + struct task_struct *p, *n; + + list_for_each_entry_safe(p, n, &env->src_rq->cfs_tasks, se.group_node) { + if (throttled_lb_pair(task_group(p), env->src_rq->cpu, + env->dst_cpu)) + continue; + + if (!hmp_can_migrate_task(p, env)) + continue; + /* Check if we found the right task */ + if (p != pm) + continue; + + move_task(p, env); + /* +* Right now, this is only the third place move_task() +* is called, so we can safely collect move_task() +* stats here rather than inside move_task(). +*/ + schedstat_inc(env->sd, lb_gained[env->idle]); + return 1; + } + return 0; +} + +/* + * hmp_active_task_migration_cpu_stop is run by cpu stopper and used to + * migrate a specific task from one runqueue to another. + * hmp_force_up_migration uses this to push a currently running task + * off a runqueue. + * Based on active_load_balance_stop_cpu and can potentially be merged. + */ +static int hmp_active_task_migration_cpu_stop(void *data) +{ + struct rq *busiest_rq = data; + struct task_struct *p = busiest_rq->migrate_task; + int busiest_cpu = cpu_of(busiest_rq); + int target_cpu = busiest_rq->push_cpu; + struct rq *target_rq = cpu_rq(target_cpu); + struct sched_domain *sd; + + raw_spin_lock_irq(&busiest_rq->lock); + /* make sure the requested cpu hasn't gone down in the meantime */ + if (unlikely(busiest_cpu != smp_processor_id() || + !
[RFC PATCH 09/10] sched: Add HMP task migration ftrace event
From: Morten Rasmussen Adds ftrace event for tracing task migrations using HMP optimized scheduling. Signed-off-by: Morten Rasmussen --- include/trace/events/sched.h | 28 kernel/sched/fair.c | 15 +++ 2 files changed, 39 insertions(+), 4 deletions(-) diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 847eb76..501aa32 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -555,6 +555,34 @@ TRACE_EVENT(sched_task_usage_ratio, __entry->comm, __entry->pid, __entry->ratio) ); + +/* + * Tracepoint for HMP (CONFIG_SCHED_HMP) task migrations. + */ +TRACE_EVENT(sched_hmp_migrate, + + TP_PROTO(struct task_struct *tsk, int dest, int force), + + TP_ARGS(tsk, dest, force), + + TP_STRUCT__entry( + __array(char, comm, TASK_COMM_LEN) + __field(pid_t, pid) + __field(int, dest) + __field(int, force) + ), + + TP_fast_assign( + memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN); + __entry->pid = tsk->pid; + __entry->dest = dest; + __entry->force = force; + ), + + TP_printk("comm=%s pid=%d dest=%d force=%d", + __entry->comm, __entry->pid, + __entry->dest, __entry->force) +); #endif /* _TRACE_SCHED_H */ /* This part must be outside protection */ diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0be53be..811b2b9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -,10 +,16 @@ unlock: rcu_read_unlock(); #ifdef CONFIG_SCHED_HMP - if (hmp_up_migration(prev_cpu, &p->se)) - return hmp_select_faster_cpu(p, prev_cpu); - if (hmp_down_migration(prev_cpu, &p->se)) - return hmp_select_slower_cpu(p, prev_cpu); + if (hmp_up_migration(prev_cpu, &p->se)) { + new_cpu = hmp_select_faster_cpu(p, prev_cpu); + trace_sched_hmp_migrate(p, new_cpu, 0); + return new_cpu; + } + if (hmp_down_migration(prev_cpu, &p->se)) { + new_cpu = hmp_select_slower_cpu(p, prev_cpu); + trace_sched_hmp_migrate(p, new_cpu, 0); + return new_cpu; + } /* Make sure that the task stays in its previous hmp domain */ if (!cpumask_test_cpu(new_cpu, &hmp_cpu_domain(prev_cpu)->cpus)) return prev_cpu; @@ -5718,6 +5724,7 @@ static void hmp_force_up_migration(int this_cpu) target->push_cpu = hmp_select_faster_cpu(p, cpu); target->migrate_task = p; force = 1; + trace_sched_hmp_migrate(p, target->push_cpu, 1); } } raw_spin_unlock_irqrestore(&target->lock, flags); -- 1.7.9.5 ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
[RFC PATCH 05/10] ARM: Add HMP scheduling support for ARM architecture
From: Morten Rasmussen Adds Kconfig entries to enable HMP scheduling on ARM platforms. Currently, it disables CPU level sched_domain load-balacing in order to simplify things. This needs fixing in a later revision. HMP scheduling will do the load-balancing at this level instead. Signed-off-by: Morten Rasmussen --- arch/arm/Kconfig| 14 ++ arch/arm/include/asm/topology.h | 32 2 files changed, 46 insertions(+) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 05de193..cb80846 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1584,6 +1584,20 @@ config SCHED_HMP_PRIO_FILTER_VAL default 5 depends on SCHED_HMP_PRIO_FILTER +config HMP_FAST_CPU_MASK + string "HMP scheduler fast CPU mask" + depends on SCHED_HMP + help + Specify the cpuids of the fast CPUs in the system as a list string, + e.g. cpuid 0+1 should be specified as 0-1. + +config HMP_SLOW_CPU_MASK + string "HMP scheduler slow CPU mask" + depends on SCHED_HMP + help + Specify the cpuids of the slow CPUs in the system as a list string, + e.g. cpuid 0+1 should be specified as 0-1. + config HAVE_ARM_SCU bool help diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h index 58b8b84..13a03de 100644 --- a/arch/arm/include/asm/topology.h +++ b/arch/arm/include/asm/topology.h @@ -27,6 +27,38 @@ void init_cpu_topology(void); void store_cpu_topology(unsigned int cpuid); const struct cpumask *cpu_coregroup_mask(int cpu); +#ifdef CONFIG_DISABLE_CPU_SCHED_DOMAIN_BALANCE +/* Common values for CPUs */ +#ifndef SD_CPU_INIT +#define SD_CPU_INIT (struct sched_domain) {\ + .min_interval = 1,\ + .max_interval = 4,\ + .busy_factor= 64, \ + .imbalance_pct = 125, \ + .cache_nice_tries = 1,\ + .busy_idx = 2,\ + .idle_idx = 1,\ + .newidle_idx= 0,\ + .wake_idx = 0,\ + .forkexec_idx = 0,\ + \ + .flags = 0*SD_LOAD_BALANCE \ + | 1*SD_BALANCE_NEWIDLE \ + | 1*SD_BALANCE_EXEC \ + | 1*SD_BALANCE_FORK \ + | 0*SD_BALANCE_WAKE \ + | 1*SD_WAKE_AFFINE \ + | 0*SD_PREFER_LOCAL \ + | 0*SD_SHARE_CPUPOWER \ + | 0*SD_SHARE_PKG_RESOURCES \ + | 0*SD_SERIALIZE\ + , \ + .last_balance= jiffies, \ + .balance_interval = 1,\ +} +#endif +#endif /* CONFIG_DISABLE_CPU_SCHED_DOMAIN_BALANCE */ + #else static inline void init_cpu_topology(void) { } -- 1.7.9.5 ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Re: [GIT PULL] bit-LITTLE-MP-v7 - IMPORTANT
Hi Viresh, On Mon, Sep 03, 2012 at 06:21:26AM +0100, Viresh Kumar wrote: > On 28 August 2012 10:37, Viresh Kumar wrote: > > I have updated > > > > https://wiki.linaro.org/WorkingGroups/PowerManagement/Process/bigLittleMPTree > > > > as per our last discussion. Please see if i have missed something. > > Hi Guys, > > I will be sending PULL request of big-LITTLE-MP-v7 today as per schedule. > Do let me know if you want anything to be included in it before that. > > @Morten: What should i do with patch reported by Santosh: > > ARM-Add-HMP-scheduling-support-for-ARM-architecture > > Do i need to apply it over your branch? The patch is already in the original patch set, so I'm not sure why it is missing. http://linux-arm.org/git?p=arm-bls.git;a=commit;h=1416200dd62551aa9ac4aa207b0c66651ccbff2c It needs to be there for the HMP scheduling to work. Regards, Morten ___ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev