Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
On 01/16/2013 01:43 PM, Alex Shi wrote: > - /* while loop will break here if sd == NULL */ >>> >>> I agree that this should be a major optimization. I just can't figure >>> out why the existing recursive search for an idle cpu switches to the >>> new cpu near the end and then starts a search for an idle cpu in the new >>> cpu's domain. Is this to handle some exotic sched domain configurations? >>> If so, they probably wouldn't work with your optimizations. >> >> Let me explain my understanding of why the recursive search is the way >> it is. >> >> _ sd0 >> | | >> | ___sd1__ ___sd2__| >> | || || | >> | | sgx| | sga | | >> | | sgy| | sgb | | >> | || || | >> |_| >> >> What the current recursive search is doing is (assuming we start with >> sd0-the top level sched domain whose flags are rightly set). we find >> that sd1 is the idlest group,and a cpux1 in sgx is the idlest cpu. >> >> We could have ideally stopped the search here.But the problem with this >> is that there is a possibility that sgx is more loaded than sgy; meaning >> the cpus in sgx are heavily imbalanced;say there are two cpus cpux1 and >> cpux2 in sgx,where cpux2 is heavily loaded and cpux1 has recently gotten >> idle and load balancing has not come to its rescue yet.According to the >> search above, cpux1 is idle,but is *not the right candidate for >> scheduling forked task,it is the right candidate for relieving the load >> from cpux2* due to cache locality etc. > > The problem still exists on the current code. It still goes to cpux1. > and then goes up to sgx to seek idlest group ... idlest cpu, and back to > cpux1 again. nothing help. > > to resolve the problem, I has tried to walk domains from top down. but testing show aim9/hackbench performance is not good on our SNB EP. and no change on other platforms. --- @@ -3351,51 +3363,33 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) - while (sd) { + for_each_lower_domain(sd) { int load_idx = sd->forkexec_idx; - struct sched_group *group; - int weight; - - if (!(sd->flags & sd_flag)) { - sd = sd->child; - continue; - } + int local = 0; if (sd_flag & SD_BALANCE_WAKE) load_idx = sd->wake_idx; - group = find_idlest_group(sd, p, cpu, load_idx); - if (!group) { - sd = sd->child; - continue; - } - - new_cpu = find_idlest_cpu(group, p, cpu); - if (new_cpu == -1 || new_cpu == cpu) { - /* Now try balancing at a lower domain level of cpu */ - sd = sd->child; + group = find_idlest_group(sd, p, cpu, load_idx, ); + if (local) continue; - } + if (!group) + goto unlock; - /* Now try balancing at a lower domain level of new_cpu */ - cpu = new_cpu; - weight = sd->span_weight; - sd = NULL; - for_each_domain(cpu, tmp) { - if (weight <= tmp->span_weight) - break; - if (tmp->flags & sd_flag) + /* go down from non-local group */ + for_each_domain(group_first_cpu(group), tmp) + if (cpumask_equal(sched_domain_span(tmp), + sched_group_cpus(group))) { sd = tmp; - } - /* while loop will break here if sd == NULL */ + break; + } } + if (group) + new_cpu = find_idlest_cpu(group, p, cpu); unlock: rcu_read_unlock(); -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
- /* while loop will break here if sd == NULL */ >> >> I agree that this should be a major optimization. I just can't figure >> out why the existing recursive search for an idle cpu switches to the >> new cpu near the end and then starts a search for an idle cpu in the new >> cpu's domain. Is this to handle some exotic sched domain configurations? >> If so, they probably wouldn't work with your optimizations. > > Let me explain my understanding of why the recursive search is the way > it is. > > _ sd0 > | | > | ___sd1__ ___sd2__| > | || || | > | | sgx| | sga | | > | | sgy| | sgb | | > | || || | > |_| > > What the current recursive search is doing is (assuming we start with > sd0-the top level sched domain whose flags are rightly set). we find > that sd1 is the idlest group,and a cpux1 in sgx is the idlest cpu. > > We could have ideally stopped the search here.But the problem with this > is that there is a possibility that sgx is more loaded than sgy; meaning > the cpus in sgx are heavily imbalanced;say there are two cpus cpux1 and > cpux2 in sgx,where cpux2 is heavily loaded and cpux1 has recently gotten > idle and load balancing has not come to its rescue yet.According to the > search above, cpux1 is idle,but is *not the right candidate for > scheduling forked task,it is the right candidate for relieving the load > from cpux2* due to cache locality etc. The problem still exists on the current code. It still goes to cpux1. and then goes up to sgx to seek idlest group ... idlest cpu, and back to cpux1 again. nothing help. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
在 2013-01-15二的 10:34 +0800,Alex Shi写道: > On 01/14/2013 05:03 PM, li guang wrote: > >>> > > This corner case may occur after "[PATCH v3 03/22] sched: fix > >>> > > find_idlest_group mess logical" brought in the local sched_group bias, > >>> > > and assume balancing runs on cpux2. > >>> > > ideally, find_idlest_group should find the real idlest(this case: > >>> > > sgy), > >>> > > then, this patch is reasonable. > >>> > > > >> > > >> > Sure. but seems it is a bit hard to go down the idlest group. > >> > > >> > and the old logical is real cost too much, on my 2 socket NHM/SNB > >> > server, hackbench can increase 2~5% performance. and no clean > >> > performance on kbuild/aim7 etc. > > what about remove local group bias? > > > Any theory profit for non local group? Usually, bias toward local group > will has cache locality profit. > but the disadvantage is miss correct balance sometime, if mostly like you said before it will bring in better performance, that's fine, I have no strong statistic to do more profound analysis. -- regards! li guang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
在 2013-01-15二的 10:34 +0800,Alex Shi写道: On 01/14/2013 05:03 PM, li guang wrote: This corner case may occur after [PATCH v3 03/22] sched: fix find_idlest_group mess logical brought in the local sched_group bias, and assume balancing runs on cpux2. ideally, find_idlest_group should find the real idlest(this case: sgy), then, this patch is reasonable. Sure. but seems it is a bit hard to go down the idlest group. and the old logical is real cost too much, on my 2 socket NHM/SNB server, hackbench can increase 2~5% performance. and no clean performance on kbuild/aim7 etc. what about remove local group bias? Any theory profit for non local group? Usually, bias toward local group will has cache locality profit. but the disadvantage is miss correct balance sometime, if mostly like you said before it will bring in better performance, that's fine, I have no strong statistic to do more profound analysis. -- regards! li guang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
- /* while loop will break here if sd == NULL */ I agree that this should be a major optimization. I just can't figure out why the existing recursive search for an idle cpu switches to the new cpu near the end and then starts a search for an idle cpu in the new cpu's domain. Is this to handle some exotic sched domain configurations? If so, they probably wouldn't work with your optimizations. Let me explain my understanding of why the recursive search is the way it is. _ sd0 | | | ___sd1__ ___sd2__| | || || | | | sgx| | sga | | | | sgy| | sgb | | | || || | |_| What the current recursive search is doing is (assuming we start with sd0-the top level sched domain whose flags are rightly set). we find that sd1 is the idlest group,and a cpux1 in sgx is the idlest cpu. We could have ideally stopped the search here.But the problem with this is that there is a possibility that sgx is more loaded than sgy; meaning the cpus in sgx are heavily imbalanced;say there are two cpus cpux1 and cpux2 in sgx,where cpux2 is heavily loaded and cpux1 has recently gotten idle and load balancing has not come to its rescue yet.According to the search above, cpux1 is idle,but is *not the right candidate for scheduling forked task,it is the right candidate for relieving the load from cpux2* due to cache locality etc. The problem still exists on the current code. It still goes to cpux1. and then goes up to sgx to seek idlest group ... idlest cpu, and back to cpux1 again. nothing help. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
On 01/16/2013 01:43 PM, Alex Shi wrote: - /* while loop will break here if sd == NULL */ I agree that this should be a major optimization. I just can't figure out why the existing recursive search for an idle cpu switches to the new cpu near the end and then starts a search for an idle cpu in the new cpu's domain. Is this to handle some exotic sched domain configurations? If so, they probably wouldn't work with your optimizations. Let me explain my understanding of why the recursive search is the way it is. _ sd0 | | | ___sd1__ ___sd2__| | || || | | | sgx| | sga | | | | sgy| | sgb | | | || || | |_| What the current recursive search is doing is (assuming we start with sd0-the top level sched domain whose flags are rightly set). we find that sd1 is the idlest group,and a cpux1 in sgx is the idlest cpu. We could have ideally stopped the search here.But the problem with this is that there is a possibility that sgx is more loaded than sgy; meaning the cpus in sgx are heavily imbalanced;say there are two cpus cpux1 and cpux2 in sgx,where cpux2 is heavily loaded and cpux1 has recently gotten idle and load balancing has not come to its rescue yet.According to the search above, cpux1 is idle,but is *not the right candidate for scheduling forked task,it is the right candidate for relieving the load from cpux2* due to cache locality etc. The problem still exists on the current code. It still goes to cpux1. and then goes up to sgx to seek idlest group ... idlest cpu, and back to cpux1 again. nothing help. to resolve the problem, I has tried to walk domains from top down. but testing show aim9/hackbench performance is not good on our SNB EP. and no change on other platforms. --- @@ -3351,51 +3363,33 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) - while (sd) { + for_each_lower_domain(sd) { int load_idx = sd-forkexec_idx; - struct sched_group *group; - int weight; - - if (!(sd-flags sd_flag)) { - sd = sd-child; - continue; - } + int local = 0; if (sd_flag SD_BALANCE_WAKE) load_idx = sd-wake_idx; - group = find_idlest_group(sd, p, cpu, load_idx); - if (!group) { - sd = sd-child; - continue; - } - - new_cpu = find_idlest_cpu(group, p, cpu); - if (new_cpu == -1 || new_cpu == cpu) { - /* Now try balancing at a lower domain level of cpu */ - sd = sd-child; + group = find_idlest_group(sd, p, cpu, load_idx, local); + if (local) continue; - } + if (!group) + goto unlock; - /* Now try balancing at a lower domain level of new_cpu */ - cpu = new_cpu; - weight = sd-span_weight; - sd = NULL; - for_each_domain(cpu, tmp) { - if (weight = tmp-span_weight) - break; - if (tmp-flags sd_flag) + /* go down from non-local group */ + for_each_domain(group_first_cpu(group), tmp) + if (cpumask_equal(sched_domain_span(tmp), + sched_group_cpus(group))) { sd = tmp; - } - /* while loop will break here if sd == NULL */ + break; + } } + if (group) + new_cpu = find_idlest_cpu(group, p, cpu); unlock: rcu_read_unlock(); -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
On 01/14/2013 05:03 PM, li guang wrote: >>> > > This corner case may occur after "[PATCH v3 03/22] sched: fix >>> > > find_idlest_group mess logical" brought in the local sched_group bias, >>> > > and assume balancing runs on cpux2. >>> > > ideally, find_idlest_group should find the real idlest(this case: sgy), >>> > > then, this patch is reasonable. >>> > > >> > >> > Sure. but seems it is a bit hard to go down the idlest group. >> > >> > and the old logical is real cost too much, on my 2 socket NHM/SNB >> > server, hackbench can increase 2~5% performance. and no clean >> > performance on kbuild/aim7 etc. > what about remove local group bias? Any theory profit for non local group? Usually, bias toward local group will has cache locality profit. -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
在 2013-01-11五的 22:56 +0800,Alex Shi写道: > On 01/11/2013 04:01 PM, li guang wrote: > > 在 2013-01-11五的 10:26 +0530,Preeti U Murthy写道: > >> Hi Morten,Alex > >> > >> On 01/09/2013 11:51 PM, Morten Rasmussen wrote: > >>> On Sat, Jan 05, 2013 at 08:37:34AM +, Alex Shi wrote: > Guess the search cpu from bottom to up in domain tree come from > commit 3dbd5342074a1e sched: multilevel sbe sbf, the purpose is > balancing over tasks on all level domains. > > This balancing cost much if there has many domain/groups in a large > system. And force spreading task among different domains may cause > performance issue due to bad locality. > > If we remove this code, we will get quick fork/exec/wake, plus better > balancing among whole system, that also reduce migrations in future > load balancing. > > This patch increases 10+% performance of hackbench on my 4 sockets > NHM and SNB machines. > > Signed-off-by: Alex Shi > --- > kernel/sched/fair.c | 20 +--- > 1 file changed, 1 insertion(+), 19 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index ecfbf8e..895a3f4 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3364,15 +3364,9 @@ select_task_rq_fair(struct task_struct *p, int > sd_flag, int wake_flags) > goto unlock; > } > > -while (sd) { > +if (sd) { > int load_idx = sd->forkexec_idx; > struct sched_group *group; > -int weight; > - > -if (!(sd->flags & sd_flag)) { > -sd = sd->child; > -continue; > -} > > if (sd_flag & SD_BALANCE_WAKE) > load_idx = sd->wake_idx; > @@ -3382,18 +3376,6 @@ select_task_rq_fair(struct task_struct *p, int > sd_flag, int wake_flags) > goto unlock; > > new_cpu = find_idlest_cpu(group, p, cpu); > - > -/* Now try balancing at a lower domain level of new_cpu > */ > -cpu = new_cpu; > -weight = sd->span_weight; > -sd = NULL; > -for_each_domain(cpu, tmp) { > -if (weight <= tmp->span_weight) > -break; > -if (tmp->flags & sd_flag) > -sd = tmp; > -} > -/* while loop will break here if sd == NULL */ > >>> > >>> I agree that this should be a major optimization. I just can't figure > >>> out why the existing recursive search for an idle cpu switches to the > >>> new cpu near the end and then starts a search for an idle cpu in the new > >>> cpu's domain. Is this to handle some exotic sched domain configurations? > >>> If so, they probably wouldn't work with your optimizations. > >> > >> Let me explain my understanding of why the recursive search is the way > >> it is. > >> > >> _ sd0 > >> | | > >> | ___sd1__ ___sd2__| > >> | || || | > >> | | sgx| | sga | | > >> | | sgy| | sgb | | > >> | || || | > >> |_| > >> > >> What the current recursive search is doing is (assuming we start with > >> sd0-the top level sched domain whose flags are rightly set). we find > >> that sd1 is the idlest group,and a cpux1 in sgx is the idlest cpu. > >> > >> We could have ideally stopped the search here.But the problem with this > >> is that there is a possibility that sgx is more loaded than sgy; meaning > >> the cpus in sgx are heavily imbalanced;say there are two cpus cpux1 and > >> cpux2 in sgx,where cpux2 is heavily loaded and cpux1 has recently gotten > >> idle and load balancing has not come to its rescue yet.According to the > >> search above, cpux1 is idle,but is *not the right candidate for > >> scheduling forked task,it is the right candidate for relieving the load > >> from cpux2* due to cache locality etc. > > > > This corner case may occur after "[PATCH v3 03/22] sched: fix > > find_idlest_group mess logical" brought in the local sched_group bias, > > and assume balancing runs on cpux2. > > ideally, find_idlest_group should find the real idlest(this case: sgy), > > then, this patch is reasonable. > > > > Sure. but seems it is a bit hard to go down the idlest group. > > and the old logical is real cost too much, on my 2 socket NHM/SNB > server, hackbench can increase 2~5% performance. and no clean > performance on kbuild/aim7 etc. what about remove local group bias? -- regards! li guang -- To unsubscribe from this list: send the line
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
On 01/14/2013 04:55 PM, li guang wrote: >> > > >> > - /* while loop will break here if sd == NULL */ > > > I agree that this should be a major optimization. I just can't figure > > > out why the existing recursive search for an idle cpu switches to the > > > new cpu near the end and then starts a search for an idle cpu in the > > > new > > > cpu's domain. Is this to handle some exotic sched domain > > > configurations? > > > If so, they probably wouldn't work with your optimizations. >>> > > >>> > > I did not find odd configuration that asking for old logical. >>> > > >>> > > According to Documentation/scheduler/sched-domains.txt, Maybe never. >>> > > "A domain's span MUST be a superset of it child's span (this restriction >>> > > could be relaxed if the need arises), and a base domain for CPU i MUST >>> > > span at least i." etc. etc. >> > >> > The reason for my suspicion is the SD_OVERLAP flag, which has something >> > to do overlapping sched domains. I haven't looked into what it does or >> > how it works. I'm just wondering if this optimization will affect the >> > use of that flag. > seems it did, SD_OVERLAP will not work after this change, > though this flag is maybe scarcely used. > because, this optimization assume all sched-domains span > is super-set over child domain. > isn't it? Alex. > As my understanding, overlap just said some cpu may appears in 2 or more same level sub domains. If so, this change won't miss cpus. Am I right, Peter? -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
在 2013-01-11五的 10:07 +,Morten Rasmussen写道: > On Fri, Jan 11, 2013 at 02:46:31AM +, Alex Shi wrote: > > On 01/10/2013 02:21 AM, Morten Rasmussen wrote: > > >> new_cpu = find_idlest_cpu(group, p, cpu); > > >> > - > > >> > - /* Now try balancing at a lower domain level of new_cpu > > >> > */ > > >> > - cpu = new_cpu; > > >> > - weight = sd->span_weight; > > >> > - sd = NULL; > > >> > - for_each_domain(cpu, tmp) { > > >> > - if (weight <= tmp->span_weight) > > >> > - break; > > >> > - if (tmp->flags & sd_flag) > > >> > - sd = tmp; > > >> > - } > > >> > - /* while loop will break here if sd == NULL */ > > > I agree that this should be a major optimization. I just can't figure > > > out why the existing recursive search for an idle cpu switches to the > > > new cpu near the end and then starts a search for an idle cpu in the new > > > cpu's domain. Is this to handle some exotic sched domain configurations? > > > If so, they probably wouldn't work with your optimizations. > > > > I did not find odd configuration that asking for old logical. > > > > According to Documentation/scheduler/sched-domains.txt, Maybe never. > > "A domain's span MUST be a superset of it child's span (this restriction > > could be relaxed if the need arises), and a base domain for CPU i MUST > > span at least i." etc. etc. > > The reason for my suspicion is the SD_OVERLAP flag, which has something > to do overlapping sched domains. I haven't looked into what it does or > how it works. I'm just wondering if this optimization will affect the > use of that flag. seems it did, SD_OVERLAP will not work after this change, though this flag is maybe scarcely used. because, this optimization assume all sched-domains span is super-set over child domain. isn't it? Alex. > > Morten > > > > > > > -- > > Thanks Alex > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- regards! li guang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
在 2013-01-11五的 10:07 +,Morten Rasmussen写道: On Fri, Jan 11, 2013 at 02:46:31AM +, Alex Shi wrote: On 01/10/2013 02:21 AM, Morten Rasmussen wrote: new_cpu = find_idlest_cpu(group, p, cpu); - - /* Now try balancing at a lower domain level of new_cpu */ - cpu = new_cpu; - weight = sd-span_weight; - sd = NULL; - for_each_domain(cpu, tmp) { - if (weight = tmp-span_weight) - break; - if (tmp-flags sd_flag) - sd = tmp; - } - /* while loop will break here if sd == NULL */ I agree that this should be a major optimization. I just can't figure out why the existing recursive search for an idle cpu switches to the new cpu near the end and then starts a search for an idle cpu in the new cpu's domain. Is this to handle some exotic sched domain configurations? If so, they probably wouldn't work with your optimizations. I did not find odd configuration that asking for old logical. According to Documentation/scheduler/sched-domains.txt, Maybe never. A domain's span MUST be a superset of it child's span (this restriction could be relaxed if the need arises), and a base domain for CPU i MUST span at least i. etc. etc. The reason for my suspicion is the SD_OVERLAP flag, which has something to do overlapping sched domains. I haven't looked into what it does or how it works. I'm just wondering if this optimization will affect the use of that flag. seems it did, SD_OVERLAP will not work after this change, though this flag is maybe scarcely used. because, this optimization assume all sched-domains span is super-set over child domain. isn't it? Alex. Morten -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- regards! li guang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
On 01/14/2013 04:55 PM, li guang wrote: - /* while loop will break here if sd == NULL */ I agree that this should be a major optimization. I just can't figure out why the existing recursive search for an idle cpu switches to the new cpu near the end and then starts a search for an idle cpu in the new cpu's domain. Is this to handle some exotic sched domain configurations? If so, they probably wouldn't work with your optimizations. I did not find odd configuration that asking for old logical. According to Documentation/scheduler/sched-domains.txt, Maybe never. A domain's span MUST be a superset of it child's span (this restriction could be relaxed if the need arises), and a base domain for CPU i MUST span at least i. etc. etc. The reason for my suspicion is the SD_OVERLAP flag, which has something to do overlapping sched domains. I haven't looked into what it does or how it works. I'm just wondering if this optimization will affect the use of that flag. seems it did, SD_OVERLAP will not work after this change, though this flag is maybe scarcely used. because, this optimization assume all sched-domains span is super-set over child domain. isn't it? Alex. As my understanding, overlap just said some cpu may appears in 2 or more same level sub domains. If so, this change won't miss cpus. Am I right, Peter? -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
在 2013-01-11五的 22:56 +0800,Alex Shi写道: On 01/11/2013 04:01 PM, li guang wrote: 在 2013-01-11五的 10:26 +0530,Preeti U Murthy写道: Hi Morten,Alex On 01/09/2013 11:51 PM, Morten Rasmussen wrote: On Sat, Jan 05, 2013 at 08:37:34AM +, Alex Shi wrote: Guess the search cpu from bottom to up in domain tree come from commit 3dbd5342074a1e sched: multilevel sbe sbf, the purpose is balancing over tasks on all level domains. This balancing cost much if there has many domain/groups in a large system. And force spreading task among different domains may cause performance issue due to bad locality. If we remove this code, we will get quick fork/exec/wake, plus better balancing among whole system, that also reduce migrations in future load balancing. This patch increases 10+% performance of hackbench on my 4 sockets NHM and SNB machines. Signed-off-by: Alex Shi alex@intel.com --- kernel/sched/fair.c | 20 +--- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ecfbf8e..895a3f4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3364,15 +3364,9 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; } -while (sd) { +if (sd) { int load_idx = sd-forkexec_idx; struct sched_group *group; -int weight; - -if (!(sd-flags sd_flag)) { -sd = sd-child; -continue; -} if (sd_flag SD_BALANCE_WAKE) load_idx = sd-wake_idx; @@ -3382,18 +3376,6 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; new_cpu = find_idlest_cpu(group, p, cpu); - -/* Now try balancing at a lower domain level of new_cpu */ -cpu = new_cpu; -weight = sd-span_weight; -sd = NULL; -for_each_domain(cpu, tmp) { -if (weight = tmp-span_weight) -break; -if (tmp-flags sd_flag) -sd = tmp; -} -/* while loop will break here if sd == NULL */ I agree that this should be a major optimization. I just can't figure out why the existing recursive search for an idle cpu switches to the new cpu near the end and then starts a search for an idle cpu in the new cpu's domain. Is this to handle some exotic sched domain configurations? If so, they probably wouldn't work with your optimizations. Let me explain my understanding of why the recursive search is the way it is. _ sd0 | | | ___sd1__ ___sd2__| | || || | | | sgx| | sga | | | | sgy| | sgb | | | || || | |_| What the current recursive search is doing is (assuming we start with sd0-the top level sched domain whose flags are rightly set). we find that sd1 is the idlest group,and a cpux1 in sgx is the idlest cpu. We could have ideally stopped the search here.But the problem with this is that there is a possibility that sgx is more loaded than sgy; meaning the cpus in sgx are heavily imbalanced;say there are two cpus cpux1 and cpux2 in sgx,where cpux2 is heavily loaded and cpux1 has recently gotten idle and load balancing has not come to its rescue yet.According to the search above, cpux1 is idle,but is *not the right candidate for scheduling forked task,it is the right candidate for relieving the load from cpux2* due to cache locality etc. This corner case may occur after [PATCH v3 03/22] sched: fix find_idlest_group mess logical brought in the local sched_group bias, and assume balancing runs on cpux2. ideally, find_idlest_group should find the real idlest(this case: sgy), then, this patch is reasonable. Sure. but seems it is a bit hard to go down the idlest group. and the old logical is real cost too much, on my 2 socket NHM/SNB server, hackbench can increase 2~5% performance. and no clean performance on kbuild/aim7 etc. what about remove local group bias? -- regards! li guang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
On 01/14/2013 05:03 PM, li guang wrote: This corner case may occur after [PATCH v3 03/22] sched: fix find_idlest_group mess logical brought in the local sched_group bias, and assume balancing runs on cpux2. ideally, find_idlest_group should find the real idlest(this case: sgy), then, this patch is reasonable. Sure. but seems it is a bit hard to go down the idlest group. and the old logical is real cost too much, on my 2 socket NHM/SNB server, hackbench can increase 2~5% performance. and no clean performance on kbuild/aim7 etc. what about remove local group bias? Any theory profit for non local group? Usually, bias toward local group will has cache locality profit. -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
On 01/11/2013 04:01 PM, li guang wrote: > 在 2013-01-11五的 10:26 +0530,Preeti U Murthy写道: >> Hi Morten,Alex >> >> On 01/09/2013 11:51 PM, Morten Rasmussen wrote: >>> On Sat, Jan 05, 2013 at 08:37:34AM +, Alex Shi wrote: Guess the search cpu from bottom to up in domain tree come from commit 3dbd5342074a1e sched: multilevel sbe sbf, the purpose is balancing over tasks on all level domains. This balancing cost much if there has many domain/groups in a large system. And force spreading task among different domains may cause performance issue due to bad locality. If we remove this code, we will get quick fork/exec/wake, plus better balancing among whole system, that also reduce migrations in future load balancing. This patch increases 10+% performance of hackbench on my 4 sockets NHM and SNB machines. Signed-off-by: Alex Shi --- kernel/sched/fair.c | 20 +--- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ecfbf8e..895a3f4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3364,15 +3364,9 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; } - while (sd) { + if (sd) { int load_idx = sd->forkexec_idx; struct sched_group *group; - int weight; - - if (!(sd->flags & sd_flag)) { - sd = sd->child; - continue; - } if (sd_flag & SD_BALANCE_WAKE) load_idx = sd->wake_idx; @@ -3382,18 +3376,6 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; new_cpu = find_idlest_cpu(group, p, cpu); - - /* Now try balancing at a lower domain level of new_cpu */ - cpu = new_cpu; - weight = sd->span_weight; - sd = NULL; - for_each_domain(cpu, tmp) { - if (weight <= tmp->span_weight) - break; - if (tmp->flags & sd_flag) - sd = tmp; - } - /* while loop will break here if sd == NULL */ >>> >>> I agree that this should be a major optimization. I just can't figure >>> out why the existing recursive search for an idle cpu switches to the >>> new cpu near the end and then starts a search for an idle cpu in the new >>> cpu's domain. Is this to handle some exotic sched domain configurations? >>> If so, they probably wouldn't work with your optimizations. >> >> Let me explain my understanding of why the recursive search is the way >> it is. >> >> _ sd0 >> | | >> | ___sd1__ ___sd2__| >> | || || | >> | | sgx| | sga | | >> | | sgy| | sgb | | >> | || || | >> |_| >> >> What the current recursive search is doing is (assuming we start with >> sd0-the top level sched domain whose flags are rightly set). we find >> that sd1 is the idlest group,and a cpux1 in sgx is the idlest cpu. >> >> We could have ideally stopped the search here.But the problem with this >> is that there is a possibility that sgx is more loaded than sgy; meaning >> the cpus in sgx are heavily imbalanced;say there are two cpus cpux1 and >> cpux2 in sgx,where cpux2 is heavily loaded and cpux1 has recently gotten >> idle and load balancing has not come to its rescue yet.According to the >> search above, cpux1 is idle,but is *not the right candidate for >> scheduling forked task,it is the right candidate for relieving the load >> from cpux2* due to cache locality etc. > > This corner case may occur after "[PATCH v3 03/22] sched: fix > find_idlest_group mess logical" brought in the local sched_group bias, > and assume balancing runs on cpux2. > ideally, find_idlest_group should find the real idlest(this case: sgy), > then, this patch is reasonable. > Sure. but seems it is a bit hard to go down the idlest group. and the old logical is real cost too much, on my 2 socket NHM/SNB server, hackbench can increase 2~5% performance. and no clean performance on kbuild/aim7 etc. -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
On 01/11/2013 06:07 PM, Morten Rasmussen wrote: > On Fri, Jan 11, 2013 at 02:46:31AM +, Alex Shi wrote: >> On 01/10/2013 02:21 AM, Morten Rasmussen wrote: new_cpu = find_idlest_cpu(group, p, cpu); > - > - /* Now try balancing at a lower domain level of new_cpu */ > - cpu = new_cpu; > - weight = sd->span_weight; > - sd = NULL; > - for_each_domain(cpu, tmp) { > - if (weight <= tmp->span_weight) > - break; > - if (tmp->flags & sd_flag) > - sd = tmp; > - } > - /* while loop will break here if sd == NULL */ >>> I agree that this should be a major optimization. I just can't figure >>> out why the existing recursive search for an idle cpu switches to the >>> new cpu near the end and then starts a search for an idle cpu in the new >>> cpu's domain. Is this to handle some exotic sched domain configurations? >>> If so, they probably wouldn't work with your optimizations. >> >> I did not find odd configuration that asking for old logical. >> >> According to Documentation/scheduler/sched-domains.txt, Maybe never. >> "A domain's span MUST be a superset of it child's span (this restriction >> could be relaxed if the need arises), and a base domain for CPU i MUST >> span at least i." etc. etc. > > The reason for my suspicion is the SD_OVERLAP flag, which has something > to do overlapping sched domains. I haven't looked into what it does or > how it works. I'm just wondering if this optimization will affect the > use of that flag. I didn't know any machine has this flag, but if just some cpu overlap, not stay alone without any domain, the patch won't miss eligible cpu. > > Morten > >> >> >> -- >> Thanks Alex >> > -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
Hi Preeti, On Fri, Jan 11, 2013 at 04:56:09AM +, Preeti U Murthy wrote: > Hi Morten,Alex > > On 01/09/2013 11:51 PM, Morten Rasmussen wrote: > > On Sat, Jan 05, 2013 at 08:37:34AM +, Alex Shi wrote: > >> Guess the search cpu from bottom to up in domain tree come from > >> commit 3dbd5342074a1e sched: multilevel sbe sbf, the purpose is > >> balancing over tasks on all level domains. > >> > >> This balancing cost much if there has many domain/groups in a large > >> system. And force spreading task among different domains may cause > >> performance issue due to bad locality. > >> > >> If we remove this code, we will get quick fork/exec/wake, plus better > >> balancing among whole system, that also reduce migrations in future > >> load balancing. > >> > >> This patch increases 10+% performance of hackbench on my 4 sockets > >> NHM and SNB machines. > >> > >> Signed-off-by: Alex Shi > >> --- > >> kernel/sched/fair.c | 20 +--- > >> 1 file changed, 1 insertion(+), 19 deletions(-) > >> > >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > >> index ecfbf8e..895a3f4 100644 > >> --- a/kernel/sched/fair.c > >> +++ b/kernel/sched/fair.c > >> @@ -3364,15 +3364,9 @@ select_task_rq_fair(struct task_struct *p, int > >> sd_flag, int wake_flags) > >>goto unlock; > >>} > >> > >> - while (sd) { > >> + if (sd) { > >>int load_idx = sd->forkexec_idx; > >>struct sched_group *group; > >> - int weight; > >> - > >> - if (!(sd->flags & sd_flag)) { > >> - sd = sd->child; > >> - continue; > >> - } > >> > >>if (sd_flag & SD_BALANCE_WAKE) > >>load_idx = sd->wake_idx; > >> @@ -3382,18 +3376,6 @@ select_task_rq_fair(struct task_struct *p, int > >> sd_flag, int wake_flags) > >>goto unlock; > >> > >>new_cpu = find_idlest_cpu(group, p, cpu); > >> - > >> - /* Now try balancing at a lower domain level of new_cpu */ > >> - cpu = new_cpu; > >> - weight = sd->span_weight; > >> - sd = NULL; > >> - for_each_domain(cpu, tmp) { > >> - if (weight <= tmp->span_weight) > >> - break; > >> - if (tmp->flags & sd_flag) > >> - sd = tmp; > >> - } > >> - /* while loop will break here if sd == NULL */ > > > > I agree that this should be a major optimization. I just can't figure > > out why the existing recursive search for an idle cpu switches to the > > new cpu near the end and then starts a search for an idle cpu in the new > > cpu's domain. Is this to handle some exotic sched domain configurations? > > If so, they probably wouldn't work with your optimizations. > > Let me explain my understanding of why the recursive search is the way > it is. > > _ sd0 > | | > | ___sd1__ ___sd2__| > | || || | > | | sgx| | sga | | > | | sgy| | sgb | | > | || || | > |_| > > What the current recursive search is doing is (assuming we start with > sd0-the top level sched domain whose flags are rightly set). we find > that sd1 is the idlest group,and a cpux1 in sgx is the idlest cpu. > > We could have ideally stopped the search here.But the problem with this > is that there is a possibility that sgx is more loaded than sgy; meaning > the cpus in sgx are heavily imbalanced;say there are two cpus cpux1 and > cpux2 in sgx,where cpux2 is heavily loaded and cpux1 has recently gotten > idle and load balancing has not come to its rescue yet.According to the > search above, cpux1 is idle,but is *not the right candidate for > scheduling forked task,it is the right candidate for relieving the load > from cpux2* due to cache locality etc. > > Therefore in the next recursive search we go one step inside sd1-the > chosen idlest group candidate,which also happens to be the *next level > sched domain for cpux1-the chosen idle cpu*. It then returns sgy as the > idlest perhaps,if the situation happens to be better than what i have > described for sgx and an appropriate cpu there is chosen. > > So in short a bird's eye view of a large sched domain to choose the cpu > would be very short sighted,we could end up creating imbalances within > lower level sched domains.To avoid this the recursive search plays safe > and chooses the best idle group after viewing the large sched domain in > detail. Thanks for your explanation. I see your point that the first search my end a high level in the sched domain and pick a cpu in a very unbalanced group. The extra search will then try to put things right. This patch set removes the recursive search completely. So the overall balance policy is changed from trying to achieve equal load across all groups to always put tasks on the most idle cpu regardless of the load of
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
On Fri, Jan 11, 2013 at 02:46:31AM +, Alex Shi wrote: > On 01/10/2013 02:21 AM, Morten Rasmussen wrote: > >>new_cpu = find_idlest_cpu(group, p, cpu); > >> > - > >> > -/* Now try balancing at a lower domain level of new_cpu > >> > */ > >> > -cpu = new_cpu; > >> > -weight = sd->span_weight; > >> > -sd = NULL; > >> > -for_each_domain(cpu, tmp) { > >> > -if (weight <= tmp->span_weight) > >> > -break; > >> > -if (tmp->flags & sd_flag) > >> > -sd = tmp; > >> > -} > >> > -/* while loop will break here if sd == NULL */ > > I agree that this should be a major optimization. I just can't figure > > out why the existing recursive search for an idle cpu switches to the > > new cpu near the end and then starts a search for an idle cpu in the new > > cpu's domain. Is this to handle some exotic sched domain configurations? > > If so, they probably wouldn't work with your optimizations. > > I did not find odd configuration that asking for old logical. > > According to Documentation/scheduler/sched-domains.txt, Maybe never. > "A domain's span MUST be a superset of it child's span (this restriction > could be relaxed if the need arises), and a base domain for CPU i MUST > span at least i." etc. etc. The reason for my suspicion is the SD_OVERLAP flag, which has something to do overlapping sched domains. I haven't looked into what it does or how it works. I'm just wondering if this optimization will affect the use of that flag. Morten > > > -- > Thanks Alex > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
在 2013-01-11五的 10:26 +0530,Preeti U Murthy写道: > Hi Morten,Alex > > On 01/09/2013 11:51 PM, Morten Rasmussen wrote: > > On Sat, Jan 05, 2013 at 08:37:34AM +, Alex Shi wrote: > >> Guess the search cpu from bottom to up in domain tree come from > >> commit 3dbd5342074a1e sched: multilevel sbe sbf, the purpose is > >> balancing over tasks on all level domains. > >> > >> This balancing cost much if there has many domain/groups in a large > >> system. And force spreading task among different domains may cause > >> performance issue due to bad locality. > >> > >> If we remove this code, we will get quick fork/exec/wake, plus better > >> balancing among whole system, that also reduce migrations in future > >> load balancing. > >> > >> This patch increases 10+% performance of hackbench on my 4 sockets > >> NHM and SNB machines. > >> > >> Signed-off-by: Alex Shi > >> --- > >> kernel/sched/fair.c | 20 +--- > >> 1 file changed, 1 insertion(+), 19 deletions(-) > >> > >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > >> index ecfbf8e..895a3f4 100644 > >> --- a/kernel/sched/fair.c > >> +++ b/kernel/sched/fair.c > >> @@ -3364,15 +3364,9 @@ select_task_rq_fair(struct task_struct *p, int > >> sd_flag, int wake_flags) > >>goto unlock; > >>} > >> > >> - while (sd) { > >> + if (sd) { > >>int load_idx = sd->forkexec_idx; > >>struct sched_group *group; > >> - int weight; > >> - > >> - if (!(sd->flags & sd_flag)) { > >> - sd = sd->child; > >> - continue; > >> - } > >> > >>if (sd_flag & SD_BALANCE_WAKE) > >>load_idx = sd->wake_idx; > >> @@ -3382,18 +3376,6 @@ select_task_rq_fair(struct task_struct *p, int > >> sd_flag, int wake_flags) > >>goto unlock; > >> > >>new_cpu = find_idlest_cpu(group, p, cpu); > >> - > >> - /* Now try balancing at a lower domain level of new_cpu */ > >> - cpu = new_cpu; > >> - weight = sd->span_weight; > >> - sd = NULL; > >> - for_each_domain(cpu, tmp) { > >> - if (weight <= tmp->span_weight) > >> - break; > >> - if (tmp->flags & sd_flag) > >> - sd = tmp; > >> - } > >> - /* while loop will break here if sd == NULL */ > > > > I agree that this should be a major optimization. I just can't figure > > out why the existing recursive search for an idle cpu switches to the > > new cpu near the end and then starts a search for an idle cpu in the new > > cpu's domain. Is this to handle some exotic sched domain configurations? > > If so, they probably wouldn't work with your optimizations. > > Let me explain my understanding of why the recursive search is the way > it is. > > _ sd0 > | | > | ___sd1__ ___sd2__| > | || || | > | | sgx| | sga | | > | | sgy| | sgb | | > | || || | > |_| > > What the current recursive search is doing is (assuming we start with > sd0-the top level sched domain whose flags are rightly set). we find > that sd1 is the idlest group,and a cpux1 in sgx is the idlest cpu. > > We could have ideally stopped the search here.But the problem with this > is that there is a possibility that sgx is more loaded than sgy; meaning > the cpus in sgx are heavily imbalanced;say there are two cpus cpux1 and > cpux2 in sgx,where cpux2 is heavily loaded and cpux1 has recently gotten > idle and load balancing has not come to its rescue yet.According to the > search above, cpux1 is idle,but is *not the right candidate for > scheduling forked task,it is the right candidate for relieving the load > from cpux2* due to cache locality etc. This corner case may occur after "[PATCH v3 03/22] sched: fix find_idlest_group mess logical" brought in the local sched_group bias, and assume balancing runs on cpux2. ideally, find_idlest_group should find the real idlest(this case: sgy), then, this patch is reasonable. > > Therefore in the next recursive search we go one step inside sd1-the > chosen idlest group candidate,which also happens to be the *next level > sched domain for cpux1-the chosen idle cpu*. It then returns sgy as the > idlest perhaps,if the situation happens to be better than what i have > described for sgx and an appropriate cpu there is chosen. > > So in short a bird's eye view of a large sched domain to choose the cpu > would be very short sighted,we could end up creating imbalances within > lower level sched domains.To avoid this the recursive search plays safe > and chooses the best idle group after viewing the large sched domain in > detail. > > Therefore even i feel that this patch should be implemented after > thorough tests. > > > > > Morten > > Regards > Preeti U Murthy > > -- > To unsubscribe from this
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
在 2013-01-11五的 10:26 +0530,Preeti U Murthy写道: Hi Morten,Alex On 01/09/2013 11:51 PM, Morten Rasmussen wrote: On Sat, Jan 05, 2013 at 08:37:34AM +, Alex Shi wrote: Guess the search cpu from bottom to up in domain tree come from commit 3dbd5342074a1e sched: multilevel sbe sbf, the purpose is balancing over tasks on all level domains. This balancing cost much if there has many domain/groups in a large system. And force spreading task among different domains may cause performance issue due to bad locality. If we remove this code, we will get quick fork/exec/wake, plus better balancing among whole system, that also reduce migrations in future load balancing. This patch increases 10+% performance of hackbench on my 4 sockets NHM and SNB machines. Signed-off-by: Alex Shi alex@intel.com --- kernel/sched/fair.c | 20 +--- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ecfbf8e..895a3f4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3364,15 +3364,9 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; } - while (sd) { + if (sd) { int load_idx = sd-forkexec_idx; struct sched_group *group; - int weight; - - if (!(sd-flags sd_flag)) { - sd = sd-child; - continue; - } if (sd_flag SD_BALANCE_WAKE) load_idx = sd-wake_idx; @@ -3382,18 +3376,6 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; new_cpu = find_idlest_cpu(group, p, cpu); - - /* Now try balancing at a lower domain level of new_cpu */ - cpu = new_cpu; - weight = sd-span_weight; - sd = NULL; - for_each_domain(cpu, tmp) { - if (weight = tmp-span_weight) - break; - if (tmp-flags sd_flag) - sd = tmp; - } - /* while loop will break here if sd == NULL */ I agree that this should be a major optimization. I just can't figure out why the existing recursive search for an idle cpu switches to the new cpu near the end and then starts a search for an idle cpu in the new cpu's domain. Is this to handle some exotic sched domain configurations? If so, they probably wouldn't work with your optimizations. Let me explain my understanding of why the recursive search is the way it is. _ sd0 | | | ___sd1__ ___sd2__| | || || | | | sgx| | sga | | | | sgy| | sgb | | | || || | |_| What the current recursive search is doing is (assuming we start with sd0-the top level sched domain whose flags are rightly set). we find that sd1 is the idlest group,and a cpux1 in sgx is the idlest cpu. We could have ideally stopped the search here.But the problem with this is that there is a possibility that sgx is more loaded than sgy; meaning the cpus in sgx are heavily imbalanced;say there are two cpus cpux1 and cpux2 in sgx,where cpux2 is heavily loaded and cpux1 has recently gotten idle and load balancing has not come to its rescue yet.According to the search above, cpux1 is idle,but is *not the right candidate for scheduling forked task,it is the right candidate for relieving the load from cpux2* due to cache locality etc. This corner case may occur after [PATCH v3 03/22] sched: fix find_idlest_group mess logical brought in the local sched_group bias, and assume balancing runs on cpux2. ideally, find_idlest_group should find the real idlest(this case: sgy), then, this patch is reasonable. Therefore in the next recursive search we go one step inside sd1-the chosen idlest group candidate,which also happens to be the *next level sched domain for cpux1-the chosen idle cpu*. It then returns sgy as the idlest perhaps,if the situation happens to be better than what i have described for sgx and an appropriate cpu there is chosen. So in short a bird's eye view of a large sched domain to choose the cpu would be very short sighted,we could end up creating imbalances within lower level sched domains.To avoid this the recursive search plays safe and chooses the best idle group after viewing the large sched domain in detail. Therefore even i feel that this patch should be implemented after thorough tests. Morten Regards Preeti U Murthy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- regards! li guang
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
On Fri, Jan 11, 2013 at 02:46:31AM +, Alex Shi wrote: On 01/10/2013 02:21 AM, Morten Rasmussen wrote: new_cpu = find_idlest_cpu(group, p, cpu); - -/* Now try balancing at a lower domain level of new_cpu */ -cpu = new_cpu; -weight = sd-span_weight; -sd = NULL; -for_each_domain(cpu, tmp) { -if (weight = tmp-span_weight) -break; -if (tmp-flags sd_flag) -sd = tmp; -} -/* while loop will break here if sd == NULL */ I agree that this should be a major optimization. I just can't figure out why the existing recursive search for an idle cpu switches to the new cpu near the end and then starts a search for an idle cpu in the new cpu's domain. Is this to handle some exotic sched domain configurations? If so, they probably wouldn't work with your optimizations. I did not find odd configuration that asking for old logical. According to Documentation/scheduler/sched-domains.txt, Maybe never. A domain's span MUST be a superset of it child's span (this restriction could be relaxed if the need arises), and a base domain for CPU i MUST span at least i. etc. etc. The reason for my suspicion is the SD_OVERLAP flag, which has something to do overlapping sched domains. I haven't looked into what it does or how it works. I'm just wondering if this optimization will affect the use of that flag. Morten -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
Hi Preeti, On Fri, Jan 11, 2013 at 04:56:09AM +, Preeti U Murthy wrote: Hi Morten,Alex On 01/09/2013 11:51 PM, Morten Rasmussen wrote: On Sat, Jan 05, 2013 at 08:37:34AM +, Alex Shi wrote: Guess the search cpu from bottom to up in domain tree come from commit 3dbd5342074a1e sched: multilevel sbe sbf, the purpose is balancing over tasks on all level domains. This balancing cost much if there has many domain/groups in a large system. And force spreading task among different domains may cause performance issue due to bad locality. If we remove this code, we will get quick fork/exec/wake, plus better balancing among whole system, that also reduce migrations in future load balancing. This patch increases 10+% performance of hackbench on my 4 sockets NHM and SNB machines. Signed-off-by: Alex Shi alex@intel.com --- kernel/sched/fair.c | 20 +--- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ecfbf8e..895a3f4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3364,15 +3364,9 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; } - while (sd) { + if (sd) { int load_idx = sd-forkexec_idx; struct sched_group *group; - int weight; - - if (!(sd-flags sd_flag)) { - sd = sd-child; - continue; - } if (sd_flag SD_BALANCE_WAKE) load_idx = sd-wake_idx; @@ -3382,18 +3376,6 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; new_cpu = find_idlest_cpu(group, p, cpu); - - /* Now try balancing at a lower domain level of new_cpu */ - cpu = new_cpu; - weight = sd-span_weight; - sd = NULL; - for_each_domain(cpu, tmp) { - if (weight = tmp-span_weight) - break; - if (tmp-flags sd_flag) - sd = tmp; - } - /* while loop will break here if sd == NULL */ I agree that this should be a major optimization. I just can't figure out why the existing recursive search for an idle cpu switches to the new cpu near the end and then starts a search for an idle cpu in the new cpu's domain. Is this to handle some exotic sched domain configurations? If so, they probably wouldn't work with your optimizations. Let me explain my understanding of why the recursive search is the way it is. _ sd0 | | | ___sd1__ ___sd2__| | || || | | | sgx| | sga | | | | sgy| | sgb | | | || || | |_| What the current recursive search is doing is (assuming we start with sd0-the top level sched domain whose flags are rightly set). we find that sd1 is the idlest group,and a cpux1 in sgx is the idlest cpu. We could have ideally stopped the search here.But the problem with this is that there is a possibility that sgx is more loaded than sgy; meaning the cpus in sgx are heavily imbalanced;say there are two cpus cpux1 and cpux2 in sgx,where cpux2 is heavily loaded and cpux1 has recently gotten idle and load balancing has not come to its rescue yet.According to the search above, cpux1 is idle,but is *not the right candidate for scheduling forked task,it is the right candidate for relieving the load from cpux2* due to cache locality etc. Therefore in the next recursive search we go one step inside sd1-the chosen idlest group candidate,which also happens to be the *next level sched domain for cpux1-the chosen idle cpu*. It then returns sgy as the idlest perhaps,if the situation happens to be better than what i have described for sgx and an appropriate cpu there is chosen. So in short a bird's eye view of a large sched domain to choose the cpu would be very short sighted,we could end up creating imbalances within lower level sched domains.To avoid this the recursive search plays safe and chooses the best idle group after viewing the large sched domain in detail. Thanks for your explanation. I see your point that the first search my end a high level in the sched domain and pick a cpu in a very unbalanced group. The extra search will then try to put things right. This patch set removes the recursive search completely. So the overall balance policy is changed from trying to achieve equal load across all groups to always put tasks on the most idle cpu regardless of the load of its group. I'm not sure if this is a good or bad move. It is quicker. Regards, Morten Therefore even i feel that this patch should be implemented after thorough tests. Morten Regards Preeti U Murthy --
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
On 01/11/2013 06:07 PM, Morten Rasmussen wrote: On Fri, Jan 11, 2013 at 02:46:31AM +, Alex Shi wrote: On 01/10/2013 02:21 AM, Morten Rasmussen wrote: new_cpu = find_idlest_cpu(group, p, cpu); - - /* Now try balancing at a lower domain level of new_cpu */ - cpu = new_cpu; - weight = sd-span_weight; - sd = NULL; - for_each_domain(cpu, tmp) { - if (weight = tmp-span_weight) - break; - if (tmp-flags sd_flag) - sd = tmp; - } - /* while loop will break here if sd == NULL */ I agree that this should be a major optimization. I just can't figure out why the existing recursive search for an idle cpu switches to the new cpu near the end and then starts a search for an idle cpu in the new cpu's domain. Is this to handle some exotic sched domain configurations? If so, they probably wouldn't work with your optimizations. I did not find odd configuration that asking for old logical. According to Documentation/scheduler/sched-domains.txt, Maybe never. A domain's span MUST be a superset of it child's span (this restriction could be relaxed if the need arises), and a base domain for CPU i MUST span at least i. etc. etc. The reason for my suspicion is the SD_OVERLAP flag, which has something to do overlapping sched domains. I haven't looked into what it does or how it works. I'm just wondering if this optimization will affect the use of that flag. I didn't know any machine has this flag, but if just some cpu overlap, not stay alone without any domain, the patch won't miss eligible cpu. Morten -- Thanks Alex -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
On 01/11/2013 04:01 PM, li guang wrote: 在 2013-01-11五的 10:26 +0530,Preeti U Murthy写道: Hi Morten,Alex On 01/09/2013 11:51 PM, Morten Rasmussen wrote: On Sat, Jan 05, 2013 at 08:37:34AM +, Alex Shi wrote: Guess the search cpu from bottom to up in domain tree come from commit 3dbd5342074a1e sched: multilevel sbe sbf, the purpose is balancing over tasks on all level domains. This balancing cost much if there has many domain/groups in a large system. And force spreading task among different domains may cause performance issue due to bad locality. If we remove this code, we will get quick fork/exec/wake, plus better balancing among whole system, that also reduce migrations in future load balancing. This patch increases 10+% performance of hackbench on my 4 sockets NHM and SNB machines. Signed-off-by: Alex Shi alex@intel.com --- kernel/sched/fair.c | 20 +--- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ecfbf8e..895a3f4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3364,15 +3364,9 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; } - while (sd) { + if (sd) { int load_idx = sd-forkexec_idx; struct sched_group *group; - int weight; - - if (!(sd-flags sd_flag)) { - sd = sd-child; - continue; - } if (sd_flag SD_BALANCE_WAKE) load_idx = sd-wake_idx; @@ -3382,18 +3376,6 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; new_cpu = find_idlest_cpu(group, p, cpu); - - /* Now try balancing at a lower domain level of new_cpu */ - cpu = new_cpu; - weight = sd-span_weight; - sd = NULL; - for_each_domain(cpu, tmp) { - if (weight = tmp-span_weight) - break; - if (tmp-flags sd_flag) - sd = tmp; - } - /* while loop will break here if sd == NULL */ I agree that this should be a major optimization. I just can't figure out why the existing recursive search for an idle cpu switches to the new cpu near the end and then starts a search for an idle cpu in the new cpu's domain. Is this to handle some exotic sched domain configurations? If so, they probably wouldn't work with your optimizations. Let me explain my understanding of why the recursive search is the way it is. _ sd0 | | | ___sd1__ ___sd2__| | || || | | | sgx| | sga | | | | sgy| | sgb | | | || || | |_| What the current recursive search is doing is (assuming we start with sd0-the top level sched domain whose flags are rightly set). we find that sd1 is the idlest group,and a cpux1 in sgx is the idlest cpu. We could have ideally stopped the search here.But the problem with this is that there is a possibility that sgx is more loaded than sgy; meaning the cpus in sgx are heavily imbalanced;say there are two cpus cpux1 and cpux2 in sgx,where cpux2 is heavily loaded and cpux1 has recently gotten idle and load balancing has not come to its rescue yet.According to the search above, cpux1 is idle,but is *not the right candidate for scheduling forked task,it is the right candidate for relieving the load from cpux2* due to cache locality etc. This corner case may occur after [PATCH v3 03/22] sched: fix find_idlest_group mess logical brought in the local sched_group bias, and assume balancing runs on cpux2. ideally, find_idlest_group should find the real idlest(this case: sgy), then, this patch is reasonable. Sure. but seems it is a bit hard to go down the idlest group. and the old logical is real cost too much, on my 2 socket NHM/SNB server, hackbench can increase 2~5% performance. and no clean performance on kbuild/aim7 etc. -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
Hi Morten,Alex On 01/09/2013 11:51 PM, Morten Rasmussen wrote: > On Sat, Jan 05, 2013 at 08:37:34AM +, Alex Shi wrote: >> Guess the search cpu from bottom to up in domain tree come from >> commit 3dbd5342074a1e sched: multilevel sbe sbf, the purpose is >> balancing over tasks on all level domains. >> >> This balancing cost much if there has many domain/groups in a large >> system. And force spreading task among different domains may cause >> performance issue due to bad locality. >> >> If we remove this code, we will get quick fork/exec/wake, plus better >> balancing among whole system, that also reduce migrations in future >> load balancing. >> >> This patch increases 10+% performance of hackbench on my 4 sockets >> NHM and SNB machines. >> >> Signed-off-by: Alex Shi >> --- >> kernel/sched/fair.c | 20 +--- >> 1 file changed, 1 insertion(+), 19 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index ecfbf8e..895a3f4 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -3364,15 +3364,9 @@ select_task_rq_fair(struct task_struct *p, int >> sd_flag, int wake_flags) >> goto unlock; >> } >> >> -while (sd) { >> +if (sd) { >> int load_idx = sd->forkexec_idx; >> struct sched_group *group; >> -int weight; >> - >> -if (!(sd->flags & sd_flag)) { >> -sd = sd->child; >> -continue; >> -} >> >> if (sd_flag & SD_BALANCE_WAKE) >> load_idx = sd->wake_idx; >> @@ -3382,18 +3376,6 @@ select_task_rq_fair(struct task_struct *p, int >> sd_flag, int wake_flags) >> goto unlock; >> >> new_cpu = find_idlest_cpu(group, p, cpu); >> - >> -/* Now try balancing at a lower domain level of new_cpu */ >> -cpu = new_cpu; >> -weight = sd->span_weight; >> -sd = NULL; >> -for_each_domain(cpu, tmp) { >> -if (weight <= tmp->span_weight) >> -break; >> -if (tmp->flags & sd_flag) >> -sd = tmp; >> -} >> -/* while loop will break here if sd == NULL */ > > I agree that this should be a major optimization. I just can't figure > out why the existing recursive search for an idle cpu switches to the > new cpu near the end and then starts a search for an idle cpu in the new > cpu's domain. Is this to handle some exotic sched domain configurations? > If so, they probably wouldn't work with your optimizations. Let me explain my understanding of why the recursive search is the way it is. _ sd0 | | | ___sd1__ ___sd2__| | || || | | | sgx| | sga | | | | sgy| | sgb | | | || || | |_| What the current recursive search is doing is (assuming we start with sd0-the top level sched domain whose flags are rightly set). we find that sd1 is the idlest group,and a cpux1 in sgx is the idlest cpu. We could have ideally stopped the search here.But the problem with this is that there is a possibility that sgx is more loaded than sgy; meaning the cpus in sgx are heavily imbalanced;say there are two cpus cpux1 and cpux2 in sgx,where cpux2 is heavily loaded and cpux1 has recently gotten idle and load balancing has not come to its rescue yet.According to the search above, cpux1 is idle,but is *not the right candidate for scheduling forked task,it is the right candidate for relieving the load from cpux2* due to cache locality etc. Therefore in the next recursive search we go one step inside sd1-the chosen idlest group candidate,which also happens to be the *next level sched domain for cpux1-the chosen idle cpu*. It then returns sgy as the idlest perhaps,if the situation happens to be better than what i have described for sgx and an appropriate cpu there is chosen. So in short a bird's eye view of a large sched domain to choose the cpu would be very short sighted,we could end up creating imbalances within lower level sched domains.To avoid this the recursive search plays safe and chooses the best idle group after viewing the large sched domain in detail. Therefore even i feel that this patch should be implemented after thorough tests. > Morten Regards Preeti U Murthy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
On 01/10/2013 02:21 AM, Morten Rasmussen wrote: >> new_cpu = find_idlest_cpu(group, p, cpu); >> > - >> > - /* Now try balancing at a lower domain level of new_cpu */ >> > - cpu = new_cpu; >> > - weight = sd->span_weight; >> > - sd = NULL; >> > - for_each_domain(cpu, tmp) { >> > - if (weight <= tmp->span_weight) >> > - break; >> > - if (tmp->flags & sd_flag) >> > - sd = tmp; >> > - } >> > - /* while loop will break here if sd == NULL */ > I agree that this should be a major optimization. I just can't figure > out why the existing recursive search for an idle cpu switches to the > new cpu near the end and then starts a search for an idle cpu in the new > cpu's domain. Is this to handle some exotic sched domain configurations? > If so, they probably wouldn't work with your optimizations. I did not find odd configuration that asking for old logical. According to Documentation/scheduler/sched-domains.txt, Maybe never. "A domain's span MUST be a superset of it child's span (this restriction could be relaxed if the need arises), and a base domain for CPU i MUST span at least i." etc. etc. -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
On 01/10/2013 02:21 AM, Morten Rasmussen wrote: new_cpu = find_idlest_cpu(group, p, cpu); - - /* Now try balancing at a lower domain level of new_cpu */ - cpu = new_cpu; - weight = sd-span_weight; - sd = NULL; - for_each_domain(cpu, tmp) { - if (weight = tmp-span_weight) - break; - if (tmp-flags sd_flag) - sd = tmp; - } - /* while loop will break here if sd == NULL */ I agree that this should be a major optimization. I just can't figure out why the existing recursive search for an idle cpu switches to the new cpu near the end and then starts a search for an idle cpu in the new cpu's domain. Is this to handle some exotic sched domain configurations? If so, they probably wouldn't work with your optimizations. I did not find odd configuration that asking for old logical. According to Documentation/scheduler/sched-domains.txt, Maybe never. A domain's span MUST be a superset of it child's span (this restriction could be relaxed if the need arises), and a base domain for CPU i MUST span at least i. etc. etc. -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
Hi Morten,Alex On 01/09/2013 11:51 PM, Morten Rasmussen wrote: On Sat, Jan 05, 2013 at 08:37:34AM +, Alex Shi wrote: Guess the search cpu from bottom to up in domain tree come from commit 3dbd5342074a1e sched: multilevel sbe sbf, the purpose is balancing over tasks on all level domains. This balancing cost much if there has many domain/groups in a large system. And force spreading task among different domains may cause performance issue due to bad locality. If we remove this code, we will get quick fork/exec/wake, plus better balancing among whole system, that also reduce migrations in future load balancing. This patch increases 10+% performance of hackbench on my 4 sockets NHM and SNB machines. Signed-off-by: Alex Shi alex@intel.com --- kernel/sched/fair.c | 20 +--- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ecfbf8e..895a3f4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3364,15 +3364,9 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; } -while (sd) { +if (sd) { int load_idx = sd-forkexec_idx; struct sched_group *group; -int weight; - -if (!(sd-flags sd_flag)) { -sd = sd-child; -continue; -} if (sd_flag SD_BALANCE_WAKE) load_idx = sd-wake_idx; @@ -3382,18 +3376,6 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; new_cpu = find_idlest_cpu(group, p, cpu); - -/* Now try balancing at a lower domain level of new_cpu */ -cpu = new_cpu; -weight = sd-span_weight; -sd = NULL; -for_each_domain(cpu, tmp) { -if (weight = tmp-span_weight) -break; -if (tmp-flags sd_flag) -sd = tmp; -} -/* while loop will break here if sd == NULL */ I agree that this should be a major optimization. I just can't figure out why the existing recursive search for an idle cpu switches to the new cpu near the end and then starts a search for an idle cpu in the new cpu's domain. Is this to handle some exotic sched domain configurations? If so, they probably wouldn't work with your optimizations. Let me explain my understanding of why the recursive search is the way it is. _ sd0 | | | ___sd1__ ___sd2__| | || || | | | sgx| | sga | | | | sgy| | sgb | | | || || | |_| What the current recursive search is doing is (assuming we start with sd0-the top level sched domain whose flags are rightly set). we find that sd1 is the idlest group,and a cpux1 in sgx is the idlest cpu. We could have ideally stopped the search here.But the problem with this is that there is a possibility that sgx is more loaded than sgy; meaning the cpus in sgx are heavily imbalanced;say there are two cpus cpux1 and cpux2 in sgx,where cpux2 is heavily loaded and cpux1 has recently gotten idle and load balancing has not come to its rescue yet.According to the search above, cpux1 is idle,but is *not the right candidate for scheduling forked task,it is the right candidate for relieving the load from cpux2* due to cache locality etc. Therefore in the next recursive search we go one step inside sd1-the chosen idlest group candidate,which also happens to be the *next level sched domain for cpux1-the chosen idle cpu*. It then returns sgy as the idlest perhaps,if the situation happens to be better than what i have described for sgx and an appropriate cpu there is chosen. So in short a bird's eye view of a large sched domain to choose the cpu would be very short sighted,we could end up creating imbalances within lower level sched domains.To avoid this the recursive search plays safe and chooses the best idle group after viewing the large sched domain in detail. Therefore even i feel that this patch should be implemented after thorough tests. Morten Regards Preeti U Murthy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
On Sat, Jan 05, 2013 at 08:37:34AM +, Alex Shi wrote: > Guess the search cpu from bottom to up in domain tree come from > commit 3dbd5342074a1e sched: multilevel sbe sbf, the purpose is > balancing over tasks on all level domains. > > This balancing cost much if there has many domain/groups in a large > system. And force spreading task among different domains may cause > performance issue due to bad locality. > > If we remove this code, we will get quick fork/exec/wake, plus better > balancing among whole system, that also reduce migrations in future > load balancing. > > This patch increases 10+% performance of hackbench on my 4 sockets > NHM and SNB machines. > > Signed-off-by: Alex Shi > --- > kernel/sched/fair.c | 20 +--- > 1 file changed, 1 insertion(+), 19 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index ecfbf8e..895a3f4 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3364,15 +3364,9 @@ select_task_rq_fair(struct task_struct *p, int > sd_flag, int wake_flags) > goto unlock; > } > > - while (sd) { > + if (sd) { > int load_idx = sd->forkexec_idx; > struct sched_group *group; > - int weight; > - > - if (!(sd->flags & sd_flag)) { > - sd = sd->child; > - continue; > - } > > if (sd_flag & SD_BALANCE_WAKE) > load_idx = sd->wake_idx; > @@ -3382,18 +3376,6 @@ select_task_rq_fair(struct task_struct *p, int > sd_flag, int wake_flags) > goto unlock; > > new_cpu = find_idlest_cpu(group, p, cpu); > - > - /* Now try balancing at a lower domain level of new_cpu */ > - cpu = new_cpu; > - weight = sd->span_weight; > - sd = NULL; > - for_each_domain(cpu, tmp) { > - if (weight <= tmp->span_weight) > - break; > - if (tmp->flags & sd_flag) > - sd = tmp; > - } > - /* while loop will break here if sd == NULL */ I agree that this should be a major optimization. I just can't figure out why the existing recursive search for an idle cpu switches to the new cpu near the end and then starts a search for an idle cpu in the new cpu's domain. Is this to handle some exotic sched domain configurations? If so, they probably wouldn't work with your optimizations. Morten > } > unlock: > rcu_read_unlock(); > -- > 1.7.12 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
On Sat, Jan 05, 2013 at 08:37:34AM +, Alex Shi wrote: Guess the search cpu from bottom to up in domain tree come from commit 3dbd5342074a1e sched: multilevel sbe sbf, the purpose is balancing over tasks on all level domains. This balancing cost much if there has many domain/groups in a large system. And force spreading task among different domains may cause performance issue due to bad locality. If we remove this code, we will get quick fork/exec/wake, plus better balancing among whole system, that also reduce migrations in future load balancing. This patch increases 10+% performance of hackbench on my 4 sockets NHM and SNB machines. Signed-off-by: Alex Shi alex@intel.com --- kernel/sched/fair.c | 20 +--- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ecfbf8e..895a3f4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3364,15 +3364,9 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; } - while (sd) { + if (sd) { int load_idx = sd-forkexec_idx; struct sched_group *group; - int weight; - - if (!(sd-flags sd_flag)) { - sd = sd-child; - continue; - } if (sd_flag SD_BALANCE_WAKE) load_idx = sd-wake_idx; @@ -3382,18 +3376,6 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; new_cpu = find_idlest_cpu(group, p, cpu); - - /* Now try balancing at a lower domain level of new_cpu */ - cpu = new_cpu; - weight = sd-span_weight; - sd = NULL; - for_each_domain(cpu, tmp) { - if (weight = tmp-span_weight) - break; - if (tmp-flags sd_flag) - sd = tmp; - } - /* while loop will break here if sd == NULL */ I agree that this should be a major optimization. I just can't figure out why the existing recursive search for an idle cpu switches to the new cpu near the end and then starts a search for an idle cpu in the new cpu's domain. Is this to handle some exotic sched domain configurations? If so, they probably wouldn't work with your optimizations. Morten } unlock: rcu_read_unlock(); -- 1.7.12 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
Guess the search cpu from bottom to up in domain tree come from commit 3dbd5342074a1e sched: multilevel sbe sbf, the purpose is balancing over tasks on all level domains. This balancing cost much if there has many domain/groups in a large system. And force spreading task among different domains may cause performance issue due to bad locality. If we remove this code, we will get quick fork/exec/wake, plus better balancing among whole system, that also reduce migrations in future load balancing. This patch increases 10+% performance of hackbench on my 4 sockets NHM and SNB machines. Signed-off-by: Alex Shi --- kernel/sched/fair.c | 20 +--- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ecfbf8e..895a3f4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3364,15 +3364,9 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; } - while (sd) { + if (sd) { int load_idx = sd->forkexec_idx; struct sched_group *group; - int weight; - - if (!(sd->flags & sd_flag)) { - sd = sd->child; - continue; - } if (sd_flag & SD_BALANCE_WAKE) load_idx = sd->wake_idx; @@ -3382,18 +3376,6 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; new_cpu = find_idlest_cpu(group, p, cpu); - - /* Now try balancing at a lower domain level of new_cpu */ - cpu = new_cpu; - weight = sd->span_weight; - sd = NULL; - for_each_domain(cpu, tmp) { - if (weight <= tmp->span_weight) - break; - if (tmp->flags & sd_flag) - sd = tmp; - } - /* while loop will break here if sd == NULL */ } unlock: rcu_read_unlock(); -- 1.7.12 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
Guess the search cpu from bottom to up in domain tree come from commit 3dbd5342074a1e sched: multilevel sbe sbf, the purpose is balancing over tasks on all level domains. This balancing cost much if there has many domain/groups in a large system. And force spreading task among different domains may cause performance issue due to bad locality. If we remove this code, we will get quick fork/exec/wake, plus better balancing among whole system, that also reduce migrations in future load balancing. This patch increases 10+% performance of hackbench on my 4 sockets NHM and SNB machines. Signed-off-by: Alex Shi alex@intel.com --- kernel/sched/fair.c | 20 +--- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ecfbf8e..895a3f4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3364,15 +3364,9 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; } - while (sd) { + if (sd) { int load_idx = sd-forkexec_idx; struct sched_group *group; - int weight; - - if (!(sd-flags sd_flag)) { - sd = sd-child; - continue; - } if (sd_flag SD_BALANCE_WAKE) load_idx = sd-wake_idx; @@ -3382,18 +3376,6 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) goto unlock; new_cpu = find_idlest_cpu(group, p, cpu); - - /* Now try balancing at a lower domain level of new_cpu */ - cpu = new_cpu; - weight = sd-span_weight; - sd = NULL; - for_each_domain(cpu, tmp) { - if (weight = tmp-span_weight) - break; - if (tmp-flags sd_flag) - sd = tmp; - } - /* while loop will break here if sd == NULL */ } unlock: rcu_read_unlock(); -- 1.7.12 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/