On 30 November 2016 at 13:49, Morten Rasmussen <morten.rasmus...@arm.com> wrote: > On Fri, Nov 25, 2016 at 04:34:33PM +0100, Vincent Guittot wrote: >> find_idlest_group() only compares the runnable_load_avg when looking for >> the least loaded group. But on fork intensive use case like hackbench
[snip] >> + min_avg_load = avg_load; >> + idlest = group; >> + } else if ((runnable_load < (min_runnable_load + >> imbalance)) && >> + (100*min_avg_load > >> imbalance_scale*avg_load)) { >> + /* >> + * The runnable loads are close so we take >> + * into account blocked load through avg_load >> + * which is blocked + runnable load >> + */ >> + min_avg_load = avg_load; >> idlest = group; >> } >> >> @@ -5470,13 +5495,16 @@ find_idlest_group(struct sched_domain *sd, struct >> task_struct *p, >> goto no_spare; >> >> if (this_spare > task_util(p) / 2 && >> - imbalance*this_spare > 100*most_spare) >> + imbalance_scale*this_spare > 100*most_spare) >> return NULL; >> else if (most_spare > task_util(p) / 2) >> return most_spare_sg; >> >> no_spare: >> - if (!idlest || 100*this_load < imbalance*min_load) >> + if (!idlest || >> + (min_runnable_load > (this_runnable_load + imbalance)) || >> + ((this_runnable_load < (min_runnable_load + imbalance)) && >> + (100*min_avg_load > imbalance_scale*this_avg_load))) > > I don't get why you have imbalance_scale applied to this_avg_load and > not min_avg_load. IIUC, you end up preferring non-local groups? In fact, I have keep the same condition that is used when looping the group. You're right that we should prefer local rq if avg_load are close and test the condition (100*this_avg_load > imbalance_scale*min_avg_load) instead > > If we take the example where this_runnable_load == min_runnable_load and > this_avg_load == min_avg_load. In this case, and in cases where > min_avg_load is slightly bigger than this_avg_load, we end up picking > the 'idlest' group even if the local group is equally good or even > slightly better? > >> return NULL; >> return idlest; >> } > > Overall, I like that load_avg is being brought in to make better > decisions. The variable naming is a bit confusing. For example, > runnable_load is a capacity-average just like avg_load. 'imbalance' is > now an absolute capacity-average margin, but it is hard to come up with > better short alternatives. > > Although 'imbalance' is based on the existing imbalance_pct, I find > somewhat arbitrary. Why is (imbalance_pct-100)*1024/100 a good absolute > margin to define the interval where we want to consider load_avg? I > guess it is case of 'we had to pick some value', which we have done in > many other places. Though, IMHO, it is a bit strange that imbalance_pct > is used in two different ways to bias comparison in the same function. I see imbalance_pct like the definition of the acceptable imbalance % for a sched_domain. This % is then used against the current load or to define an absolute value. > It used to be only used as a scaling factor (now imbalance_scale), while > this patch proposes to use it for computing an absolute margin > (imbalance) as well. It is not major issue, but it is not clear why it > is used differently to compare two metrics that are relatively closely > related. In fact, scaling factor (imbalance) doesn't work well with small value. As an example, the use of a scaling factor fails as soon as this_runnable_load = 0 because we always selected local rq even if min_runnable_load is only 1 which doesn't really make sense because they are just the same. > > Morten