On Fri, Jul 25, 2014 at 03:32:10PM -0400, Rik van Riel wrote: > Subject: sched: make update_sd_pick_busiest return true on a busier sd > > Currently update_sd_pick_busiest only identifies the busiest sd > that is either overloaded, or has a group imbalance. When no > sd is imbalanced or overloaded, the load balancer fails to find > the busiest domain. > > This breaks load balancing between domains that are not overloaded, > in the !SD_ASYM_PACKING case. This patch makes update_sd_pick_busiest > return true when the busiest sd yet is encountered. > > Behaviour for SD_ASYM_PACKING does not seem to match the comment, > but I have no hardware to test that so I have left the behaviour > of that code unchanged. > > It is unclear what to do with the group_imb condition. > Should group_imb override a busier load? If so, should we fix > calculate_imbalance to return a sensible number when the "busiest" > node found has a below average load? We probably need to fix > calculate_imbalance anyway, to deal with an overloaded group that > happens to have a below average load...
I think we want overloaded > group_imb > other. So prefer overloaded groups, imbalanced groups if no overloaded and anything else if no overloaded and imbalanced thingies. If you look at the comment near sg_imbalanced(), in that case we want to move tasks from the first group to the second, even though the second group would be the heaviest. enum group_type { group_other = 0, group_imbalanced, group_overloaded, }; static enum group_type group_classify(struct sg_lb_stats *sgs) { if (sgs->sum_nr_running > sgs->group_capacity_factor) return group_overloaded; if (sgs->group_imb) return group_imbalanced; return group_other; } > /** > * update_sd_pick_busiest - return 1 on busiest group > * @env: The load balancing environment. > @@ -5957,7 +5962,7 @@ static inline void update_sg_lb_stats(struct lb_env > *env, > * @sgs: sched_group statistics > * > * Determine if @sg is a busier group than the previously selected > - * busiest group. > + * busiest group. We really need that extra trailing whitespace, yes? ;-) > * Return: %true if @sg is a busier group than the previously selected > * busiest group. %false otherwise. > @@ -5967,13 +5972,17 @@ static bool update_sd_pick_busiest(struct lb_env *env, > struct sched_group *sg, > struct sg_lb_stats *sgs) > { if (group_classify(sgs) < group_classify(&sds->busiest_stats)) return false; > if (sgs->avg_load <= sds->busiest_stat.avg_load) > return false; > > + /* This is the busiest node. */ > + if (!(env->sd->flags & SD_ASYM_PACKING)) > return true; > > /* We could replace sg_lb_stats::group_imb with the above and avoid the endless recomputation I suppose. Also, we still need a little change to calculate_imbalance() where we assume sum_nr_running > group_capacity_factor.
pgpAw0e2kbwm3.pgp
Description: PGP signature