On Fri, Jul 25, 2014 at 03:32:10PM -0400, Rik van Riel wrote:
> Subject: sched: make update_sd_pick_busiest return true on a busier sd
> 
> Currently update_sd_pick_busiest only identifies the busiest sd
> that is either overloaded, or has a group imbalance. When no
> sd is imbalanced or overloaded, the load balancer fails to find
> the busiest domain.
> 
> This breaks load balancing between domains that are not overloaded,
> in the !SD_ASYM_PACKING case. This patch makes update_sd_pick_busiest
> return true when the busiest sd yet is encountered.
> 
> Behaviour for SD_ASYM_PACKING does not seem to match the comment,
> but I have no hardware to test that so I have left the behaviour
> of that code unchanged.
> 
> It is unclear what to do with the group_imb condition.
> Should group_imb override a busier load? If so, should we fix
> calculate_imbalance to return a sensible number when the "busiest"
> node found has a below average load? We probably need to fix
> calculate_imbalance anyway, to deal with an overloaded group that
> happens to have a below average load...

I think we want overloaded > group_imb > other. So prefer overloaded
groups, imbalanced groups if no overloaded and anything else if no
overloaded and imbalanced thingies.

If you look at the comment near sg_imbalanced(), in that case we want to
move tasks from the first group to the second, even though the second
group would be the heaviest.

enum group_type {
        group_other = 0,
        group_imbalanced,
        group_overloaded,
};

static enum group_type group_classify(struct sg_lb_stats *sgs)
{
        if (sgs->sum_nr_running > sgs->group_capacity_factor)
                return group_overloaded;

        if (sgs->group_imb)
                return group_imbalanced;

        return group_other;
}

>  /**
>   * update_sd_pick_busiest - return 1 on busiest group
>   * @env: The load balancing environment.
> @@ -5957,7 +5962,7 @@ static inline void update_sg_lb_stats(struct lb_env 
> *env,
>   * @sgs: sched_group statistics
>   *
>   * Determine if @sg is a busier group than the previously selected
> - * busiest group.
> + * busiest group. 

We really need that extra trailing whitespace, yes? ;-)

>   * Return: %true if @sg is a busier group than the previously selected
>   * busiest group. %false otherwise.
> @@ -5967,13 +5972,17 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>                                  struct sched_group *sg,
>                                  struct sg_lb_stats *sgs)
>  {

        if (group_classify(sgs) < group_classify(&sds->busiest_stats))
                return false;

>       if (sgs->avg_load <= sds->busiest_stat.avg_load)
>               return false;
>  

> +     /* This is the busiest node. */
> +     if (!(env->sd->flags & SD_ASYM_PACKING))
>               return true;
>  
>       /*

We could replace sg_lb_stats::group_imb with the above and avoid the
endless recomputation I suppose.

Also, we still need a little change to calculate_imbalance() where we
assume sum_nr_running > group_capacity_factor.

Attachment: pgpAw0e2kbwm3.pgp
Description: PGP signature

Reply via email to