On Wed, 31 Jul 2019 at 15:44, Srikar Dronamraju <[email protected]> wrote: > > * Vincent Guittot <[email protected]> [2019-07-26 16:42:53]: > > > On Fri, 26 Jul 2019 at 15:59, Srikar Dronamraju > > <[email protected]> wrote: > > > > @@ -7361,19 +7357,46 @@ static int detach_tasks(struct lb_env *env) > > > > if (!can_migrate_task(p, env)) > > > > goto next; > > > > > > > > - load = task_h_load(p); > > > > + if (env->src_grp_type == migrate_load) { > > > > + unsigned long load = task_h_load(p); > > > > > > > > - if (sched_feat(LB_MIN) && load < 16 && > > > > !env->sd->nr_balance_failed) > > > > - goto next; > > > > + if (sched_feat(LB_MIN) && > > > > + load < 16 && !env->sd->nr_balance_failed) > > > > + goto next; > > > > + > > > > + if ((load / 2) > env->imbalance) > > > > + goto next; > > > > > > I know this existed before too but if the load is exactly or around 2x of > > > env->imbalance, the resultant imbalance after the load balance operation > > > would still be around env->imbalance. We may lose some cache affinity too. > > > > > > Can we do something like. > > > if (2 * load > 3 * env->imbalance) > > > goto next; > > > > TBH, I don't know what should be the best value and it's probably > > worth doing some investigation but i would prefer to do that as a > > separate patch to get a similar behavior in the overloaded case > > Why do you propose 3/2 instead of 2 ? > > > > If the imbalance is exactly or around load/2, then we still select the task > to migrate > However after the migrate the imbalance will still be load/2. > - Can this lead to ping/pong?
In some case you're probably right but this might also help to move other tasks when 3/2 will not. 1st example with 3 tasks with almost same load on 2 cpus, we will probably end up migrating the waiting task with a load / 2 ~< env->imbalance whereas 2 * load > 3 * env->imbalance will not move any task. Note that we don't ensure fairness between the 3 tasks in the latter case. TBH I don't know what is better 2nd example with 2 tasks TA and TB with a load around L and 4 tasks T0 T1 T2 T3 with a load around L/4 TA and TB are on CPU0 and T0 to T3 are on CPU1, we have the same imbalance as previous example 2L on CPU0 and L on CPU1. With load / 2 > env->imbalance, we will migrate TA or TB to CPU1 and then, we might end up migrating 2 tasks among T0 to T3 and the system will be balanced With 2 * load > 3 * env->imbalance, we will not migrate TA or TB because task load is higher the the imbalance which is L/2 and the scheduler will never get a chance to balance the system whereas it could have reached a balanced state Just to say that I'm not sure that there one best ratio to use > - Did we lose out of cache though we didn't gain from an imbalance. > > > > > > > > > > > > > - if (sgs->sum_h_nr_running) > > > > - sgs->load_per_task = sgs->group_load / > > > > sgs->sum_h_nr_running; > > > > + sgs->group_capacity = group->sgc->capacity; > > > > > > > > sgs->group_weight = group->group_weight; > > > > > > > > - sgs->group_no_capacity = group_is_overloaded(env, sgs); > > > > - sgs->group_type = group_classify(group, sgs); > > > > + sgs->group_type = group_classify(env, group, sgs); > > > > + > > > > + /* Computing avg_load makes sense only when group is overloaded */ > > > > + if (sgs->group_type != group_overloaded) > > > > + sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / > > > > + sgs->group_capacity; > > > > > > Mismatch in comment and code? > > > > I may need to add more comments but at this step, the group should be > > either overloaded or fully busy but it can also be imbalanced. > > In case of a group fully busy or imbalanced (sgs->group_type != > > group_overloaded), we haven't computed avg_load yet so we have to do > > so because: > > -In the case of fully_busy, we are going to be overloaded which the > > next step after fully busy when you are about to pull more load > > -In case of imbalance, we don't know the real state of the local group > > so we fall back to this default behavior > > > > We seem to be checking for avg_load when the group_type is group_overloaded. > But somehow I am don't see where sgs->avg_load is calculated for > group_overloaded case. My fault, I read to quickly your comment and thought that you were referring to calculte_imbalance() but your comment is about update_sg_lb_stats(). As Peter mentioned previously this should be if (sgs->group_type == group_overloaded) instead of if (sgs->group_type != group_overloaded) > > > > > > > We calculated avg_load for !group_overloaded case, but seem to be using > > > for > > > group_overloaded cases too. > > > > for group_overloaded case, we already computed it in update_sg_lb_stats() > > > > > From update_sg_lb_stats() > > if (sgs->group_type != group_overloaded) > sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / > sgs->group_capacity; > > So we seem to be skipping calculation of avg_load for group_overloaded. No? yes. Thanks Vincent > > > -- > Thanks and Regards > Srikar Dronamraju >

