On Tue, Sep 18, 2007 at 11:03:59PM -0700, Tong Li wrote:
> This patch attempts to improve CFS's SMP global fairness based on the new 
> virtual time design.
> 
> Removed vruntime adjustment in set_task_cpu() as it skews global fairness.
> 
> Modified small_imbalance logic in find_busiest_group(). If there's small 
> imbalance, move tasks from busiest to local sched_group only if the local 
> group contains a CPU whose min_vruntime is the maximum among all CPUs in 
> the same sched_domain. This prevents any CPU from advancing too far ahead 
> in virtual time and avoids tasks thrashing between two CPUs without 
> utilizing other CPUs in the system. For example, for 10 tasks on 8 CPUs, 
> since the load is not evenly divisible by the number of CPUs, we want the 
> extra load to have a fair use of every CPU in the system.
> 
> Tested with a microbenchmark running 10 nice-0 tasks on 8 CPUs. Each task 

Just as an experiment, can you run 82 tasks on 8 CPUs. Current
imbalance_pct logic will not detect and fix the global fairness issue
even with this patch.

>       if (*imbalance < busiest_load_per_task) {
> -             unsigned long tmp, pwr_now, pwr_move;
> -             unsigned int imbn;
> -
>  small_imbalance:
> -             pwr_move = pwr_now = 0;
> -             imbn = 2;
> -             if (this_nr_running) {
> -                     this_load_per_task /= this_nr_running;
> -                     if (busiest_load_per_task > this_load_per_task)
> -                             imbn = 1;
> -             } else
> -                     this_load_per_task = SCHED_LOAD_SCALE;
> -
> -             if (max_load - this_load + SCHED_LOAD_SCALE_FUZZ >=
> -                                     busiest_load_per_task * imbn) {
> -                     *imbalance = busiest_load_per_task;
> -                     return busiest;
> -             }

This patch removes quite a few lines and all this is logic is not for
fairness :( Especially the above portion handles some of the HT/MC
optimizations.

> -
> -             /*
> -              * OK, we don't have enough imbalance to justify moving 
> tasks,
> -              * however we may be able to increase total CPU power used by
> -              * moving them.
> +             /* 
> +              * When there's small imbalance, move tasks only if this
> +              * sched_group contains a CPU whose min_vruntime is the 
> +              * maximum among all CPUs in the same domain.
>                */
> -
> -             pwr_now += busiest->__cpu_power *
> -                             min(busiest_load_per_task, max_load);
> -             pwr_now += this->__cpu_power *
> -                             min(this_load_per_task, this_load);
> -             pwr_now /= SCHED_LOAD_SCALE;
> -
> -             /* Amount of load we'd subtract */
> -             tmp = sg_div_cpu_power(busiest,
> -                             busiest_load_per_task * SCHED_LOAD_SCALE);
> -             if (max_load > tmp)
> -                     pwr_move += busiest->__cpu_power *
> -                             min(busiest_load_per_task, max_load - tmp);
> -
> -             /* Amount of load we'd add */
> -             if (max_load * busiest->__cpu_power <
> -                             busiest_load_per_task * SCHED_LOAD_SCALE)
> -                     tmp = sg_div_cpu_power(this,
> -                                     max_load * busiest->__cpu_power);
> -             else
> -                     tmp = sg_div_cpu_power(this,
> -                             busiest_load_per_task * SCHED_LOAD_SCALE);
> -             pwr_move += this->__cpu_power *
> -                             min(this_load_per_task, this_load + tmp);
> -             pwr_move /= SCHED_LOAD_SCALE;
> -
> -             /* Move if we gain throughput */
> -             if (pwr_move > pwr_now)
> +             if (max_vruntime_group == this)
>                       *imbalance = busiest_load_per_task;
> +             else
> +                     *imbalance = 0;

Not sure how this all interacts when some of the cpu's are idle. I have to
look more closely.

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to