On Fri, 16 Feb 2007 18:08:42 -0800
"Siddha, Suresh B" <[EMAIL PROTECTED]> wrote:

> Changes since v1:
> 
>   - Move the idle load balancer selection from schedule()
>     to the first busy scheduler_tick() after restarting the tick.
>     This will avoid the unnecessay ownership changes when 
>     softirq's(which are run in softirqd context in certain -rt
>     configurations) like timer, sched are invoked for every idle tick
>     that happens.
> 
>   - Misc cleanups.
> 
> ---
> Fix the process idle load balancing in the presence of dynticks.
> cpus for which ticks are stopped will sleep till the next event wakes it up.
> Potentially these sleeps can be for large durations and during which today,
> there is no periodic idle load balancing being done.
> 
> This patch nominates an owner among the idle cpus, which does the idle load
> balancing on behalf of the other idle cpus. And once all the cpus are
> completely idle, then we can stop this idle load balancing too. Checks added
> in fast path are minimized.  Whenever there are busy cpus in the system, there
> will be an owner(idle cpu) doing the system wide idle load balancing.
> 
> Open items:
> 1. Intelligent owner selection (like an idle core in a busy package).
> 2. Merge with rcu's nohz_cpu_mask?
> 

I spose I'll hold my nose and merge this, but it creates too much of a mess
to be mergeable into the CPU scheduler, IMO.

Can we please do something to reduce the ifdef density?  And if possible,
all the newly-added returns-from-the-middle-of-a-function?


> +#ifdef CONFIG_NO_HZ
> +static struct {
> +     int load_balancer;
> +     cpumask_t  cpu_mask;
> +} nohz ____cacheline_aligned = {
> +     .load_balancer = -1,
> +     .cpu_mask = CPU_MASK_NONE,
> +};
> +
> +/*
> + * This routine will try to nominate the ilb (idle load balancing)
> + * owner among the cpus whose ticks are stopped. ilb owner will do the idle
> + * load balancing on behalf of all those cpus. If all the cpus in the system
> + * go into this tickless mode, then there will be no ilb owner (as there is
> + * no need for one) and all the cpus will sleep till the next wakeup event
> + * arrives...
> + *
> + * For the ilb owner, tick is not stopped. And this tick will be used
> + * for idle load balancing. ilb owner will still be part of
> + * nohz.cpu_mask..
> + *
> + * While stopping the tick, this cpu will become the ilb owner if there
> + * is no other owner. And will be the owner till that cpu becomes busy
> + * or if all cpus in the system stop their ticks at which point
> + * there is no need for ilb owner.
> + *
> + * When the ilb owner becomes busy, it nominates another owner, during the
> + * next busy scheduler_tick()
> + */
> +int select_nohz_load_balancer(int stop_tick)
> +{
> +     int cpu = smp_processor_id();
> +
> +     if (stop_tick) {
> +             cpu_set(cpu, nohz.cpu_mask);
> +             cpu_rq(cpu)->in_nohz_recently = 1;
> +
> +             /*
> +              * If we are going offline and still the leader, give up!
> +              */
> +             if (cpu_is_offline(cpu) && nohz.load_balancer == cpu) {
> +                     if (cmpxchg(&nohz.load_balancer,  cpu, -1) != cpu)

So we require that architectures which implement CONFIG_NO_HZ also
implement cmpxchg.

> +                             BUG();
> +                     return 0;
> +             }
> +
> +             /* time for ilb owner also to sleep */
> +             if (cpus_weight(nohz.cpu_mask) == num_online_cpus()) {
> +                     if (nohz.load_balancer == cpu)
> +                             nohz.load_balancer = -1;
> +                     return 0;
> +             }
> +
> +             if (nohz.load_balancer == -1) {
> +                     /* make me the ilb owner */
> +                     if (cmpxchg(&nohz.load_balancer, -1, cpu) == -1)
> +                             return 1;
> +             } else if (nohz.load_balancer == cpu)
> +                     return 1;
> +     } else {
> +             if (!cpu_isset(cpu, nohz.cpu_mask))
> +                     return 0;
> +
> +             cpu_clear(cpu, nohz.cpu_mask);
> +
> +             if (nohz.load_balancer == cpu)
> +                     if (cmpxchg(&nohz.load_balancer,  cpu, -1) != cpu)
> +                             BUG();
> +     }
> +     return 0;
> +}
> +#endif
> +
>  /*
>   * run_rebalance_domains is triggered when needed from the scheduler tick.
>   *
> @@ -3347,15 +3437,46 @@ static DEFINE_SPINLOCK(balancing);
>  
>  static void run_rebalance_domains(struct softirq_action *h)
>  {
> -     int this_cpu = smp_processor_id(), balance = 1;
> -     struct rq *this_rq = cpu_rq(this_cpu);
> -     unsigned long interval;
> +     int balance_cpu = smp_processor_id(), balance;
> +     struct rq *balance_rq = cpu_rq(balance_cpu);
> +     unsigned long interval, next_balance;

One definition per line is preferred.

>       struct sched_domain *sd;
> -     enum idle_type idle = this_rq->idle_at_tick ? SCHED_IDLE : NOT_IDLE;
> +     enum idle_type idle;
> +
> +#ifdef CONFIG_NO_HZ
> +     cpumask_t cpus = nohz.cpu_mask;
> +     int local_cpu = balance_cpu;
> +     struct rq *local_rq = balance_rq;
> +
> +restart:
> +
> +     if (local_rq->idle_at_tick && nohz.load_balancer == local_cpu) {
> +             if (need_resched())
> +                     return;
> +
> +             balance_cpu = first_cpu(cpus);
> +             if (unlikely(balance_cpu >= NR_CPUS))
> +                     return;
> +             balance_rq = cpu_rq(balance_cpu);
> +             cpu_clear(balance_cpu, cpus);
> +     }
> +
> +     /*
> +      * We must be doing idle load balancing for some other idle cpu
> +      */
> +     if (local_cpu != balance_cpu)
> +             idle = SCHED_IDLE;
> +     else
> +#endif
> +             idle = balance_rq->idle_at_tick ? SCHED_IDLE : NOT_IDLE;
> +

It's not just that this is an eyesore (it is) - it introduces some ongoing
maintenance problems.  I expect this can be pulled out into a separate
function with no loss in quality of generated code.

And that function would be a nice site for a nice comment explaining the
design.  Why do we return if need_resched()?  What is the significance of
(balance_cpu >= NR_CPUS)?


> +
>       /* Earliest time when we have to call run_rebalance_domains again */
> -     unsigned long next_balance = jiffies + 60*HZ;
> +     next_balance = jiffies + 60*HZ;
> +
> +     for_each_domain(balance_cpu, sd) {
>  
> -     for_each_domain(this_cpu, sd) {
>               if (!(sd->flags & SD_LOAD_BALANCE))
>                       continue;
>  
> @@ -3374,7 +3495,7 @@ static void run_rebalance_domains(struct
>               }
>  
>               if (time_after_eq(jiffies, sd->last_balance + interval)) {
> -                     if (load_balance(this_cpu, this_rq, sd, idle, 
> &balance)) {
> +                     if (load_balance(balance_cpu, balance_rq, sd, idle, 
> &balance)) {
>                               /*
>                                * We've pulled tasks over so either we're no
>                                * longer idle, or one of our SMT siblings is
> @@ -3398,7 +3519,16 @@ out:
>               if (!balance)
>                       break;
>       }
> -     this_rq->next_balance = next_balance;
> +     balance_rq->next_balance = next_balance;
> +
> +#ifdef CONFIG_NO_HZ
> +     if (local_rq->idle_at_tick && nohz.load_balancer == local_cpu) {
> +             if (time_after(next_balance, local_rq->next_balance))
> +                     local_rq->next_balance = next_balance;
> +             if (!cpus_empty(cpus))
> +                     goto restart;
> +     }
> +#endif
>  }

A goto in an ifdef :(

Perhaps it can be removed by teaching the caller to call this function again?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to