Re: [PATCH 2/2] sched: update runqueue clock before migrations away

bsegall Mon, 09 Dec 2013 10:14:46 -0800

Chris Redpath <chris.redp...@arm.com> writes:

> If we migrate a sleeping task away from a CPU which has the
> tick stopped, then both the clock_task and decay_counter will
> be out of date for that CPU and we will not decay load correctly
> regardless of how often we update the blocked load.
>
> This is only an issue for tasks which are not on a runqueue
> (because otherwise that CPU would be awake) and simultaneously
> the CPU the task previously ran on has had the tick stopped.
>
> Signed-off-by: Chris Redpath <chris.redp...@arm.com>


This looks like it is basically correct, but it seems unfortunate to
take any rq lock for these ttwus. I don't know enough about the nohz
machinery to know if that's at all avoidable.


> ---
>  kernel/sched/fair.c |   30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b7e5945..0af1dc2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4324,6 +4324,7 @@ unlock:
>       return new_cpu;
>  }
>  
> +static int nohz_test_cpu(int cpu);
>  /*
>   * Called immediately before a task is migrated to a new cpu; task_cpu(p) and
>   * cfs_rq_of(p) references at time of call are still valid and identify the
> @@ -4343,6 +4344,25 @@ migrate_task_rq_fair(struct task_struct *p, int 
> next_cpu)
>        * be negative here since on-rq tasks have decay-count == 0.
>        */
>       if (se->avg.decay_count) {
> +             /*
> +              * If we migrate a sleeping task away from a CPU
> +              * which has the tick stopped, then both the clock_task
> +              * and decay_counter will be out of date for that CPU
> +              * and we will not decay load correctly.
> +              */
> +             if (!se->on_rq && nohz_test_cpu(task_cpu(p))) {
p->on_rq - se->on_rq must be false to call set_task_cpu at all. That
said, barring bugs like the one you fixed in patch 1 I think decay_count
!= 0 should also imply !p->on_rq.

> +                     struct rq *rq = cpu_rq(task_cpu(p));
> +                     unsigned long flags;
> +                     /*
> +                      * Current CPU cannot be holding rq->lock in this
> +                      * circumstance, but another might be. We must hold
> +                      * rq->lock before we go poking around in its clocks
> +                      */
> +                     raw_spin_lock_irqsave(&rq->lock, flags);
> +                     update_rq_clock(rq);
> +                     update_cfs_rq_blocked_load(cfs_rq, 0);
> +                     raw_spin_unlock_irqrestore(&rq->lock, flags);
> +             }
>               se->avg.decay_count = -__synchronize_entity_decay(se);
>               atomic_long_add(se->avg.load_avg_contrib,
>                                               &cfs_rq->removed_load);
> @@ -6507,6 +6527,11 @@ static struct {
>       unsigned long next_balance;     /* in jiffy units */
>  } nohz ____cacheline_aligned;
>  
> +static int nohz_test_cpu(int cpu)
> +{
> +     return cpumask_test_cpu(cpu, nohz.idle_cpus_mask);
> +}
> +
>  static inline int find_new_ilb(int call_cpu)
>  {
>       int ilb = cpumask_first(nohz.idle_cpus_mask);
> @@ -6619,6 +6644,11 @@ static int sched_ilb_notifier(struct notifier_block 
> *nfb,
>               return NOTIFY_DONE;
>       }
>  }
> +#else
> +static int nohz_test_cpu(int cpu)
> +{
> +     return 0;
> +}
>  #endif
>  
>  static DEFINE_SPINLOCK(balancing);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] sched: update runqueue clock before migrations away

Reply via email to