* Peter Zijlstra <pet...@infradead.org> [2018-07-23 12:38:30]:

> On Wed, Jun 20, 2018 at 10:32:52PM +0530, Srikar Dronamraju wrote:
> > Since task migration under numa balancing can happen in parallel, more
> > than one task might choose to move to the same node at the same time.
> > This can cause load imbalances at the node level.
> > 
> > The problem is more likely if there are more cores per node or more
> > nodes in system.
> > 
> > Use a per-node variable to indicate if task migration
> > to the node under numa balance is currently active.
> > This per-node variable will not track swapping of tasks.
> 
> 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 50c7727..87fb20e 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -1478,11 +1478,22 @@ struct task_numa_env {
> >  static void task_numa_assign(struct task_numa_env *env,
> >                          struct task_struct *p, long imp)
> >  {
> > +   pg_data_t *pgdat = NODE_DATA(cpu_to_node(env->dst_cpu));
> >     struct rq *rq = cpu_rq(env->dst_cpu);
> >  
> >     if (xchg(&rq->numa_migrate_on, 1))
> >             return;
> >  
> > +   if (!env->best_task && env->best_cpu != -1)
> > +           WRITE_ONCE(pgdat->active_node_migrate, 0);
> > +
> > +   if (!p) {
> > +           if (xchg(&pgdat->active_node_migrate, 1)) {
> > +                   WRITE_ONCE(rq->numa_migrate_on, 0);
> > +                   return;
> > +           }
> > +   }
> > +
> >     if (env->best_cpu != -1) {
> >             rq = cpu_rq(env->best_cpu);
> >             WRITE_ONCE(rq->numa_migrate_on, 0);
> 
> 
> Urgh, that's prertty magical code. And it doesn't even have a comment.
> 
> For isntance, I cannot tell why we clear that active_node_migrate thing
> right there.
> 

active_node_migrate doesn't track swaps, it only tracks task movement to
a node. Here a task finds a first cpu which is idle.  So it would have
set pgdat->active_node_migrate. Here env->best_task is NULL but
env->best_cpu is set.

Next the task might find another cpu where it finds swap to be
beneficial than a move. i.e there is a pair of tasks to be swapped. Now
we have to reset pgdat->active_node_migrate. The test for best_task and
best_cpu will tell us if we had set active_node_migrate.

-- 
Thanks and Regards
Srikar Dronamraju

Reply via email to