On Thu, Nov 26, 2020 at 11:17:48AM +0100, Vincent Guittot wrote:

> > Something like so then?
> 
> yes. it seems ok
> 
> >
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c

> > @@ -6982,20 +6982,29 @@ static void check_preempt_wakeup(struct
> >  #ifdef CONFIG_SMP
> >  static struct task_struct *pick_task_fair(struct rq *rq)
> >  {
> >         struct sched_entity *se;
> > +       struct cfs_rq *cfs_rq;
> > +
> > +again:
> > +       cfs_rq = &rq->cfs;
> >         if (!cfs_rq->nr_running)
> >                 return NULL;
> >
> >         do {
> >                 struct sched_entity *curr = cfs_rq->curr;
> >
> > +               /* When we pick for a remote RQ, we'll not have done 
> > put_prev_entity() */
> > +               if (curr) {
> > +                       if (curr->on_rq)
> > +                               update_curr(cfs_rq);
> > +                       else
> > +                               curr = NULL;
> >
> > +                       if (unlikely(check_cfs_rq_runtime(cfs_rq)))
> > +                               goto again;

Head-ache though; pick_task() was supposed to be stateless, but now
we're modifying a remote runqueue... I suppose it still works, because
irrespective of which task we end up picking (even idle), we'll schedule
the remote CPU, which would've resulted in the same (and possibly
triggered a reschedule if we'd not done it here).

There's a wrinkle through, other than in schedule(), where we dequeue()
and keep running with the current task while we release rq->lock, this
has preemption enabled as well.

This means that if we do this, the remote CPU could preempt, but the
task is then no longer on the runqueue.

I _think_ it all still works, but yuck!

> > +               }
> >
> > +               se = pick_next_entity(cfs_rq, curr);
> >                 cfs_rq = group_cfs_rq(se);
> >         } while (cfs_rq);
> >

Reply via email to