On Sat, May 18, 2019 at 11:37:56PM +0800 Aubrey Li wrote: > On Wed, Apr 24, 2019 at 12:18 AM Vineeth Remanan Pillai > <vpil...@digitalocean.com> wrote: > > > > From: Peter Zijlstra (Intel) <pet...@infradead.org> > > > > Instead of only selecting a local task, select a task for all SMT > > siblings for every reschedule on the core (irrespective which logical > > CPU does the reschedule). > > > > NOTE: there is still potential for siblings rivalry. > > NOTE: this is far too complicated; but thus far I've failed to > > simplify it further. > > > > Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> > > --- > > kernel/sched/core.c | 222 ++++++++++++++++++++++++++++++++++++++++++- > > kernel/sched/sched.h | 5 +- > > 2 files changed, 224 insertions(+), 3 deletions(-) > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index e5bdc1c4d8d7..9e6e90c6f9b9 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -3574,7 +3574,7 @@ static inline void schedule_debug(struct task_struct > > *prev) > > * Pick up the highest-prio task: > > */ > > static inline struct task_struct * > > -pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags > > *rf) > > +__pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags > > *rf) > > { > > const struct sched_class *class; > > struct task_struct *p; > > @@ -3619,6 +3619,220 @@ pick_next_task(struct rq *rq, struct task_struct > > *prev, struct rq_flags *rf) > > BUG(); > > } > > > > +#ifdef CONFIG_SCHED_CORE > > + > > +static inline bool cookie_match(struct task_struct *a, struct task_struct > > *b) > > +{ > > + if (is_idle_task(a) || is_idle_task(b)) > > + return true; > > + > > + return a->core_cookie == b->core_cookie; > > +} > > + > > +// XXX fairness/fwd progress conditions > > +static struct task_struct * > > +pick_task(struct rq *rq, const struct sched_class *class, struct > > task_struct *max) > > +{ > > + struct task_struct *class_pick, *cookie_pick; > > + unsigned long cookie = 0UL; > > + > > + /* > > + * We must not rely on rq->core->core_cookie here, because we fail > > to reset > > + * rq->core->core_cookie on new picks, such that we can detect if > > we need > > + * to do single vs multi rq task selection. > > + */ > > + > > + if (max && max->core_cookie) { > > + WARN_ON_ONCE(rq->core->core_cookie != max->core_cookie); > > + cookie = max->core_cookie; > > + } > > + > > + class_pick = class->pick_task(rq); > > + if (!cookie) > > + return class_pick; > > + > > + cookie_pick = sched_core_find(rq, cookie); > > + if (!class_pick) > > + return cookie_pick; > > + > > + /* > > + * If class > max && class > cookie, it is the highest priority > > task on > > + * the core (so far) and it must be selected, otherwise we must go > > with > > + * the cookie pick in order to satisfy the constraint. > > + */ > > + if (cpu_prio_less(cookie_pick, class_pick) && core_prio_less(max, > > class_pick)) > > + return class_pick; > > + > > + return cookie_pick; > > +} > > + > > +static struct task_struct * > > +pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags > > *rf) > > +{ > > + struct task_struct *next, *max = NULL; > > + const struct sched_class *class; > > + const struct cpumask *smt_mask; > > + int i, j, cpu; > > + > > + if (!sched_core_enabled(rq)) > > + return __pick_next_task(rq, prev, rf); > > + > > + /* > > + * If there were no {en,de}queues since we picked (IOW, the task > > + * pointers are all still valid), and we haven't scheduled the last > > + * pick yet, do so now. > > + */ > > + if (rq->core->core_pick_seq == rq->core->core_task_seq && > > + rq->core->core_pick_seq != rq->core_sched_seq) { > > + WRITE_ONCE(rq->core_sched_seq, rq->core->core_pick_seq); > > + > > + next = rq->core_pick; > > + if (next != prev) { > > + put_prev_task(rq, prev); > > + set_next_task(rq, next); > > + } > > + return next; > > + } > > + > > The following patch improved my test cases. > Welcome any comments. >
This is certainly better than violating the point of the core scheduler :) If I'm understanding this right what will happen in this case is instead of using the idle process selected by the sibling we do the core scheduling again. This may start with a newidle_balance which might bring over something to run that matches what we want to put on the sibling. If that works then I can see this helping. But I'd be a little concerned that we could end up thrashing. Once we do core scheduling again here we'd force the sibling to resched and if we got a different result which "helped" him pick idle we'd go around again. I think inherent in the concept of core scheduling (barring a perfectly aligned set of jobs) is some extra idle time on siblings. Cheers, Phil > Thanks, > -Aubrey > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 3e3162f..86031f4 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3685,10 +3685,12 @@ pick_next_task(struct rq *rq, struct > task_struct *prev, struct rq_flags *rf) > /* > * If there were no {en,de}queues since we picked (IOW, the task > * pointers are all still valid), and we haven't scheduled the last > - * pick yet, do so now. > + * pick yet, do so now. If the last pick is idle task, we abandon > + * last pick and try to pick up task this time. > */ > if (rq->core->core_pick_seq == rq->core->core_task_seq && > - rq->core->core_pick_seq != rq->core_sched_seq) { > + rq->core->core_pick_seq != rq->core_sched_seq && > + !is_idle_task(rq->core_pick)) { > WRITE_ONCE(rq->core_sched_seq, rq->core->core_pick_seq); > > next = rq->core_pick; --