On 03/05/21 15:41, Valentin Schneider wrote:
> On 05/03/21 15:56, Peter Zijlstra wrote:
> > On Sat, Dec 26, 2020 at 01:54:45PM +0000, Qais Yousef wrote:
> >>
> >> > +static inline struct task_struct *get_push_task(struct rq *rq)
> >> > +{
> >> > +        struct task_struct *p = rq->curr;
> >>
> >> Shouldn't we verify the class of the task here? The RT task in migration
> >> disabled could have been preempted by a dl or stopper task. Similarly, the 
> >> dl
> >> task could have been preempted by a stopper task.
> >>
> >> I don't think an RT task should be allowed to push a dl task under any
> >> circumstances?
> >
> > Hmm, quite. Fancy doing a patch?
> 
> Last time we talked about this, I looked into
> 
>   push_rt_task() + find_lowest_rq()
> 
> IIRC, with how
> 
>   find_lowest_rq() + cpupri_find_fitness()
> 
> currently work, find_lowest_rq() should return -1 in push_rt_task() if
> rq->curr is DL (CPUPRI_INVALID). IOW, Migration-Disabled RT tasks shouldn't
> actually interfere with DL tasks (unless a DL task gets scheduled after we
> drop the rq lock and kick the stopper, but we have that problem everywhere
> including CFS active balance).

This makes it less of a problem true, but AFAICT this can still happen in the
pull path.

Anyways, here's the patch as extra bolts and braces to be considered.

Thanks

--
Qais Yousef

--->8----

>From 2df733d381f636cc185944c7eda86c824a9a785e Mon Sep 17 00:00:00 2001
From: Qais Yousef <[email protected]>
Date: Tue, 12 Jan 2021 11:54:16 +0000
Subject: [PATCH] sched: Don't push a higher priority class in get_push_task()

Commit a7c81556ec4d ("sched: Fix migrate_disable() vs rt/dl balancing")
will attempt to push/pull a higher priority task if the candidate task
is in migrate_disable() section. This is an attempt to prevent
starvation of these lower priority task that, in theory at least, could
end up in a situation where they're forever in migrate disable section
with no CPU time to run.

One issue with that is get_push_task() assumes rq->curr is of the same
sched_class, which AFAICT is not guaranteed to be true.

This patch adds extra bolts and braces to ensure that this voluntary
push operation is performed on a task of the same scheduling class only.

Otherwise an RT task could end up causing a DL task to be pushed away.
Which breaks the strict priority between sched classes.

We could also end up trying to push the migration task. Which I think is
harmless and is nothing but a wasted effort.

Fixes: a7c81556ec4d ("sched: Fix migrate_disable() vs rt/dl balancing")
Signed-off-by: Qais Yousef <[email protected]>
---
 kernel/sched/deadline.c |  2 +-
 kernel/sched/rt.c       |  4 ++--
 kernel/sched/sched.h    | 17 ++++++++++++++++-
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index aac3539aa0fe..afadc7e1f968 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2276,7 +2276,7 @@ static void pull_dl_task(struct rq *this_rq)
                                goto skip;
 
                        if (is_migration_disabled(p)) {
-                               push_task = get_push_task(src_rq);
+                               push_task = get_push_task(src_rq, 
SCHED_DEADLINE);
                        } else {
                                deactivate_task(src_rq, p, 0);
                                set_task_cpu(p, this_cpu);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 8f720b71d13d..c2c5c08e3030 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1892,7 +1892,7 @@ static int push_rt_task(struct rq *rq, bool pull)
                 * to this other CPU, instead attempt to push the current
                 * running task on this CPU away.
                 */
-               push_task = get_push_task(rq);
+               push_task = get_push_task(rq, SCHED_FIFO);
                if (push_task) {
                        raw_spin_unlock(&rq->lock);
                        stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
@@ -2225,7 +2225,7 @@ static void pull_rt_task(struct rq *this_rq)
                                goto skip;
 
                        if (is_migration_disabled(p)) {
-                               push_task = get_push_task(src_rq);
+                               push_task = get_push_task(src_rq, SCHED_FIFO);
                        } else {
                                deactivate_task(src_rq, p, 0);
                                set_task_cpu(p, this_cpu);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 10a1522b1e30..4e156f008d22 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1954,12 +1954,27 @@ extern void trigger_load_balance(struct rq *rq);
 
 extern void set_cpus_allowed_common(struct task_struct *p, const struct 
cpumask *new_mask, u32 flags);
 
-static inline struct task_struct *get_push_task(struct rq *rq)
+static inline struct task_struct *get_push_task(struct rq *rq, int policy)
 {
        struct task_struct *p = rq->curr;
 
        lockdep_assert_held(&rq->lock);
 
+       switch(policy) {
+       case SCHED_FIFO:
+       case SCHED_RR:
+               if (!rt_task(p))
+                       return NULL;
+               break;
+       case SCHED_DEADLINE:
+               if (!dl_task(p))
+                       return NULL;
+               break;
+       default:
+               WARN_ON_ONCE(1);
+               return NULL;
+       }
+
        if (rq->push_busy)
                return NULL;
 
-- 
2.25.1

Reply via email to