On Wed,  4 Feb 2015 09:12:20 +0800
Xunlei Pang <xlp...@126.com> wrote:

> From: Xunlei Pang <pang.xun...@linaro.org>
> 
> We may suffer from extra rt overload rq due to the affinity,
> so when the affinity of any runnable rt task is changed, we
> should check to trigger balancing, otherwise it will cause
> some unnecessary delayed real-time response. Unfortunately,
> current RT global scheduler doesn't trigger anything.
> 
> For example: a 2-cpu system with two runnable FIFO tasks(same
> rt_priority) bound on CPU0, let's name them rt1(running) and
> rt2(runnable) respectively; CPU1 has no RTs. Then, someone sets
> the affinity of rt2 to 0x3(i.e. CPU0 and CPU1), but after this,
> rt2 still can't be scheduled until rt1 enters schedule(), this
> definitely causes some/big response latency for rt2.
> 

I understand the issue you point out, but I have to be honest and say
that I really do not like this approach.

> So, when doing set_cpus_allowed_rt(), if detecting such cases,
> check to trigger a push behaviour.
> 
> Signed-off-by: Xunlei Pang <pang.xun...@linaro.org>
> ---
>  kernel/sched/rt.c | 69 
> +++++++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 59 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index f4d4b07..4dacb6e 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1428,7 +1428,7 @@ static struct sched_rt_entity 
> *pick_next_rt_entity(struct rq *rq,
>       return next;
>  }
>  
> -static struct task_struct *_pick_next_task_rt(struct rq *rq)
> +static struct task_struct *_pick_next_task_rt(struct rq *rq, int peek_only)
>  {

peek_only should be bool, but don't worry about it, I think this isn't
needed.

>       struct sched_rt_entity *rt_se;
>       struct task_struct *p;
> @@ -1441,7 +1441,8 @@ static struct task_struct *_pick_next_task_rt(struct rq 
> *rq)
>       } while (rt_rq);
>  
>       p = rt_task_of(rt_se);
> -     p->se.exec_start = rq_clock_task(rq);
> +     if (!peek_only)
> +             p->se.exec_start = rq_clock_task(rq);
>  
>       return p;
>  }
> @@ -1476,7 +1477,7 @@ pick_next_task_rt(struct rq *rq, struct task_struct 
> *prev)
>  
>       put_prev_task(rq, prev);
>  
> -     p = _pick_next_task_rt(rq);
> +     p = _pick_next_task_rt(rq, 0);
>  
>       /* The running task is never eligible for pushing */
>       dequeue_pushable_task(rq, p);
> @@ -1886,28 +1887,69 @@ static void set_cpus_allowed_rt(struct task_struct *p,
>                               const struct cpumask *new_mask)
>  {
>       struct rq *rq;
> -     int weight;
> +     int old_weight, new_weight;
> +     int preempt_push = 0, direct_push = 0;
>  
>       BUG_ON(!rt_task(p));
>  
>       if (!task_on_rq_queued(p))
>               return;
>  
> -     weight = cpumask_weight(new_mask);
> +     old_weight = p->nr_cpus_allowed;
> +     new_weight = cpumask_weight(new_mask);
> +
> +     rq = task_rq(p);
> +
> +     if (new_weight > 1 &&
> +         rt_task(rq->curr) &&
> +         !test_tsk_need_resched(rq->curr)) {
> +             /*
> +              * Set new mask information to prepare pushing.
> +              * It's safe to do this here.

Please explain why it is safe.

> +              */
> +             cpumask_copy(&p->cpus_allowed, new_mask);
> +             p->nr_cpus_allowed = new_weight;
> +
> +             if (task_running(rq, p) &&
> +                 cpumask_test_cpu(task_cpu(p), new_mask) &&
> +                 cpupri_find(&rq->rd->cpupri, p, NULL)) {

Hmm, You called cpupri_find() which should also return a mask of the
CPUs with the lowest priorities. I wonder if we could have utilize this
information instead of doing it twice? Of course things could change by
the time the task migrates.

> +                     /*
> +                      * At this point, current task gets migratable most
> +                      * likely due to the change of its affinity, let's
> +                      * figure out if we can migrate it.
> +                      *
> +                      * Is there any task with the same priority as that
> +                      * of current task? If found one, we should resched.
> +                      * NOTE: The target may be unpushable.
> +                      */
> +                     if (p->prio == rq->rt.highest_prio.next) {
> +                             /* One target just in pushable_tasks list. */
> +                             requeue_task_rt(rq, p, 0);
> +                             preempt_push = 1;
> +                     } else if (rq->rt.rt_nr_total > 1) {
> +                             struct task_struct *next;
> +
> +                             requeue_task_rt(rq, p, 0);
> +                             /* peek only */
> +                             next = _pick_next_task_rt(rq, 1);
> +                             if (next != p && next->prio == p->prio)
> +                                     preempt_push = 1;
> +                     }

I'm thinking it would be better just to send an IPI to the CPU that
figures this out and pushes a task off of itself.

> +             } else if (!task_running(rq, p))
> +                     direct_push = 1;
> +     }
>  
>       /*
>        * Only update if the process changes its state from whether it
>        * can migrate or not.
>        */
> -     if ((p->nr_cpus_allowed > 1) == (weight > 1))
> -             return;
> -
> -     rq = task_rq(p);
> +     if ((old_weight > 1) == (new_weight > 1))
> +             goto out;
>  
>       /*
>        * The process used to be able to migrate OR it can now migrate
>        */
> -     if (weight <= 1) {
> +     if (new_weight <= 1) {
>               if (!task_current(rq, p))
>                       dequeue_pushable_task(rq, p);
>               BUG_ON(!rq->rt.rt_nr_migratory);
> @@ -1919,6 +1961,13 @@ static void set_cpus_allowed_rt(struct task_struct *p,
>       }
>  
>       update_rt_migration(&rq->rt);
> +
> +out:
> +     if (direct_push)
> +             push_rt_tasks(rq);
> +
> +     if (preempt_push)
> +             resched_curr(rq);

I don't know. This just doesn't seem clean.

-- Steve

>  }
>  
>  /* Assumes rq->lock is held */

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to