On Wed, 4 Feb 2015 09:12:20 +0800 Xunlei Pang <xlp...@126.com> wrote:
> From: Xunlei Pang <pang.xun...@linaro.org> > > We may suffer from extra rt overload rq due to the affinity, > so when the affinity of any runnable rt task is changed, we > should check to trigger balancing, otherwise it will cause > some unnecessary delayed real-time response. Unfortunately, > current RT global scheduler doesn't trigger anything. > > For example: a 2-cpu system with two runnable FIFO tasks(same > rt_priority) bound on CPU0, let's name them rt1(running) and > rt2(runnable) respectively; CPU1 has no RTs. Then, someone sets > the affinity of rt2 to 0x3(i.e. CPU0 and CPU1), but after this, > rt2 still can't be scheduled until rt1 enters schedule(), this > definitely causes some/big response latency for rt2. > I understand the issue you point out, but I have to be honest and say that I really do not like this approach. > So, when doing set_cpus_allowed_rt(), if detecting such cases, > check to trigger a push behaviour. > > Signed-off-by: Xunlei Pang <pang.xun...@linaro.org> > --- > kernel/sched/rt.c | 69 > +++++++++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 59 insertions(+), 10 deletions(-) > > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c > index f4d4b07..4dacb6e 100644 > --- a/kernel/sched/rt.c > +++ b/kernel/sched/rt.c > @@ -1428,7 +1428,7 @@ static struct sched_rt_entity > *pick_next_rt_entity(struct rq *rq, > return next; > } > > -static struct task_struct *_pick_next_task_rt(struct rq *rq) > +static struct task_struct *_pick_next_task_rt(struct rq *rq, int peek_only) > { peek_only should be bool, but don't worry about it, I think this isn't needed. > struct sched_rt_entity *rt_se; > struct task_struct *p; > @@ -1441,7 +1441,8 @@ static struct task_struct *_pick_next_task_rt(struct rq > *rq) > } while (rt_rq); > > p = rt_task_of(rt_se); > - p->se.exec_start = rq_clock_task(rq); > + if (!peek_only) > + p->se.exec_start = rq_clock_task(rq); > > return p; > } > @@ -1476,7 +1477,7 @@ pick_next_task_rt(struct rq *rq, struct task_struct > *prev) > > put_prev_task(rq, prev); > > - p = _pick_next_task_rt(rq); > + p = _pick_next_task_rt(rq, 0); > > /* The running task is never eligible for pushing */ > dequeue_pushable_task(rq, p); > @@ -1886,28 +1887,69 @@ static void set_cpus_allowed_rt(struct task_struct *p, > const struct cpumask *new_mask) > { > struct rq *rq; > - int weight; > + int old_weight, new_weight; > + int preempt_push = 0, direct_push = 0; > > BUG_ON(!rt_task(p)); > > if (!task_on_rq_queued(p)) > return; > > - weight = cpumask_weight(new_mask); > + old_weight = p->nr_cpus_allowed; > + new_weight = cpumask_weight(new_mask); > + > + rq = task_rq(p); > + > + if (new_weight > 1 && > + rt_task(rq->curr) && > + !test_tsk_need_resched(rq->curr)) { > + /* > + * Set new mask information to prepare pushing. > + * It's safe to do this here. Please explain why it is safe. > + */ > + cpumask_copy(&p->cpus_allowed, new_mask); > + p->nr_cpus_allowed = new_weight; > + > + if (task_running(rq, p) && > + cpumask_test_cpu(task_cpu(p), new_mask) && > + cpupri_find(&rq->rd->cpupri, p, NULL)) { Hmm, You called cpupri_find() which should also return a mask of the CPUs with the lowest priorities. I wonder if we could have utilize this information instead of doing it twice? Of course things could change by the time the task migrates. > + /* > + * At this point, current task gets migratable most > + * likely due to the change of its affinity, let's > + * figure out if we can migrate it. > + * > + * Is there any task with the same priority as that > + * of current task? If found one, we should resched. > + * NOTE: The target may be unpushable. > + */ > + if (p->prio == rq->rt.highest_prio.next) { > + /* One target just in pushable_tasks list. */ > + requeue_task_rt(rq, p, 0); > + preempt_push = 1; > + } else if (rq->rt.rt_nr_total > 1) { > + struct task_struct *next; > + > + requeue_task_rt(rq, p, 0); > + /* peek only */ > + next = _pick_next_task_rt(rq, 1); > + if (next != p && next->prio == p->prio) > + preempt_push = 1; > + } I'm thinking it would be better just to send an IPI to the CPU that figures this out and pushes a task off of itself. > + } else if (!task_running(rq, p)) > + direct_push = 1; > + } > > /* > * Only update if the process changes its state from whether it > * can migrate or not. > */ > - if ((p->nr_cpus_allowed > 1) == (weight > 1)) > - return; > - > - rq = task_rq(p); > + if ((old_weight > 1) == (new_weight > 1)) > + goto out; > > /* > * The process used to be able to migrate OR it can now migrate > */ > - if (weight <= 1) { > + if (new_weight <= 1) { > if (!task_current(rq, p)) > dequeue_pushable_task(rq, p); > BUG_ON(!rq->rt.rt_nr_migratory); > @@ -1919,6 +1961,13 @@ static void set_cpus_allowed_rt(struct task_struct *p, > } > > update_rt_migration(&rq->rt); > + > +out: > + if (direct_push) > + push_rt_tasks(rq); > + > + if (preempt_push) > + resched_curr(rq); I don't know. This just doesn't seem clean. -- Steve > } > > /* Assumes rq->lock is held */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/