At 2021-02-04 16:01:57, "Vincent Guittot" <vincent.guit...@linaro.org> wrote:
>On Tue, 2 Feb 2021 at 08:56, chin <ultrac...@163.com> wrote:
>>
>>
>>
>>
>> At 2021-01-13 16:30:14, "Vincent Guittot" <vincent.guit...@linaro.org> wrote:
>> >On Wed, 13 Jan 2021 at 04:14, chin <ultrac...@163.com> wrote:
>> >>
>> >>
>> >>
>> >>
>> >> At 2021-01-12 16:18:51, "Vincent Guittot" <vincent.guit...@linaro.org> 
>> >> wrote:
>> >> >On Tue, 12 Jan 2021 at 07:59, chin <ultrac...@163.com> wrote:
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> At 2021-01-11 19:04:19, "Vincent Guittot" <vincent.guit...@linaro.org> 
>> >> >> wrote:
>> >> >> >On Mon, 11 Jan 2021 at 09:27, chin <ultrac...@163.com> wrote:
>> >> >> >>
>> >> >> >>
>> >> >> >> At 2020-12-23 19:30:26, "Vincent Guittot" 
>> >> >> >> <vincent.guit...@linaro.org> wrote:
>> >> >> >> >On Wed, 23 Dec 2020 at 09:32, <ultrac...@163.com> wrote:
>> >> >> >> >>
>> >> >> >> >> From: Chen Xiaoguang <xiaoggc...@tencent.com>
>> >> >> >> >>
>> >> >> >> >> Before a CPU switches from running SCHED_NORMAL task to
>> >> >> >> >> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
>> >> >> >> >
>> >> >> >> >Could you explain more in detail why you only care about this use 
>> >> >> >> >case
>> >> >> >>
>> >> >> >> >in particular and not the general case?
>> >> >> >>
>> >> >> >>
>> >> >> >> We want to run online tasks using SCHED_NORMAL policy and offline 
>> >> >> >> tasks
>> >> >> >> using SCHED_IDLE policy. The online tasks and the offline tasks run 
>> >> >> >> in
>> >> >> >> the same computer in order to use the computer efficiently.
>> >> >> >> The online tasks are in sleep in most times but should responce 
>> >> >> >> soon once
>> >> >> >> wake up. The offline tasks are in low priority and will run only 
>> >> >> >> when no online
>> >> >> >> tasks.
>> >> >> >>
>> >> >> >> The online tasks are more important than the offline tasks and are 
>> >> >> >> latency
>> >> >> >> sensitive we should make sure the online tasks preempt the offline 
>> >> >> >> tasks
>> >> >> >> as soon as possilbe while there are online tasks waiting to run.
>> >> >> >> So in our situation we hope the SCHED_NORMAL to run if has any.
>> >> >> >>
>> >> >> >> Let's assume we have 2 CPUs,
>> >> >> >> In CPU1 we got 2 SCHED_NORMAL tasks.
>> >> >> >> in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.
>> >> >> >>
>> >> >> >>              CPU1                      CPU2
>> >> >> >>         curr       rq1            curr          rq2
>> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
>> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> >> >>
>> >> >> >>                                  NORMAL exits or blocked
>> >> >> >>       +------+ | +------+                | +----+ +----+
>> >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
>> >> >> >>       +------+ | +------+                | +----+ +----+
>> >> >> >>
>> >> >> >>                                  pick_next_task_fair
>> >> >> >>       +------+ | +------+         +----+ | +----+
>> >> >> >> t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
>> >> >> >>       +------+ | +------+         +----+ | +----+
>> >> >> >>
>> >> >> >>                                  SCHED_IDLE running
>> >> >> >> t3    +------+ | +------+        +----+  | +----+
>> >> >> >>       |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
>> >> >> >>       +------+ | +------+        +----+  | +----+
>> >> >> >>
>> >> >> >>                                  run_rebalance_domains
>> >> >> >>       +------+ |                +------+ | +----+ +----+
>> >> >> >> t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
>> >> >> >>       +------+ |                +------+ | +----+ +----+
>> >> >> >>
>> >> >> >> As we can see
>> >> >> >> t1: NORMAL task in CPU2 exits or blocked
>> >> >> >> t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
>> >> >> >> another SCHED_NORMAL in rq1 is waiting.
>> >> >> >> t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
>> >> >> >> t4: after a short time, periodic load_balance triggerd and pull
>> >> >> >> SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts 
>> >> >> >> SCHED_IDLE.
>> >> >> >>
>> >> >> >> In this scenario, SCHED_IDLE is running while SCHED_NORMAL is 
>> >> >> >> waiting to run.
>> >> >> >> The latency of this SCHED_NORMAL will be high which is not 
>> >> >> >> acceptble.
>> >> >> >>
>> >> >> >> Do a load_balance before running the SCHED_IDLE may fix this 
>> >> >> >> problem.
>> >> >> >>
>> >> >> >> This patch works as below:
>> >> >> >>
>> >> >> >>              CPU1                      CPU2
>> >> >> >>         curr       rq1            curr          rq2
>> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
>> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> >> >>
>> >> >> >>                                  NORMAL exits or blocked
>> >> >> >>       +------+ | +------+                | +----+ +----+
>> >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
>> >> >> >>       +------+ | +------+                | +----+ +----+
>> >> >> >>
>> >> >> >> t2                            pick_next_task_fair (all se are 
>> >> >> >> SCHED_IDLE)
>> >> >> >>
>> >> >> >>                                  newidle_balance
>> >> >> >>       +------+ |                 +------+ | +----+ +----+
>> >> >> >> t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
>> >> >> >>       +------+ |                 +------+ | +----+ +----+
>> >> >> >>
>> >> >> >>
>> >> >> >> t1: NORMAL task in CPU2 exits or blocked
>> >> >> >> t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and 
>> >> >> >> calls
>> >> >> >> newidle_balance who tries to pull a SCHED_NORMAL(if has).
>> >> >> >> t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
>> >> >> >> SCHED_IDLE(likely).
>> >> >> >>
>> >> >> >> >
>> >> >> >> >> CPU by doing load_balance first.
>> >> >> >> >>
>> >> >> >> >> Signed-off-by: Chen Xiaoguang <xiaoggc...@tencent.com>
>> >> >> >> >> Signed-off-by: Chen He <heddc...@tencent.com>
>> >> >> >> >> ---
>> >> >> >> >>  kernel/sched/fair.c | 5 +++++
>> >> >> >> >>  1 file changed, 5 insertions(+)
>> >> >> >> >>
>> >> >> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> >> >> >> >> index ae7ceba..0a26132 100644
>> >> >> >> >> --- a/kernel/sched/fair.c
>> >> >> >> >> +++ b/kernel/sched/fair.c
>> >> >> >> >> @@ -7004,6 +7004,11 @@ struct task_struct *
>> >> >> >> >>         struct task_struct *p;
>> >> >> >> >>         int new_tasks;
>> >> >> >> >>
>> >> >> >> >> +       if (prev &&
>> >> >> >> >> +           fair_policy(prev->policy) &&
>> >> >> >> >
>> >> >> >> >Why do you need a prev and fair task  ? You seem to target the 
>> >> >> >> >special
>> >> >> >> >case of pick_next_task  but in this case why not only testing 
>> >> >> >> >rf!=null
>> >> >> >> > to make sure to not return immediately after jumping to the idle
>> >> >> >>
>> >> >> >> >label?
>> >> >> >> We just want to do load_balance only when CPU switches from 
>> >> >> >> SCHED_NORMAL
>> >> >> >> to SCHED_IDLE.
>> >> >> >> If not check prev, when the running tasks are all SCHED_IDLE, we 
>> >> >> >> would
>> >> >> >> do newidle_balance everytime in pick_next_task_fair, it makes no 
>> >> >> >> sense
>> >> >> >> and kind of wasting.
>> >> >> >
>> >> >> >I agree that calling newidle_balance every time pick_next_task_fair is
>> >> >> >called when there are only sched_idle tasks is useless.
>> >> >> >But you also have to take into account cases where there was another
>> >> >> >class of task running on the cpu like RT one. In your example above,
>> >> >> >if you replace the normal task on CPU2 by a RT task, you still want to
>> >> >>
>> >> >> >pick the normal task on CPU1 once RT task goes to sleep.
>> >> >> Sure, this case should be taken into account,  we should also try to
>> >> >> pick normal task in this case.
>> >> >>
>> >> >> >
>> >> >> >Another point that you will have to consider the impact on
>> >> >> >rq->idle_stamp because newidle_balance is assumed to be called before
>> >> >>
>> >> >> >going idle which is not the case anymore with your use case
>> >> >> Yes. rq->idle_stamp should not be changed in this case.
>> >> >>
>> >> >>
>> >> >>
>> >> >> Actually we want to pull a SCHED_NORMAL task (if possible) to run when 
>> >> >> a cpu is
>> >> >> about to run SCHED_IDLE task. But currently newidle_balance is not
>> >> >> designed for SCHED_IDLE  so SCHED_IDLE can also be pulled which
>> >> >> is useless in our situation.
>> >> >
>> >> >newidle_balance will pull a sched_idle task only if there is an
>> >> >imbalance which is the right thing to do IMO to ensure fairness
>> >> >between sched_idle tasks.  Being a sched_idle task doesn't mean that
>> >> >we should break the fairness
>> >> >
>> >> >>
>> >> >> So we plan to add a new function sched_idle_balance which only try to
>> >> >> pull SCHED_NORMAL tasks from the busiest cpu. And we will call
>> >> >> sched_idle_balance when the previous task is normal or RT and
>> >> >> hoping we can pull a SCHED_NORMAL task to run.
>> >> >>
>> >> >> Do you think it is ok to add a new sched_idle_balance?
>> >> >
>> >> >I don't see any reason why the scheduler should not pull a sched_idle
>> >> >task if there is an imbalance. That will happen anyway during the next
>> >>
>> >> >periodic load balance
>> >> OK. We should not pull the SCHED_IDLE tasks only in load_balance.
>> >>
>> >>
>> >> Do you think it make sense to do an extra load_balance when cpu is
>> >> about to run SCHED_IDLE task (switched from normal/RT)?
>> >
>> >I'm not sure to get your point here.
>> >Do you mean if a sched_idle task is picked to become the running task
>> >whereas there are runnable normal tasks ? This can happen if normal
>> >tasks are long running tasks. We should not in this case. The only
>> >case is when the running task, which is not a sched_idle task but a
>> >normal/rt/deadline one, goes to sleep and there are only sched_idle
>> >tasks enqueued. In this case and only in this case, we should trigger
>> >a load_balance to get a chance to pull a waiting normal task from
>> >another CPU.
>> >
>> >This means checking this state in pick_next_task_fair() and in 
>> >balance_fair()
>>
>> We made another change would you please give some comments?
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 04a3ce2..2357301 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -7029,6 +7029,10 @@ struct task_struct *
>>         struct task_struct *p;
>>         int new_tasks;
>>
>> +       if (sched_idle_rq(rq) && prev && prev->state &&
>> +           prev->policy != SCHED_IDLE)
>

>This need a comment to explain what it want to achieve
Sure we will add a comment when we send version 2.

>

>Why do you need to test prev->state ?
We only want to do this when a normal task goes to sleep.
If a long running normal task reaches its time slice then it is the time
for the SCHED_IDLE task to run.

>
>> +               goto idle;
>> +
>>  again:
>>         if (!sched_fair_runnable(rq))
>>                 goto idle;
>> @@ -10571,7 +10575,8 @@ static int newidle_balance(struct rq *this_rq, 
>> struct rq_flags *rf)
>>          * We must set idle_stamp _before_ calling idle_balance(), such that 
>> we
>>          * measure the duration of idle_balance() as idle time.
>>          */
>> -       this_rq->idle_stamp = rq_clock(this_rq);
>> +       if (!rq->nr_running)
>> +               this_rq->idle_stamp = rq_clock(this_rq);
>>
>>         /*
>>          * Do not pull tasks towards !active CPUs...
>>
>> >
>> >> By doing this SCHED_NORMAL tasks waiting on other cpus would get
>> >> a chance to be pulled to this cpu and run, it is helpful to reduce the 
>> >> latency
>> >> of SCHED_NORMAL tasks.
>> >>
>> >>
>> >> >>>
>> >> >> >
>> >> >> >>
>> >> >> >> >
>> >> >> >>
>> >> >> >> >Also why not doing that for default case too ? i.e. balance_fair() 
>> >> >> >> >?
>> >> >> >> You are right, if you think this scenario makes sense, we will send 
>> >> >> >> a
>> >> >> >> refined patch soon :-)
>> >> >> >>
>> >> >> >> >
>> >> >> >> >> +           sched_idle_cpu(rq->cpu))
>> >> >> >> >> +               goto idle;
>> >> >> >> >> +
>> >> >> >> >>  again:
>> >> >> >> >>         if (!sched_fair_runnable(rq))
>> >> >> >> >>                 goto idle;
>> >> >> >> >> --
>> >> >> >> >> 1.8.3.1
>> >> >> >> >>
>> >> >> >> >>

Reply via email to