At 2021-02-04 16:01:57, "Vincent Guittot" <vincent.guit...@linaro.org> wrote: >On Tue, 2 Feb 2021 at 08:56, chin <ultrac...@163.com> wrote: >> >> >> >> >> At 2021-01-13 16:30:14, "Vincent Guittot" <vincent.guit...@linaro.org> wrote: >> >On Wed, 13 Jan 2021 at 04:14, chin <ultrac...@163.com> wrote: >> >> >> >> >> >> >> >> >> >> At 2021-01-12 16:18:51, "Vincent Guittot" <vincent.guit...@linaro.org> >> >> wrote: >> >> >On Tue, 12 Jan 2021 at 07:59, chin <ultrac...@163.com> wrote: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> At 2021-01-11 19:04:19, "Vincent Guittot" <vincent.guit...@linaro.org> >> >> >> wrote: >> >> >> >On Mon, 11 Jan 2021 at 09:27, chin <ultrac...@163.com> wrote: >> >> >> >> >> >> >> >> >> >> >> >> At 2020-12-23 19:30:26, "Vincent Guittot" >> >> >> >> <vincent.guit...@linaro.org> wrote: >> >> >> >> >On Wed, 23 Dec 2020 at 09:32, <ultrac...@163.com> wrote: >> >> >> >> >> >> >> >> >> >> From: Chen Xiaoguang <xiaoggc...@tencent.com> >> >> >> >> >> >> >> >> >> >> Before a CPU switches from running SCHED_NORMAL task to >> >> >> >> >> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other >> >> >> >> > >> >> >> >> >Could you explain more in detail why you only care about this use >> >> >> >> >case >> >> >> >> >> >> >> >> >in particular and not the general case? >> >> >> >> >> >> >> >> >> >> >> >> We want to run online tasks using SCHED_NORMAL policy and offline >> >> >> >> tasks >> >> >> >> using SCHED_IDLE policy. The online tasks and the offline tasks run >> >> >> >> in >> >> >> >> the same computer in order to use the computer efficiently. >> >> >> >> The online tasks are in sleep in most times but should responce >> >> >> >> soon once >> >> >> >> wake up. The offline tasks are in low priority and will run only >> >> >> >> when no online >> >> >> >> tasks. >> >> >> >> >> >> >> >> The online tasks are more important than the offline tasks and are >> >> >> >> latency >> >> >> >> sensitive we should make sure the online tasks preempt the offline >> >> >> >> tasks >> >> >> >> as soon as possilbe while there are online tasks waiting to run. >> >> >> >> So in our situation we hope the SCHED_NORMAL to run if has any. >> >> >> >> >> >> >> >> Let's assume we have 2 CPUs, >> >> >> >> In CPU1 we got 2 SCHED_NORMAL tasks. >> >> >> >> in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks. >> >> >> >> >> >> >> >> CPU1 CPU2 >> >> >> >> curr rq1 curr rq2 >> >> >> >> +------+ | +------+ +------+ | +----+ +----+ >> >> >> >> t0 |NORMAL| | |NORMAL| |NORMAL| | |IDLE| |IDLE| >> >> >> >> +------+ | +------+ +------+ | +----+ +----+ >> >> >> >> >> >> >> >> NORMAL exits or blocked >> >> >> >> +------+ | +------+ | +----+ +----+ >> >> >> >> t1 |NORMAL| | |NORMAL| | |IDLE| |IDLE| >> >> >> >> +------+ | +------+ | +----+ +----+ >> >> >> >> >> >> >> >> pick_next_task_fair >> >> >> >> +------+ | +------+ +----+ | +----+ >> >> >> >> t2 |NORMAL| | |NORMAL| |IDLE| | |IDLE| >> >> >> >> +------+ | +------+ +----+ | +----+ >> >> >> >> >> >> >> >> SCHED_IDLE running >> >> >> >> t3 +------+ | +------+ +----+ | +----+ >> >> >> >> |NORMAL| | |NORMAL| |IDLE| | |IDLE| >> >> >> >> +------+ | +------+ +----+ | +----+ >> >> >> >> >> >> >> >> run_rebalance_domains >> >> >> >> +------+ | +------+ | +----+ +----+ >> >> >> >> t4 |NORMAL| | |NORMAL| | |IDLE| |IDLE| >> >> >> >> +------+ | +------+ | +----+ +----+ >> >> >> >> >> >> >> >> As we can see >> >> >> >> t1: NORMAL task in CPU2 exits or blocked >> >> >> >> t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while >> >> >> >> another SCHED_NORMAL in rq1 is waiting. >> >> >> >> t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1. >> >> >> >> t4: after a short time, periodic load_balance triggerd and pull >> >> >> >> SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts >> >> >> >> SCHED_IDLE. >> >> >> >> >> >> >> >> In this scenario, SCHED_IDLE is running while SCHED_NORMAL is >> >> >> >> waiting to run. >> >> >> >> The latency of this SCHED_NORMAL will be high which is not >> >> >> >> acceptble. >> >> >> >> >> >> >> >> Do a load_balance before running the SCHED_IDLE may fix this >> >> >> >> problem. >> >> >> >> >> >> >> >> This patch works as below: >> >> >> >> >> >> >> >> CPU1 CPU2 >> >> >> >> curr rq1 curr rq2 >> >> >> >> +------+ | +------+ +------+ | +----+ +----+ >> >> >> >> t0 |NORMAL| | |NORMAL| |NORMAL| | |IDLE| |IDLE| >> >> >> >> +------+ | +------+ +------+ | +----+ +----+ >> >> >> >> >> >> >> >> NORMAL exits or blocked >> >> >> >> +------+ | +------+ | +----+ +----+ >> >> >> >> t1 |NORMAL| | |NORMAL| | |IDLE| |IDLE| >> >> >> >> +------+ | +------+ | +----+ +----+ >> >> >> >> >> >> >> >> t2 pick_next_task_fair (all se are >> >> >> >> SCHED_IDLE) >> >> >> >> >> >> >> >> newidle_balance >> >> >> >> +------+ | +------+ | +----+ +----+ >> >> >> >> t3 |NORMAL| | |NORMAL| | |IDLE| |IDLE| >> >> >> >> +------+ | +------+ | +----+ +----+ >> >> >> >> >> >> >> >> >> >> >> >> t1: NORMAL task in CPU2 exits or blocked >> >> >> >> t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and >> >> >> >> calls >> >> >> >> newidle_balance who tries to pull a SCHED_NORMAL(if has). >> >> >> >> t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of >> >> >> >> SCHED_IDLE(likely). >> >> >> >> >> >> >> >> > >> >> >> >> >> CPU by doing load_balance first. >> >> >> >> >> >> >> >> >> >> Signed-off-by: Chen Xiaoguang <xiaoggc...@tencent.com> >> >> >> >> >> Signed-off-by: Chen He <heddc...@tencent.com> >> >> >> >> >> --- >> >> >> >> >> kernel/sched/fair.c | 5 +++++ >> >> >> >> >> 1 file changed, 5 insertions(+) >> >> >> >> >> >> >> >> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> >> >> >> >> index ae7ceba..0a26132 100644 >> >> >> >> >> --- a/kernel/sched/fair.c >> >> >> >> >> +++ b/kernel/sched/fair.c >> >> >> >> >> @@ -7004,6 +7004,11 @@ struct task_struct * >> >> >> >> >> struct task_struct *p; >> >> >> >> >> int new_tasks; >> >> >> >> >> >> >> >> >> >> + if (prev && >> >> >> >> >> + fair_policy(prev->policy) && >> >> >> >> > >> >> >> >> >Why do you need a prev and fair task ? You seem to target the >> >> >> >> >special >> >> >> >> >case of pick_next_task but in this case why not only testing >> >> >> >> >rf!=null >> >> >> >> > to make sure to not return immediately after jumping to the idle >> >> >> >> >> >> >> >> >label? >> >> >> >> We just want to do load_balance only when CPU switches from >> >> >> >> SCHED_NORMAL >> >> >> >> to SCHED_IDLE. >> >> >> >> If not check prev, when the running tasks are all SCHED_IDLE, we >> >> >> >> would >> >> >> >> do newidle_balance everytime in pick_next_task_fair, it makes no >> >> >> >> sense >> >> >> >> and kind of wasting. >> >> >> > >> >> >> >I agree that calling newidle_balance every time pick_next_task_fair is >> >> >> >called when there are only sched_idle tasks is useless. >> >> >> >But you also have to take into account cases where there was another >> >> >> >class of task running on the cpu like RT one. In your example above, >> >> >> >if you replace the normal task on CPU2 by a RT task, you still want to >> >> >> >> >> >> >pick the normal task on CPU1 once RT task goes to sleep. >> >> >> Sure, this case should be taken into account, we should also try to >> >> >> pick normal task in this case. >> >> >> >> >> >> > >> >> >> >Another point that you will have to consider the impact on >> >> >> >rq->idle_stamp because newidle_balance is assumed to be called before >> >> >> >> >> >> >going idle which is not the case anymore with your use case >> >> >> Yes. rq->idle_stamp should not be changed in this case. >> >> >> >> >> >> >> >> >> >> >> >> Actually we want to pull a SCHED_NORMAL task (if possible) to run when >> >> >> a cpu is >> >> >> about to run SCHED_IDLE task. But currently newidle_balance is not >> >> >> designed for SCHED_IDLE so SCHED_IDLE can also be pulled which >> >> >> is useless in our situation. >> >> > >> >> >newidle_balance will pull a sched_idle task only if there is an >> >> >imbalance which is the right thing to do IMO to ensure fairness >> >> >between sched_idle tasks. Being a sched_idle task doesn't mean that >> >> >we should break the fairness >> >> > >> >> >> >> >> >> So we plan to add a new function sched_idle_balance which only try to >> >> >> pull SCHED_NORMAL tasks from the busiest cpu. And we will call >> >> >> sched_idle_balance when the previous task is normal or RT and >> >> >> hoping we can pull a SCHED_NORMAL task to run. >> >> >> >> >> >> Do you think it is ok to add a new sched_idle_balance? >> >> > >> >> >I don't see any reason why the scheduler should not pull a sched_idle >> >> >task if there is an imbalance. That will happen anyway during the next >> >> >> >> >periodic load balance >> >> OK. We should not pull the SCHED_IDLE tasks only in load_balance. >> >> >> >> >> >> Do you think it make sense to do an extra load_balance when cpu is >> >> about to run SCHED_IDLE task (switched from normal/RT)? >> > >> >I'm not sure to get your point here. >> >Do you mean if a sched_idle task is picked to become the running task >> >whereas there are runnable normal tasks ? This can happen if normal >> >tasks are long running tasks. We should not in this case. The only >> >case is when the running task, which is not a sched_idle task but a >> >normal/rt/deadline one, goes to sleep and there are only sched_idle >> >tasks enqueued. In this case and only in this case, we should trigger >> >a load_balance to get a chance to pull a waiting normal task from >> >another CPU. >> > >> >This means checking this state in pick_next_task_fair() and in >> >balance_fair() >> >> We made another change would you please give some comments? >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 04a3ce2..2357301 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -7029,6 +7029,10 @@ struct task_struct * >> struct task_struct *p; >> int new_tasks; >> >> + if (sched_idle_rq(rq) && prev && prev->state && >> + prev->policy != SCHED_IDLE) >
>This need a comment to explain what it want to achieve Sure we will add a comment when we send version 2. > >Why do you need to test prev->state ? We only want to do this when a normal task goes to sleep. If a long running normal task reaches its time slice then it is the time for the SCHED_IDLE task to run. > >> + goto idle; >> + >> again: >> if (!sched_fair_runnable(rq)) >> goto idle; >> @@ -10571,7 +10575,8 @@ static int newidle_balance(struct rq *this_rq, >> struct rq_flags *rf) >> * We must set idle_stamp _before_ calling idle_balance(), such that >> we >> * measure the duration of idle_balance() as idle time. >> */ >> - this_rq->idle_stamp = rq_clock(this_rq); >> + if (!rq->nr_running) >> + this_rq->idle_stamp = rq_clock(this_rq); >> >> /* >> * Do not pull tasks towards !active CPUs... >> >> > >> >> By doing this SCHED_NORMAL tasks waiting on other cpus would get >> >> a chance to be pulled to this cpu and run, it is helpful to reduce the >> >> latency >> >> of SCHED_NORMAL tasks. >> >> >> >> >> >> >>> >> >> >> > >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >Also why not doing that for default case too ? i.e. balance_fair() >> >> >> >> >? >> >> >> >> You are right, if you think this scenario makes sense, we will send >> >> >> >> a >> >> >> >> refined patch soon :-) >> >> >> >> >> >> >> >> > >> >> >> >> >> + sched_idle_cpu(rq->cpu)) >> >> >> >> >> + goto idle; >> >> >> >> >> + >> >> >> >> >> again: >> >> >> >> >> if (!sched_fair_runnable(rq)) >> >> >> >> >> goto idle; >> >> >> >> >> -- >> >> >> >> >> 1.8.3.1 >> >> >> >> >> >> >> >> >> >>