On 2020/7/23 15:47, benbjiang(蒋彪) wrote: > Hi, > >> On Jul 23, 2020, at 1:39 PM, Li, Aubrey <aubrey...@linux.intel.com> wrote: >> >> On 2020/7/23 12:23, benbjiang(蒋彪) wrote: >>> Hi, >>>> On Jul 23, 2020, at 11:35 AM, Li, Aubrey <aubrey...@linux.intel.com> wrote: >>>> >>>> On 2020/7/23 10:42, benbjiang(蒋彪) wrote: >>>>> Hi, >>>>> >>>>>> On Jul 23, 2020, at 9:57 AM, Li, Aubrey <aubrey...@linux.intel.com> >>>>>> wrote: >>>>>> >>>>>> On 2020/7/22 22:32, benbjiang(蒋彪) wrote: >>>>>>> Hi, >>>>>>> >>>>>>>> On Jul 22, 2020, at 8:13 PM, Li, Aubrey <aubrey...@linux.intel.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> On 2020/7/22 16:54, benbjiang(蒋彪) wrote: >>>>>>>>> Hi, Aubrey, >>>>>>>>> >>>>>>>>>> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai >>>>>>>>>> <vpil...@digitalocean.com> wrote: >>>>>>>>>> >>>>>>>>>> From: Aubrey Li <aubrey...@intel.com> >>>>>>>>>> >>>>>>>>>> - Don't migrate if there is a cookie mismatch >>>>>>>>>> Load balance tries to move task from busiest CPU to the >>>>>>>>>> destination CPU. When core scheduling is enabled, if the >>>>>>>>>> task's cookie does not match with the destination CPU's >>>>>>>>>> core cookie, this task will be skipped by this CPU. This >>>>>>>>>> mitigates the forced idle time on the destination CPU. >>>>>>>>>> >>>>>>>>>> - Select cookie matched idle CPU >>>>>>>>>> In the fast path of task wakeup, select the first cookie matched >>>>>>>>>> idle CPU instead of the first idle CPU. >>>>>>>>>> >>>>>>>>>> - Find cookie matched idlest CPU >>>>>>>>>> In the slow path of task wakeup, find the idlest CPU whose core >>>>>>>>>> cookie matches with task's cookie >>>>>>>>>> >>>>>>>>>> - Don't migrate task if cookie not match >>>>>>>>>> For the NUMA load balance, don't migrate task to the CPU whose >>>>>>>>>> core cookie does not match with task's cookie >>>>>>>>>> >>>>>>>>>> Signed-off-by: Aubrey Li <aubrey...@linux.intel.com> >>>>>>>>>> Signed-off-by: Tim Chen <tim.c.c...@linux.intel.com> >>>>>>>>>> Signed-off-by: Vineeth Remanan Pillai <vpil...@digitalocean.com> >>>>>>>>>> --- >>>>>>>>>> kernel/sched/fair.c | 64 >>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++---- >>>>>>>>>> kernel/sched/sched.h | 29 ++++++++++++++++++++ >>>>>>>>>> 2 files changed, 88 insertions(+), 5 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>>>>>>>>> index d16939766361..33dc4bf01817 100644 >>>>>>>>>> --- a/kernel/sched/fair.c >>>>>>>>>> +++ b/kernel/sched/fair.c >>>>>>>>>> @@ -2051,6 +2051,15 @@ static void task_numa_find_cpu(struct >>>>>>>>>> task_numa_env *env, >>>>>>>>>> if (!cpumask_test_cpu(cpu, env->p->cpus_ptr)) >>>>>>>>>> continue; >>>>>>>>>> >>>>>>>>>> +#ifdef CONFIG_SCHED_CORE >>>>>>>>>> + /* >>>>>>>>>> + * Skip this cpu if source task's cookie does not match >>>>>>>>>> + * with CPU's core cookie. >>>>>>>>>> + */ >>>>>>>>>> + if (!sched_core_cookie_match(cpu_rq(cpu), env->p)) >>>>>>>>>> + continue; >>>>>>>>>> +#endif >>>>>>>>>> + >>>>>>>>>> env->dst_cpu = cpu; >>>>>>>>>> if (task_numa_compare(env, taskimp, groupimp, maymove)) >>>>>>>>>> break; >>>>>>>>>> @@ -5963,11 +5972,17 @@ find_idlest_group_cpu(struct sched_group >>>>>>>>>> *group, struct task_struct *p, int this >>>>>>>>>> >>>>>>>>>> /* Traverse only the allowed CPUs */ >>>>>>>>>> for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) { >>>>>>>>>> + struct rq *rq = cpu_rq(i); >>>>>>>>>> + >>>>>>>>>> +#ifdef CONFIG_SCHED_CORE >>>>>>>>>> + if (!sched_core_cookie_match(rq, p)) >>>>>>>>>> + continue; >>>>>>>>>> +#endif >>>>>>>>>> + >>>>>>>>>> if (sched_idle_cpu(i)) >>>>>>>>>> return i; >>>>>>>>>> >>>>>>>>>> if (available_idle_cpu(i)) { >>>>>>>>>> - struct rq *rq = cpu_rq(i); >>>>>>>>>> struct cpuidle_state *idle = idle_get_state(rq); >>>>>>>>>> if (idle && idle->exit_latency < >>>>>>>>>> min_exit_latency) { >>>>>>>>>> /* >>>>>>>>>> @@ -6224,8 +6239,18 @@ static int select_idle_cpu(struct task_struct >>>>>>>>>> *p, struct sched_domain *sd, int t >>>>>>>>>> for_each_cpu_wrap(cpu, cpus, target) { >>>>>>>>>> if (!--nr) >>>>>>>>>> return -1; >>>>>>>>>> - if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) >>>>>>>>>> - break; >>>>>>>>>> + >>>>>>>>>> + if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) { >>>>>>>>>> +#ifdef CONFIG_SCHED_CORE >>>>>>>>>> + /* >>>>>>>>>> + * If Core Scheduling is enabled, select this >>>>>>>>>> cpu >>>>>>>>>> + * only if the process cookie matches core >>>>>>>>>> cookie. >>>>>>>>>> + */ >>>>>>>>>> + if (sched_core_enabled(cpu_rq(cpu)) && >>>>>>>>>> + p->core_cookie == >>>>>>>>>> cpu_rq(cpu)->core->core_cookie) >>>>>>>>> Why not also add similar logic in select_idle_smt to reduce >>>>>>>>> forced-idle? :) >>>>>>>> We hit select_idle_smt after we scaned the entire LLC domain for idle >>>>>>>> cores >>>>>>>> and idle cpus and failed,so IMHO, an idle smt is probably a good >>>>>>>> choice under >>>>>>>> this scenario. >>>>>>> >>>>>>> AFAIC, selecting idle sibling with unmatched cookie will cause >>>>>>> unnecessary fored-idle, unfairness and latency, compared to choosing >>>>>>> *target* cpu. >>>>>> Choosing target cpu could increase the runnable task number on the >>>>>> target runqueue, this >>>>>> could trigger busiest->nr_running > 1 logic and makes the idle sibling >>>>>> trying to pull but >>>>>> not success(due to cookie not match). Putting task to the idle sibling >>>>>> is relatively stable IMHO. >>>>> >>>>> I’m afraid that *unsuccessful* pullings between smts would not result in >>>>> unstableness, because >>>>> the load-balance always do periodicly , and unsuccess means nothing >>>>> happen. >>>> unsuccess pulling means more unnecessary overhead in load balance. >>>> >>>>> On the contrary, unmatched sibling tasks running concurrently could bring >>>>> forced-idle to each other repeatedly, >>>>> Which is more unstable, and more costly when pick_next_task for all >>>>> siblings. >>>> Not worse than two tasks ping-pong on the same target run queue I guess, >>>> and better if >>>> - task1(cookie A) is running on the target, and task2(cookie B) in the >>>> runqueue, >>>> - task3(cookie B) coming >>>> >>>> If task3 chooses target's sibling, it could have a chance to run >>>> concurrently with task2. >>>> But if task3 chooses target, it will wait for next pulling luck of load >>>> balancer >>> That’s more interesting. :) >>> Distributing different cookie tasks onto different cpus(or cpusets) could >>> be the *ideal stable status* we want, as I understood. >>> Different cookie tasks running on sibling smts could hurt performance, and >>> that should be avoided with best effort. >> We already tried to avoid when we scan idle cores and idle cpus in llc >> domain. > > I’m afraid that’s not enough either, :) > 1. Scanning Idle cpus is not a full scan, there is limit according to scan > cost. > 2. That's only trying at the *core/cpu* level, *SMT* level should be > considered too. > >> >>> For above case, selecting idle sibling cpu can improve the concurrency >>> indeed, but it decrease the imbalance for load-balancer. >>> In that case, load-balancer could not notice the imbalance, and would do >>> nothing to improve the unmatched situation. >>> On the contrary, choosing the *target* cpu could enhance the imbalance, and >>> load-balancer could try to pull unmatched task away, >> Pulling away to where needs another bunch of elaboration. > > Still with the SMT2+3tasks case, > if *idle sibling* chosen, > Smt1’s load = task1+task2, smt2’s load = task3. Task3 will run intermittently > because of forced-idle, > so smt2’s real load could low enough, that it could not be pulled away > forever. That’s indeed a stable state, > but with performance at a discount. > > If *target sibling* chose, > Smt1’s load = task1+task2+task3, smt2’s load=0. It’s a obvious imbalance, and > load-balancer will pick a task to pull, > 1. If task1(cookie A) picked, that’s done for good. > 2. If task2(cookie B) or task3(cookie B) picked, that’s ok too, the rest > task(cookie B) could be pulled away at next balance(maybe need to improve the > pulling to tend to pull matched task more aggressively). > And then, we may reach a more stable state *globally* without performance > discount.
I'm not sure what you mean pulled away, - if you mean pulled away from this core, cookieA in idle sibling case can be pulled away too. - and if you mean pulled away but within this core, I guess cookieB in target sibling case can't be pulled away either, as nr_running difference = 1 Thanks, -Aubrey