Re: [RFC PATCH 11/16] sched: migration changes for core scheduling(Internet mail)

Li, Aubrey Thu, 23 Jul 2020 01:07:34 -0700

On 2020/7/23 15:47, benbjiang(蒋彪) wrote:
> Hi,
> 
>> On Jul 23, 2020, at 1:39 PM, Li, Aubrey <aubrey...@linux.intel.com> wrote:
>>
>> On 2020/7/23 12:23, benbjiang(蒋彪) wrote:
>>> Hi,
>>>> On Jul 23, 2020, at 11:35 AM, Li, Aubrey <aubrey...@linux.intel.com> wrote:
>>>>
>>>> On 2020/7/23 10:42, benbjiang(蒋彪) wrote:
>>>>> Hi,
>>>>>
>>>>>> On Jul 23, 2020, at 9:57 AM, Li, Aubrey <aubrey...@linux.intel.com> 
>>>>>> wrote:
>>>>>>
>>>>>> On 2020/7/22 22:32, benbjiang(蒋彪) wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>>> On Jul 22, 2020, at 8:13 PM, Li, Aubrey <aubrey...@linux.intel.com> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On 2020/7/22 16:54, benbjiang(蒋彪) wrote:
>>>>>>>>> Hi, Aubrey,
>>>>>>>>>
>>>>>>>>>> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai 
>>>>>>>>>> <vpil...@digitalocean.com> wrote:
>>>>>>>>>>
>>>>>>>>>> From: Aubrey Li <aubrey...@intel.com>
>>>>>>>>>>
>>>>>>>>>> - Don't migrate if there is a cookie mismatch
>>>>>>>>>> Load balance tries to move task from busiest CPU to the
>>>>>>>>>> destination CPU. When core scheduling is enabled, if the
>>>>>>>>>> task's cookie does not match with the destination CPU's
>>>>>>>>>> core cookie, this task will be skipped by this CPU. This
>>>>>>>>>> mitigates the forced idle time on the destination CPU.
>>>>>>>>>>
>>>>>>>>>> - Select cookie matched idle CPU
>>>>>>>>>> In the fast path of task wakeup, select the first cookie matched
>>>>>>>>>> idle CPU instead of the first idle CPU.
>>>>>>>>>>
>>>>>>>>>> - Find cookie matched idlest CPU
>>>>>>>>>> In the slow path of task wakeup, find the idlest CPU whose core
>>>>>>>>>> cookie matches with task's cookie
>>>>>>>>>>
>>>>>>>>>> - Don't migrate task if cookie not match
>>>>>>>>>> For the NUMA load balance, don't migrate task to the CPU whose
>>>>>>>>>> core cookie does not match with task's cookie
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Aubrey Li <aubrey...@linux.intel.com>
>>>>>>>>>> Signed-off-by: Tim Chen <tim.c.c...@linux.intel.com>
>>>>>>>>>> Signed-off-by: Vineeth Remanan Pillai <vpil...@digitalocean.com>
>>>>>>>>>> ---
>>>>>>>>>> kernel/sched/fair.c  | 64 
>>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++----
>>>>>>>>>> kernel/sched/sched.h | 29 ++++++++++++++++++++
>>>>>>>>>> 2 files changed, 88 insertions(+), 5 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>>>>>>> index d16939766361..33dc4bf01817 100644
>>>>>>>>>> --- a/kernel/sched/fair.c
>>>>>>>>>> +++ b/kernel/sched/fair.c
>>>>>>>>>> @@ -2051,6 +2051,15 @@ static void task_numa_find_cpu(struct 
>>>>>>>>>> task_numa_env *env,
>>>>>>>>>>              if (!cpumask_test_cpu(cpu, env->p->cpus_ptr))
>>>>>>>>>>                      continue;
>>>>>>>>>>
>>>>>>>>>> +#ifdef CONFIG_SCHED_CORE
>>>>>>>>>> +            /*
>>>>>>>>>> +             * Skip this cpu if source task's cookie does not match
>>>>>>>>>> +             * with CPU's core cookie.
>>>>>>>>>> +             */
>>>>>>>>>> +            if (!sched_core_cookie_match(cpu_rq(cpu), env->p))
>>>>>>>>>> +                    continue;
>>>>>>>>>> +#endif
>>>>>>>>>> +
>>>>>>>>>>              env->dst_cpu = cpu;
>>>>>>>>>>              if (task_numa_compare(env, taskimp, groupimp, maymove))
>>>>>>>>>>                      break;
>>>>>>>>>> @@ -5963,11 +5972,17 @@ find_idlest_group_cpu(struct sched_group 
>>>>>>>>>> *group, struct task_struct *p, int this
>>>>>>>>>>
>>>>>>>>>>      /* Traverse only the allowed CPUs */
>>>>>>>>>>      for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
>>>>>>>>>> +            struct rq *rq = cpu_rq(i);
>>>>>>>>>> +
>>>>>>>>>> +#ifdef CONFIG_SCHED_CORE
>>>>>>>>>> +            if (!sched_core_cookie_match(rq, p))
>>>>>>>>>> +                    continue;
>>>>>>>>>> +#endif
>>>>>>>>>> +
>>>>>>>>>>              if (sched_idle_cpu(i))
>>>>>>>>>>                      return i;
>>>>>>>>>>
>>>>>>>>>>              if (available_idle_cpu(i)) {
>>>>>>>>>> -                    struct rq *rq = cpu_rq(i);
>>>>>>>>>>                      struct cpuidle_state *idle = idle_get_state(rq);
>>>>>>>>>>                      if (idle && idle->exit_latency < 
>>>>>>>>>> min_exit_latency) {
>>>>>>>>>>                              /*
>>>>>>>>>> @@ -6224,8 +6239,18 @@ static int select_idle_cpu(struct task_struct 
>>>>>>>>>> *p, struct sched_domain *sd, int t
>>>>>>>>>>      for_each_cpu_wrap(cpu, cpus, target) {
>>>>>>>>>>              if (!--nr)
>>>>>>>>>>                      return -1;
>>>>>>>>>> -            if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
>>>>>>>>>> -                    break;
>>>>>>>>>> +
>>>>>>>>>> +            if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) {
>>>>>>>>>> +#ifdef CONFIG_SCHED_CORE
>>>>>>>>>> +                    /*
>>>>>>>>>> +                     * If Core Scheduling is enabled, select this 
>>>>>>>>>> cpu
>>>>>>>>>> +                     * only if the process cookie matches core 
>>>>>>>>>> cookie.
>>>>>>>>>> +                     */
>>>>>>>>>> +                    if (sched_core_enabled(cpu_rq(cpu)) &&
>>>>>>>>>> +                        p->core_cookie == 
>>>>>>>>>> cpu_rq(cpu)->core->core_cookie)
>>>>>>>>> Why not also add similar logic in select_idle_smt to reduce 
>>>>>>>>> forced-idle? :)
>>>>>>>> We hit select_idle_smt after we scaned the entire LLC domain for idle 
>>>>>>>> cores
>>>>>>>> and idle cpus and failed,so IMHO, an idle smt is probably a good 
>>>>>>>> choice under
>>>>>>>> this scenario.
>>>>>>>
>>>>>>> AFAIC, selecting idle sibling with unmatched cookie will cause 
>>>>>>> unnecessary fored-idle, unfairness and latency, compared to choosing 
>>>>>>> *target* cpu.
>>>>>> Choosing target cpu could increase the runnable task number on the 
>>>>>> target runqueue, this
>>>>>> could trigger busiest->nr_running > 1 logic and makes the idle sibling 
>>>>>> trying to pull but
>>>>>> not success(due to cookie not match). Putting task to the idle sibling 
>>>>>> is relatively stable IMHO.
>>>>>
>>>>> I’m afraid that *unsuccessful* pullings between smts would not result in 
>>>>> unstableness, because
>>>>> the load-balance always do periodicly , and unsuccess means nothing 
>>>>> happen.
>>>> unsuccess pulling means more unnecessary overhead in load balance.
>>>>
>>>>> On the contrary, unmatched sibling tasks running concurrently could bring 
>>>>> forced-idle to each other repeatedly,
>>>>> Which is more unstable, and more costly when pick_next_task for all 
>>>>> siblings.
>>>> Not worse than two tasks ping-pong on the same target run queue I guess, 
>>>> and better if
>>>> - task1(cookie A) is running on the target, and task2(cookie B) in the 
>>>> runqueue,
>>>> - task3(cookie B) coming
>>>>
>>>> If task3 chooses target's sibling, it could have a chance to run 
>>>> concurrently with task2.
>>>> But if task3 chooses target, it will wait for next pulling luck of load 
>>>> balancer
>>> That’s more interesting. :)
>>> Distributing different cookie tasks onto different cpus(or cpusets) could 
>>> be the *ideal stable status* we want, as I understood.
>>> Different cookie tasks running on sibling smts could hurt performance, and 
>>> that should be avoided with best effort.
>> We already tried to avoid when we scan idle cores and idle cpus in llc 
>> domain.
> 
> I’m afraid that’s not enough either, :)
> 1. Scanning Idle cpus is not a full scan, there is limit according to scan 
> cost.
> 2. That's only trying at the *core/cpu* level, *SMT* level should be 
> considered too. 
> 
>>
>>> For above case, selecting idle sibling cpu can improve the concurrency 
>>> indeed, but it decrease the imbalance for load-balancer.
>>> In that case, load-balancer could not notice the imbalance, and would do 
>>> nothing to improve the unmatched situation.
>>> On the contrary, choosing the *target* cpu could enhance the imbalance, and 
>>> load-balancer could try to pull unmatched task away,
>> Pulling away to where needs another bunch of elaboration.
> 
> Still with the SMT2+3tasks case, 
> if *idle sibling* chosen, 
> Smt1’s load = task1+task2, smt2’s load = task3. Task3 will run intermittently 
> because of forced-idle, 
> so smt2’s real load could low enough, that it could not be pulled away 
> forever. That’s indeed a stable state, 
> but with performance at a discount. 
> 
> If *target sibling* chose,
> Smt1’s load = task1+task2+task3, smt2’s load=0. It’s a obvious imbalance, and 
> load-balancer will pick a task to pull,
> 1. If task1(cookie A) picked, that’s done for good.
> 2. If task2(cookie B) or task3(cookie B) picked, that’s ok too, the rest 
> task(cookie B) could be pulled away at next balance(maybe need to improve the 
> pulling to tend to pull matched task more aggressively).
> And then, we may reach a more stable state *globally* without performance 
> discount.


I'm not sure what you mean pulled away,
- if you mean pulled away from this core, cookieA in idle sibling case can be
  pulled away too.
- and if you mean pulled away but within this core, I guess cookieB in target
  sibling case can't be pulled away either, as nr_running difference = 1

Thanks,
-Aubrey

Re: [RFC PATCH 11/16] sched: migration changes for core scheduling(Internet mail)

Reply via email to