On 2020/9/15 17:23, Vincent Guittot wrote: > On Tue, 15 Sep 2020 at 10:47, Jiang Biao <benbji...@gmail.com> wrote: >> >> Hi, Vincent >> >> On Mon, 14 Sep 2020 at 20:26, Vincent Guittot >> <vincent.guit...@linaro.org> wrote: >>> >>> On Sun, 13 Sep 2020 at 05:59, Jiang Biao <benbji...@gmail.com> wrote: >>>> >>>> Hi, Aubrey >>>> >>>> On Fri, 11 Sep 2020 at 23:48, Aubrey Li <aubrey...@intel.com> wrote: >>>>> >>>>> Added idle cpumask to track idle cpus in sched domain. When a CPU >>>>> enters idle, its corresponding bit in the idle cpumask will be set, >>>>> and when the CPU exits idle, its bit will be cleared. >>>>> >>>>> When a task wakes up to select an idle cpu, scanning idle cpumask >>>>> has low cost than scanning all the cpus in last level cache domain, >>>>> especially when the system is heavily loaded. >>>>> >>>>> Signed-off-by: Aubrey Li <aubrey...@linux.intel.com> >>>>> --- >>>>> include/linux/sched/topology.h | 13 +++++++++++++ >>>>> kernel/sched/fair.c | 4 +++- >>>>> kernel/sched/topology.c | 2 +- >>>>> 3 files changed, 17 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/include/linux/sched/topology.h >>>>> b/include/linux/sched/topology.h >>>>> index fb11091129b3..43a641d26154 100644 >>>>> --- a/include/linux/sched/topology.h >>>>> +++ b/include/linux/sched/topology.h >>>>> @@ -65,8 +65,21 @@ struct sched_domain_shared { >>>>> atomic_t ref; >>>>> atomic_t nr_busy_cpus; >>>>> int has_idle_cores; >>>>> + /* >>>>> + * Span of all idle CPUs in this domain. >>>>> + * >>>>> + * NOTE: this field is variable length. (Allocated dynamically >>>>> + * by attaching extra space to the end of the structure, >>>>> + * depending on how many CPUs the kernel has booted up with) >>>>> + */ >>>>> + unsigned long idle_cpus_span[]; >>>>> }; >>>>> >>>>> +static inline struct cpumask *sds_idle_cpus(struct sched_domain_shared >>>>> *sds) >>>>> +{ >>>>> + return to_cpumask(sds->idle_cpus_span); >>>>> +} >>>>> + >>>>> struct sched_domain { >>>>> /* These fields must be setup */ >>>>> struct sched_domain __rcu *parent; /* top domain must be >>>>> null terminated */ >>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>>>> index 6b3b59cc51d6..3b6f8a3589be 100644 >>>>> --- a/kernel/sched/fair.c >>>>> +++ b/kernel/sched/fair.c >>>>> @@ -6136,7 +6136,7 @@ static int select_idle_cpu(struct task_struct *p, >>>>> struct sched_domain *sd, int t >>>>> >>>>> time = cpu_clock(this); >>>>> >>>>> - cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); >>>>> + cpumask_and(cpus, sds_idle_cpus(sd->shared), p->cpus_ptr); >>>> Is the sds_idle_cpus() always empty if nohz=off? >>> >>> Good point >>> >>>> Do we need to initialize the idle_cpus_span with sched_domain_span(sd)? >>>> >>>>> >>>>> for_each_cpu_wrap(cpu, cpus, target) { >>>>> if (!--nr) >>>>> @@ -10182,6 +10182,7 @@ static void set_cpu_sd_state_busy(int cpu) >>>>> sd->nohz_idle = 0; >>>>> >>>>> atomic_inc(&sd->shared->nr_busy_cpus); >>>>> + cpumask_clear_cpu(cpu, sds_idle_cpus(sd->shared)); >>>>> unlock: >>>>> rcu_read_unlock(); >>>>> } >>>>> @@ -10212,6 +10213,7 @@ static void set_cpu_sd_state_idle(int cpu) >>>>> sd->nohz_idle = 1; >>>>> >>>>> atomic_dec(&sd->shared->nr_busy_cpus); >>>>> + cpumask_set_cpu(cpu, sds_idle_cpus(sd->shared)); >>>> This only works when entering/exiting tickless mode? :) >>>> Why not update idle_cpus_span during tick_nohz_idle_enter()/exit()? >>> >>> set_cpu_sd_state_busy is only called during a tick in order to limit >>> the rate of the update to once per tick per cpu at most and prevents >>> any kind of storm of update if short running tasks wake/sleep all the >>> time. We don't want to update a cpumask at each and every enter/leave >>> idle. >>> >> Agree. But set_cpu_sd_state_busy seems not being reached when >> nohz=off, which means it will not work for that case? :) > > Yes set_cpu_sd_state_idle/busy are nohz function
Thanks Biao to point this out. If the shared idle cpumask is initialized with sched_domain_span(sd), then nohz=off case will remain the previous behavior. Thanks, -Aubrey