Hi Prateek,
On Tue, 26 May 2026 11:23:59 +0530, K Prateek Nayak <[email protected]>
wrote:
> Hello Srikar,
>
> On 5/26/2026 10:28 AM, Srikar Dronamraju wrote:
> > L2 Cache reported here is for SMT8 Core aka CACHE domain.
>
> Apart for the scheduler, nothing in tree currently cares about
> cpu_coregroup_mask() except for drivers/base/arch_topology.c but
> Power doesn't select GENERIC_ARCH_TOPOLOGY.
>
> Why can't Power have an internal mask for MC domain (tl_mc_mask) and
> the scheduler can use cpu_coregroup_mask() for the actual LLc? (The L2
> mask in this case.)
>
> Power anyways adds its own topology via set_sched_topology() so the
> default_topology from kernel/sched/topology.c remains unused.
>
> ...
>
> > Shouldnt cache-aware scheduling be worried about cpuset partitions too.
> > If a cpuset has subset of LLC cores, then we should Scheduler assume it can
> > control complete LLC?
>
> Well, the scheduling takes care of partitions and the cache aware
> scheduling bits take care of looking at the full system perspective
> for stats aggregation and pointing to a particular LLc.
>
> We don't compare llc_id across cpusets so we keeping one unique llc_id
> per H/W LLC instance is feasible and it enables us to keep llc_id space
> limited for optimizing cache-aware scheduling.
>
> Now if we have threads of same process across partitions, we'll
> still aggregate the utilization numbers across the full LLC but
> the load balancers at individual partitions will make a call on
> the aggregation.
>
> --
> Thanks and Regards,
> Prateek
>
>
I suppose what you suggested looks like below:
powerpc/smp: make cpu_coregroup_mask() return the LLC
On pSeries shared LPARs(or coregroup_enabled is false on
Power9 and earlier) the hemisphere map is not allocated, so
build_sched_domains() dereferences a NULL cpumask and crashes.
The generic scheduler expects cpu_coregroup_mask() to span the LLC.
On powerpc the LLC is the L2. Return cpu_l2_cache_mask() instead of
the hemisphere map. Use a coregroup_map() helper for the in-file
hemisphere users, and a powerpc_tl_mc_mask() wrapper for the MC
sched-domain level.
Fixes: b5ea300a17e3 ("sched/cache: Make LLC id continuous")
Reported-by: Venkat Rao Bagalkote <[email protected]>
Suggested-by: K Prateek Nayak <[email protected]>
---
arch/powerpc/kernel/smp.c | 35 +++++++++++++++++++++++------------
1 file changed, 23 insertions(+), 12 deletions(-)
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1040,11 +1040,22 @@ static const struct cpumask
*tl_smallcore_smt_mask(struct sched_domain_topology_
}
#endif
+static inline struct cpumask *coregroup_map(int cpu)
+{
+ return per_cpu(cpu_coregroup_map, cpu);
+}
+
struct cpumask *cpu_coregroup_mask(int cpu)
{
- return per_cpu(cpu_coregroup_map, cpu);
+ return cpu_l2_cache_mask(cpu);
+}
+
+static const struct cpumask *
+powerpc_tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+ return coregroup_map(cpu);
}
static bool has_coregroup_support(void)
{
if (is_shared_processor())
@@ -1155,7 +1166,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid));
if (has_coregroup_support())
- cpumask_set_cpu(boot_cpuid, cpu_coregroup_mask(boot_cpuid));
+ cpumask_set_cpu(boot_cpuid, coregroup_map(boot_cpuid));
init_big_cores();
if (has_big_cores) {
@@ -1520,8 +1531,8 @@ static void remove_cpu_from_masks(int cpu)
set_cpus_unrelated(cpu, i, cpu_core_mask);
if (has_coregroup_support()) {
- for_each_cpu(i, cpu_coregroup_mask(cpu))
- set_cpus_unrelated(cpu, i, cpu_coregroup_mask);
+ for_each_cpu(i, coregroup_map(cpu))
+ set_cpus_unrelated(cpu, i, coregroup_map);
}
}
#endif
@@ -1553,7 +1564,7 @@ static void update_coregroup_mask(int cpu, cpumask_var_t
*mask)
if (!*mask) {
/* Assume only siblings are part of this CPU's coregroup */
for_each_cpu(i, submask_fn(cpu))
- set_cpus_related(cpu, i, cpu_coregroup_mask);
+ set_cpus_related(cpu, i, coregroup_map);
return;
}
@@ -1561,18 +1572,18 @@ static void update_coregroup_mask(int cpu,
cpumask_var_t *mask)
cpumask_and(*mask, cpu_online_mask, cpu_node_mask(cpu));
/* Update coregroup mask with all the CPUs that are part of submask */
- or_cpumasks_related(cpu, cpu, submask_fn, cpu_coregroup_mask);
+ or_cpumasks_related(cpu, cpu, submask_fn, coregroup_map);
/* Skip all CPUs already part of coregroup mask */
- cpumask_andnot(*mask, *mask, cpu_coregroup_mask(cpu));
+ cpumask_andnot(*mask, *mask, coregroup_map(cpu));
for_each_cpu(i, *mask) {
/* Skip all CPUs not part of this coregroup */
if (coregroup_id == cpu_to_coregroup_id(i)) {
- or_cpumasks_related(cpu, i, submask_fn,
cpu_coregroup_mask);
+ or_cpumasks_related(cpu, i, submask_fn, coregroup_map);
cpumask_andnot(*mask, *mask, submask_fn(i));
} else {
- cpumask_andnot(*mask, *mask, cpu_coregroup_mask(i));
+ cpumask_andnot(*mask, *mask, coregroup_map(i));
}
}
}
@@ -1733,7 +1744,7 @@ static void __init build_sched_topology(void)
if (has_coregroup_support()) {
powerpc_topology[i++] =
- SDTL_INIT(tl_mc_mask, powerpc_shared_proc_flags, MC);
+ SDTL_INIT(powerpc_tl_mc_mask,
powerpc_shared_proc_flags, MC);
}
powerpc_topology[i++] = SDTL_INIT(tl_pkg_mask,
powerpc_shared_proc_flags, PKG);
--
2.43.0
Thanks,
Yu