On Mon, Sep 15, 2014 at 03:26:41PM -0700, Dave Hansen wrote:
> 
> I'm getting the spew below when booting with Haswell (Xeon
> E5-2699) CPUs and the "Cluster-on-Die" (CoD) feature enabled in
> the BIOS. 

What is that cluster-on-die thing? I've heard it before but never could
find anything on it.

> This also fixes sysfs because CPUs with the same 'physical_package_id'
> in /sys/devices/system/cpu/cpu*/topology/ are not listed together
> in the same 'core_siblings_list'.  This violates a statement from
> Documentation/ABI/testing/sysfs-devices-system-cpu:
> 
>       core_siblings: internal kernel map of cpu#'s hardware threads
>       within the same physical_package_id.
> 
>       core_siblings_list: human-readable list of the logical CPU
>       numbers within the same physical_package_id as cpu#.

No that statement is wrong; it assumes physical_package_id is a good
identifier for nodes. Clearly this is no longer true.

The idea is that core_siblings (or rather cpu_core_mask) is a mask of
all cores on a node. 

> The sysfs effects here cause an issue with the hwloc tool where
> it gets confused and thinks there are more sockets than are
> physically present.

Meh, so then we need another mask.

The important bit you didn't show was the scheduler domain setup. I
suspect it all works by accident, not by design.

> diff -puN arch/x86/kernel/smpboot.c~hsw-cod-is-sane arch/x86/kernel/smpboot.c
> --- a/arch/x86/kernel/smpboot.c~hsw-cod-is-sane       2014-09-15 
> 14:56:20.012314468 -0700
> +++ b/arch/x86/kernel/smpboot.c       2014-09-15 14:58:58.837506644 -0700
> @@ -344,10 +344,13 @@ static bool match_llc(struct cpuinfo_x86
>  static bool match_mc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  {
>       if (c->phys_proc_id == o->phys_proc_id) {
> -             if (cpu_has(c, X86_FEATURE_AMD_DCM))
> -                     return true;
> -
> -             return topology_sane(c, o, "mc");
> +             /*
> +              * We used to enforce that 'c' and 'o' be on the
> +              * same node, but AMD's DCM and Intel's Cluster-
> +              * on-Die (CoD) support both have physical
> +              * processors that span NUMA nodes.
> +              */
> +             return true;
>       }
>       return false;
>  }

This is wrong (and I suppose the AMD case was already wrong). That
function is supposed to match a multi-core group which is very much
supposed to be smaller-or-equal to a node, not spanning nodes.

The scheduler assumes: SMT <= LLC <= MC <= NODE and if setting the MC
mask to cover multiple nodes works, its by accident.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to