Le 22/12/2017 à 11:42, Samuel Thibault a écrit :
> Hello,
>
> Brice Goglin, on mar. 19 déc. 2017 11:48:39 +0100, wrote:
>>   + Memory, I/O and Misc objects are now stored in dedicated children lists,
>>     not in the usual children list that is now only used for CPU-side 
>> objects.
>>     - hwloc_get_next_child() may still be used to iterate over these 4 lists
>>       of children at once.
> I hadn't realized this before: so the NUMA-related hierarchy level can
> not be easily obtained with hwloc_get_type_depth and such, that's really
> a concern. For instance in slurm-llnl one can find
>
>       if (hwloc_get_type_depth(topology, HWLOC_OBJ_NODE) >
>           hwloc_get_type_depth(topology, HWLOC_OBJ_SOCKET)) {
>
> and probably others are doing this too, e.g. looking up from a CPU to
> find the NUMA level becomes very different from looking up from a cPU to
> find the L3 level etc.
>
> Instead of moving these objects to another place which is very
> different to find, can't we rather create another type of object, e.g.
> HWLOC_OBJ_MEMORY, to represent the different kinds of memories that can
> be found in a given NUMA level, and keep HWLOC_OBJ_NODE as it is?
>

That won't work. You can have memory attached at different levels of the
hierarchy (things like HBM inside a die, normal memory attached to a
package, and slow memory attached to the memory interconnect). The
notion of NUMA node and proximity domain is changing. It's not a set of
CPU+memory anymore. Things are moving towards the separation of "memory
initiator" (CPUs) and "memory target" (memory banks, possibly behind
memory-side caches). And those targets can be attached to different things.



I agree that finding local NUMA nodes is harder now. I thought about
having an explicit type saying "I have memory children, other don't"
(you propose NUMA with MEMORY children, I rather thought about MEMORY
with NUMA children because people are used to NUMA node numbers, and
memory-bind to NUMA nodes). But again, there's no guarantee that they
will be at the same depth in the hierarchy since they might be attached
to different kinds of resources. Things like comparing their depth with
socket depth won't work either. So we'd end up with multiple levels just
like Groups.

I will add helpers to simplify the lookup (give me my local NUMA node if
there's a single one, give me the number of "normal" NUMA nodes so I can
split the machine in parts, ...) but it's too early to add these, we
need more feedback first.



About Slurm-llnl, their code is obsolete anyway. NUMA is inside Socket
in all modern architectures. So they expose a "Socket" resource that is
actually a NUMA node. They used an easy way to detect whether there are
multiple NUMAs per socket or the contrary. We can still detect that in
v2.0, even if the code is different. Once we'll understand what they
*really* want to do, we'll help them update that code to v2.0.

Brice

_______________________________________________
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

Reply via email to