Brice Goglin wrote:

Brock Palen wrote:
has anyone done work with hwloc on scalemp systems?  They provide
their own tool numabind, but we are looking for a more generic
solution to process placement and control that works well inside our
MPI library (openMPI in most cases).

Any input on this would be great!

Hello Brock,

From what I remember, ScaleMP uses an hypervisor on each node that
virtually merges all of them into a fake big shared-memory machine. Then
a vanilla Linux kernel runs on top of it. So hwloc should just see
regular cores and NUMA node information, assuming the virtual "merged"
hardware reports all necessary information to the OS.


running lstopo 0.9.3 it appears that howloc does see the extra layer of complexity:

[brockp@nyx0809 INTEL]$ lstopo -
System(79GB)
  Misc0
    Node#0(10GB) + Socket#1 + L3(8192KB)
      L2(256KB) + L1(32KB) + Core#0 + P#0
      L2(256KB) + L1(32KB) + Core#1 + P#1
      L2(256KB) + L1(32KB) + Core#2 + P#2
      L2(256KB) + L1(32KB) + Core#3 + P#3
    Node#1(10GB) + Socket#0 + L3(8192KB)
      L2(256KB) + L1(32KB) + Core#0 + P#4
      L2(256KB) + L1(32KB) + Core#1 + P#5
      L2(256KB) + L1(32KB) + Core#2 + P#6
      L2(256KB) + L1(32KB) + Core#3 + P#7
  Misc0
    Node#2(10GB) + Socket#3 + L3(8192KB)
      L2(256KB) + L1(32KB) + Core#0 + P#8
      L2(256KB) + L1(32KB) + Core#1 + P#9
      L2(256KB) + L1(32KB) + Core#2 + P#10
      L2(256KB) + L1(32KB) + Core#3 + P#11
    Node#3(10GB) + Socket#2 + L3(8192KB)
      L2(256KB) + L1(32KB) + Core#0 + P#12
      L2(256KB) + L1(32KB) + Core#1 + P#13
      L2(256KB) + L1(32KB) + Core#2 + P#14
      L2(256KB) + L1(32KB) + Core#3 + P#15
  Misc0
    Node#4(10GB) + Socket#5 + L3(8192KB)
      L2(256KB) + L1(32KB) + Core#0 + P#16
      L2(256KB) + L1(32KB) + Core#1 + P#17
      L2(256KB) + L1(32KB) + Core#2 + P#18
      L2(256KB) + L1(32KB) + Core#3 + P#19
    Node#5(10GB) + Socket#4 + L3(8192KB)
      L2(256KB) + L1(32KB) + Core#0 + P#20
      L2(256KB) + L1(32KB) + Core#1 + P#21
      L2(256KB) + L1(32KB) + Core#2 + P#22
      L2(256KB) + L1(32KB) + Core#3 + P#23
  Misc0
    Node#6(10GB) + Socket#7 + L3(8192KB)
      L2(256KB) + L1(32KB) + Core#0 + P#24
      L2(256KB) + L1(32KB) + Core#1 + P#25
      L2(256KB) + L1(32KB) + Core#2 + P#26
      L2(256KB) + L1(32KB) + Core#3 + P#27
    Node#7(10GB) + Socket#6 + L3(8192KB)
      L2(256KB) + L1(32KB) + Core#0 + P#28
      L2(256KB) + L1(32KB) + Core#1 + P#29
      L2(256KB) + L1(32KB) + Core#2 + P#30
      L2(256KB) + L1(32KB) + Core#3 + P#31

I don't know why they are all labeled Misc0 but it does see the extra layer.

If you want other information let me know.

There's a bit of ScaleMP code in the Linux kernel, but it does pretty
much nothing, it does not seem to add anything to /proc or /sys for
instance. So I am not sure hwloc could get some specialized knowledge of
ScaleMP machines. Maybe their custom numabind tool knows that ScaleMP
machines only works on machines with some well-defined
types/counts/numbering of processors and NUMA nodes, and thus uses this
information to group sockets/NUMA-nodes depending on their physical
distance.

Brice

_______________________________________________
hwloc-users mailing list
hwloc-us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Reply via email to