I'd like to report the following bug with hwloc-1.0.1:
When creating a Linux cpuset (see cpuset(7)) with a subset of resources
of the current machine, and binding a hwloc application to this cpuset,
then the hwloc API may return a broken topology when restricting the
topology to objects that have children.
Working example on a machine running Linux kernel 2.6.16.60-0.42.5-smp
and containing two quad-core Nehalem Sockets X5570 with hyperthreading
enabled (shell prompt is >).
We start with the cpuset named / that contains all 16 logical processing
units and all two memory nodes). We run the lstopo command.
Then we create a cpuset containing the first 5 logical processing units,
and bind the current shell to it. We again run the lstopo command. With
option --merge the output looks strange /does not contain the second
NUMA node with L3 cache, PU #4 is left alone. To the end we compile a
small executable that tries to fetch the common parents of all processor
pairs of the topology. This application crashes with SIGSEGV.
> cat /proc/self/cpuset
/
> /sw/local/packages/hwloc-1.0.1/bin/lstopo
Machine (142GB)
NUMANode #0 (phys=0 71GB) + Socket #0 + L3 #0 (8192KB)
L2 #0 (256KB) + L1 #0 (32KB) + Core #0
PU #0 (phys=0)
PU #1 (phys=8)
L2 #1 (256KB) + L1 #1 (32KB) + Core #1
PU #2 (phys=1)
PU #3 (phys=9)
L2 #2 (256KB) + L1 #2 (32KB) + Core #2
PU #4 (phys=2)
PU #5 (phys=10)
L2 #3 (256KB) + L1 #3 (32KB) + Core #3
PU #6 (phys=3)
PU #7 (phys=11)
NUMANode #1 (phys=1 71GB) + Socket #1 + L3 #1 (8192KB)
L2 #4 (256KB) + L1 #4 (32KB) + Core #4
PU #8 (phys=4)
PU #9 (phys=12)
L2 #5 (256KB) + L1 #5 (32KB) + Core #5
PU #10 (phys=5)
PU #11 (phys=13)
L2 #6 (256KB) + L1 #6 (32KB) + Core #6
PU #12 (phys=6)
PU #13 (phys=14)
L2 #7 (256KB) + L1 #7 (32KB) + Core #7
PU #14 (phys=7)
PU #15 (phys=15)
> /sw/local/packages/hwloc-1.0.1/bin/lstopo --merge
Machine
L3 #0 (8192KB)
Core #0
PU #0 (phys=0)
PU #1 (phys=8)
Core #1
PU #2 (phys=1)
PU #3 (phys=9)
Core #2
PU #4 (phys=2)
PU #5 (phys=10)
Core #3
PU #6 (phys=3)
PU #7 (phys=11)
L3 #1 (8192KB)
Core #4
PU #8 (phys=4)
PU #9 (phys=12)
Core #5
PU #10 (phys=5)
PU #11 (phys=13)
Core #6
PU #12 (phys=6)
PU #13 (phys=14)
Core #7
PU #14 (phys=7)
PU #15 (phys=15)
> /bin/echo 0-4 > /dev/cpuset/mycpuset/cpus
> /bin/echo 0-1 > /dev/cpuset/mycpuset/mems
> /bin/echo $$ > /dev/cpuset/mycpuset/tasks
> /sw/local/packages/hwloc-1.0.1/bin/lstopo
Machine (142GB)
NUMANode #0 (phys=0 71GB) + Socket #0 + L3 #0 (8192KB)
L2 #0 (256KB) + L1 #0 (32KB) + Core #0 + PU #0 (phys=0)
L2 #1 (256KB) + L1 #1 (32KB) + Core #1 + PU #1 (phys=1)
L2 #2 (256KB) + L1 #2 (32KB) + Core #2 + PU #2 (phys=2)
L2 #3 (256KB) + L1 #3 (32KB) + Core #3 + PU #3 (phys=3)
NUMANode #1 (phys=1 71GB) + Socket #1 + L3 #1 (8192KB) + L2 #4 (256KB)
+ L1 #4 (32KB) + Core #4 + PU #4 (phys=4)
> /sw/local/packages/hwloc-1.0.1/bin/lstopo --merge
Machine
L3 #0 (8192KB)
PU #0 (phys=0)
PU #1 (phys=1)
PU #2 (phys=2)
PU #3 (phys=3)
PU #4 (phys=4)
> cat test.c
#include <hwloc.h>
int main(void) {
int npu, i, j;
hwloc_topology_t topology;
hwloc_obj_t *pu, parent;
/* Allocate and initialize topology object. */
hwloc_topology_init(&topology);
/* Perform the topology detection. */
hwloc_topology_ignore_all_keep_structure(topology);
hwloc_topology_load(topology);
/* Collect all HWLOC_OBJ_PU */
npu = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_PU);
pu = (hwloc_obj_t *)malloc(npu * sizeof(hwloc_obj_t *));
pu[0] = hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_PU, NULL);
hwloc_get_closest_objs(topology, pu[0], &pu[1], npu - 1);
/* Determine common parent */
for(i = 0; i < npu - 1; i++) {
for(j = i + 1; j < npu; j++) {
parent = hwloc_get_common_ancestor_obj(topology, pu[i], pu[j]);
printf("%2d %2d common parent type %d\n", i, j, parent->type);
}
}
}
> gcc -I/sw/local/packages/hwloc-1.0.1/include
-L/sw/local/packages/hwloc-1.0.1/lib
-Wl,-rpath,/sw/local/packages/hwloc-1.0.1/lib -lhwloc test.c
> ./a.out
0 1 common parent type 4
0 2 common parent type 4
0 3 common parent type 4
Segmentation fault
--
Dr. Bernd Kallies
Konrad-Zuse-Zentrum für Informationstechnik Berlin
Takustr. 7
14195 Berlin
Tel: +49-30-84185-270
Fax: +49-30-84185-311
e-mail: [email protected]