Hi all,

I have a 6-node cluster with the buggy L3 H8QG6 AMD boards. Brice
Goglin recently provided a fix to Fabian Wein and I applied the same
fix (by diffing Fabian's original and Brice's fixed XML and then
incorporating the equivalent changes to our XML). It did the trick
perfectly, using openmpi-1.10.0 and hwloc 1.11.1. I then proceeded to
produce a patched XML for each node in our cluster.

The problem arises when I try to combine the XMLs. To test the assembly
of just two XMLs, I ran:

hwloc-assembler combo.xml \
        --name clusty clusty_fixed.xml \
        --name node1 node1_fixed.xml

I then set HWLOC_XMLFILE to combo.xml and, when trying to mpirun a test
program on either of the two nodes, I get a segfault:

andrej@clusty:~/MPI$ mpirun -np 44 python testmpi.py 
[clusty:19136] *** Process received signal ***
[clusty:19136] Signal: Segmentation fault (11)
[clusty:19136] Signal code: Address not mapped (1)
[clusty:19136] Failing at address: (nil)
[clusty:19136]
[ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fdf37f38340]
[clusty:19136]
[ 1] /usr/local/hwloc/lib/libhwloc.so.5(hwloc_bitmap_and+0x17)[0x7fdf37934e77]
[clusty:19136]
[ 2] 
/opt/openmpi-1.10.0/lib/libopen-pal.so.13(opal_hwloc_base_filter_cpus+0x37c)[0x7fdf381b239c]
[clusty:19136]
[ 3] 
/opt/openmpi-1.10.0/lib/libopen-pal.so.13(opal_hwloc_base_get_topology+0xcb)[0x7fdf381b412b]
[clusty:19136]
[ 4] /opt/openmpi-1.10.0/lib/openmpi/mca_ess_hnp.so(+0x47ea)[0x7fdf35c1c7ea]
[clusty:19136]
[ 5] /opt/openmpi-1.10.0/lib/libopen-rte.so.12(orte_init+0x168)[0x7fdf384062b8]
[clusty:19136] [ 6] mpirun[0x404497] [clusty:19136] [ 7]
mpirun[0x40363d] [clusty:19136]
[ 8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fdf37b81ec5]
[clusty:19136] [ 9] mpirun[0x403559] [clusty:19136] *** End of error
message *** Segmentation fault (core dumped)

Each individual XML file works (i.e. no hwloc complaints and mpirun
works perfectly), but the assembled one does not. I'm attaching all
three XMLs: clusty.xml, node1.xml and combo.xml. Any ideas?

Thanks,
Andrej

Attachment: clusty.xml
Description: XML document

Attachment: combo.xml
Description: XML document

Attachment: node1.xml
Description: XML document

Reply via email to