Hi all, I have a 6-node cluster with the buggy L3 H8QG6 AMD boards. Brice Goglin recently provided a fix to Fabian Wein and I applied the same fix (by diffing Fabian's original and Brice's fixed XML and then incorporating the equivalent changes to our XML). It did the trick perfectly, using openmpi-1.10.0 and hwloc 1.11.1. I then proceeded to produce a patched XML for each node in our cluster.
The problem arises when I try to combine the XMLs. To test the assembly of just two XMLs, I ran: hwloc-assembler combo.xml \ --name clusty clusty_fixed.xml \ --name node1 node1_fixed.xml I then set HWLOC_XMLFILE to combo.xml and, when trying to mpirun a test program on either of the two nodes, I get a segfault: andrej@clusty:~/MPI$ mpirun -np 44 python testmpi.py [clusty:19136] *** Process received signal *** [clusty:19136] Signal: Segmentation fault (11) [clusty:19136] Signal code: Address not mapped (1) [clusty:19136] Failing at address: (nil) [clusty:19136] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fdf37f38340] [clusty:19136] [ 1] /usr/local/hwloc/lib/libhwloc.so.5(hwloc_bitmap_and+0x17)[0x7fdf37934e77] [clusty:19136] [ 2] /opt/openmpi-1.10.0/lib/libopen-pal.so.13(opal_hwloc_base_filter_cpus+0x37c)[0x7fdf381b239c] [clusty:19136] [ 3] /opt/openmpi-1.10.0/lib/libopen-pal.so.13(opal_hwloc_base_get_topology+0xcb)[0x7fdf381b412b] [clusty:19136] [ 4] /opt/openmpi-1.10.0/lib/openmpi/mca_ess_hnp.so(+0x47ea)[0x7fdf35c1c7ea] [clusty:19136] [ 5] /opt/openmpi-1.10.0/lib/libopen-rte.so.12(orte_init+0x168)[0x7fdf384062b8] [clusty:19136] [ 6] mpirun[0x404497] [clusty:19136] [ 7] mpirun[0x40363d] [clusty:19136] [ 8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fdf37b81ec5] [clusty:19136] [ 9] mpirun[0x403559] [clusty:19136] *** End of error message *** Segmentation fault (core dumped) Each individual XML file works (i.e. no hwloc complaints and mpirun works perfectly), but the assembled one does not. I'm attaching all three XMLs: clusty.xml, node1.xml and combo.xml. Any ideas? Thanks, Andrej
clusty.xml
Description: XML document
combo.xml
Description: XML document
node1.xml
Description: XML document