I guess it could be some invalid bus information in PCI bridges. Maybe try to shutdown the node completely and restart it. I've seen other strange PCI issues disappear like this in the past...
Otherwise, please send the tarball generated by "hwloc-gather-topology --io foo". Send it only to me, it will likely be big because --io gathers much more sysfs files for PCI. Brice Le 10/09/2015 21:23, George Bosilca a écrit : > It used to work. Now I don't know exactly when I last updated the > trunk version on the cluster, but not more than 10 days ago. > > lstopo complains with the same assert. Interestingly enough, the same > binary succeed on the other nodes of the same cluster ... > > George. > > > On Thu, Sep 10, 2015 at 3:20 PM, Brice Goglin <brice.gog...@inria.fr > <mailto:brice.gog...@inria.fr>> wrote: > > Did it work on the same machine before? Or did OMPI enable hwloc's > PCI discovery recently? > > Does lstopo complain the same? > > Brice > > > > Le 10/09/2015 21:10, George Bosilca a écrit : >> With the current trunk version I keep getting an assert deep down >> in orted. >> >> orted: >> >> ../../../../../../../ompi/opal/mca/hwloc/hwloc1110/hwloc/src/pci-common.c:177: >> hwloc_pci_try_insert_siblings_below_new_bridge: Assertion `comp >> != HWLOC_PCI_BUSID_SUPERSET' failed. >> >> The stack looks like this: >> >> [dancer18:21100] *** Process received signal *** >> [dancer18:21100] Signal: Aborted (6) >> [dancer18:21100] Signal code: (-6) >> [dancer18:21100] [ 0] /lib64/libpthread.so.0(+0xf710)[0x7fc22ce61710] >> [dancer18:21100] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x7fc22caf0625] >> [dancer18:21100] [ 2] /lib64/libc.so.6(abort+0x175)[0x7fc22caf1e05] >> [dancer18:21100] [ 3] /lib64/libc.so.6(+0x2b74e)[0x7fc22cae974e] >> [dancer18:21100] [ 4] >> /lib64/libc.so.6(__assert_perror_fail+0x0)[0x7fc22cae9810] >> [dancer18:21100] [ 5] >> >> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(+0xb0a62)[0x7fc22ddc6a62] >> [dancer18:21100] [ 6] >> >> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(+0xb0b60)[0x7fc22ddc6b60] >> [dancer18:21100] [ 7] >> >> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(opal_hwloc1110_hwloc_insert_pci_device_list+0x8f)[0x7fc22ddc724c] >> [dancer18:21100] [ 8] >> >> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(+0xbf2d6)[0x7fc22ddd52d6] >> [dancer18:21100] [ 9] >> >> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(+0xd22f7)[0x7fc22dde82f7] >> [dancer18:21100] [10] >> >> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(opal_hwloc1110_hwloc_topology_load+0x1a3)[0x7fc22dde8ee1] >> [dancer18:21100] [11] >> >> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(opal_hwloc_base_get_topology+0x80)[0x7fc22ddb6ece] >> [dancer18:21100] [12] >> >> /home/bosilca/opt/trunk/debug/lib/libopen-rte.so.0(orte_ess_base_orted_setup+0x127)[0x7fc22e0b3523] >> [dancer18:21100] [13] >> >> /home/bosilca/opt/trunk/debug/lib/openmpi/mca_ess_env.so(+0xe45)[0x7fc22c6bbe45] >> [dancer18:21100] [14] >> >> /home/bosilca/opt/trunk/debug/lib/libopen-rte.so.0(orte_init+0x2c6)[0x7fc22e06b55a] >> [dancer18:21100] [15] >> >> /home/bosilca/opt/trunk/debug/lib/libopen-rte.so.0(orte_daemon+0x5c1)[0x7fc22e09a895] >> [dancer18:21100] [16] >> /home/bosilca/opt/trunk/debug/bin/orted[0x40082a] >> [dancer18:21100] [17] >> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fc22cadcd5d] >> [dancer18:21100] [18] >> /home/bosilca/opt/trunk/debug/bin/orted[0x4006e9] >> >> Any ideas? >> >> George. >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org <mailto:de...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/09/17993.php > > > _______________________________________________ > devel mailing list > de...@open-mpi.org <mailto:de...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/17994.php > > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/17995.php