Brice,

I confirm your patch solves the issue I reported earlier for OMPI. I did
not try it on a standalone HWLOC, so I am not sure that it maintains the
coherency of the output. If you want I can give it a try.

Thanks,
  George.


On Thu, Sep 10, 2015 at 6:08 PM, Brice Goglin <brice.gog...@inria.fr> wrote:

> Try this patch (it applies to hwloc v1.9-v1.11, it should be OK against
> OMPI's tree).
> Your bridge 22:00.0 says it contains the master bus 00. It causes a cycle
> in hwloc's insert algorithm, caught be the assertion. The patch just
> removes this invalid bridge entirely.
>
> Brice
>
>
>
> Le 10/09/2015 21:23, George Bosilca a écrit :
>
> It used to work. Now I don't know exactly when I last updated the trunk
> version on the cluster, but not more than 10 days ago.
>
> lstopo complains with the same assert. Interestingly enough, the same
> binary succeed on the other nodes of the same cluster ...
>
>   George.
>
>
> On Thu, Sep 10, 2015 at 3:20 PM, Brice Goglin <brice.gog...@inria.fr>
> wrote:
>
>> Did it work on the same machine before? Or did OMPI enable hwloc's PCI
>> discovery recently?
>>
>> Does lstopo complain the same?
>>
>> Brice
>>
>>
>>
>> Le 10/09/2015 21:10, George Bosilca a écrit :
>>
>> With the current trunk version I keep getting an assert deep down in
>> orted.
>>
>> orted:
>> ../../../../../../../ompi/opal/mca/hwloc/hwloc1110/hwloc/src/pci-common.c:177:
>> hwloc_pci_try_insert_siblings_below_new_bridge: Assertion `comp !=
>> HWLOC_PCI_BUSID_SUPERSET' failed.
>>
>> The stack looks like this:
>>
>> [dancer18:21100] *** Process received signal ***
>> [dancer18:21100] Signal: Aborted (6)
>> [dancer18:21100] Signal code:  (-6)
>> [dancer18:21100] [ 0] /lib64/libpthread.so.0(+0xf710)[0x7fc22ce61710]
>> [dancer18:21100] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x7fc22caf0625]
>> [dancer18:21100] [ 2] /lib64/libc.so.6(abort+0x175)[0x7fc22caf1e05]
>> [dancer18:21100] [ 3] /lib64/libc.so.6(+0x2b74e)[0x7fc22cae974e]
>> [dancer18:21100] [ 4]
>> /lib64/libc.so.6(__assert_perror_fail+0x0)[0x7fc22cae9810]
>> [dancer18:21100] [ 5]
>> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(+0xb0a62)[0x7fc22ddc6a62]
>> [dancer18:21100] [ 6]
>> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(+0xb0b60)[0x7fc22ddc6b60]
>> [dancer18:21100] [ 7]
>> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(opal_hwloc1110_hwloc_insert_pci_device_list+0x8f)[0x7fc22ddc724c]
>> [dancer18:21100] [ 8]
>> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(+0xbf2d6)[0x7fc22ddd52d6]
>> [dancer18:21100] [ 9]
>> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(+0xd22f7)[0x7fc22dde82f7]
>> [dancer18:21100] [10]
>> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(opal_hwloc1110_hwloc_topology_load+0x1a3)[0x7fc22dde8ee1]
>> [dancer18:21100] [11]
>> /home/bosilca/opt/trunk/debug/lib/libopen-pal.so.0(opal_hwloc_base_get_topology+0x80)[0x7fc22ddb6ece]
>> [dancer18:21100] [12]
>> /home/bosilca/opt/trunk/debug/lib/libopen-rte.so.0(orte_ess_base_orted_setup+0x127)[0x7fc22e0b3523]
>> [dancer18:21100] [13]
>> /home/bosilca/opt/trunk/debug/lib/openmpi/mca_ess_env.so(+0xe45)[0x7fc22c6bbe45]
>> [dancer18:21100] [14]
>> /home/bosilca/opt/trunk/debug/lib/libopen-rte.so.0(orte_init+0x2c6)[0x7fc22e06b55a]
>> [dancer18:21100] [15]
>> /home/bosilca/opt/trunk/debug/lib/libopen-rte.so.0(orte_daemon+0x5c1)[0x7fc22e09a895]
>> [dancer18:21100] [16] /home/bosilca/opt/trunk/debug/bin/orted[0x40082a]
>> [dancer18:21100] [17]
>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fc22cadcd5d]
>> [dancer18:21100] [18] /home/bosilca/opt/trunk/debug/bin/orted[0x4006e9]
>>
>> Any ideas?
>>
>>   George.
>>
>>
>>
>> _______________________________________________
>> devel mailing listde...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/09/17993.php
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/09/17994.php
>>
>
>
>
> _______________________________________________
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/17995.php
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/17997.php
>

Reply via email to