Can you please let me know when you fix this? I intend to release 1.8.4 by the 
end of the week. Since Mellanox is the only member with IB, you folks have been 
maintaining this BTL.


> On Nov 3, 2014, at 6:26 AM, Alina Sklarevich <ali...@dev.mellanox.co.il> 
> wrote:
> 
> Hi,
> 
> On 1.8.4rc1 we observe the following assert in the osu_mbw_mr test when using 
> the openib BTL.
> 
> When compiled in production mode (i.e. no --enable-debug) the test simply 
> hangs.
> 
> When using either the tcp BTL or the cm PML, the benchmark completes without 
> error.
> 
> The command line to reproduce this is:
> 
> $ mpirun --bind-to core -display-map -mca btl_openib_if_include mlx5_0:1 -np 
> 2 -mca pml ob1 -mca btl openib,self,sm ./osu_mbw_mr
> 
> # OSU MPI Multiple Bandwidth / Message Rate Test v4.4
> # [ pairs: 1 ] [ window size: 64 ]
> # Size                  MB/s        Messages/s
> osu_mbw_mr: ../../../../opal/class/opal_list.h:547: _opal_list_append: 
> Assertion `0 == item->opal_list_item_refcount' failed.
> [vegas15:30395] *** Process received signal ***
> [vegas15:30395] Signal: Aborted (6)
> [vegas15:30395] Signal code:  (-6)
> [vegas15:30395] [ 0] /lib64/libpthread.so.0[0x30bc40f500]
> [vegas15:30395] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x30bc0328a5]
> [vegas15:30395] [ 2] /lib64/libc.so.6(abort+0x175)[0x30bc034085]
> [vegas15:30395] [ 3] /lib64/libc.so.6[0x30bc02ba1e]
> [vegas15:30395] [ 4] /lib64/libc.so.6(__assert_perror_fail+0x0)[0x30bc02bae0]
> [vegas15:30395] [ 5] 
> /labhome/alinas/workspace/tt/ompi_rc1/openmpi-1.8.4rc1/install/lib/openmpi/mca_btl_openib.so(+0x9087)[0x7ffff3f70087]
> [vegas15:30395] [ 6] 
> /labhome/alinas/workspace/tt/ompi_rc1/openmpi-1.8.4rc1/install/lib/openmpi/mca_btl_openib.so(mca_btl_openib_alloc+0x403)[0x7ffff3f754b3]
> [vegas15:30395] [ 7] 
> /labhome/alinas/workspace/tt/ompi_rc1/openmpi-1.8.4rc1/install/lib/openmpi/mca_btl_openib.so(mca_btl_openib_sendi+0xf9e)[0x7ffff3f785b4]
> [vegas15:30395] [ 8] 
> /labhome/alinas/workspace/tt/ompi_rc1/openmpi-1.8.4rc1/install/lib/openmpi/mca_pml_ob1.so(+0xed08)[0x7ffff3308d08]
> [vegas15:30395] [ 9] 
> /labhome/alinas/workspace/tt/ompi_rc1/openmpi-1.8.4rc1/install/lib/openmpi/mca_pml_ob1.so(+0xf8ba)[0x7ffff33098ba]
> [vegas15:30395] [10] 
> /labhome/alinas/workspace/tt/ompi_rc1/openmpi-1.8.4rc1/install/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_isend+0x108)[0x7ffff3309a1f]
> [vegas15:30395] [11] 
> /labhome/alinas/workspace/tt/ompi_rc1/openmpi-1.8.4rc1/install/lib/libmpi.so.1(MPI_Isend+0x2ec)[0x7ffff7cff5e8]
> [vegas15:30395] [12] 
> /hpc/local/benchmarks/hpc-stack-gcc/install/ompi-mellanox-v1.8/tests/osu-micro-benchmarks-4.4/osu_mbw_mr[0x400fa4]
> [vegas15:30395] [13] 
> /hpc/local/benchmarks/hpc-stack-gcc/install/ompi-mellanox-v1.8/tests/osu-micro-benchmarks-4.4/osu_mbw_mr[0x40167d]
> [vegas15:30395] [14] /lib64/libc.so.6(__libc_start_main+0xfd)[0x30bc01ecdd]
> [vegas15:30395] [15] 
> /hpc/local/benchmarks/hpc-stack-gcc/install/ompi-mellanox-v1.8/tests/osu-micro-benchmarks-4.4/osu_mbw_mr[0x400db9]
> [vegas15:30395] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 30395 on node vegas15 exited on 
> signal 6 (Aborted).
> --------------------------------------------------------------------------
> 
> 
> Thanks,
> Alina.
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/11/16142.php

Reply via email to