Hi,
I am currently working on an app that is to be run at multiple
clusters simultaneously, each with a private Myri-10G network
(see also my recent mail to the users list, 2007-08-21:
"MX/BTL eager_limit/min_send_size").
Simultaneous use of the MX and TCP BTL used to work fine after a
patch r14963 by George Bosilca, but in latest ompi revisions
I am getting the initialization crash shown below in case I start
the app on multiple clusters with the MX BTL enabled. After some
further searching among the revisions since r14963, the problem
appears to occur consistently as of revision r15011 (related to
the mutex implementation). Maybe someone could have a look?
If you need more info, just let me know.
Thanks!
Kees Verstoep
-----
[node010:11628] *** Process received signal ***
[node010:11628] Signal: Segmentation fault (11)
[node010:11628] Signal code: Address not mapped (1)
[node010:11628] Failing at address: 0xc0
[node010:11628] [ 0] /lib64/tls/libpthread.so.0 [0x38cf60c4f0]
[node010:11628] [ 1]
/usr/local/package/openmpi-trunk/r15011/lib/openmpi/mca_btl_mx.so(mca_btl_mx_add_procs+0x3f8)
[0x2b5008abbb04]
[node010:11628] [ 2]
/usr/local/package/openmpi-trunk/r15011/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x2d2)
[0x2b50089b4acf]
[node010:11628] [ 3]
/usr/local/package/openmpi-trunk/r15011/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0x10f)
[0x2b5008566ccd]
[node010:11628] [ 4]
/usr/local/package/openmpi-trunk/default/lib/libmpi.so.0(ompi_mpi_init+0x9de)
[0x2b5005d04426]
[node010:11628] [ 5]
/usr/local/package/openmpi-trunk/default/lib/libmpi.so.0(MPI_Init+0x159)
[0x2b5005d4b5fd]