Hi folks!
I am trying to launch *MPI master branch* with srun (simple send/recv
program, see attach) and using *openib*, but unfortunately I get a *segfault
*.
Below is my workflow.
1) I configured ompi/master with following line:
./autogen.sh && ./configure --prefix=$PWD/install --with-openib --with-pmi
&& make -j3 && make install -j3
2) exported (along with PATH and LD_LIBRARY_PATH) OMPI_MCA_btl variable:
export OMPI_MCA_btl=self,openib
3) and launched with following line:
mpicc ~/usefull_tests/mpi_init.c && srun -n 2 ./a.out
Eventually I get following error:
srun: error: mir6: task 1: Segmentation fault (core dumped)
srun: Terminating job step 17309.2
with following backtrace:
#0 0x00007f856c47b1d0 in ?? ()
#1 <signal handler called>
#2 0x00007f856d12d721 in rml_recv_cb (status=0, process_name=0x2027c50,
buffer=0x7f857084ed10,
tag=102, cbdata=0x0) at connect/btl_openib_connect_oob.c:823
#3 0x00007f857553ffb0 in orte_rml_base_process_msg (fd=-1, flags=4,
cbdata=0x2027b80)
at base/rml_base_msg_handlers.c:172
#4 0x00007f857522a6c6 in event_process_active_single_queue
(base=0x1ed6c60, activeq=0x1ec9210)
at event.c:1367
#5 0x00007f857522a93e in event_process_active (base=0x1ed6c60) at
event.c:1437
#6 0x00007f857522afbc in opal_libevent2021_event_base_loop
(base=0x1ed6c60, flags=1) at event.c:1645
#7 0x00007f85754ccc19 in orte_progress_thread_engine (obj=0x7f857577cf20)
at runtime/orte_init.c:180
#8 0x0000003b5a6077f1 in start_thread () from /lib64/libpthread.so.0
#9 0x0000003b59ee570d in clone () from /lib64/libc.so.6
Can anybody please help with a reason of such failure?
P.s. I use Red Hat Enterprise Linux Server release 6.2 with InfiniBand
cards.
Thanks in advance,
Victor Kocheganov.
#include "mpi.h" /* PROVIDES THE BASIC MPI DEFINITION AND TYPES */
#include "stdio.h"
int main(int argc, char **argv) {
int rank, size, i;
int buffer[10];
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (size < 2)
{
printf("Please run with two processes.\n");fflush(stdout);
MPI_Finalize();
return 0;
}
if (rank == 0)
{
for (i=0; i<10; i++)
buffer[i] = i;
MPI_Send(buffer, 10, MPI_INT, 1, 123, MPI_COMM_WORLD);
}
if (rank == 1)
{
for (i=0; i<10; i++)
buffer[i] = -1;
MPI_Recv(buffer, 10, MPI_INT, 0, 123, MPI_COMM_WORLD, &status);
for (i=0; i<10; i++)
{
if (buffer[i] != i)
printf("Error: buffer[%d] = %d but is expected to be %d\n", i, buffer[i], i);
}
fflush(stdout);
}
MPI_Finalize();
return 0;
}