FWIW: I can never recall seeing someone use --enable-mca-dso...though I don't know if that is the source of the problem.
On Nov 7, 2013, at 6:00 AM, Rolf vandeVaart <rvandeva...@nvidia.com> wrote: > Hello Solibakke: > Let me try and reproduce with your configure options. > > Rolf > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Solibakke Per > Bjarte > Sent: Thursday, November 07, 2013 8:40 AM > To: 'de...@open-mpi.org' > Subject: [OMPI devel] MPIRUN error message after ./configure and sudo make > all install... > > Hello > System with: > Cuda 5.5 and OpenMPI-1.7.3 with system: quadro K5000 and 8 CPUs each with 192 > GPUs =1536 cores) > > ./configure –with-cuda –with-hwloc –enable-dlopen –enable-mca-dso > –enable-shared –enable-vt –with-threads=posix –enable-mpi-thread-multiple > –prefix=/usr/local > > Works fine under installation: ./configure and make, make install > > Error message during mpirun –hostfile…. ./snp_mpi: > > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > -------------------------------------------------------------------------- > mpirun has exited due to process rank 2 with PID 18385 on > node PBS-GPU1 exiting improperly. There are three reasons this could occur: > > 1. this process did not call "init" before exiting, but others in > the job did. This can cause a job to hang indefinitely while it waits > for all processes to call "init". By rule, if one process calls "init", > then ALL processes must call "init" prior to termination. > > 2. this process called "init", but exited without calling "finalize". > By rule, all processes that call "init" MUST call "finalize" prior to > exiting or it will be considered an "abnormal termination" > > 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter > orte_create_session_dirs is set to false. In this case, the run-time cannot > detect that the abort call was an abnormal termination. Hence, the only > error message you will receive is this one. > > This may have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > > You can avoid this message by specifying -quiet on the mpirun command line. > > > Some suggestions for configure options or mpirun options? > > The options: enable-mca-no-build=pml-bfo removes the message. However, I > cannot reach any of my GPUs only the CPUs. > In configure I assume: –enable-mca-dso must be effective. > > Any suggestions for the CUDA (GPU support) for massive parallel running? > > Regards > PBSolibakke > This email message is for the sole use of the intended recipient(s) and may > contain confidential information. Any unauthorized review, use, disclosure > or distribution is prohibited. If you are not the intended recipient, please > contact the sender by reply email and destroy all copies of the original > message. > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel