Re: [OMPI devel] MPIRUN error message after ./configure and sudo make all install...
Solibakke: I have not reproduced the issue, but I think I have an idea of what is happening. What type of interconnect are you running over in this cluster? Note that in the Open MPI 1.7.3 series, CUDA-aware support is only available within a node and between nodes using the verbs interface over Infiniband. Rolf From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Thursday, November 07, 2013 10:00 AM To: Open MPI Developers Subject: Re: [OMPI devel] MPIRUN error message after ./configure and sudo make all install... FWIW: I can never recall seeing someone use --enable-mca-dso...though I don't know if that is the source of the problem. On Nov 7, 2013, at 6:00 AM, Rolf vandeVaart mailto:rvandeva...@nvidia.com>> wrote: Hello Solibakke: Let me try and reproduce with your configure options. Rolf From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Solibakke Per Bjarte Sent: Thursday, November 07, 2013 8:40 AM To: 'de...@open-mpi.org<mailto:de...@open-mpi.org>' Subject: [OMPI devel] MPIRUN error message after ./configure and sudo make all install... Hello System with: Cuda 5.5 and OpenMPI-1.7.3 with system: quadro K5000 and 8 CPUs each with 192 GPUs =1536 cores) ./configure -with-cuda -with-hwloc -enable-dlopen -enable-mca-dso -enable-shared -enable-vt -with-threads=posix -enable-mpi-thread-multiple -prefix=/usr/local Works fine under installation: ./configure and make, make install Error message during mpirun -hostfile ./snp_mpi: /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbo
Re: [OMPI devel] MPIRUN error message after ./configure and sudo make all install...
FWIW: I can never recall seeing someone use --enable-mca-dso...though I don't know if that is the source of the problem. On Nov 7, 2013, at 6:00 AM, Rolf vandeVaart wrote: > Hello Solibakke: > Let me try and reproduce with your configure options. > > Rolf > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Solibakke Per > Bjarte > Sent: Thursday, November 07, 2013 8:40 AM > To: 'de...@open-mpi.org' > Subject: [OMPI devel] MPIRUN error message after ./configure and sudo make > all install... > > Hello > System with: > Cuda 5.5 and OpenMPI-1.7.3 with system: quadro K5000 and 8 CPUs each with 192 > GPUs =1536 cores) > > ./configure –with-cuda –with-hwloc –enable-dlopen –enable-mca-dso > –enable-shared –enable-vt –with-threads=posix –enable-mpi-thread-multiple > –prefix=/usr/local > > Works fine under installation: ./configure and make, make install > > Error message during mpirun –hostfile…. ./snp_mpi: > > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol > lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: > progress_one_cuda_htod_event > /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun
Re: [OMPI devel] MPIRUN error message after ./configure and sudo make all install...
Hello Solibakke: Let me try and reproduce with your configure options. Rolf From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Solibakke Per Bjarte Sent: Thursday, November 07, 2013 8:40 AM To: 'de...@open-mpi.org' Subject: [OMPI devel] MPIRUN error message after ./configure and sudo make all install... Hello System with: Cuda 5.5 and OpenMPI-1.7.3 with system: quadro K5000 and 8 CPUs each with 192 GPUs =1536 cores) ./configure -with-cuda -with-hwloc -enable-dlopen -enable-mca-dso -enable-shared -enable-vt -with-threads=posix -enable-mpi-thread-multiple -prefix=/usr/local Works fine under installation: ./configure and make, make install Error message during mpirun -hostfile ./snp_mpi: /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_
[OMPI devel] MPIRUN error message after ./configure and sudo make all install...
Hello System with: Cuda 5.5 and OpenMPI-1.7.3 with system: quadro K5000 and 8 CPUs each with 192 GPUs =1536 cores) ./configure -with-cuda -with-hwloc -enable-dlopen -enable-mca-dso -enable-shared -enable-vt -with-threads=posix -enable-mpi-thread-multiple -prefix=/usr/local Works fine under installation: ./configure and make, make install Error message during mpirun -hostfile ./snp_mpi: /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event -- mpirun has exited due to process rank 2 with PID 18385 on node PBS-GPU1 exiting improperly. There are three reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefini