FWIW: I can never recall seeing someone use --enable-mca-dso...though I don't 
know if that is the source of the problem.

On Nov 7, 2013, at 6:00 AM, Rolf vandeVaart <rvandeva...@nvidia.com> wrote:

> Hello Solibakke:
> Let me try and reproduce with your configure options.
>  
> Rolf 
>  
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Solibakke Per 
> Bjarte
> Sent: Thursday, November 07, 2013 8:40 AM
> To: 'de...@open-mpi.org'
> Subject: [OMPI devel] MPIRUN error message after ./configure and sudo make 
> all install...
>  
> Hello
> System with:
> Cuda 5.5 and OpenMPI-1.7.3 with system: quadro K5000 and 8 CPUs each with 192 
> GPUs =1536 cores)
>  
> ./configure –with-cuda –with-hwloc –enable-dlopen –enable-mca-dso 
> –enable-shared –enable-vt –with-threads=posix –enable-mpi-thread-multiple 
> –prefix=/usr/local
>  
> Works fine under installation:  ./configure and make, make install
>  
> Error message during mpirun –hostfile…. ./snp_mpi:
>  
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> /home/solibakk/econometrics/snp_applik/npmarkets/elreprorun/snp_mpi: symbol 
> lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: 
> progress_one_cuda_htod_event
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 2 with PID 18385 on
> node PBS-GPU1 exiting improperly. There are three reasons this could occur:
>  
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>  
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>  
> 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
> orte_create_session_dirs is set to false. In this case, the run-time cannot
> detect that the abort call was an abnormal termination. Hence, the only
> error message you will receive is this one.
>  
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
>  
> You can avoid this message by specifying -quiet on the mpirun command line.
>  
>  
> Some suggestions for configure options or mpirun  options?
>  
> The options: enable-mca-no-build=pml-bfo removes the message. However, I 
> cannot reach any of my GPUs only the CPUs.
> In configure I assume: –enable-mca-dso must be effective.
>  
> Any suggestions for the CUDA (GPU support) for massive parallel running?
>  
> Regards
> PBSolibakke
> This email message is for the sole use of the intended recipient(s) and may 
> contain confidential information.  Any unauthorized review, use, disclosure 
> or distribution is prohibited.  If you are not the intended recipient, please 
> contact the sender by reply email and destroy all copies of the original 
> message.
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to