[OMPI devel] CUDA support doesn't work starting from 1.9a1r27862

2013-01-24 Thread Alessandro Fanfarillo
Dear all,
I would like to report a bug for the CUDA support on the last 5 trunk
versions.
The attached code is a simply send/receive test case which correctly works
with version 1.9a1r27844.

Starting from version 1.9a1r27862 up to 1.9a1r27897 I get the following
message:

./test: symbol lookup error: /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so:
undefined symbol: progress_one_cuda_htod_event
./test: symbol lookup error: /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so:
undefined symbol: progress_one_cuda_htod_event
--
mpirun has exited due to process rank 0 with PID 21641 on
node ip-10-16-24-100 exiting improperly. There are three reasons this could
occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.

-

I'm using gcc-4.7.2 and CUDA 4.2. The test fails also with CUDA 4.1.

Thanks in advance.

Best regards.

Alessandro Fanfarillo


test.tar.bz2
Description: BZip2 compressed data


Re: [OMPI devel] CUDA support doesn't work starting from 1.9a1r27862

2013-01-24 Thread Rolf vandeVaart
Thanks for this report.  I will look into this.  Can you tell me what your 
mpirun command looked like and do you know what transport you are running over?
Specifically, is this on a single node or multiple nodes?

Rolf

From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf 
Of Alessandro Fanfarillo
Sent: Thursday, January 24, 2013 4:11 AM
To: de...@open-mpi.org
Subject: [OMPI devel] CUDA support doesn't work starting from 1.9a1r27862

Dear all,
I would like to report a bug for the CUDA support on the last 5 trunk versions.
The attached code is a simply send/receive test case which correctly works with 
version 1.9a1r27844.
Starting from version 1.9a1r27862 up to 1.9a1r27897 I get the following message:

./test: symbol lookup error: /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so: 
undefined symbol: progress_one_cuda_htod_event
./test: symbol lookup error: /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so: 
undefined symbol: progress_one_cuda_htod_event
--
mpirun has exited due to process rank 0 with PID 21641 on
node ip-10-16-24-100 exiting improperly. There are three reasons this could 
occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.

-
I'm using gcc-4.7.2 and CUDA 4.2. The test fails also with CUDA 4.1.
Thanks in advance.

Best regards.

Alessandro Fanfarillo



---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] CUDA support doesn't work starting from 1.9a1r27862

2013-01-24 Thread Alessandro Fanfarillo
I usually run "mpirun -np 2 ./test". I execute always on a single node. The
message appears either with 1 or 2 GPUs on the single node.


2013/1/24 Rolf vandeVaart 

> Thanks for this report.  I will look into this.  Can you tell me what your
> mpirun command looked like and do you know what transport you are running
> over?
>
> Specifically, is this on a single node or multiple nodes?
>
> ** **
>
> Rolf
>
> ** **
>
> *From:* devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] *On
> Behalf Of *Alessandro Fanfarillo
> *Sent:* Thursday, January 24, 2013 4:11 AM
> *To:* de...@open-mpi.org
> *Subject:* [OMPI devel] CUDA support doesn't work starting from
> 1.9a1r27862
>
> ** **
>
> Dear all,
>
> I would like to report a bug for the CUDA support on the last 5 trunk
> versions.
>
> The attached code is a simply send/receive test case which correctly works
> with version 1.9a1r27844. 
>
> Starting from version 1.9a1r27862 up to 1.9a1r27897 I get the following
> message:
>
> ./test: symbol lookup error:
> /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so: undefined symbol:
> progress_one_cuda_htod_event
> ./test: symbol lookup error:
> /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so: undefined symbol:
> progress_one_cuda_htod_event
> --
> mpirun has exited due to process rank 0 with PID 21641 on
> node ip-10-16-24-100 exiting improperly. There are three reasons this
> could occur:
>
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
> orte_create_session_dirs is set to false. In this case, the run-time cannot
> detect that the abort call was an abnormal termination. Hence, the only
> error message you will receive is this one.
>
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
>
> You can avoid this message by specifying -quiet on the mpirun command line.
> 
>
>
>
> -
> 
>
> I'm using gcc-4.7.2 and CUDA 4.2. The test fails also with CUDA 4.1.
>
> Thanks in advance.
>
> Best regards.
>
> Alessandro Fanfarillo
>
> ** **
>
> ** **
>  --
>  This email message is for the sole use of the intended recipient(s) and
> may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
>  --
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>