[OMPI users] Cuda Aware MPI Problem

2013-12-13 Thread Özgür Pekçağlıyan
Hello,

I am having difficulties with compiling openMPI with CUDA support. I have
followed this (http://www.open-mpi.org/faq/?category=building#build-cuda)
faq entry. As below;

$ cd openmpi-1.7.3/
$ ./configure --with-cuda=/urs/local/cuda-5.5
$ make all install

everything goes perfect during compilation. But when I try to execute
simplest mpi hello world application I got following error;

$ mpicc hello.c -o hello
$ mpirun -np 2 hello

hello: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so:
undefined symbol: progress_one_cuda_htod_event
hello: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so:
undefined symbol: progress_one_cuda_htod_event
--
mpirun has exited due to process rank 0 with PID 30329 on
node cudalab1 exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.

--

$ mpirun -np 1 hello

hello: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so:
undefined symbol: progress_one_cuda_htod_event
--
mpirun has exited due to process rank 0 with PID 30327 on
node cudalab1 exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.

--


Any suggestions?
I have two PCs with Intel I3 CPUs and Geforce GTX 480 GPUs.


And here is the hello.c file;

#include 
#include 


int main (int argc, char **argv)
{
  int rank, size;

  MPI_Init (&argc, &argv); /* starts MPI */
  MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
  MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
  printf( "Hello world from process %d of %d\n", rank, size );
  MPI_Finalize();
  return 0;
}




-- 
Özgür Pekçağlıyan
B.Sc. in Computer Engineering
M.Sc. in Computer Engineering


Re: [OMPI users] Cuda Aware MPI Problem

2013-12-13 Thread Özgür Pekçağlıyan
Hello again,

I have compiled openmpi--1.9a1r29873 from nightly build trunk and so far
everything looks alright. But I have not test the cuda support yet.


On Fri, Dec 13, 2013 at 2:38 PM, Özgür Pekçağlıyan <
ozgur.pekcagli...@gmail.com> wrote:

> Hello,
>
> I am having difficulties with compiling openMPI with CUDA support. I have
> followed this (http://www.open-mpi.org/faq/?category=building#build-cuda)
> faq entry. As below;
>
> $ cd openmpi-1.7.3/
> $ ./configure --with-cuda=/urs/local/cuda-5.5
> $ make all install
>
> everything goes perfect during compilation. But when I try to execute
> simplest mpi hello world application I got following error;
>
> $ mpicc hello.c -o hello
> $ mpirun -np 2 hello
>
> hello: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so:
> undefined symbol: progress_one_cuda_htod_event
> hello: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so:
> undefined symbol: progress_one_cuda_htod_event
> --
> mpirun has exited due to process rank 0 with PID 30329 on
> node cudalab1 exiting improperly. There are three reasons this could occur:
>
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
> orte_create_session_dirs is set to false. In this case, the run-time cannot
> detect that the abort call was an abnormal termination. Hence, the only
> error message you will receive is this one.
>
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
>
> You can avoid this message by specifying -quiet on the mpirun command line.
>
> --
>
> $ mpirun -np 1 hello
>
> hello: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so:
> undefined symbol: progress_one_cuda_htod_event
> --
> mpirun has exited due to process rank 0 with PID 30327 on
> node cudalab1 exiting improperly. There are three reasons this could occur:
>
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
> orte_create_session_dirs is set to false. In this case, the run-time cannot
> detect that the abort call was an abnormal termination. Hence, the only
> error message you will receive is this one.
>
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
>
> You can avoid this message by specifying -quiet on the mpirun command line.
>
> --
>
>
> Any suggestions?
> I have two PCs with Intel I3 CPUs and Geforce GTX 480 GPUs.
>
>
> And here is the hello.c file;
>
> #include 
> #include 
>
>
> int main (int argc, char **argv)
> {
>   int rank, size;
>
>   MPI_Init (&argc, &argv); /* starts MPI */
>   MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
>   MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
>   printf( "Hello world from process %d of %d\n", rank, size );
>   MPI_Finalize();
>   return 0;
> }
>
>
>
>
> --
> Özgür Pekçağlıyan
> B.Sc. in Computer Engineering
> M.Sc. in Computer Engineering
>



-- 
Özgür Pekçağlıyan
B.Sc. in Computer Engineering
M.Sc. in Computer Engineering


Re: [OMPI users] Cuda Aware MPI Problem

2013-12-13 Thread Rolf vandeVaart
Yes, this was a bug with Open MPI 1.7.3.  I could not reproduce it, but it was 
definitely an issue in certain configurations.
Here was the fix.   https://svn.open-mpi.org/trac/ompi/changeset/29762

We fixed it in Open MPI 1.7.4 and the trunk version, so as you have seen, they 
do not have the problem.

Rolf


From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Özgür Pekçagliyan
Sent: Friday, December 13, 2013 8:03 AM
To: us...@open-mpi.org
Subject: Re: [OMPI users] Cuda Aware MPI Problem

Hello again,

I have compiled openmpi--1.9a1r29873 from nightly build trunk and so far 
everything looks alright. But I have not test the cuda support yet.

On Fri, Dec 13, 2013 at 2:38 PM, Özgür Pekçağlıyan 
mailto:ozgur.pekcagli...@gmail.com>> wrote:
Hello,

I am having difficulties with compiling openMPI with CUDA support. I have 
followed this (http://www.open-mpi.org/faq/?category=building#build-cuda) faq 
entry. As below;

$ cd openmpi-1.7.3/
$ ./configure --with-cuda=/urs/local/cuda-5.5
$ make all install

everything goes perfect during compilation. But when I try to execute simplest 
mpi hello world application I got following error;

$ mpicc hello.c -o hello
$ mpirun -np 2 hello

hello: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined 
symbol: progress_one_cuda_htod_event
hello: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined 
symbol: progress_one_cuda_htod_event
--
mpirun has exited due to process rank 0 with PID 30329 on
node cudalab1 exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.

--

$ mpirun -np 1 hello

hello: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined 
symbol: progress_one_cuda_htod_event
--
mpirun has exited due to process rank 0 with PID 30327 on
node cudalab1 exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.

--


Any suggestions?
I have two PCs with Intel I3 CPUs and Geforce GTX 480 GPUs.


And here is the hello.c file;
#include 
#include 


int main (int argc, char **argv)
{
  int rank, size;

  MPI_Init (&argc, &argv); /* starts MPI */
  MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
  MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
  printf( "Hello world from process %d of %d\n", rank, size );
  MPI_Finalize();
  return 0;
}



--
Özgür Pekçağlıyan
B.Sc. in Computer Engineering
M.Sc. in Computer Engineering



--
Özgür Pekçağlıyan
B.Sc. in Computer Engineering
M.Sc. in Computer Engineering

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy 

Re: [OMPI users] environment variables and MPI_Comm_spawn

2013-12-13 Thread tom fogal
Hi Ralph, thanks for your help!

Ralph Castain writes:
> It would have to be done via MPI_Info arguments, and we never had a
> request to do so (and hence, don't define such an argument). It would
> be easy enough to do so (look in the ompi/mca/dpm/orte/dpm_orte.c
> code).

Well, I wanted to just report success, but I've only got the easy
side of it: saving the arguments from the MPI_Info arguments into
the orte_job_t struct.  See attached "0003" patch (against trunk).
However, I couldn't figure out how to get the other side: reading out
the environment variables and setting them at fork.  Maybe you could
help with (or do :-) that?

Or just guide me as to where again: I threw abort()s in 'spawn'
functions I found under plm/, but my programs didn't abort and so I'm
not sure where they went.

> MPI implementations generally don't forcibly propagate envars because
> it is so hard to know which ones to handle - it is easy to propagate
> a system envar that causes bad things to happen on the remote end.

I understand.  Though in this case, I'm /trying/ to make Bad Things
(tm) happen ;-).

> One thing you could do, of course, is add that envar to your default
> shell setup (.bashrc or whatever). This would set the variable by
> default on your remote locations (assuming you are using rsh/ssh
> for your launcher), and then any process you start would get
> it. However, that won't help if this is an envar intended only for
> the comm_spawned process.

Unfortunately what I want to play with at the moment are LD_*
variables, and fiddling with these in my .bashrc will mess up a lot
more than just the simulation I am presently hacking.

> I can add this capability to the OMPI trunk, and port it to the 1.7
> release - but we don't go all the way back to the 1.4 series any
> more.

Yes, having this in a 1.7 release would be great!


BTW, I encountered a couple other small things while grepping through
source/waiting for trunk to build, so there are two other small patches
attached.  One gets rid of warnings about unused functions in generated
lexing code.  I believe the second fixes resource leaks on error paths.
However, it turned out none of my user-level code hit that function at
all, so I haven't been able to test it.  Take from it what you will...

-tom

> On Wed, Dec 11, 2013 at 2:10 PM, tom fogal  wrote:
> 
> > Hi all,
> >
> > I'm developing on Open MPI 1.4.5-ubuntu2 on Ubuntu 13.10 (so, Ubuntu's
> > packaged Open MPI) at the moment.
> >
> > I'd like to pass environment variables to processes started via
> > MPI_Comm_spawn.  Unfortunately, the MPI 3.0 standard (at least) does
> > not seem to specify a way to do this; thus I have been searching for
> > implementation-specific ways to accomplish my task.
> >
> > I have tried setting the environment variable using the POSIX setenv(3)
> > call, but it seems that Open MPI comm-spawn'd processes do not inherit
> > environment variables.  See the attached 2 C99 programs; one prints
> > out the environment it receives, and one sets the MEANING_OF_LIFE
> > environment variable, spawns the previous 'env printing' program, and
> > exits.  I run via:
> >
> >   $ env -i HOME=/home/tfogal \
> >   PATH=/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin \
> >   mpirun -x TJFVAR=testing -n 5 ./mpienv ./envpar
> >
> > and expect (well, hope) to find the MEANING_OF_LIFE in 'envpar's
> > output.  I do see TJFVAR, but the MEANING_OF_LIFE sadly does not
> > propagate.  Perhaps I am asking the wrong question...
> >
> > I found another MPI implementation which allowed passing such
> > information via the MPI_Info argument, however I could find no
> > documentation of similar functionality in Open MPI.
> >
> > Is there a way to accomplish what I'm looking for?  I could even be
> > convinced to hack source, but a starting pointer would be appreciated.
> >
> > Thanks,
> >
> > -tom

From 8285a7625e5ea014b9d4df5dd65a7642fd4bc322 Mon Sep 17 00:00:00 2001
From: Tom Fogal 
List-Post: users@lists.open-mpi.org
Date: Fri, 13 Dec 2013 12:03:56 +0100
Subject: [PATCH 1/3] btl: Remove warnings about unused lexing functions.

---
 ompi/mca/btl/openib/btl_openib_lex.l | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/ompi/mca/btl/openib/btl_openib_lex.l b/ompi/mca/btl/openib/btl_openib_lex.l
index 2aa6059..7455b78 100644
--- a/ompi/mca/btl/openib/btl_openib_lex.l
+++ b/ompi/mca/btl/openib/btl_openib_lex.l
@@ -1,3 +1,5 @@
+%option nounput
+%option noinput
 %{ /* -*- C -*- */
 /*
  * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
-- 
1.8.3.2

From dff9fd5ef69f09de6d0fee2236c39a79e8674f92 Mon Sep 17 00:00:00 2001
From: Tom Fogal 
List-Post: users@lists.open-mpi.org
Date: Fri, 13 Dec 2013 13:06:41 +0100
Subject: [PATCH 2/3] mca: cleanup buf, ps when errors occur.

---
 orte/mca/plm/base/plm_base_proxy.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/orte/mca/plm/base/plm_base_proxy.c b/orte/mca/plm/base/plm_base_proxy.c
index 5d2b100..275cb3a 100644
--- a/orte/m

Re: [OMPI users] environment variables and MPI_Comm_spawn

2013-12-13 Thread Jeff Squyres (jsquyres)
Thanks for the first 2 patches, Tom -- I applied them to the SVN trunk and 
scheduled them to go into the v1.7 series.  I don't know if they'll make 1.7.4 
or be pushed to 1.7.5, but they'll get there.

I'll defer to Ralph for the rest of the discussion about info keys.


On Dec 13, 2013, at 9:16 AM, tom fogal  wrote:

> Hi Ralph, thanks for your help!
> 
> Ralph Castain writes:
>> It would have to be done via MPI_Info arguments, and we never had a
>> request to do so (and hence, don't define such an argument). It would
>> be easy enough to do so (look in the ompi/mca/dpm/orte/dpm_orte.c
>> code).
> 
> Well, I wanted to just report success, but I've only got the easy
> side of it: saving the arguments from the MPI_Info arguments into
> the orte_job_t struct.  See attached "0003" patch (against trunk).
> However, I couldn't figure out how to get the other side: reading out
> the environment variables and setting them at fork.  Maybe you could
> help with (or do :-) that?
> 
> Or just guide me as to where again: I threw abort()s in 'spawn'
> functions I found under plm/, but my programs didn't abort and so I'm
> not sure where they went.
> 
>> MPI implementations generally don't forcibly propagate envars because
>> it is so hard to know which ones to handle - it is easy to propagate
>> a system envar that causes bad things to happen on the remote end.
> 
> I understand.  Though in this case, I'm /trying/ to make Bad Things
> (tm) happen ;-).
> 
>> One thing you could do, of course, is add that envar to your default
>> shell setup (.bashrc or whatever). This would set the variable by
>> default on your remote locations (assuming you are using rsh/ssh
>> for your launcher), and then any process you start would get
>> it. However, that won't help if this is an envar intended only for
>> the comm_spawned process.
> 
> Unfortunately what I want to play with at the moment are LD_*
> variables, and fiddling with these in my .bashrc will mess up a lot
> more than just the simulation I am presently hacking.
> 
>> I can add this capability to the OMPI trunk, and port it to the 1.7
>> release - but we don't go all the way back to the 1.4 series any
>> more.
> 
> Yes, having this in a 1.7 release would be great!
> 
> 
> BTW, I encountered a couple other small things while grepping through
> source/waiting for trunk to build, so there are two other small patches
> attached.  One gets rid of warnings about unused functions in generated
> lexing code.  I believe the second fixes resource leaks on error paths.
> However, it turned out none of my user-level code hit that function at
> all, so I haven't been able to test it.  Take from it what you will...
> 
> -tom
> 
>> On Wed, Dec 11, 2013 at 2:10 PM, tom fogal  wrote:
>> 
>>> Hi all,
>>> 
>>> I'm developing on Open MPI 1.4.5-ubuntu2 on Ubuntu 13.10 (so, Ubuntu's
>>> packaged Open MPI) at the moment.
>>> 
>>> I'd like to pass environment variables to processes started via
>>> MPI_Comm_spawn.  Unfortunately, the MPI 3.0 standard (at least) does
>>> not seem to specify a way to do this; thus I have been searching for
>>> implementation-specific ways to accomplish my task.
>>> 
>>> I have tried setting the environment variable using the POSIX setenv(3)
>>> call, but it seems that Open MPI comm-spawn'd processes do not inherit
>>> environment variables.  See the attached 2 C99 programs; one prints
>>> out the environment it receives, and one sets the MEANING_OF_LIFE
>>> environment variable, spawns the previous 'env printing' program, and
>>> exits.  I run via:
>>> 
>>>  $ env -i HOME=/home/tfogal \
>>>  PATH=/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin \
>>>  mpirun -x TJFVAR=testing -n 5 ./mpienv ./envpar
>>> 
>>> and expect (well, hope) to find the MEANING_OF_LIFE in 'envpar's
>>> output.  I do see TJFVAR, but the MEANING_OF_LIFE sadly does not
>>> propagate.  Perhaps I am asking the wrong question...
>>> 
>>> I found another MPI implementation which allowed passing such
>>> information via the MPI_Info argument, however I could find no
>>> documentation of similar functionality in Open MPI.
>>> 
>>> Is there a way to accomplish what I'm looking for?  I could even be
>>> convinced to hack source, but a starting pointer would be appreciated.
>>> 
>>> Thanks,
>>> 
>>> -tom
> 
> From 8285a7625e5ea014b9d4df5dd65a7642fd4bc322 Mon Sep 17 00:00:00 2001
> From: Tom Fogal 
> Date: Fri, 13 Dec 2013 12:03:56 +0100
> Subject: [PATCH 1/3] btl: Remove warnings about unused lexing functions.
> 
> ---
> ompi/mca/btl/openib/btl_openib_lex.l | 2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/ompi/mca/btl/openib/btl_openib_lex.l 
> b/ompi/mca/btl/openib/btl_openib_lex.l
> index 2aa6059..7455b78 100644
> --- a/ompi/mca/btl/openib/btl_openib_lex.l
> +++ b/ompi/mca/btl/openib/btl_openib_lex.l
> @@ -1,3 +1,5 @@
> +%option nounput
> +%option noinput
> %{ /* -*- C -*- */
> /*
>  * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
> -- 
> 1.8.3.2
> 
> From dff9fd5ef69f09d