[OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-20 Thread Brock Palen
We managed to have another user hit the bug that causes collectives (this time 
MPI_Bcast() ) to hang on IB that was fixed by setting:

btl_openib_cpc_include rdmacm

My question is if we set this to the default on our system with an environment 
variable does it introduce any performance or other issues we should be aware 
of?

Is there a reason we should not use rdmacm?

Thanks!

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985






Re: [OMPI users] Need help buiding OpenMPI with Intel v12.0 compilers on Linux

2011-04-20 Thread Gus Correa

Ormiston, Scott J. wrote:

At Tue, 19 Apr 2011 15:30:41 -0400, Gus Correa wrote:


Is it possible that the Intel compiler environment is not set?

Just in case, did you source the right Intel scripts to setup
the icc, icpc, and ifort environment?
Something like this (for a 64-bit machine):

source /opt/intel/composerxe-2011.1.107/bin/compilervars.csh intel64

and perhaps a similar command for icc/icpc.
Check the compiler documentation for details.


I did source that same startup file.


I only have ifort 12.0 in one of our machines here, no icc or icpc.
However, the OS is CentOS 5.4 64-bit,
and I compiled OpenMPI 1.4.3 there with gcc, g++ and ifort
without any problem.
I would guess you can do it with icc, icpc and ifort too.

Another possibility is some name mangling issue.
Maybe the leading double underscore on the C symbols?


I the OS is Centos 5 (not sure which version), 64bit, and OpenMPI 1.4.3.

I originally thought the configure was fine, but now tht I check through 
the config.log, I see that it had errors:


conftest.c(49): error #2379: cannot open source file "ac_nonexistent.h"
  #include 

conftest.c(58): catastrophic error: $error directive: Normal Unix 
environment

#error Normal Unix environment

conftest.c (102): error :expected and expression
 if (sizeof (( long long )))
  ^
conftest.c (103): error :expected and expression
 if (sizeof (( long double )))
^

conftest.c (104): error :expected and expression
 if (sizeof (( int8_t )))
   ^

conftest.c (105): error :expected and expression
 if (sizeof (( uint8_t )))
^
conftest.c (106): error :expected and expression
 if (sizeof (( int16_t )))
^
conftest.c (107): error :expected and expression
 if (sizeof (( uint16_t )))
 ^
and so on. Other errors occurred in
conftest.cpp
conftest.f


conftest.F
conftest.f90

Does anyone know what I am missing here?

Thanks.
Scott Ormiston
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Hi Scott

I would guess ac_nonexistent.h is supposed to be what the name says,
a non-existent file.
A little googling around gave me this wisdom!  :)
In any case, I have the same message on my config.log,
but no harm was caused, and OpenMPI built correctly.

However, as for the 'catastrophic error', I don't have it here.

Would you perhaps be missing some RPMs on your system e.g. 
autoconf,automake,libtool, or gmake, or some 'devel' or 'compat' RPMs 
for gcc and friends?

Anyway, this is just a wild guess.
For one thing, on some computers here we always have to install
some of these 'devel' and 'compat' packages
to build the Intel compiler right.
The standard CentOS (and Fedora, and RHEL)
installation skips quite a bit of the code development tools.
See the compiler release notes and other documentation for details.

I think it may be time for the OpenMPI developers
to take over this thread and give you a hand.

My two meager and useless cents,
Gus Correa


Re: [OMPI users] mpirun unsuccessful when run across multiple nodes

2011-04-20 Thread Jeff Squyres
You need to compile your cpi.c to get an executable.  This is not an MPI issue. 
 :-)

Also, mpdboot is part of a different MPI implementation named MPICH; you don't 
need to run mpdboot with Open MPI.  If you have further questions about MPICH, 
you'll need to ping them on their mailing list -- we aren't able to answer 
MPICH questions here, sorry!

(background: MPI = a book.  It's a specification.  There's a bunch of different 
implementations of that specification available; Open MPI is one [great] one.  
:-)  MPICH is another.  There are also others.)


On Apr 20, 2011, at 10:24 AM, mohd naseem wrote:

> folloeing error shows
> 
> [mpiuser@f2 programrun]$ mpiexec -np 4 ./cpi.c
> problem with execution of ./cpi.c  on  f2:  [Errno 13] Permission denied 
> problem with execution of ./cpi.c  on  f2:  [Errno 13] Permission denied 
> [mpiuser@f2 programrun]$ mpdboot -n 2 -v
> totalnum=2  numhosts=1
> there are not enough hosts on which to start all processes
> 
> 
> 
> On Wed, Apr 20, 2011 at 7:51 PM, mohd naseem  wrote:
> sir i m still not able to trace all the hosts
> following error shows
> 
> 
> 
> [mpiuser@f2 programrun]$ mpiexec -np 4 ./cpi.c
> problem with execution of ./cpi.c  on  f2:  [Errno 13] Permission denied 
> problem with execution of ./cpi.c  on  f2:  [Errno 13] Permission denied 
> 
> 
> 
> On Tue, Apr 19, 2011 at 8:25 PM, Ralph Castain  wrote:
> You have to tell mpiexec what nodes you want to use for your application. 
> This is typically done either on the command line or in a file. For now, you 
> could just do this:
> 
> mpiexec -host node1,node2,node3 -np N ./my_app
> 
> where node1,node2,node3,...  are the names or IP addresses of the nodes you 
> want to run on, and N is the number of total processes you want executed.
> 
> 
> On Apr 19, 2011, at 8:47 AM, mohd naseem wrote:
> 
>> 
>> sorry sir,
>> 
>> i am unable to understand what u are saying ? becoz i am a new user of mpi.
>> 
>> please tell me details about it and command also
>> 
>> thanks
>> 
>> 
>>  
>> On Tue, Apr 19, 2011 at 2:32 PM, Reuti  wrote:
>> Good, then please supply a hostfile with the names of the machines you want 
>> to run for a particular run and give it as option to `mpiexec`. See options 
>> -np and -machinefile.
>> 
>> -- Reuti
>> 
>> 
>> Am 19.04.2011 um 06:38 schrieb mohd naseem:
>> 
>> > sir
>> > when i give mpiexec hostname command.
>> > it only give one hostname. rest are not shown.
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Mon, Apr 18, 2011 at 7:46 PM, Reuti  wrote:
>> > Am 18.04.2011 um 15:40 schrieb chenjie gu:
>> >
>> > > I am a green hand on Openmpi, I have the following Openmpi structure, 
>> > > however it has problem when running across multiple nodes.
>> > > I am trying to build a Bewolf Cluster between 6 nodes of our serve (HP 
>> > > Proliant G460 G7), I have installed the Openmpi on one node (assuming at 
>> > > /mirror),
>> > > ./configure --prefix=/mirror/openmpi CC=icc CXX=icpc F77=ifort FC=ifort
>> > > make all install
>> > >
>> > > using NFS, the directory of /mirror was successfully exported to the 
>> > > rest of 5 nodes. Now as I test the Openmpi, it runs very well on a 
>> > > single node,
>> > > however it hangs across multiple nodes.
>> > >
>> > > Now one possible reason as I know is that Openmpi uses TCP to exchange 
>> > > data between different nodes, so I am worried about
>> > > whether there are firewalls between each nodes, which can be factory 
>> > > integrated at somewhere(switch/NIC). Could anyone give me some
>> > > information on this point?
>> >
>> > It's not only about MPI communcation. Before you need some means to allow 
>> > the startup of the local orte daemons on each machine by passphraseless 
>> > ssh-keys or better hostbased authentication 
>> > http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html , or enable `rsh` on the 
>> > machines and tell Open MPI to use it. Is:
>> >
>> > mpiexec hostname
>> >
>> > giving you a list of the involved machines?
>> >
>> > -- Reuti
>> >
>> >
>> > > Thanks a lot,
>> > > Regards,
>> > > ArchyGU
>> > > Nanyang Technological University
>> > > ___
>> > > users mailing list
>> > > us...@open-mpi.org
>> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> >
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 

Re: [OMPI users] Need help buiding OpenMPI with Intel v12.0 compilers on Linux

2011-04-20 Thread Ormiston, Scott J.

At Tue, 19 Apr 2011 15:30:41 -0400, Gus Correa wrote:


Is it possible that the Intel compiler environment is not set?

Just in case, did you source the right Intel scripts to setup
the icc, icpc, and ifort environment?
Something like this (for a 64-bit machine):

source /opt/intel/composerxe-2011.1.107/bin/compilervars.csh intel64

and perhaps a similar command for icc/icpc.
Check the compiler documentation for details.


I did source that same startup file.


I only have ifort 12.0 in one of our machines here, no icc or icpc.
However, the OS is CentOS 5.4 64-bit,
and I compiled OpenMPI 1.4.3 there with gcc, g++ and ifort
without any problem.
I would guess you can do it with icc, icpc and ifort too.

Another possibility is some name mangling issue.
Maybe the leading double underscore on the C symbols?


I the OS is Centos 5 (not sure which version), 64bit, and OpenMPI 1.4.3.

I originally thought the configure was fine, but now tht I check  
through the config.log, I see that it had errors:


conftest.c(49): error #2379: cannot open source file "ac_nonexistent.h"
  #include 

conftest.c(58): catastrophic error: $error directive: Normal Unix environment
#error Normal Unix environment

conftest.c (102): error :expected and expression
 if (sizeof (( long long )))
  ^
conftest.c (103): error :expected and expression
 if (sizeof (( long double )))
^

conftest.c (104): error :expected and expression
 if (sizeof (( int8_t )))
   ^

conftest.c (105): error :expected and expression
 if (sizeof (( uint8_t )))
^
conftest.c (106): error :expected and expression
 if (sizeof (( int16_t )))
^
conftest.c (107): error :expected and expression
 if (sizeof (( uint16_t )))
 ^
and so on. Other errors occurred in
conftest.cpp
conftest.f


conftest.F
conftest.f90

Does anyone know what I am missing here?

Thanks.
Scott Ormiston


Re: [OMPI users] mpirun unsuccessful when run across multiple nodes

2011-04-20 Thread mohd naseem
folloeing error shows

[mpiuser@f2 programrun]$ mpiexec -np 4 ./cpi.c
problem with execution of ./cpi.c  on  f2:  [Errno 13] Permission denied
problem with execution of ./cpi.c  on  f2:  [Errno 13] Permission denied
[mpiuser@f2 programrun]$ mpdboot -n 2 -v
totalnum=2  numhosts=1
there are not enough hosts on which to start all processes



On Wed, Apr 20, 2011 at 7:51 PM, mohd naseem wrote:

> sir i m still not able to trace all the hosts
> following error shows
>
>
>
> [mpiuser@f2 programrun]$ mpiexec -np 4 ./cpi.c
> problem with execution of ./cpi.c  on  f2:  [Errno 13] Permission denied
> problem with execution of ./cpi.c  on  f2:  [Errno 13] Permission denied
>
>
>
> On Tue, Apr 19, 2011 at 8:25 PM, Ralph Castain  wrote:
>
>> You have to tell mpiexec what nodes you want to use for your application.
>> This is typically done either on the command line or in a file. For now, you
>> could just do this:
>>
>> mpiexec -host node1,node2,node3 -np N ./my_app
>>
>> where node1,node2,node3,...  are the names or IP addresses of the nodes
>> you want to run on, and N is the number of total processes you want
>> executed.
>>
>>
>> On Apr 19, 2011, at 8:47 AM, mohd naseem wrote:
>>
>>
>> sorry sir,
>>
>> i am unable to understand what u are saying ? becoz i am a new user of
>> mpi.
>>
>> please tell me details about it and command also
>>
>> thanks
>>
>>
>>
>> On Tue, Apr 19, 2011 at 2:32 PM, Reuti wrote:
>>
>>> Good, then please supply a hostfile with the names of the machines you
>>> want to run for a particular run and give it as option to `mpiexec`. See
>>> options -np and -machinefile.
>>>
>>> -- Reuti
>>>
>>>
>>> Am 19.04.2011 um 06:38 schrieb mohd naseem:
>>>
>>> > sir
>>> > when i give mpiexec hostname command.
>>> > it only give one hostname. rest are not shown.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Mon, Apr 18, 2011 at 7:46 PM, Reuti 
>>> wrote:
>>> > Am 18.04.2011 um 15:40 schrieb chenjie gu:
>>> >
>>> > > I am a green hand on Openmpi, I have the following Openmpi structure,
>>> however it has problem when running across multiple nodes.
>>> > > I am trying to build a Bewolf Cluster between 6 nodes of our serve
>>> (HP Proliant G460 G7), I have installed the Openmpi on one node (assuming at
>>> /mirror),
>>> > > ./configure --prefix=/mirror/openmpi CC=icc CXX=icpc F77=ifort
>>> FC=ifort
>>> > > make all install
>>> > >
>>> > > using NFS, the directory of /mirror was successfully exported to the
>>> rest of 5 nodes. Now as I test the Openmpi, it runs very well on a single
>>> node,
>>> > > however it hangs across multiple nodes.
>>> > >
>>> > > Now one possible reason as I know is that Openmpi uses TCP to
>>> exchange data between different nodes, so I am worried about
>>> > > whether there are firewalls between each nodes, which can be factory
>>> integrated at somewhere(switch/NIC). Could anyone give me some
>>> > > information on this point?
>>> >
>>> > It's not only about MPI communcation. Before you need some means to
>>> allow the startup of the local orte daemons on each machine by
>>> passphraseless ssh-keys or better hostbased authentication
>>> http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html , or enable `rsh` on
>>> the machines and tell Open MPI to use it. Is:
>>> >
>>> > mpiexec hostname
>>> >
>>> > giving you a list of the involved machines?
>>> >
>>> > -- Reuti
>>> >
>>> >
>>> > > Thanks a lot,
>>> > > Regards,
>>> > > ArchyGU
>>> > > Nanyang Technological University
>>> > > ___
>>> > > users mailing list
>>> > > us...@open-mpi.org
>>> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> >
>>> >
>>> > ___
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> >
>>> > ___
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>


Re: [OMPI users] mpirun unsuccessful when run across multiple nodes

2011-04-20 Thread mohd naseem
sir i m still not able to trace all the hosts
following error shows



[mpiuser@f2 programrun]$ mpiexec -np 4 ./cpi.c
problem with execution of ./cpi.c  on  f2:  [Errno 13] Permission denied
problem with execution of ./cpi.c  on  f2:  [Errno 13] Permission denied



On Tue, Apr 19, 2011 at 8:25 PM, Ralph Castain  wrote:

> You have to tell mpiexec what nodes you want to use for your application.
> This is typically done either on the command line or in a file. For now, you
> could just do this:
>
> mpiexec -host node1,node2,node3 -np N ./my_app
>
> where node1,node2,node3,...  are the names or IP addresses of the nodes you
> want to run on, and N is the number of total processes you want executed.
>
>
> On Apr 19, 2011, at 8:47 AM, mohd naseem wrote:
>
>
> sorry sir,
>
> i am unable to understand what u are saying ? becoz i am a new user of mpi.
>
> please tell me details about it and command also
>
> thanks
>
>
>
> On Tue, Apr 19, 2011 at 2:32 PM, Reuti  wrote:
>
>> Good, then please supply a hostfile with the names of the machines you
>> want to run for a particular run and give it as option to `mpiexec`. See
>> options -np and -machinefile.
>>
>> -- Reuti
>>
>>
>> Am 19.04.2011 um 06:38 schrieb mohd naseem:
>>
>> > sir
>> > when i give mpiexec hostname command.
>> > it only give one hostname. rest are not shown.
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Mon, Apr 18, 2011 at 7:46 PM, Reuti 
>> wrote:
>> > Am 18.04.2011 um 15:40 schrieb chenjie gu:
>> >
>> > > I am a green hand on Openmpi, I have the following Openmpi structure,
>> however it has problem when running across multiple nodes.
>> > > I am trying to build a Bewolf Cluster between 6 nodes of our serve (HP
>> Proliant G460 G7), I have installed the Openmpi on one node (assuming at
>> /mirror),
>> > > ./configure --prefix=/mirror/openmpi CC=icc CXX=icpc F77=ifort
>> FC=ifort
>> > > make all install
>> > >
>> > > using NFS, the directory of /mirror was successfully exported to the
>> rest of 5 nodes. Now as I test the Openmpi, it runs very well on a single
>> node,
>> > > however it hangs across multiple nodes.
>> > >
>> > > Now one possible reason as I know is that Openmpi uses TCP to exchange
>> data between different nodes, so I am worried about
>> > > whether there are firewalls between each nodes, which can be factory
>> integrated at somewhere(switch/NIC). Could anyone give me some
>> > > information on this point?
>> >
>> > It's not only about MPI communcation. Before you need some means to
>> allow the startup of the local orte daemons on each machine by
>> passphraseless ssh-keys or better hostbased authentication
>> http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html , or enable `rsh` on
>> the machines and tell Open MPI to use it. Is:
>> >
>> > mpiexec hostname
>> >
>> > giving you a list of the involved machines?
>> >
>> > -- Reuti
>> >
>> >
>> > > Thanks a lot,
>> > > Regards,
>> > > ArchyGU
>> > > Nanyang Technological University
>> > > ___
>> > > users mailing list
>> > > us...@open-mpi.org
>> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> >
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>