[OMPI users] 100% CPU doing nothing!?

2009-04-21 Thread Ross Boylan
I'm using Rmpi (a pretty thin wrapper around MPI for R) on Debian Lenny
(amd64).  My set up has a central calculator and a bunch of slaves to
wich work is distributed.

The slaves wait like this:
mpi.send(as.double(0), doubleType, root, requestCode, comm=comm)
request <- request+1
cases <- mpi.recv(cases, integerType, root, mpi.any.tag(),
comm=comm)

I.e., they do a simple send and then a receive.

It's possible there's no one to talk to, so it could be stuck at
mpi.send or mpi.recv.

Are either of those operations that should chew up CPU?  At this point,
I'm just trying to figure out where to look for the source of the
problem.

Running openmpi-bin 1.2.7~rc2-2

Ross



Re: [OMPI users] COMM_ACCEPT/COMM_CONNECT: what BTL willthe connected processes use?

2009-04-21 Thread Jeff Squyres

On Apr 21, 2009, at 12:14 PM, Katz, Jacob wrote:

So, sm will never be chosen in this case in the current  
implementation, correct?




Correct.  This is mainly a limitation of our current implementation.   
There have been some ideas kicked around on how to fix it, and I think  
there's even been a little work in this area, but nothing has been  
finalized yet.


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] COMM_ACCEPT/COMM_CONNECT: what BTL will the connected processes use?

2009-04-21 Thread George Bosilca
No, we do not expose such kind of information to the upper layer. If  
you really want I can tell you how to do it in a dirty way, but only  
if you really need to know...


  george.

On Apr 21, 2009, at 12:14 , Katz, Jacob wrote:

So, sm will never be chosen in this case in the current  
implementation, correct?
Is there an API or another method to find out what BTL is currently  
used (either inside the application code or externally)?


Thanks.

Jacob M. Katz | jacob.k...@intel.com | Work: +972-4-865-5726 | iNet:  
(8)-465-5726



-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]  
On Behalf Of George Bosilca

Sent: Tuesday, April 21, 2009 17:48
To: Open MPI Users
Subject: Re: [OMPI users] COMM_ACCEPT/COMM_CONNECT: what BTL will  
the connected processes use?


With few exceptions, Open MPI will choose the best BTL. There are two
exceptions I know about:
1. sm - we didn't figure out a clean way to do it, nor we spent too
much time trying to
2. elan - the initialization of the device is a global operation, and
we cannot guarantee that all nodes are involved in the accept/connect.

  george.

On Apr 21, 2009, at 09:28 , Katz, Jacob wrote:


Hi,

In a dynamically connected client/server-style application, where
the server uses MPI_OPEN_PORT/MPI_COMM_ACCEPT and the client uses
MPI_COMM_CONNECT, what will be the communication method (BTL) chosen
by OMPI? Will the communication thru the resultant inter-
communicator use TCP, or will OMPI choose the best possible method
(e.g. sm if the client and the server are on the same node)?

Thanks.

Jacob M. Katz | jacob.k...@intel.com | Work: +972-4-865-5726 | iNet:
(8)-465-5726

-
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
-
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Problem with running openMPI program

2009-04-21 Thread Gus Correa

Hi Ankush

Ankush Kaul wrote:

@Eugene
they are ok but we wanted something better, which would more clearly 
show de diff in using a single pc and the cluster.


@Prakash
i had prob with running de programs as they were compiling using mpcc n 
not mpicc


@gus
we are tryin 2 figure out de hpl config, its quite complicated, 


I sent you some sketchy instructions to build HPL,
on my last message to this thread.
I built HPL and run it here yesterday that way.
Did you try my suggestions?
Where did you get stuck?

also de 
locate command lists lots of confusing results.




I would say the list is just long, not really confusing.
You can  find what you need if you want.
Pipe the output of locate through "more", and search carefully.
If you are talking about BLAS try "locate libblas.a" and
"locate libgoto.a".
Those are the libraries you need, and if they are not there
you need to install one of them.
Read my previous email for details.
I hope it will help you get HPL working, if you are interested on HPL.

I hope this helps.

Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-


@jeff
i think u are correct we may have installed openmpi without VT support, 
but is there anythin we can do now???


One more thing I found this program but dont know how to run it : 
http://www.cis.udel.edu/~pollock/367/manual/node35.html


Thanks 2 all u guys 4 putting in so much efforts to help us out.




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Why do I only see 1 process running? Please help!

2009-04-21 Thread Jeff Squyres
It depends on how you configured Open MPI (i.e., ran the "configure"  
script).  If you don't specify, Open MPI will install itself into /usr/ 
local/bin.  Or you can specify where to install it via the --prefix  
parameter to configure.  For example:


./configure --prefix=/opt/openmpi-1.3.1

Will put the executables in /opt/openmpi-1.3.1/bin, for example.

See the README file for a bunch of relevant information on building  
and installing Open MPI.



On Apr 21, 2009, at 12:47 PM, Grady Laksmono wrote:


hey Thanks a lot,
well, I build the open-mpi package on the Desktop of RHEL 4.7 and  
then I followed the instruction to put the path, which I believed  
were written as /etc/openmpi/bin and /etc/openmpi/lib, which there's  
no such a path on my Linux installation.. I'm wondering if there's a  
tutorial that specify the specific step that I need to take for RHEL?


On Tue, Apr 21, 2009 at 6:45 AM, Jeff Squyres   
wrote:
These kinds of messages are symptomatic that you compiled your  
applications with one version of Open MPI and ran with another.  You  
might want to ensure that your examples are compiled against the  
same version of Open MPI that you're running with.



On Apr 17, 2009, at 5:38 PM, Grady Laksmono wrote:

Hi, here's what I have:

hello_cxx example
[hpc@localhost examples]$ mpirun -n 2 hello_cxx
hello_cxx: Symbol `_ZN3MPI10COMM_WORLDE' has different size in  
shared object, co nsider re-linking
hello_cxx: Symbol `_ZN3MPI10COMM_WORLDE' has different size in  
shared object, co nsider re-linking

Hello, world!  I am 0 of 1
libibverbs: Fatal: couldn't read uverbs ABI version.
--
[0,0,0]: OpenIB on host localhost.localdomain was unable to find any  
HCAs.

Another transport will be used instead, although this may result in
lower performance.
--
libibverbs: Fatal: couldn't read uverbs ABI version.
--
[0,0,0]: OpenIB on host localhost.localdomain was unable to find any  
HCAs.

Another transport will be used instead, although this may result in
lower performance.
--
Hello, world!  I am 0 of 1

ring_cxx example
[hpc@localhost examples]$ mpirun -n 2 ring_cxx
ring_cxx: Symbol `_ZN3MPI10COMM_WORLDE' has different size in shared  
object, consider re-linking
ring_cxx: Symbol `_ZN3MPI10COMM_WORLDE' has different size in shared  
object, consider re-linking

libibverbs: Fatal: couldn't read uverbs ABI version.
libibverbs: Fatal: couldn't read uverbs ABI version.
--
[0,0,0]: OpenIB on host localhost.localdomain was unable to find any  
HCAs.

Another transport will be used instead, although this may result in
lower performance.
--
--
[0,0,0]: OpenIB on host localhost.localdomain was unable to find any  
HCAs.

Another transport will be used instead, although this may result in
lower performance.
--
Process 0 sending 10 to 0, tag 201 (1 processes in ring)
Process 0 sending 10 to 0, tag 201 (1 processes in ring)
Process 0 sent to 0
Process 0 sent to 0
Process 0 decremented value: 9
Process 0 decremented value: 8
Process 0 decremented value: 7
Process 0 decremented value: 6
Process 0 decremented value: 5
Process 0 decremented value: 4
Process 0 decremented value: 3
Process 0 decremented value: 2
Process 0 decremented value: 1
Process 0 decremented value: 0
Process 0 exiting
Process 0 decremented value: 9
Process 0 decremented value: 8
Process 0 decremented value: 7
Process 0 decremented value: 6
Process 0 decremented value: 5
Process 0 decremented value: 4
Process 0 decremented value: 3
Process 0 decremented value: 2
Process 0 decremented value: 1
Process 0 decremented value: 0
Process 0 exiting

which is weird, I'm not sure what's wrong, but one thing that I  
realized is that the documentation for running openmpi is outdated?  
here's my $PATH and $LD_LIBRARY_PATH


[hpc@localhost ~]$ cat .bash_profile
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
   . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin:/usr/lib/openmpi/1.2.5-gcc/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/1.2.5-gcc/lib

export PATH
export LD_LIBRARY_PATH
unset USERNAME

It's different that what the documentation had, because there's I  
couldn't find the files in the /opt/openmpi

I hope that anyone could help?

Thanks a lot!

-- Grady
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Why do I only see 1 process running? Please help!

2009-04-21 Thread Grady Laksmono
hey Thanks a lot,
well, I build the open-mpi package on the Desktop of RHEL 4.7 and then I
followed the instruction to put the path, which I believed were written as
/etc/openmpi/bin and /etc/openmpi/lib, which there's no such a path on my
Linux installation.. I'm wondering if there's a tutorial that specify the
specific step that I need to take for RHEL?

On Tue, Apr 21, 2009 at 6:45 AM, Jeff Squyres  wrote:

> These kinds of messages are symptomatic that you compiled your applications
> with one version of Open MPI and ran with another.  You might want to ensure
> that your examples are compiled against the same version of Open MPI that
> you're running with.
>
>
> On Apr 17, 2009, at 5:38 PM, Grady Laksmono wrote:
>
>  Hi, here's what I have:
>>
>> hello_cxx example
>> [hpc@localhost examples]$ mpirun -n 2 hello_cxx
>> hello_cxx: Symbol `_ZN3MPI10COMM_WORLDE' has different size in shared
>> object, co nsider re-linking
>> hello_cxx: Symbol `_ZN3MPI10COMM_WORLDE' has different size in shared
>> object, co nsider re-linking
>> Hello, world!  I am 0 of 1
>> libibverbs: Fatal: couldn't read uverbs ABI version.
>> --
>> [0,0,0]: OpenIB on host localhost.localdomain was unable to find any HCAs.
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --
>> libibverbs: Fatal: couldn't read uverbs ABI version.
>> --
>> [0,0,0]: OpenIB on host localhost.localdomain was unable to find any HCAs.
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --
>> Hello, world!  I am 0 of 1
>>
>> ring_cxx example
>> [hpc@localhost examples]$ mpirun -n 2 ring_cxx
>> ring_cxx: Symbol `_ZN3MPI10COMM_WORLDE' has different size in shared
>> object, consider re-linking
>> ring_cxx: Symbol `_ZN3MPI10COMM_WORLDE' has different size in shared
>> object, consider re-linking
>> libibverbs: Fatal: couldn't read uverbs ABI version.
>> libibverbs: Fatal: couldn't read uverbs ABI version.
>> --
>> [0,0,0]: OpenIB on host localhost.localdomain was unable to find any HCAs.
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --
>> --
>> [0,0,0]: OpenIB on host localhost.localdomain was unable to find any HCAs.
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --
>> Process 0 sending 10 to 0, tag 201 (1 processes in ring)
>> Process 0 sending 10 to 0, tag 201 (1 processes in ring)
>> Process 0 sent to 0
>> Process 0 sent to 0
>> Process 0 decremented value: 9
>> Process 0 decremented value: 8
>> Process 0 decremented value: 7
>> Process 0 decremented value: 6
>> Process 0 decremented value: 5
>> Process 0 decremented value: 4
>> Process 0 decremented value: 3
>> Process 0 decremented value: 2
>> Process 0 decremented value: 1
>> Process 0 decremented value: 0
>> Process 0 exiting
>> Process 0 decremented value: 9
>> Process 0 decremented value: 8
>> Process 0 decremented value: 7
>> Process 0 decremented value: 6
>> Process 0 decremented value: 5
>> Process 0 decremented value: 4
>> Process 0 decremented value: 3
>> Process 0 decremented value: 2
>> Process 0 decremented value: 1
>> Process 0 decremented value: 0
>> Process 0 exiting
>>
>> which is weird, I'm not sure what's wrong, but one thing that I realized
>> is that the documentation for running openmpi is outdated? here's my $PATH
>> and $LD_LIBRARY_PATH
>>
>> [hpc@localhost ~]$ cat .bash_profile
>> # .bash_profile
>>
>> # Get the aliases and functions
>> if [ -f ~/.bashrc ]; then
>>. ~/.bashrc
>> fi
>>
>> # User specific environment and startup programs
>>
>> PATH=$PATH:$HOME/bin:/usr/lib/openmpi/1.2.5-gcc/bin
>> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/1.2.5-gcc/lib
>>
>> export PATH
>> export LD_LIBRARY_PATH
>> unset USERNAME
>>
>> It's different that what the documentation had, because there's I couldn't
>> find the files in the /opt/openmpi
>> I hope that anyone could help?
>>
>> Thanks a lot!
>>
>> -- Grady
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> --
> Jeff Squyres
> Cisco Systems
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Grady Laksmono
gradyfau...@laksmono.com
www.laksmono.com


Re: [OMPI users] COMM_ACCEPT/COMM_CONNECT: what BTL will the connected processes use?

2009-04-21 Thread Katz, Jacob
So, sm will never be chosen in this case in the current implementation, correct?
Is there an API or another method to find out what BTL is currently used 
(either inside the application code or externally)?

Thanks.

Jacob M. Katz | jacob.k...@intel.com | Work: +972-4-865-5726 | iNet: 
(8)-465-5726


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of George Bosilca
Sent: Tuesday, April 21, 2009 17:48
To: Open MPI Users
Subject: Re: [OMPI users] COMM_ACCEPT/COMM_CONNECT: what BTL will the connected 
processes use?

With few exceptions, Open MPI will choose the best BTL. There are two
exceptions I know about:
1. sm - we didn't figure out a clean way to do it, nor we spent too
much time trying to
2. elan - the initialization of the device is a global operation, and
we cannot guarantee that all nodes are involved in the accept/connect.

   george.

On Apr 21, 2009, at 09:28 , Katz, Jacob wrote:

> Hi,
>
> In a dynamically connected client/server-style application, where
> the server uses MPI_OPEN_PORT/MPI_COMM_ACCEPT and the client uses
> MPI_COMM_CONNECT, what will be the communication method (BTL) chosen
> by OMPI? Will the communication thru the resultant inter-
> communicator use TCP, or will OMPI choose the best possible method
> (e.g. sm if the client and the server are on the same node)?
>
> Thanks.
> 
> Jacob M. Katz | jacob.k...@intel.com | Work: +972-4-865-5726 | iNet:
> (8)-465-5726
>
> -
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
-
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.




Re: [OMPI users] Automatic checkpoint/restart in OpenMPI

2009-04-21 Thread Josh Hursey


On Apr 20, 2009, at 9:29 PM, ESTEBAN MENESES ROJAS wrote:


   Hello.
   Is there any way to automatically checkpoint/restart an  
application in OpenMPI? This is, checkpointing the application  
without using the command ompi-checkpoint, perhaps via a function  
call in the application's code itself. The same with the restart  
after a failure.


Currently Open MPI only supports checkpointing/restart applications  
using the ompi-checkpoint command and restarting with the ompi-restart  
command. We do not expose a function call for the application to start  
the checkpoint operation internally.


On a temporary branch, I developed an interface as part of a proposal  
to the MPI Forum. It works for a coordinated checkpoint (all processes  
must call the function similar to barrier). In its current state, it  
is not ready to come to the trunk just yet since there is some support  
structure missing that I am still working on.


This branch does not expose an interface to restart a process. What  
that interface should look like quickly becomes a much more difficult  
question. If you have ideas on the interface signature and semantics I  
would be interested in hearing about them.




   On a related note, what is the default behavior of an OpenMPI  
application after one process fails? Does the runtime shut down the  
whole application?


If a process fails Open MPI, by default, will terminate the whole  
application. Work is in progress by a couple of the core development  
teams to provide alternative failure modes, but I do not think any of  
this work has made it to the development trunk yet.


Best,
Josh



   Thanks. ___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Problem with running openMPI program

2009-04-21 Thread Eugene Loh




Ankush Kaul wrote:
@Eugene
they are ok but we wanted something better, which would more clearly
show de diff in using a single pc and the cluster.
  
Another option is the NAS Parallel Benchmarks.  They are older, but
well known, self-verifying, report performance, and relatively small
and accessible.
@Prakash
  i had prob with running de
programs as they were compiling using mpcc n not mpicc
  
@gus
we are tryin 2 figure out de hpl config, its quite complicated, also de
locate command lists lots of confusing results.
  
@jeff
i think u are correct we may have installed openmpi without VT support,
but is there anythin we can do now???

Reinstall OMPI?
One more thing I found this program but dont know how to
run it : http://www.cis.udel.edu/~pollock/367/manual/node35.html

That may depend on more than just MPI.  You need some graphics.  You
might need the MPICH MPE environment.

If I understand where you're at on this, you might also try writing
your own MPI programs.  Run something simple.  Then something a little
more complicated.  And so on.  Build something bit by bit.  Good luck.




Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-04-21 Thread Ralph Castain
I'm working on it - the code was not written for multiple app_contexts, and
I have to fix a few compensating errors as well.

Hope to have it in the next couple of days.



On Tue, Apr 21, 2009 at 8:24 AM, Geoffroy Pignot wrote:

> Hi Lenny,
>
> Here is the basic mpirun command I would like to run :
>
> mpirun -rf rankfile -n 1 -host r001n001 master.x options1  : -n 1 -host
> r001n002 master.x options2 : -n 1 -host r001n001 slave.x options3 : -n 1
> -host r001n002 slave.x options4
>
> with cat rankfile
> rank 0=r001n001 slot=0:*
> rank 1=r001n002 slot=0:*
> rank 3=r001n001 slot=1:*
> rank 4=r001n002 slot=1:*
>
> It should be equivalent and more elegant to run :
> mpirun -hostfile myhostfile -rf rankfile -n 1 master.x options1 : -n 1
> master.x options2 : -n 1 slave.x options3 : -n 1 slave.x options4
>
> with cat myhostfile
> r001n001 slots=2
> r001n002 slots=2
>
> I hope these examples will set you straight about I want to do
>
> Regards
>
> Geoffroy
>
>
>
>>
>> It's something in the basis, right,
>> I tried to investigate it yesterday and saw that for some reason
>> jdata->bookmark->index is 2 instead of 1 ( in this example ).
>>
>> [dellix7:28454] [
>> ../../../../../orte/mca/rmaps/rank_file/rmaps_rank_file.c
>> +417 ]  node->index = 1, jdata->bookmark->index=2
>> [dellix7:28454] [
>> ../../../../../orte/mca/rmaps/rank_file/rmaps_rank_file.c
>> +417 ]  node->index = 2, jdata->bookmark->index=2
>> I am not so familiar with this part of code, since it appears in all rmap
>> component and I just copied it :).
>>
>> I am also not quite understand what Geoffroy tries to run, so I can think
>> od
>> workaround.
>> Lenny.
>>
>>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Problem with running openMPI program

2009-04-21 Thread Ankush Kaul
@Eugene
they are ok but we wanted something better, which would more clearly show de
diff in using a single pc and the cluster.

@Prakash
i had prob with running de programs as they were compiling using mpcc n not
mpicc

@gus
we are tryin 2 figure out de hpl config, its quite complicated, also de
locate command lists lots of confusing results.

@jeff
i think u are correct we may have installed openmpi without VT support, but
is there anythin we can do now???

One more thing I found this program but dont know how to run it :
http://www.cis.udel.edu/~pollock/367/manual/node35.html

Thanks 2 all u guys 4 putting in so much efforts to help us out.


[OMPI users] Could following situations caused by RDMA mca parameters?

2009-04-21 Thread Tsung Han Shie
Dear all

I tried to increase speed of a program with openmpi-1.1.3 by adding
following 4 parameters into openmpi-mca-params.conf file.

mpi_leave_pinned=1
btl_openib_eager_rdma_num=128
btl_openib_max_eager_rdma=128
btl_openib_eager_limit=1024

and then, I ran my program twice(124 processes on 31 nodes). one with
"mpi_leave_pinned=1", another with "mpi_leave_pinned=0".
All of them were stopped abnormally with "ctrl+c" and "killall -9
".
After that, I couldn't start to run that program again.
I checked every nodes with "free -m" and I found that huge amount of cached
memory were used in each nodes.
Could this situation be caused by those 4 parameters? IS there anyway to
free theme?

Best regard

T. H. Hsieh
MS. Student, NTU, Taiwan.


Re: [OMPI users] COMM_ACCEPT/COMM_CONNECT: what BTL will the connected processes use?

2009-04-21 Thread George Bosilca
With few exceptions, Open MPI will choose the best BTL. There are two  
exceptions I know about:
1. sm - we didn't figure out a clean way to do it, nor we spent too  
much time trying to
2. elan - the initialization of the device is a global operation, and  
we cannot guarantee that all nodes are involved in the accept/connect.


  george.

On Apr 21, 2009, at 09:28 , Katz, Jacob wrote:


Hi,

In a dynamically connected client/server-style application, where  
the server uses MPI_OPEN_PORT/MPI_COMM_ACCEPT and the client uses  
MPI_COMM_CONNECT, what will be the communication method (BTL) chosen  
by OMPI? Will the communication thru the resultant inter- 
communicator use TCP, or will OMPI choose the best possible method  
(e.g. sm if the client and the server are on the same node)?


Thanks.

Jacob M. Katz | jacob.k...@intel.com | Work: +972-4-865-5726 | iNet:  
(8)-465-5726


-
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Problems with SSH

2009-04-21 Thread Luis Vitorio Cargnini

Hi,
Please I did as mentioned into the FAQ for SSH password-less but the  
mpirun still requesting me the password ?


-bash-3.2$ mpirun -d -v -hostfile chosts -np 16  ./hello
[cluster-srv0.logti.etsmtl.ca:31929] procdir: /tmp/openmpi-sessions-AH72000@cluster-srv0.logti.etsmtl.ca_0 
/41688/0/0
[cluster-srv0.logti.etsmtl.ca:31929] jobdir: /tmp/openmpi-sessions-AH72000@cluster-srv0.logti.etsmtl.ca_0 
/41688/0

[cluster-srv0.logti.etsmtl.ca:31929] top: 
openmpi-sessions-AH72000@cluster-srv0.logti.etsmtl.ca_0
[cluster-srv0.logti.etsmtl.ca:31929] tmp: /tmp
[cluster-srv0.logti.etsmtl.ca:31929] mpirun: reset PATH: /export/ 
cluster/appl/x86_64/llvm/bin:/bin:/sbin:/export/cluster/appl/x86_64/ 
llvm/bin:/usr/local/llvm/bin:/usr/local/bin:/usr/bin:/usr/sbin:/home/ 
GTI420/AH72000/oe/bitbake/bin
[cluster-srv0.logti.etsmtl.ca:31929] mpirun: reset LD_LIBRARY_PATH: / 
export/cluster/appl/x86_64/llvm/lib:/lib64:/lib:/export/cluster/appl/ 
x86_64/llvm/lib:/usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib
ah72...@cluster-srv1.logti.etsmtl.ca's password: ah72...@cluster-srv2.logti.etsmtl.ca 
's password: ah72...@cluster-srv3.logti.etsmtl.ca's password:
[cluster-srv1.logti.etsmtl.ca:02621] procdir: /tmp/openmpi-sessions-AH72000@cluster-srv1.logti.etsmtl.ca_0 
/41688/0/1
[cluster-srv1.logti.etsmtl.ca:02621] jobdir: /tmp/openmpi-sessions-AH72000@cluster-srv1.logti.etsmtl.ca_0 
/41688/0

[cluster-srv1.logti.etsmtl.ca:02621] top: 
openmpi-sessions-AH72000@cluster-srv1.logti.etsmtl.ca_0
[cluster-srv1.logti.etsmtl.ca:02621] tmp: /tmp


Permission denied, please try again.
ah72...@cluster-srv2.logti.etsmtl.ca's password:
[cluster-srv3.logti.etsmtl.ca:09730] procdir: /tmp/openmpi-sessions-AH72000@cluster-srv3.logti.etsmtl.ca_0 
/41688/0/3
[cluster-srv3.logti.etsmtl.ca:09730] jobdir: /tmp/openmpi-sessions-AH72000@cluster-srv3.logti.etsmtl.ca_0 
/41688/0

[cluster-srv3.logti.etsmtl.ca:09730] top: 
openmpi-sessions-AH72000@cluster-srv3.logti.etsmtl.ca_0
[cluster-srv3.logti.etsmtl.ca:09730] tmp: /tmp

Permission denied, please try again.
ah72...@cluster-srv2.logti.etsmtl.ca's password:
[cluster-srv2.logti.etsmtl.ca:12802] procdir: /tmp/openmpi-sessions-AH72000@cluster-srv2.logti.etsmtl.ca_0 
/41688/0/2
[cluster-srv2.logti.etsmtl.ca:12802] jobdir: /tmp/openmpi-sessions-AH72000@cluster-srv2.logti.etsmtl.ca_0 
/41688/0

[cluster-srv2.logti.etsmtl.ca:12802] top: 
openmpi-sessions-AH72000@cluster-srv2.logti.etsmtl.ca_0
[cluster-srv2.logti.etsmtl.ca:12802] tmp: /tmp



smime.p7s
Description: S/MIME cryptographic signature


PGP.sig
Description: Ceci est une signature électronique PGP


[OMPI users] few Problems

2009-04-21 Thread Luis Vitorio Cargnini

Hi,
Please someone can answer me which can be this problem ?
 daemon INVALID arch ffc91200




the debug output:
[[41704,1],14] node[4].name cluster-srv4 daemon INVALID arch ffc91200
[cluster-srv3:09684] [[41704,1],13] node[0].name cluster-srv0 daemon 0  
arch ffc91200
[cluster-srv3:09684] [[41704,1],13] node[1].name cluster-srv1 daemon 1  
arch ffc91200
[cluster-srv3:09684] [[41704,1],13] node[2].name cluster-srv2 daemon 2  
arch ffc91200
[cluster-srv3:09684] [[41704,1],13] node[3].name cluster-srv3 daemon 3  
arch ffc91200
[cluster-srv3:09684] [[41704,1],13] node[4].name cluster-srv4 daemon  
INVALID arch ffc91200


ORTE_ERROR_LOG: A message is attempting to be sent to a process whose  
contact information is unknown in file rml_oob_send.c at line 105

smime.p7s
Description: S/MIME cryptographic signature


PGP.sig
Description: Ceci est une signature électronique PGP


Re: [OMPI users] Open-MPI and gprof

2009-04-21 Thread Eugene Loh

jody wrote:


Hi
I wanted to profile my application using gprof, and proceeded like
when profiling a normal application:
- compile everything with option -pg
- run application
- call gprof
This returns a normal-looking output, but i don't know
whether this is the data for node 0 only or accumulated for all nodes.

Does anybody have experience in profiling parallel applications?
Is there a way to have profile data for each node separately?
If not, is there another profiling tool which can?
 

Gosh, I'm trying not to sound like a repeating commercial, but this is a 
rather direct answer to your question.


If you use Sun Studio compilers and tools, there is a Performance 
Analyzer.  The basic mode of operation is that it samples the callstack 
periodically.  So, you don't get the huge data volumes that tracing 
tools generate, but you do get statistically fair data that shows where 
time is spent.  If you preface your "mpirun" command with "collect", 
then you get data for all the MPI processes in your job.  You can look 
at data aggregated over all processes or for some subset.  You can get 
gprof-style information about where time is spent.  You can also trace 
MPI calls, the memory heap, hardware events (like cache misses), etc.  
Tool is available from http://developers.sun.com/sunstudio/ via free 
download for Linux and Solaris on on x86 and SPARC.  You don't need to 
compile your program specially (I mean, no -pg).  Fine print applies to 
every statement I'm making in this paragraph, but I'm trying to keep it 
short.


Again, sorry if it sounds like a commercial, but it's intended to be a 
direct answer to your question.


P.S.  If you go to 
http://developers.sun.com/sunstudio/documentation/demos/index.jsp , 
"halfway down" is a set of presentations on "How to Perform Analysis".  
This can give you more information on Performance Analyzer.  I don't 
know how much, if any, is specific to MPI, but should be helpful.


Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-04-21 Thread Geoffroy Pignot
Hi Lenny,

Here is the basic mpirun command I would like to run :

mpirun -rf rankfile -n 1 -host r001n001 master.x options1  : -n 1 -host
r001n002 master.x options2 : -n 1 -host r001n001 slave.x options3 : -n 1
-host r001n002 slave.x options4

with cat rankfile
rank 0=r001n001 slot=0:*
rank 1=r001n002 slot=0:*
rank 3=r001n001 slot=1:*
rank 4=r001n002 slot=1:*

It should be equivalent and more elegant to run :
mpirun -hostfile myhostfile -rf rankfile -n 1 master.x options1 : -n 1
master.x options2 : -n 1 slave.x options3 : -n 1 slave.x options4

with cat myhostfile
r001n001 slots=2
r001n002 slots=2

I hope these examples will set you straight about I want to do

Regards

Geoffroy


>
> It's something in the basis, right,
> I tried to investigate it yesterday and saw that for some reason
> jdata->bookmark->index is 2 instead of 1 ( in this example ).
>
> [dellix7:28454] [ ../../../../../orte/mca/rmaps/rank_file/rmaps_rank_file.c
> +417 ]  node->index = 1, jdata->bookmark->index=2
> [dellix7:28454] [ ../../../../../orte/mca/rmaps/rank_file/rmaps_rank_file.c
> +417 ]  node->index = 2, jdata->bookmark->index=2
> I am not so familiar with this part of code, since it appears in all rmap
> component and I just copied it :).
>
> I am also not quite understand what Geoffroy tries to run, so I can think
> od
> workaround.
> Lenny.
>
>


[OMPI users] Open-MPI and gprof

2009-04-21 Thread jody
Hi
I wanted to profile my application using gprof, and proceeded like
when profiling a normal application:
- compile everything with option -pg
- run application
- call gprof
This returns a normal-looking output, but i don't know
whether this is the data for node 0 only or accumulated for all nodes.

Does anybody have experience in profiling parallel applications?
Is there a way to have profile data for each node separately?
If not, is there another profiling tool which can?

Thank You
  Jody


Re: [OMPI users] Why do I only see 1 process running? Please help!

2009-04-21 Thread Jeff Squyres
These kinds of messages are symptomatic that you compiled your  
applications with one version of Open MPI and ran with another.  You  
might want to ensure that your examples are compiled against the same  
version of Open MPI that you're running with.


On Apr 17, 2009, at 5:38 PM, Grady Laksmono wrote:


Hi, here's what I have:

hello_cxx example
[hpc@localhost examples]$ mpirun -n 2 hello_cxx
hello_cxx: Symbol `_ZN3MPI10COMM_WORLDE' has different size in  
shared object, co nsider re-linking
hello_cxx: Symbol `_ZN3MPI10COMM_WORLDE' has different size in  
shared object, co nsider re-linking

Hello, world!  I am 0 of 1
libibverbs: Fatal: couldn't read uverbs ABI version.
--
[0,0,0]: OpenIB on host localhost.localdomain was unable to find any  
HCAs.

Another transport will be used instead, although this may result in
lower performance.
--
libibverbs: Fatal: couldn't read uverbs ABI version.
--
[0,0,0]: OpenIB on host localhost.localdomain was unable to find any  
HCAs.

Another transport will be used instead, although this may result in
lower performance.
--
Hello, world!  I am 0 of 1

ring_cxx example
[hpc@localhost examples]$ mpirun -n 2 ring_cxx
ring_cxx: Symbol `_ZN3MPI10COMM_WORLDE' has different size in shared  
object, consider re-linking
ring_cxx: Symbol `_ZN3MPI10COMM_WORLDE' has different size in shared  
object, consider re-linking

libibverbs: Fatal: couldn't read uverbs ABI version.
libibverbs: Fatal: couldn't read uverbs ABI version.
--
[0,0,0]: OpenIB on host localhost.localdomain was unable to find any  
HCAs.

Another transport will be used instead, although this may result in
lower performance.
--
--
[0,0,0]: OpenIB on host localhost.localdomain was unable to find any  
HCAs.

Another transport will be used instead, although this may result in
lower performance.
--
Process 0 sending 10 to 0, tag 201 (1 processes in ring)
Process 0 sending 10 to 0, tag 201 (1 processes in ring)
Process 0 sent to 0
Process 0 sent to 0
Process 0 decremented value: 9
Process 0 decremented value: 8
Process 0 decremented value: 7
Process 0 decremented value: 6
Process 0 decremented value: 5
Process 0 decremented value: 4
Process 0 decremented value: 3
Process 0 decremented value: 2
Process 0 decremented value: 1
Process 0 decremented value: 0
Process 0 exiting
Process 0 decremented value: 9
Process 0 decremented value: 8
Process 0 decremented value: 7
Process 0 decremented value: 6
Process 0 decremented value: 5
Process 0 decremented value: 4
Process 0 decremented value: 3
Process 0 decremented value: 2
Process 0 decremented value: 1
Process 0 decremented value: 0
Process 0 exiting

which is weird, I'm not sure what's wrong, but one thing that I  
realized is that the documentation for running openmpi is outdated?  
here's my $PATH and $LD_LIBRARY_PATH


[hpc@localhost ~]$ cat .bash_profile
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin:/usr/lib/openmpi/1.2.5-gcc/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/1.2.5-gcc/lib

export PATH
export LD_LIBRARY_PATH
unset USERNAME

It's different that what the documentation had, because there's I  
couldn't find the files in the /opt/openmpi

I hope that anyone could help?

Thanks a lot!

-- Grady
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



[OMPI users] COMM_ACCEPT/COMM_CONNECT: what BTL will the connected processes use?

2009-04-21 Thread Katz, Jacob
Hi,

In a dynamically connected client/server-style application, where the server 
uses MPI_OPEN_PORT/MPI_COMM_ACCEPT and the client uses MPI_COMM_CONNECT, what 
will be the communication method (BTL) chosen by OMPI? Will the communication 
thru the resultant inter-communicator use TCP, or will OMPI choose the best 
possible method (e.g. sm if the client and the server are on the same node)?

Thanks.

Jacob M. Katz | jacob.k...@intel.com | Work: 
+972-4-865-5726 | iNet: (8)-465-5726

-
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: [OMPI users] Problem with running openMPI program

2009-04-21 Thread Jeff Squyres

On Apr 20, 2009, at 11:08 AM, Ankush Kaul wrote:


i try to run mpicc-vt -c hello.c -o hello

but it gives a error
bash: mpicc-vt: command not found



It sounds like your Open MPI installation was not built with  
VampirTrace support.  Note that OMPI only included VT in Open MPI v1.3  
and later.  When Open MPI is installed with VT support, mpicc-vt  
should be in $prefix/bin.


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Reduce with XOR with MPI_Double

2009-04-21 Thread Richard Treumann

Santolo

The MPI standard defines reduction operations where the operand/operation
pair has a meaningful semantic.  I cannot picture a well defined semantic
for:
999.0 BXOR 0.009.  Maybe you can but it is
not an error that the MPI standard leaves out BXOR on floating point
operands.  That means you are not going to "Fix" it.

With more than one floating point representation in use by various
machines, the result of:

printf("%f\n", 999.0 BXOR 0.009)

could be vastly different from machine to machine (pseudo code obviously -
BXOR is not a C operator)

If you agree that BXOR on floating point data has no well defined or
portable meaning and you still have a need for it in your application on
your hardware then you can try cheating.  Use MPI_Reduce but tell it the
data is an integer type.  Libmpi will apply the bitwise XOR to the bytes
you have pretended are integer and if you get the result you want you may
have solved your problem.

Just understand that because what you are wanting to do has no defined
meaning you cannot assume portability.  You also cannot assume results that
match your expectations unless you fully understand the floating point
representations and fully understand  your own goals.  EG 9.8 BXOR
9.9 may give you what you expect while 2.2 BXOR 2.3 does not.

  Dick

Dick Treumann  -  MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363


users-boun...@open-mpi.org wrote on 04/21/2009 08:26:27 AM:

> [image removed]
>
> Re: [OMPI users] Reduce with XOR with MPI_Double
>
> Jeff Squyres
>
> to:
>
> Open MPI Users
>
> 04/21/2009 08:27 AM
>
> Sent by:
>
> users-boun...@open-mpi.org
>
> Please respond to Open MPI Users
>
> I'm not quite sure what you're asking.  MPI_BXOR is valid on a variety
> of Fortran and C integer types; see MPI-2.1 p162 for the full table.
>
>  http://www.mpi-forum.org/docs/mpi21-report.pdf
>
>
>
> On Apr 19, 2009, at 3:46 PM, Santolo Felaco wrote:
>
> >  I mean the bitwise xor. Pardon for standard the operation is valid
> > only integer dates.
> > Bye.
> >
> > 2009/4/19 Santolo Felaco 
> > Hi,
> > I want to use the xor operation of reduce with double dates. For MPI
> > standars the operation is valid only MPI_Char dates.
> > How can I fix this?
> >
> > Thanks. Bye
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users