Re: [OMPI users] HPL with OpenMPI: Do I have a memory leak?

2009-05-04 Thread Brian Barrett

Gus -

Thanks for the info - it pretty much confirms my suspicion.  In 1.3.0  
and 1.3.1, we configured the glibc memory allocator to not give memory  
back to the OS.  If most of the allocations are similarly sized or  
small, this all works out, because the allocator reuses the old  
allocations.  However, when the allocations are large (like in HPL)  
and differ in size between allocate / free cycles (like in HPL with  
different NBs), the allocator has a real hard time reusing old  
allocations and ends up allocating more and more memory.


Anyway, we deal with the InfiniBand pinning problem in a different,  
hopefully less broken, way for 1.3.2 and later, so this particular  
problem should go away with the upgrade.  If not, please let us know  
as we're trying to minimize the impact our pin cache has on real  
applications.  Sometimes, this doesn't work as we intended, which is  
how we ended up with the issues you ran into.  We were trying to fix a  
different issue related to linkers that existed in 1.0 - 1.2.x, and  
did, only to break something else.  1.3.2 takes yet another approach,  
which we believe is more flexible than both previous approaches.


Good luck!

Brian


On May 1, 2009, at 7:30 PM, Gus Correa wrote:


Hi Brian

Thank you very much for the instant help!

I just tried "-mca btl openib,sm,self" and
"-mca mpi_leave_pinned 0" together (still with OpenMPI 1.3.1).

So far so good, it passed through two NB cases/linear system  
solutions,

it is running the third NB, and the memory use hasn't increased.
On the failed runs the second NB already used more memory than the  
first, and the third would blow up memory use.


If the run was bound do fail it would be swapping memory at this  
point, and it is not.

This is a good sign, I hope I am not speaking too early,
but it looks like your suggestion fixed the problem.
Thanks!

It was interesting to observe using Ganglia
that on the failed runs the memory use "jumps"
happened whenever HPL switched from one NB to another.
Every NB transition (i.e., time HPL started to solve a
new linear system, and probably generated a new random matrix)
the memory use would jump to a (significantly) higher value.
Anyway, this is just is in case the info tells you something about  
what

might be going on.

I will certainly follow your advice and upgrade to OpenMPI 1.3.2,
which I just downloaded.
You guys are prolific, a new edition per month! :)

Many thanks!
Gus Correa

Brian W. Barrett wrote:

Gus -
Open MPI 1.3.0 & 1.3.1 attempted to use some controls in the glibc  
malloc implementation to handle memory registration caching for  
InfiniBand. Unfortunately, it was not only bugging in that it  
didn't work, but it also has the side effect that certain memory  
usage patterns can cause the memory allocator to use much more  
memory than it normally would.  The configuration options were set  
any time the openib module was loaded, even if it wasn't used in  
communication.  Can you try running with the extra option:

 -mca mpi_leave_pinned 0
I'm guessing that will fix the problem.  If you're using  
InfiniBand, you probably want to upgrade to 1.3.2, as there are  
known data corruption issues in 1.3.0 and 1.3.1 with openib.

Brian
On Fri, 1 May 2009, Gus Correa wrote:

Hi Ralph

Thank you very much for the prompt answer.
Sorry for being so confusing on my original message.

Yes, I am saying that the inclusion of openib is causing the  
difference

in behavior.
It runs with "sm,self", it fails with "openib,sm,self".
I am as puzzled as you are, because I thought the "openib" parameter
was simply ignored when running on a single node, exactly like you  
said.

After your message arrived, I ran HPL once more with "openib",
just in case.
Sure enough it failed just as I described.

And yes, all the procs run on a single node in both cases.
It doesn't seem to be a problem caused by a particular
node hardware either, as I already
tried three different nodes with similar results.

BTW, I successfully ran HPL across the whole cluster two days ago,
with IB ("openib,sm,self"),
but using a modest (for the cluster) problem size: N=50,000.
The total cluster memory is 24*16=384GB,
which gives a max HPL problem size N=195,000.
I have yet to try the large problem on the whole cluster,
but I am afraid I will stumble on the same memory problem.

Finally, on your email you use the syntax "btl=openib,sm,self",
with an "=" sign between the btl key and its values.
However, the mpiexec man page uses the syntax "btl openib,sm,self",
with a blank space between the btl key and its values.
I've been following the man page syntax.
The "=" sign doesn't seem to work, and aborts with the error:
"No executable was specified on the mpiexec command line.".
Could this possibly be the issue (say, wrong parsing of mca  
options)?


Many thanks!
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-05-04 Thread Geoffroy Pignot
Hi ,

I got the 
openmpi-1.4a1r21095.tar.gztarball,
but unfortunately my command doesn't work

cat rankf:
rank 0=node1 slot=*
rank 1=node2 slot=*

cat hostf:
node1 slots=2
node2 slots=2

mpirun  --rankfile rankf --hostfile hostf  --host node1 -n 1 hostname :
--host node2 -n 1 hostname

Error, invalid rank (1) in the rankfile (rankf)

--
[r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in file
rmaps_rank_file.c at line 403
[r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in file
base/rmaps_base_map_job.c at line 86
[r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in file
base/plm_base_launch_support.c at line 86
[r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in file
plm_rsh_module.c at line 1016


Ralph, could you tell me if my command syntax is correct or not ? if not,
give me the expected one ?

Regards

Geoffroy




2009/4/30 Geoffroy Pignot 

> Immediately Sir !!! :)
>
> Thanks again Ralph
>
> Geoffroy
>
>
>
>>
>>
>> --
>>
>> Message: 2
>> Date: Thu, 30 Apr 2009 06:45:39 -0600
>> From: Ralph Castain 
>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>> To: Open MPI Users 
>> Message-ID:
>><71d2d8cc0904300545v61a42fe1k50086d2704d0f...@mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> I believe this is fixed now in our development trunk - you can download
>> any
>> tarball starting from last night and give it a try, if you like. Any
>> feedback would be appreciated.
>>
>> Ralph
>>
>>
>> On Apr 14, 2009, at 7:57 AM, Ralph Castain wrote:
>>
>> Ah now, I didn't say it -worked-, did I? :-)
>>
>> Clearly a bug exists in the program. I'll try to take a look at it (if
>> Lenny
>> doesn't get to it first), but it won't be until later in the week.
>>
>> On Apr 14, 2009, at 7:18 AM, Geoffroy Pignot wrote:
>>
>> I agree with you Ralph , and that 's what I expect from openmpi but my
>> second example shows that it's not working
>>
>> cat hostfile.0
>>   r011n002 slots=4
>>   r011n003 slots=4
>>
>>  cat rankfile.0
>>rank 0=r011n002 slot=0
>>rank 1=r011n003 slot=1
>>
>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1 hostname
>> ### CRASHED
>>
>> > > Error, invalid rank (1) in the rankfile (rankfile.0)
>> > >
>> >
>> --
>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in file
>> > > rmaps_rank_file.c at line 404
>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in file
>> > > base/rmaps_base_map_job.c at line 87
>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in file
>> > > base/plm_base_launch_support.c at line 77
>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in file
>> > > plm_rsh_module.c at line 985
>> > >
>> >
>> --
>> > > A daemon (pid unknown) died unexpectedly on signal 1  while
>> > attempting to
>> > > launch so we are aborting.
>> > >
>> > > There may be more information reported by the environment (see
>> > above).
>> > >
>> > > This may be because the daemon was unable to find all the needed
>> > shared
>> > > libraries on the remote node. You may set your LD_LIBRARY_PATH to
>> > have the
>> > > location of the shared libraries on the remote nodes and this will
>> > > automatically be forwarded to the remote nodes.
>> > >
>> >
>> --
>> > >
>> >
>> --
>> > > orterun noticed that the job aborted, but has no info as to the
>> > process
>> > > that caused that situation.
>> > >
>> >
>> --
>> > > orterun: clean termination accomplished
>>
>>
>>
>> Message: 4
>> Date: Tue, 14 Apr 2009 06:55:58 -0600
>> From: Ralph Castain 
>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>> To: Open MPI Users 
>> Message-ID: 
>> Content-Type: text/plain; charset="us-ascii"; Format="flowed";
>>   DelSp="yes"
>>
>> The rankfile cuts across the entire job - it isn't applied on an
>> app_context basis. So the ranks in your rankfile must correspond to
>> the eventual rank of each process in the cmd line.
>>
>> Unfortunately, that means you have to count ranks. In your case, you
>> only have four, so that makes life easier. Your rankfile would look
>> something like this:
>>
>> rank 0=r001n001 slot=0
>> rank 1=r001n002 slot=1
>> rank 2=r001n001 slot=1
>> rank 3=r001n002 slot=2
>>
>> HTH
>> Ralph
>>
>> On Apr 14, 2009, at 12:19 AM, Geoffroy Pignot wrote:
>>
>> > Hi,
>> >
>> > I agree that my examples are not very clear. What I want to do is to
>> > launch a multiexes application (masters-slaves) and benefit from 

[OMPI users] Fw: users Digest, Vol 1217, Issue 2, Message3

2009-05-04 Thread jan

Hi Jeff,

I have updated the firmware of Infiniband module on Dell M600, but the 
problem still occured.


===

$ mpirun -hostfile clusternode -np 16 --byslot --mca btl openib,sm,self 
$HOME/test/cpi

Process 1 on node1
Process 11 on node2
Process 8 on node2
Process 6 on node1
Process 4 on node1
Process 14 on node2
Process 3 on node1
Process 2 on node1
Process 9 on node2
Process 5 on node1
Process 0 on node1
Process 7 on node1
Process 10 on node2
Process 15 on node2
Process 13 on node2
Process 12 on node2
[node1][[3175,1],0][btl_openib_component.c:3029:poll_device] error polling 
HP CQ with -2 errno says Success

=

Is this problem unsolvable?


Best Regards,

Gloria Jan
Wavelink Technology Inc



I can confirm that I have exactly the same problem, also on Dell
system, even with latest openpmpi.

Our system is:

Dell M905
OpenSUSE 11.1
kernel: 2.6.27.21-0.1-default
ofed-1.4-21.12 from SUSE repositories.
OpenMPI-1.3.2


But what I can also add, it not only affect openmpi, if this messages
are triggered after mpirun:
[node032][[9340,1],11][btl_openib_component.c:3002:poll_device] error
polling HP CQ with -2 errno says Success

Then IB stack hangs. You cannot even reload it, have to reboot node.




Something that severe should not be able to be caused by Open MPI.
Specifically: Open MPI should not be able to hang the OFED stack.
Have you run layer 0 diagnostics to know that your fabric is clean?
You might want to contact your IB vendor to find out how to do that.

--
Jeff Squyres
Cisco Systems





On Apr 24, 2009, at 5:21 AM, jan wrote:


Dear Sir,

I?m running a cluster with OpenMPI.

$mpirun --mca mpi_show_mpi_alloc_mem_leaks 8 --mca
mpi_show_handle_leaks 1 $HOME/test/cpi

I got the error message as job failed:

Process 15 on node2
Process 6 on node1
Process 14 on node2
? ? ?
Process 0 on node1
Process 10 on node2
[node2][[9340,1],13][btl_openib_component.c:3002:poll_device] error
polling HP C
Q with -2 errno says Success
[node2][[9340,1],9][btl_openib_component.c:3002:poll_device] error
polling HP CQ
 with -2 errno says Success
[node2][[9340,1],10][btl_openib_component.c:3002:poll_device] error
polling HP C
Q with -2 errno says Success
[node2][[9340,1],11][btl_openib_component.c:3002:poll_device] error
polling HP C
Q with -2 errno says Success
[node2][[9340,1],8][btl_openib_component.c:3002:poll_device] error
polling HP CQ
 with -2 errno says Success
[node2][[9340,1],15][btl_openib_component.c:3002:poll_device] [node2]
[[9340,1],1
2][btl_openib_component.c:3002:poll_device] error polling HP CQ with
-2 errno sa
ys Success
error polling HP CQ with -2 errno says Success
[node2][[9340,1],14][btl_openib_component.c:3002:poll_device] error
polling HP C
Q with -2 errno says Success
mpirun: killing job...

--
mpirun noticed that process rank 0 with PID 28438 on node node1
exited on signal
 0 (Unknown signal 0).
--
mpirun: clean termination accomplished

and got the message as job success

Process 1 on node1
Process 2 on node1
? ? ?
Process 13 on node2
Process 14 on node2
--
The following memory locations were allocated via MPI_ALLOC_MEM but
not freed via MPI_FREE_MEM before invoking MPI_FINALIZE:

Process ID: [[13692,1],12]
Hostname:   node2
PID:30183

(null)
--
[node1:32276] 15 more processes have sent help message help-mpool-
base.txt / all
 mem leaks
[node1:32276] Set MCA parameter "orte_base_help_aggregate" to 0 to
see all help
/ error messages


It  occurred periodic, ie. twice success, then twice failed, twice
success, then twice failed ? . I download the OFED-1.4.1-rc3 from
The OpenFabrics Alliance and installed on Dell PowerEdge M600 Blade
Server. The infiniband Mezzanine Cards is Mellanox ConnectX QDR &
DDR. And infiniband switch module is Mellanox M2401G. OS is CentOS
5.3, kernel  2.6.18-128.1.6.el5, with PGI V7.2-5 compiler. It?s
running OpenSM subnet manager.

Best Regards,

Gloria Jan

Wavelink Technology Inc.

The output of the "ompi_info --all" command as:

 Package: Open MPI root@vortex Distribution
Open MPI: 1.3.1
   Open MPI SVN revision: r20826
   Open MPI release date: Mar 18, 2009
Open RTE: 1.3.1
   Open RTE SVN revision: r20826
   Open RTE release date: Mar 18, 2009
OPAL: 1.3.1
   OPAL SVN revision: r20826
   OPAL release date: Mar 18, 2009
Ident string: 1.3.1
   MCA backtrace: execinfo (MCA v2.0, API v2.0, Component
v1.3.1)
  MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component
v1.3.1)
   MCA paffinity: linux (MCA v2.0, API v2.0, Component

Re: [OMPI users] MPI processes hang when using OpenMPI 1.3.2 and Gcc-4.4.0

2009-05-04 Thread Simone Pellegrini

Hi,
sorry for the delay but I did some additional experiments to found out 
whether the problem was openmpi or gcc!


In attach u will find the program that causes the problem before mentioned.
I compile the program with the following line:

$HOME/openmpi-1.3.2-gcc44/bin/mpicc -O3 -g -Wall -fmessage-length=0 -m64 
bug.c -o bug


When I run the program using mpi 1.3.2 compiled with gcc44 in the 
following way:


$HOME/openmpi-1.3.2-gcc44/bin/mpirun --mca btl self,sm --np 32 ./bug 1024

The program just hangs... and never terminates! I am running on a SMP 
machine with 32 cores, actually it is a Sun Fire X4600 X2. (8 quad-core 
Barcelona AMD chips), the OS is CentOS 5 and the kernel is 
2.6.18-92.el5.src-PAPI (patched with PAPI).
I use a N of 1024, and if I print out the value of the iterator i, 
sometimes it stops around 165, other times around 520... and it doesn't 
make any sense.


If I run the program (and it's important to notice I don't recompile it, 
I just use another mpirun from a different mpi version) the program 
works fine. I did some experiments during the weekend and if I use 
openmpi-1.3.2 compiled with gcc433 everything works fine.


So I really think the problem is strictly related to the usage of 
gcc-4.4.0! ...and it doesn't depends from OpenMPI as the program hangs 
even when I use gcc 1.3.1 compiled with gcc 4.4!


I hope everything is clear now.

regards, Simone

Eugene Loh wrote:
So far, I'm unable to reproduce this problem.  I haven't exactly 
reproduced your test conditions, but then I can't.  At a minimum, I 
don't have exactly the code you ran (and not convinced I want to!).  So:


*) Can you reproduce the problem with the stand-alone test case I sent 
out?
*) Does the problem correlate with OMPI version?  (I.e., 1.3.1 versus 
1.3.2.)

*) Does the problem occur at lower np?
*) Does the problem correlate with the compiler version?  (I.e., GCC 
4.4 versus 4.3.3.)
*) What is the failure rate?  How many times should I expect to run to 
see failures?

*) How large is N?

Eugene Loh wrote:


Simone Pellegrini wrote:


Dear all,
I have successfully compiled and installed openmpi 1.3.2 on a 8 
socket quad-core machine from Sun.


I have used both Gcc-4.4 and Gcc-4.3.3 during the compilation phase 
but when I try to run simple MPI programs processes hangs. Actually 
this is the kernel of the application I am trying to run:


MPI_Barrier(MPI_COMM_WORLD);
total = MPI_Wtime();
for(i=0; i0)
MPI_Sendrecv(A[i-1], N, MPI_FLOAT, top, 0, row, N, 
MPI_FLOAT, bottom, 0, MPI_COMM_WORLD, &status);

for(k=0; k


Do you know if this kernel is sufficient to reproduce the problem?  
How large is N?  Evidently, it's greater than 1600, but I'm still 
curious how big.  What are top and bottom?  Are they rank+1 and rank-1?



Sometimes the program terminates correctly, sometimes don't!



Roughly, what fraction of runs hang?  50%?  1%?  <0.1%?

I am running the program using the shared memory module because I am 
using just one multi-core with the following command:


mpirun --mca btl self,sm --np 32 ./my_prog prob_size



Any idea if this fails at lower np?

If I print the index number during the program execution I can see 
that program stop running around index value 1600... but it actually 
doesn't crash. It just stops! :(


I run the program under strace to see what's going on and this is 
the output:

[...]
futex(0x2b20c02d9790, FUTEX_WAKE, 1)= 1
futex(0x2afcf2b0, FUTEX_WAKE, 1)= 0
readv(100, 
[{"n\267\0\1\0\0\0\0n\267\0\1\0\0\0\0n\267\0\0\0\0\0\0\0\0\0\4\0\0\0\34"..., 
36}], 1) = 36
readv(100, 
[{"n\267\0\1\0\0\0\0n\267\0\1\0\0\0\4\0\0\0jj\0\0\0\1\0\0\0", 28}], 
1) = 28

futex(0x19e93fd8, FUTEX_WAKE, 1)= 1
futex(0x2afcf5e0, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)

futex(0x2afcf5e0, FUTEX_WAKE, 1)= 0
writev(102, 
[{"n\267\0\1\0\0\0\0n\267\0\0\0\0\0\0n\267\0\1\0\0\0\4\0\0\0\4\0\0\0\34"..., 
36}, {"n\267\0\1\0\0\0\0n\267\0\1\0\0\0\7\0\0\0jj\0\0\0\1\0\0\0", 
28}], 2) = 64
poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, 
events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN}, 
{fd=11, events=POLLIN}, {fd=21, events=POLLIN}, {fd=25, 
events=POLLIN}, {fd=27, events=POLLIN}, {fd=33, events=POLLIN}, 
{fd=37, events=POLLIN}, {fd=39, events=POLLIN}, {fd=44, 
events=POLLIN}, {fd=48, events=POLLIN}, {fd=50, events=POLLIN}, 
{fd=55, events=POLLIN}, {fd=59, events=POLLIN}, {fd=61, 
events=POLLIN}, {fd=66, events=POLLIN}, {fd=70, events=POLLIN}, 
{fd=72, events=POLLIN}, {fd=77, events=POLLIN}, {fd=81, 
events=POLLIN}, {fd=83, events=POLLIN}, {fd=88, events=POLLIN}, 
{fd=92, events=POLLIN}, {fd=94, events=POLLIN}, {fd=99, 
events=POLLIN}, {fd=103, events=POLLIN}, {fd=105, events=POLLIN}, 
{fd=0, events=POLLIN}, {fd=100, events=POLLIN, revents=POLLIN}, 
...], 39, 1000) = 1
readv(100, 
[{"n\267\0\1\0\0\0\0n\267\0\1\0\0\0\0n\267\0\0\0\0\0\0\0\0\0\4\0\0\0\34"..., 
36}], 1) = 36
readv(100, 
[{"n\267\0\1\0\0\0\0n\267\0\

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-05-04 Thread Ralph Castain
My apologies - I wasn't clear enough. You need a tarball from r2  
or greater...such as:


http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r21142.tar.gz

HTH
Ralph


On May 4, 2009, at 2:14 AM, Geoffroy Pignot wrote:


Hi ,

I got the openmpi-1.4a1r21095.tar.gz tarball, but unfortunately my  
command doesn't work


cat rankf:
rank 0=node1 slot=*
rank 1=node2 slot=*

cat hostf:
node1 slots=2
node2 slots=2

mpirun  --rankfile rankf --hostfile hostf  --host node1 -n 1  
hostname : --host node2 -n 1 hostname


Error, invalid rank (1) in the rankfile (rankf)

--
[r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in file  
rmaps_rank_file.c at line 403
[r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in file  
base/rmaps_base_map_job.c at line 86
[r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in file  
base/plm_base_launch_support.c at line 86
[r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in file  
plm_rsh_module.c at line 1016



Ralph, could you tell me if my command syntax is correct or not ? if  
not, give me the expected one ?


Regards

Geoffroy




2009/4/30 Geoffroy Pignot 
Immediately Sir !!! :)

Thanks again Ralph

Geoffroy





--

Message: 2
Date: Thu, 30 Apr 2009 06:45:39 -0600
From: Ralph Castain 
Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
To: Open MPI Users 
Message-ID:
   <71d2d8cc0904300545v61a42fe1k50086d2704d0f...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

I believe this is fixed now in our development trunk - you can  
download any

tarball starting from last night and give it a try, if you like. Any
feedback would be appreciated.

Ralph


On Apr 14, 2009, at 7:57 AM, Ralph Castain wrote:

Ah now, I didn't say it -worked-, did I? :-)

Clearly a bug exists in the program. I'll try to take a look at it  
(if Lenny

doesn't get to it first), but it won't be until later in the week.

On Apr 14, 2009, at 7:18 AM, Geoffroy Pignot wrote:

I agree with you Ralph , and that 's what I expect from openmpi but my
second example shows that it's not working

cat hostfile.0
  r011n002 slots=4
  r011n003 slots=4

 cat rankfile.0
   rank 0=r011n002 slot=0
   rank 1=r011n003 slot=1

mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1  
hostname

### CRASHED

> > Error, invalid rank (1) in the rankfile (rankfile.0)
> >
>  
--
> > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in  
file

> > rmaps_rank_file.c at line 404
> > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in  
file

> > base/rmaps_base_map_job.c at line 87
> > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in  
file

> > base/plm_base_launch_support.c at line 77
> > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in  
file

> > plm_rsh_module.c at line 985
> >
>  
--

> > A daemon (pid unknown) died unexpectedly on signal 1  while
> attempting to
> > launch so we are aborting.
> >
> > There may be more information reported by the environment (see
> above).
> >
> > This may be because the daemon was unable to find all the needed
> shared
> > libraries on the remote node. You may set your LD_LIBRARY_PATH to
> have the
> > location of the shared libraries on the remote nodes and this will
> > automatically be forwarded to the remote nodes.
> >
>  
--

> >
>  
--

> > orterun noticed that the job aborted, but has no info as to the
> process
> > that caused that situation.
> >
>  
--

> > orterun: clean termination accomplished



Message: 4
Date: Tue, 14 Apr 2009 06:55:58 -0600
From: Ralph Castain 
Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
To: Open MPI Users 
Message-ID: 
Content-Type: text/plain; charset="us-ascii"; Format="flowed";
  DelSp="yes"

The rankfile cuts across the entire job - it isn't applied on an
app_context basis. So the ranks in your rankfile must correspond to
the eventual rank of each process in the cmd line.

Unfortunately, that means you have to count ranks. In your case, you
only have four, so that makes life easier. Your rankfile would look
something like this:

rank 0=r001n001 slot=0
rank 1=r001n002 slot=1
rank 2=r001n001 slot=1
rank 3=r001n002 slot=2

HTH
Ralph

On Apr 14, 2009, at 12:19 AM, Geoffroy Pignot wrote:

> Hi,
>
> I agree that my examples are not very clear. What I want to do is to
> launch a multiexes application (masters-slaves) and benefit from the
> processor affinity.
> Could you show me how to convert this command , using -rf option
> (whatever the affinity is)
>
> mpirun -n

Re: [OMPI users] users Digest, Vol 1217, Issue 2, Message3

2009-05-04 Thread Jeff Squyres

As I've indicated a few times in this thread:

>> Have you run layer 0 diagnostics to know that your fabric is clean?
>> You might want to contact your IB vendor to find out how to do that.



On May 4, 2009, at 4:34 AM, jan wrote:


Hi Jeff,

I have updated the firmware of Infiniband module on Dell M600, but the
problem still occured.

=
=
=
=
=
==

$ mpirun -hostfile clusternode -np 16 --byslot --mca btl  
openib,sm,self

$HOME/test/cpi
Process 1 on node1
Process 11 on node2
Process 8 on node2
Process 6 on node1
Process 4 on node1
Process 14 on node2
Process 3 on node1
Process 2 on node1
Process 9 on node2
Process 5 on node1
Process 0 on node1
Process 7 on node1
Process 10 on node2
Process 15 on node2
Process 13 on node2
Process 12 on node2
[node1][[3175,1],0][btl_openib_component.c:3029:poll_device] error  
polling

HP CQ with -2 errno says Success
= 
= 
= 
= 
= 
= 
= 
==


Is this problem unsolvable?


Best Regards,

 Gloria Jan
Wavelink Technology Inc


>>> I can confirm that I have exactly the same problem, also on Dell
>>> system, even with latest openpmpi.
>>>
>>> Our system is:
>>>
>>> Dell M905
>>> OpenSUSE 11.1
>>> kernel: 2.6.27.21-0.1-default
>>> ofed-1.4-21.12 from SUSE repositories.
>>> OpenMPI-1.3.2
>>>
>>>
>>> But what I can also add, it not only affect openmpi, if this  
messages

>>> are triggered after mpirun:
>>> [node032][[9340,1],11][btl_openib_component.c:3002:poll_device]  
error

>>> polling HP CQ with -2 errno says Success
>>>
>>> Then IB stack hangs. You cannot even reload it, have to reboot  
node.

>>>
>>
>>
>> Something that severe should not be able to be caused by Open MPI.
>> Specifically: Open MPI should not be able to hang the OFED stack.
>> Have you run layer 0 diagnostics to know that your fabric is clean?
>> You might want to contact your IB vendor to find out how to do  
that.

>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>>
>>
>
> On Apr 24, 2009, at 5:21 AM, jan wrote:
>
>> Dear Sir,
>>
>> I?m running a cluster with OpenMPI.
>>
>> $mpirun --mca mpi_show_mpi_alloc_mem_leaks 8 --mca
>> mpi_show_handle_leaks 1 $HOME/test/cpi
>>
>> I got the error message as job failed:
>>
>> Process 15 on node2
>> Process 6 on node1
>> Process 14 on node2
>> ? ? ?
>> Process 0 on node1
>> Process 10 on node2
>> [node2][[9340,1],13][btl_openib_component.c:3002:poll_device] error
>> polling HP C
>> Q with -2 errno says Success
>> [node2][[9340,1],9][btl_openib_component.c:3002:poll_device] error
>> polling HP CQ
>>  with -2 errno says Success
>> [node2][[9340,1],10][btl_openib_component.c:3002:poll_device] error
>> polling HP C
>> Q with -2 errno says Success
>> [node2][[9340,1],11][btl_openib_component.c:3002:poll_device] error
>> polling HP C
>> Q with -2 errno says Success
>> [node2][[9340,1],8][btl_openib_component.c:3002:poll_device] error
>> polling HP CQ
>>  with -2 errno says Success
>> [node2][[9340,1],15][btl_openib_component.c:3002:poll_device]  
[node2]

>> [[9340,1],1
>> 2][btl_openib_component.c:3002:poll_device] error polling HP CQ  
with

>> -2 errno sa
>> ys Success
>> error polling HP CQ with -2 errno says Success
>> [node2][[9340,1],14][btl_openib_component.c:3002:poll_device] error
>> polling HP C
>> Q with -2 errno says Success
>> mpirun: killing job...
>>
>>  
--

>> mpirun noticed that process rank 0 with PID 28438 on node node1
>> exited on signal
>>  0 (Unknown signal 0).
>>  
--

>> mpirun: clean termination accomplished
>>
>> and got the message as job success
>>
>> Process 1 on node1
>> Process 2 on node1
>> ? ? ?
>> Process 13 on node2
>> Process 14 on node2
>>  
--

>> The following memory locations were allocated via MPI_ALLOC_MEM but
>> not freed via MPI_FREE_MEM before invoking MPI_FINALIZE:
>>
>> Process ID: [[13692,1],12]
>> Hostname:   node2
>> PID:30183
>>
>> (null)
>>  
--

>> [node1:32276] 15 more processes have sent help message help-mpool-
>> base.txt / all
>>  mem leaks
>> [node1:32276] Set MCA parameter "orte_base_help_aggregate" to 0 to
>> see all help
>> / error messages
>>
>>
>> It  occurred periodic, ie. twice success, then twice failed, twice
>> success, then twice failed ? . I download the OFED-1.4.1-rc3 from
>> The OpenFabrics Alliance and installed on Dell PowerEdge M600 Blade
>> Server. The infiniband Mezzanine Cards is Mellanox ConnectX QDR &
>> DDR. And infiniband switch module is Mellanox M2401G. OS is CentOS
>> 5.3, kernel  2.6.18-128.1.6.el5, with PGI V7.2-5 compiler. It?s
>> running OpenSM subnet manager.
>>
>> Best Regards,
>>
>> Gloria Jan
>>
>> Wavelink Technology Inc.
>>
>> The output of the "ompi_inf

[OMPI users] mca: base: component_find: unable to open /usr/local/lib/openmpi/mca_crs_blcr: file not found (ignored)

2009-05-04 Thread Kritiraj Sajadah

Dear All,
  Thanks to Josh and Yaakoub, i was able to configure my openmpi as 
follows:

raj@raj:./configure --prefix=/usr/local --with-ft=cr --enable-ft-thread 
--enable-mpi-threads --with-blcr=/usr/local.

raj@raj:make all install

I try to checkppoint an mpi application using the following command running on 
a single node:

raj@raj:mpirun -np 1 -am ft-enable-cr mpisleep

I got the following with no checkpointing performed:
raj@raj:mca: base: component_find: unable to open 
/usr/local/lib/openmpi/mca_crs_blcr: file not found (ignored)

Please help.

Regards,

Raj





Re: [OMPI users] mca: base: component_find: unable to open/usr/local/lib/openmpi/mca_crs_blcr: file not found (ignored)

2009-05-04 Thread Jeff Squyres

On May 4, 2009, at 9:06 AM, Kritiraj Sajadah wrote:


raj@raj:mpirun -np 1 -am ft-enable-cr mpisleep

I got the following with no checkpointing performed:
raj@raj:mca: base: component_find: unable to open /usr/local/lib/ 
openmpi/mca_crs_blcr: file not found (ignored)




This is usually a faulty error message from libltdl.  It usually means  
that the dependent libraries for a component cannot be found -- e.g.,  
is blcr installed on every node where you're trying to use it?


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] users Digest, Vol 1217, Issue 2, Message3

2009-05-04 Thread jan
Thank you Jeff. I have passed the mail to the IB vendor Dell company(the 
blade was ordered from Dell Taiwan), but he todl me that he didn't 
understand  "layer 0 diagnostics". Coluld you help us to get more 
information of "layer 0 diagnostics". Thanks again.

Regards,

Gloria Jan
Wavelink Technology Inc.


> As I've indicated a few times in this thread:
>
> >> Have you run layer 0 diagnostics to know that your fabric is clean?
> >> You might want to contact your IB vendor to find out how to do that.
>
>
>
> On May 4, 2009, at 4:34 AM, jan wrote:
>
>> Hi Jeff,
>>
>> I have updated the firmware of Infiniband module on Dell M600, but the
>> problem still occured.
>>
>> =
>> =
>> =
>> =
>> =
>> ==
>>
>> $ mpirun -hostfile clusternode -np 16 --byslot --mca btl  openib,sm,self
>> $HOME/test/cpi
>> Process 1 on node1
>> Process 11 on node2
>> Process 8 on node2
>> Process 6 on node1
>> Process 4 on node1
>> Process 14 on node2
>> Process 3 on node1
>> Process 2 on node1
>> Process 9 on node2
>> Process 5 on node1
>> Process 0 on node1
>> Process 7 on node1
>> Process 10 on node2
>> Process 15 on node2
>> Process 13 on node2
>> Process 12 on node2
>> [node1][[3175,1],0][btl_openib_component.c:3029:poll_device] error 
>> polling
>> HP CQ with -2 errno says Success
>> = = = = = = = 
>> ==
>>
>> Is this problem unsolvable?
>>
>>
>> Best Regards,
>>
>>  Gloria Jan
>> Wavelink Technology Inc
>>



Re: [OMPI users] users Digest, Vol 1217, Issue 2, Message3

2009-05-04 Thread jan
Thank you Jeff. I have passed the mail to the IB vendor Dell company(the 
blade was ordered from Dell Taiwan), but he todl me that he didn't 
understand  "layer 0 diagnostics". Coluld you help us to get more 
information of "layer 0 diagnostics". Thanks again.

Regards,

Gloria Jan
Wavelink Technology Inc.


> As I've indicated a few times in this thread:
>
> >> Have you run layer 0 diagnostics to know that your fabric is clean?
> >> You might want to contact your IB vendor to find out how to do that.
>
>
>
> On May 4, 2009, at 4:34 AM, jan wrote:
>
>> Hi Jeff,
>>
>> I have updated the firmware of Infiniband module on Dell M600, but the
>> problem still occured.
>>
>> =
>> =
>> =
>> =
>> =
>> ==
>>
>> $ mpirun -hostfile clusternode -np 16 --byslot --mca btl  openib,sm,self
>> $HOME/test/cpi
>> Process 1 on node1
>> Process 11 on node2
>> Process 8 on node2
>> Process 6 on node1
>> Process 4 on node1
>> Process 14 on node2
>> Process 3 on node1
>> Process 2 on node1
>> Process 9 on node2
>> Process 5 on node1
>> Process 0 on node1
>> Process 7 on node1
>> Process 10 on node2
>> Process 15 on node2
>> Process 13 on node2
>> Process 12 on node2
>> [node1][[3175,1],0][btl_openib_component.c:3029:poll_device] error 
>> polling
>> HP CQ with -2 errno says Success
>> = = = = = = = 
>> ==
>>
>> Is this problem unsolvable?
>>
>>
>> Best Regards,
>>
>>  Gloria Jan
>> Wavelink Technology Inc
>>



Re: [OMPI users] mca: base: component_find: unable to open/usr/local/lib/openmpi/mca_crs_blcr: file not found (ignored)

2009-05-04 Thread Kritiraj Sajadah

Hi Jeff,
  In fact i am testing it on my laptop before installing it on the 
cluster. 

I downloaded BLCR and installed it in /usr/local on my laptop

Then i installed openmpi using the following option:

 ./configure --prefix=/usr/local --with-ft=cr --enable-ft-thread 
--enable-mpi-threads --with-blcr=/usr/local/lib

So, everything is installed and tested on my laptop for now but i am still 
getting the error.

Please help.

Thanks 

Raj



--- On Mon, 5/4/09, Jeff Squyres  wrote:

> From: Jeff Squyres 
> Subject: Re: [OMPI users] mca: base: component_find: unable to 
> open/usr/local/lib/openmpi/mca_crs_blcr: file not found (ignored)
> To: "Open MPI Users" 
> Date: Monday, May 4, 2009, 2:09 PM
> On May 4, 2009, at 9:06 AM, Kritiraj
> Sajadah wrote:
> 
> > raj@raj:mpirun -np 1 -am ft-enable-cr mpisleep
> > 
> > I got the following with no checkpointing performed:
> > raj@raj:mca: base: component_find: unable to open
> /usr/local/lib/openmpi/mca_crs_blcr: file not found
> (ignored)
> > 
> 
> This is usually a faulty error message from libltdl. 
> It usually means that the dependent libraries for a
> component cannot be found -- e.g., is blcr installed on
> every node where you're trying to use it?
> 
> --Jeff Squyres
> Cisco Systems
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 






Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-05-04 Thread Geoffroy Pignot
Hi,

So, there are no more crashes with my "crazy" mpirun command. But the
paffinity feature seems to be broken. Indeed I am not able to pin my
processes.

Simple test with a program using your plpa library :

r011n006% cat hostf
r011n006 slots=4

r011n006% cat rankf
rank 0=r011n006 slot=0   > bind to CPU 0 , exact ?

r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf --rankfile
rankf --wdir /tmp -n 1 a.out
 >>> PLPA Number of processors online: 4
 >>> PLPA Number of processor sockets: 2
 >>> PLPA Socket 0 (ID 0): 2 cores
 >>> PLPA Socket 1 (ID 3): 2 cores

Ctrl+Z
r011n006%bg

r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot
R+   gpignot3  9271 97.8 a.out

In fact whatever the slot number I put in my rankfile , a.out always runs on
the CPU 3. I was looking for it on CPU 0 accordind to my cpuinfo file (see
below)
The result is the same if I try another syntax (rank 0=r011n006 slot=0:0
bind to socket 0 - core 0  , exact ? )

Thanks in advance

Geoffroy

PS: I run on rhel5

r011n006% uname -a
Linux r011n006 2.6.18-92.1.1NOMAP32.el5 #1 SMP Sat Mar 15 01:46:39 CDT 2008
x86_64 x86_64 x86_64 GNU/Linux

My configure is :
 ./configure --prefix=/tmp/openmpi-1.4a --libdir='${exec_prefix}/lib64'
--disable-dlopen --disable-mpi-cxx --enable-heterogeneous


r011n006% cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU5150  @ 2.66GHz
stepping: 6
cpu MHz : 2660.007
cache size  : 4096 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm
constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips: 5323.68
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU5150  @ 2.66GHz
stepping: 6
cpu MHz : 2660.007
cache size  : 4096 KB
physical id : 3
siblings: 2
core id : 0
cpu cores   : 2
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm
constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips: 5320.03
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU5150  @ 2.66GHz
stepping: 6
cpu MHz : 2660.007
cache size  : 4096 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm
constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips: 5319.39
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU5150  @ 2.66GHz
stepping: 6
cpu MHz : 2660.007
cache size  : 4096 KB
physical id : 3
siblings: 2
core id : 1
cpu cores   : 2
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm
constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips: 5320.03
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:


> --
>
> Message: 2
> Date: Mon, 4 May 2009 04:45:57 -0600
> From: Ralph Castain 
> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
> To: Open MPI Users 
> Message-ID: 
> Content-Type: text/plain; charset="us-ascii"; Format="flowed";
>DelSp="yes"
>
> My apologies - I wasn't clear enough. You need a tarball from r2
> or greater...such as:
>
> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r21142.tar.gz
>
> HTH
> Ralph
>
>
> On May 4, 2009, at 2:14 AM, Geoffroy Pignot wrote:
>
> > Hi ,
> >
> > I got the openmpi-1.4a1r21095.tar.gz tarball, but unfortunately my
> > command doesn't work
> >
> > cat rankf:
> > rank 0=node1 slot=*
> 

[OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Vicente
Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in  
Xcode", but it's only for MPICC. I am using MPIF90, so I did the same,  
but changing MPICC for MPIF90, and also the path, but it did not work.


Building target “fortran” of project “fortran” with configuration  
“Debug”



Checking Dependencies
Invalid value 'MPIF90' for GCC_VERSION


The file "MPIF90.cpcompspec" looks like this:

  1 /**
  2 Xcode Coompiler Specification for MPIF90
  3
  4 */
  5
  6 {   Type = Compiler;
  7 Identifier = com.apple.compilers.mpif90;
  8 BasedOn = com.apple.compilers.gcc.4_0;
  9 Name = "MPIF90";
 10 Version = "Default";
 11 Description = "MPI GNU C/C++ Compiler 4.0";
 12 ExecPath = "/usr/local/bin/mpif90";  // This gets  
converted to the g++ variant automatically

 13 PrecompStyle = pch;
 14 }

and is located in "/Developer/Library/Xcode/Plug-ins"

and when I do mpif90 -v on terminal it works well:

Using built-in specs.
Target: i386-apple-darwin8.10.1
Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure -- 
prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/ 
tmp/gfortran-20090321/gfortran_libs --enable-bootstrap

Thread model: posix
gcc version 4.4.0 20090321 (experimental) [trunk revision 144983] (GCC)


Any idea??

Thanks.

Vincent

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-05-04 Thread Ralph Castain
Unfortunately, I didn't write any of that code - I was just fixing the
mapper so it would properly map the procs. From what I can tell, the proper
things are happening there.

I'll have to dig into the code that specifically deals with parsing the
results to bind the processes. Afraid that will take awhile longer - pretty
dark in that hole.


On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot  wrote:

> Hi,
>
> So, there are no more crashes with my "crazy" mpirun command. But the
> paffinity feature seems to be broken. Indeed I am not able to pin my
> processes.
>
> Simple test with a program using your plpa library :
>
> r011n006% cat hostf
> r011n006 slots=4
>
> r011n006% cat rankf
> rank 0=r011n006 slot=0   > bind to CPU 0 , exact ?
>
> r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf --rankfile
> rankf --wdir /tmp -n 1 a.out
>  >>> PLPA Number of processors online: 4
>  >>> PLPA Number of processor sockets: 2
>  >>> PLPA Socket 0 (ID 0): 2 cores
>  >>> PLPA Socket 1 (ID 3): 2 cores
>
> Ctrl+Z
> r011n006%bg
>
> r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot
> R+   gpignot3  9271 97.8 a.out
>
> In fact whatever the slot number I put in my rankfile , a.out always runs
> on the CPU 3. I was looking for it on CPU 0 accordind to my cpuinfo file
> (see below)
> The result is the same if I try another syntax (rank 0=r011n006 slot=0:0
> bind to socket 0 - core 0  , exact ? )
>
> Thanks in advance
>
> Geoffroy
>
> PS: I run on rhel5
>
> r011n006% uname -a
> Linux r011n006 2.6.18-92.1.1NOMAP32.el5 #1 SMP Sat Mar 15 01:46:39 CDT 2008
> x86_64 x86_64 x86_64 GNU/Linux
>
> My configure is :
>  ./configure --prefix=/tmp/openmpi-1.4a --libdir='${exec_prefix}/lib64'
> --disable-dlopen --disable-mpi-cxx --enable-heterogeneous
>
>
> r011n006% cat /proc/cpuinfo
> processor   : 0
> vendor_id   : GenuineIntel
> cpu family  : 6
> model   : 15
> model name  : Intel(R) Xeon(R) CPU5150  @ 2.66GHz
> stepping: 6
> cpu MHz : 2660.007
> cache size  : 4096 KB
> physical id : 0
> siblings: 2
> core id : 0
> cpu cores   : 2
> fpu : yes
> fpu_exception   : yes
> cpuid level : 10
> wp  : yes
> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm
> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
> bogomips: 5323.68
> clflush size: 64
> cache_alignment : 64
> address sizes   : 36 bits physical, 48 bits virtual
> power management:
>
> processor   : 1
> vendor_id   : GenuineIntel
> cpu family  : 6
> model   : 15
> model name  : Intel(R) Xeon(R) CPU5150  @ 2.66GHz
> stepping: 6
> cpu MHz : 2660.007
> cache size  : 4096 KB
> physical id : 3
> siblings: 2
> core id : 0
> cpu cores   : 2
> fpu : yes
> fpu_exception   : yes
> cpuid level : 10
> wp  : yes
> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm
> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
> bogomips: 5320.03
> clflush size: 64
> cache_alignment : 64
> address sizes   : 36 bits physical, 48 bits virtual
> power management:
>
> processor   : 2
> vendor_id   : GenuineIntel
> cpu family  : 6
> model   : 15
> model name  : Intel(R) Xeon(R) CPU5150  @ 2.66GHz
> stepping: 6
> cpu MHz : 2660.007
> cache size  : 4096 KB
> physical id : 0
> siblings: 2
> core id : 1
> cpu cores   : 2
> fpu : yes
> fpu_exception   : yes
> cpuid level : 10
> wp  : yes
> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm
> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
> bogomips: 5319.39
> clflush size: 64
> cache_alignment : 64
> address sizes   : 36 bits physical, 48 bits virtual
> power management:
>
> processor   : 3
> vendor_id   : GenuineIntel
> cpu family  : 6
> model   : 15
> model name  : Intel(R) Xeon(R) CPU5150  @ 2.66GHz
> stepping: 6
> cpu MHz : 2660.007
> cache size  : 4096 KB
> physical id : 3
> siblings: 2
> core id : 1
> cpu cores   : 2
> fpu : yes
> fpu_exception   : yes
> cpuid level : 10
> wp  : yes
> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm
> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
> bogomips: 5320.03
> clflush size: 64
> cache_alignment : 64
> address sizes   : 36 bits physical, 48 bits virtual
> power management:
>
>
>> ---

Re: [OMPI users] Myrinet optimization with OMP1.3 and macosX

2009-05-04 Thread Ricardo Fernández-Perea
I finally have opportunity to run the imb-3.2 benchmark over myrinet I am
running in a cluster of 16 node Xservers connected with myrinet 15 of them
are 8core ones and the last one is a 4 cores one. Having a limit of 124
process
I have run the test with the bynode option so from the 2 to the 16 process
test is always running 1 process by node.

the following test  pingpong, pingping, sendrecv, exchange presents a strong
drop in performance with the 64k packet size.

any idea where I should look for the cause.

Ricardo

On Fri, Mar 20, 2009 at 7:32 PM, Ricardo Fernández-Perea <
rfernandezpe...@gmail.com> wrote:

> It is the F-2M but I think for inter-node communication should be
> equivalents.
> I have not run and MPI pingpong benchmark yet.
>
> The truth is I have a 10 days travel coming next week and I thought I can
> take some optimization   "light reading" with me.
>
> so I know what I must look for  when I came back.
>
> Ricardo
>
>
> On Fri, Mar 20, 2009 at 5:10 PM, Scott Atchley  wrote:
>
>> On Mar 20, 2009, at 11:33 AM, Ricardo Fernández-Perea wrote:
>>
>>  This are the results initially
>>> Running 1000 iterations.
>>>   Length   Latency(us)Bandwidth(MB/s)
>>>0   2.738  0.000
>>>1   2.718  0.368
>>>2   2.707  0.739
>>> 
>>>  10485764392.217238.735
>>>  20971528705.028240.913
>>>  4194304   17359.166241.619
>>>
>>> with  export MX_RCACHE=1
>>>
>>> Running 1000 iterations.
>>>   Length   Latency(us)Bandwidth(MB/s)
>>>0   2.731  0.000
>>>1   2.705  0.370
>>>2   2.719  0.736
>>> 
>>>  10485764265.846245.807
>>>  20971528491.122246.982
>>>  4194304   16953.997247.393
>>>
>>
>> Ricardo,
>>
>> I am assuming that these are PCI-X NICs. Given the latency and bandwidth,
>> are these "D" model NICs (see the top of the mx_info output)? If so, that
>> looks about as good as you can expect.
>>
>> Have you run Intel MPI Benchmark (IMB) or another MPI pingpong type
>> benchmark?
>>
>> Scott
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>


IMB1-Exchange.results
Description: Binary data


IMB1-PingPing.results
Description: Binary data


IMB1-PingPong.results
Description: Binary data


IMB1-Sendrecv.results
Description: Binary data


Re: [OMPI users] Myrinet optimization with OMP1.3 and macosX

2009-05-04 Thread Bogdan Costescu

On Mon, 4 May 2009, Ricardo Fern�ndez-Perea wrote:


any idea where I should look for the cause.


Can you try adding to the mpirun/mpiexec command line '--mca mtl 
mx --mca pml cm' to specify usage of the non-default MX MTL ? (sorry 
if you already do, I haven't found it in your e-mail)


--
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.coste...@iwr.uni-heidelberg.de

[OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Warner Yuen
Have you installed a Fortran compiler? Mac OS X's developer tools do  
not come with a Fortran compiler, so you'll need to install one if you  
haven't already done so. I routinely use the Intel IFORT compilers  
with success. However, I hear many good things about the gfortran  
compilers on Mac OS X, you can't beat the price of gfortran!



Warner Yuen
Scientific Computing
Consulting Engineer
Apple, Inc.
email: wy...@apple.com
Tel: 408.718.2859




On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote:


Send users mailing list submissions to
us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
users-requ...@open-mpi.org

You can reach the person managing the list at
users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

  1. How do I compile OpenMPI in Xcode 3.1 (Vicente)
  2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain)


--

Message: 1
Date: Mon, 4 May 2009 16:12:44 +0200
From: Vicente 
Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1
To: us...@open-mpi.org
Message-ID: <1c2c0085-940f-43bb-910f-975871ae2...@gmail.com>
Content-Type: text/plain; charset="windows-1252"; Format="flowed";
DelSp="yes"

Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in
Xcode", but it's only for MPICC. I am using MPIF90, so I did the same,
but changing MPICC for MPIF90, and also the path, but it did not work.

Building target ?fortran? of project ?fortran? with configuration
?Debug?


Checking Dependencies
Invalid value 'MPIF90' for GCC_VERSION


The file "MPIF90.cpcompspec" looks like this:

  1 /**
  2 Xcode Coompiler Specification for MPIF90
  3
  4 */
  5
  6 {   Type = Compiler;
  7 Identifier = com.apple.compilers.mpif90;
  8 BasedOn = com.apple.compilers.gcc.4_0;
  9 Name = "MPIF90";
 10 Version = "Default";
 11 Description = "MPI GNU C/C++ Compiler 4.0";
 12 ExecPath = "/usr/local/bin/mpif90";  // This gets
converted to the g++ variant automatically
 13 PrecompStyle = pch;
 14 }

and is located in "/Developer/Library/Xcode/Plug-ins"

and when I do mpif90 -v on terminal it works well:

Using built-in specs.
Target: i386-apple-darwin8.10.1
Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure --
prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/
tmp/gfortran-20090321/gfortran_libs --enable-bootstrap
Thread model: posix
gcc version 4.4.0 20090321 (experimental) [trunk revision 144983]  
(GCC)



Any idea??

Thanks.

Vincent
-- next part --
HTML attachment scrubbed and removed

--

Message: 2
Date: Mon, 4 May 2009 08:28:26 -0600
From: Ralph Castain 
Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
To: Open MPI Users 
Message-ID:
<71d2d8cc0905040728h2002f4d7s4c49219eee29e...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Unfortunately, I didn't write any of that code - I was just fixing the
mapper so it would properly map the procs. From what I can tell, the  
proper

things are happening there.

I'll have to dig into the code that specifically deals with parsing  
the
results to bind the processes. Afraid that will take awhile longer -  
pretty

dark in that hole.


On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot  
 wrote:



Hi,

So, there are no more crashes with my "crazy" mpirun command. But the
paffinity feature seems to be broken. Indeed I am not able to pin my
processes.

Simple test with a program using your plpa library :

r011n006% cat hostf
r011n006 slots=4

r011n006% cat rankf
rank 0=r011n006 slot=0   > bind to CPU 0 , exact ?

r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf -- 
rankfile

rankf --wdir /tmp -n 1 a.out

PLPA Number of processors online: 4
PLPA Number of processor sockets: 2
PLPA Socket 0 (ID 0): 2 cores
PLPA Socket 1 (ID 3): 2 cores


Ctrl+Z
r011n006%bg

r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot
R+   gpignot3  9271 97.8 a.out

In fact whatever the slot number I put in my rankfile , a.out  
always runs
on the CPU 3. I was looking for it on CPU 0 accordind to my cpuinfo  
file

(see below)
The result is the same if I try another syntax (rank 0=r011n006  
slot=0:0

bind to socket 0 - core 0  , exact ? )

Thanks in advance

Geoffroy

PS: I run on rhel5

r011n006% uname -a
Linux r011n006 2.6.18-92.1.1NOMAP32.el5 #1 SMP Sat Mar 15 01:46:39  
CDT 2008

x86_64 x86_64 x86_64 GNU/Linux

My configure is :
./configure --prefix=/tmp/openmpi-1.4a --libdir='${exec_prefix}/ 
lib64'

--disable-dlopen --disable-mpi-cxx --enable-heterogeneous


r011n006% cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   

Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Jeff Squyres
FWIW, I don't use Xcode, but I use the precompiled gcc/gfortran from  
here with good success:


http://hpc.sourceforge.net/



On May 4, 2009, at 11:38 AM, Warner Yuen wrote:


Have you installed a Fortran compiler? Mac OS X's developer tools do
not come with a Fortran compiler, so you'll need to install one if you
haven't already done so. I routinely use the Intel IFORT compilers
with success. However, I hear many good things about the gfortran
compilers on Mac OS X, you can't beat the price of gfortran!


Warner Yuen
Scientific Computing
Consulting Engineer
Apple, Inc.
email: wy...@apple.com
Tel: 408.718.2859




On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote:

> Send users mailing list submissions to
>   us...@open-mpi.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>   http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
>   users-requ...@open-mpi.org
>
> You can reach the person managing the list at
>   users-ow...@open-mpi.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
>   1. How do I compile OpenMPI in Xcode 3.1 (Vicente)
>   2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain)
>
>
>  
--

>
> Message: 1
> Date: Mon, 4 May 2009 16:12:44 +0200
> From: Vicente 
> Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1
> To: us...@open-mpi.org
> Message-ID: <1c2c0085-940f-43bb-910f-975871ae2...@gmail.com>
> Content-Type: text/plain; charset="windows-1252"; Format="flowed";
>   DelSp="yes"
>
> Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in
> Xcode", but it's only for MPICC. I am using MPIF90, so I did the  
same,
> but changing MPICC for MPIF90, and also the path, but it did not  
work.

>
> Building target ?fortran? of project ?fortran? with configuration
> ?Debug?
>
>
> Checking Dependencies
> Invalid value 'MPIF90' for GCC_VERSION
>
>
> The file "MPIF90.cpcompspec" looks like this:
>
>   1 /**
>   2 Xcode Coompiler Specification for MPIF90
>   3
>   4 */
>   5
>   6 {   Type = Compiler;
>   7 Identifier = com.apple.compilers.mpif90;
>   8 BasedOn = com.apple.compilers.gcc.4_0;
>   9 Name = "MPIF90";
>  10 Version = "Default";
>  11 Description = "MPI GNU C/C++ Compiler 4.0";
>  12 ExecPath = "/usr/local/bin/mpif90";  // This gets
> converted to the g++ variant automatically
>  13 PrecompStyle = pch;
>  14 }
>
> and is located in "/Developer/Library/Xcode/Plug-ins"
>
> and when I do mpif90 -v on terminal it works well:
>
> Using built-in specs.
> Target: i386-apple-darwin8.10.1
> Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure --
> prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/
> tmp/gfortran-20090321/gfortran_libs --enable-bootstrap
> Thread model: posix
> gcc version 4.4.0 20090321 (experimental) [trunk revision 144983]
> (GCC)
>
>
> Any idea??
>
> Thanks.
>
> Vincent
> -- next part --
> HTML attachment scrubbed and removed
>
> --
>
> Message: 2
> Date: Mon, 4 May 2009 08:28:26 -0600
> From: Ralph Castain 
> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
> To: Open MPI Users 
> Message-ID:
>   <71d2d8cc0905040728h2002f4d7s4c49219eee29e...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Unfortunately, I didn't write any of that code - I was just fixing  
the

> mapper so it would properly map the procs. From what I can tell, the
> proper
> things are happening there.
>
> I'll have to dig into the code that specifically deals with parsing
> the
> results to bind the processes. Afraid that will take awhile longer -
> pretty
> dark in that hole.
>
>
> On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot
>  wrote:
>
>> Hi,
>>
>> So, there are no more crashes with my "crazy" mpirun command. But  
the
>> paffinity feature seems to be broken. Indeed I am not able to pin  
my

>> processes.
>>
>> Simple test with a program using your plpa library :
>>
>> r011n006% cat hostf
>> r011n006 slots=4
>>
>> r011n006% cat rankf
>> rank 0=r011n006 slot=0   > bind to CPU 0 , exact ?
>>
>> r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf --
>> rankfile
>> rankf --wdir /tmp -n 1 a.out
> PLPA Number of processors online: 4
> PLPA Number of processor sockets: 2
> PLPA Socket 0 (ID 0): 2 cores
> PLPA Socket 1 (ID 3): 2 cores
>>
>> Ctrl+Z
>> r011n006%bg
>>
>> r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot
>> R+   gpignot3  9271 97.8 a.out
>>
>> In fact whatever the slot number I put in my rankfile , a.out
>> always runs
>> on the CPU 3. I was looking for it on CPU 0 accordind to my cpuinfo
>> file
>> (see below)
>> The result is the same if I try another syntax (rank 0=r011n006
>> slot=0:0
>> bind to socket 0 - core 0  

Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Vicente
Yes, I already have gfortran compiler on /usr/local/bin, the same path  
as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin  
and on  /Developer/usr/bin says it:


"Unfortunately, this installation of Open MPI was not compiled with
Fortran 90 support.  As such, the mpif90 compiler is non-functional."


That should be the problem, I will have to change the path to use the  
gfortran I have installed.

How could I do it? (Sorry, I am beginner)

Thanks.


El 04/05/2009, a las 17:38, Warner Yuen escribió:

Have you installed a Fortran compiler? Mac OS X's developer tools do  
not come with a Fortran compiler, so you'll need to install one if  
you haven't already done so. I routinely use the Intel IFORT  
compilers with success. However, I hear many good things about the  
gfortran compilers on Mac OS X, you can't beat the price of gfortran!



Warner Yuen
Scientific Computing
Consulting Engineer
Apple, Inc.
email: wy...@apple.com
Tel: 408.718.2859




On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote:


Send users mailing list submissions to
us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
users-requ...@open-mpi.org

You can reach the person managing the list at
users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

 1. How do I compile OpenMPI in Xcode 3.1 (Vicente)
 2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain)


--

Message: 1
Date: Mon, 4 May 2009 16:12:44 +0200
From: Vicente 
Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1
To: us...@open-mpi.org
Message-ID: <1c2c0085-940f-43bb-910f-975871ae2...@gmail.com>
Content-Type: text/plain; charset="windows-1252"; Format="flowed";
DelSp="yes"

Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in
Xcode", but it's only for MPICC. I am using MPIF90, so I did the  
same,
but changing MPICC for MPIF90, and also the path, but it did not  
work.


Building target ?fortran? of project ?fortran? with configuration
?Debug?


Checking Dependencies
Invalid value 'MPIF90' for GCC_VERSION


The file "MPIF90.cpcompspec" looks like this:

 1 /**
 2 Xcode Coompiler Specification for MPIF90
 3
 4 */
 5
 6 {   Type = Compiler;
 7 Identifier = com.apple.compilers.mpif90;
 8 BasedOn = com.apple.compilers.gcc.4_0;
 9 Name = "MPIF90";
10 Version = "Default";
11 Description = "MPI GNU C/C++ Compiler 4.0";
12 ExecPath = "/usr/local/bin/mpif90";  // This gets
converted to the g++ variant automatically
13 PrecompStyle = pch;
14 }

and is located in "/Developer/Library/Xcode/Plug-ins"

and when I do mpif90 -v on terminal it works well:

Using built-in specs.
Target: i386-apple-darwin8.10.1
Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure --
prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/
tmp/gfortran-20090321/gfortran_libs --enable-bootstrap
Thread model: posix
gcc version 4.4.0 20090321 (experimental) [trunk revision 144983]  
(GCC)



Any idea??

Thanks.

Vincent
-- next part --
HTML attachment scrubbed and removed

--

Message: 2
Date: Mon, 4 May 2009 08:28:26 -0600
From: Ralph Castain 
Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
To: Open MPI Users 
Message-ID:
<71d2d8cc0905040728h2002f4d7s4c49219eee29e...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Unfortunately, I didn't write any of that code - I was just fixing  
the
mapper so it would properly map the procs. From what I can tell,  
the proper

things are happening there.

I'll have to dig into the code that specifically deals with parsing  
the
results to bind the processes. Afraid that will take awhile longer  
- pretty

dark in that hole.


On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot  
 wrote:



Hi,

So, there are no more crashes with my "crazy" mpirun command. But  
the

paffinity feature seems to be broken. Indeed I am not able to pin my
processes.

Simple test with a program using your plpa library :

r011n006% cat hostf
r011n006 slots=4

r011n006% cat rankf
rank 0=r011n006 slot=0   > bind to CPU 0 , exact ?

r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf -- 
rankfile

rankf --wdir /tmp -n 1 a.out

PLPA Number of processors online: 4
PLPA Number of processor sockets: 2
PLPA Socket 0 (ID 0): 2 cores
PLPA Socket 1 (ID 3): 2 cores


Ctrl+Z
r011n006%bg

r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot
R+   gpignot3  9271 97.8 a.out

In fact whatever the slot number I put in my rankfile , a.out  
always runs
on the CPU 3. I was looking for it on CPU 0 accordind to my  
cpuinfo file

(see below)
The result is the same if I try 

Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Vicente
I can use openmpi from terminal, but I am having problems with gdb, so  
I wanted to know if there was posible to use openmpi with Xcode.


However, for mac users, which is the best way to compile and debug an  
mpi program??.


Thanks.

Vincent


El 04/05/2009, a las 17:42, Jeff Squyres escribió:

FWIW, I don't use Xcode, but I use the precompiled gcc/gfortran from  
here with good success:


   http://hpc.sourceforge.net/



On May 4, 2009, at 11:38 AM, Warner Yuen wrote:


Have you installed a Fortran compiler? Mac OS X's developer tools do
not come with a Fortran compiler, so you'll need to install one if  
you

haven't already done so. I routinely use the Intel IFORT compilers
with success. However, I hear many good things about the gfortran
compilers on Mac OS X, you can't beat the price of gfortran!


Warner Yuen
Scientific Computing
Consulting Engineer
Apple, Inc.
email: wy...@apple.com
Tel: 408.718.2859




On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote:

> Send users mailing list submissions to
>   us...@open-mpi.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>   http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
>   users-requ...@open-mpi.org
>
> You can reach the person managing the list at
>   users-ow...@open-mpi.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
>   1. How do I compile OpenMPI in Xcode 3.1 (Vicente)
>   2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain)
>
>
>  
--

>
> Message: 1
> Date: Mon, 4 May 2009 16:12:44 +0200
> From: Vicente 
> Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1
> To: us...@open-mpi.org
> Message-ID: <1c2c0085-940f-43bb-910f-975871ae2...@gmail.com>
> Content-Type: text/plain; charset="windows-1252"; Format="flowed";
>   DelSp="yes"
>
> Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in
> Xcode", but it's only for MPICC. I am using MPIF90, so I did the  
same,
> but changing MPICC for MPIF90, and also the path, but it did not  
work.

>
> Building target ?fortran? of project ?fortran? with configuration
> ?Debug?
>
>
> Checking Dependencies
> Invalid value 'MPIF90' for GCC_VERSION
>
>
> The file "MPIF90.cpcompspec" looks like this:
>
>   1 /**
>   2 Xcode Coompiler Specification for MPIF90
>   3
>   4 */
>   5
>   6 {   Type = Compiler;
>   7 Identifier = com.apple.compilers.mpif90;
>   8 BasedOn = com.apple.compilers.gcc.4_0;
>   9 Name = "MPIF90";
>  10 Version = "Default";
>  11 Description = "MPI GNU C/C++ Compiler 4.0";
>  12 ExecPath = "/usr/local/bin/mpif90";  // This gets
> converted to the g++ variant automatically
>  13 PrecompStyle = pch;
>  14 }
>
> and is located in "/Developer/Library/Xcode/Plug-ins"
>
> and when I do mpif90 -v on terminal it works well:
>
> Using built-in specs.
> Target: i386-apple-darwin8.10.1
> Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure --
> prefix=/usr/local/gfortran --enable-languages=c,fortran --with- 
gmp=/

> tmp/gfortran-20090321/gfortran_libs --enable-bootstrap
> Thread model: posix
> gcc version 4.4.0 20090321 (experimental) [trunk revision 144983]
> (GCC)
>
>
> Any idea??
>
> Thanks.
>
> Vincent
> -- next part --
> HTML attachment scrubbed and removed
>
> --
>
> Message: 2
> Date: Mon, 4 May 2009 08:28:26 -0600
> From: Ralph Castain 
> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
> To: Open MPI Users 
> Message-ID:
>   <71d2d8cc0905040728h2002f4d7s4c49219eee29e...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Unfortunately, I didn't write any of that code - I was just  
fixing the
> mapper so it would properly map the procs. From what I can tell,  
the

> proper
> things are happening there.
>
> I'll have to dig into the code that specifically deals with parsing
> the
> results to bind the processes. Afraid that will take awhile  
longer -

> pretty
> dark in that hole.
>
>
> On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot
>  wrote:
>
>> Hi,
>>
>> So, there are no more crashes with my "crazy" mpirun command.  
But the
>> paffinity feature seems to be broken. Indeed I am not able to  
pin my

>> processes.
>>
>> Simple test with a program using your plpa library :
>>
>> r011n006% cat hostf
>> r011n006 slots=4
>>
>> r011n006% cat rankf
>> rank 0=r011n006 slot=0   > bind to CPU 0 , exact ?
>>
>> r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf --
>> rankfile
>> rankf --wdir /tmp -n 1 a.out
> PLPA Number of processors online: 4
> PLPA Number of processor sockets: 2
> PLPA Socket 0 (ID 0): 2 cores
> PLPA Socket 1 (ID 3): 2 cores
>>
>> Ctrl+Z
>> r011n006%bg
>>
>> r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot
>> R+   gpigno

Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Jeff Squyres
Open MPI comes pre-installed in Leopard; as Warner noted, since  
Leopard doesn't ship with a Fortran compiler, the Open MPI that Apple  
ships has non-functional mpif77 and mpif90 wrapper compilers.


So the Open MPI that you installed manually will use your Fortran  
compilers, and therefore will have functional mpif77 and mpif90  
wrapper compilers.  Hence, you probably need to be sure to use the  
"right" wrapper compilers.  It looks like you specified the full path  
specified to ExecPath, so I'm not sure why Xcode wouldn't work with  
that (like I mentioned, I unfortunately don't use Xcode myself, so I  
don't know why that wouldn't work).




On May 4, 2009, at 11:53 AM, Vicente wrote:


Yes, I already have gfortran compiler on /usr/local/bin, the same path
as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin
and on  /Developer/usr/bin says it:

"Unfortunately, this installation of Open MPI was not compiled with
Fortran 90 support.  As such, the mpif90 compiler is non-functional."


That should be the problem, I will have to change the path to use the
gfortran I have installed.
How could I do it? (Sorry, I am beginner)

Thanks.


El 04/05/2009, a las 17:38, Warner Yuen escribió:

> Have you installed a Fortran compiler? Mac OS X's developer tools do
> not come with a Fortran compiler, so you'll need to install one if
> you haven't already done so. I routinely use the Intel IFORT
> compilers with success. However, I hear many good things about the
> gfortran compilers on Mac OS X, you can't beat the price of  
gfortran!

>
>
> Warner Yuen
> Scientific Computing
> Consulting Engineer
> Apple, Inc.
> email: wy...@apple.com
> Tel: 408.718.2859
>
>
>
>
> On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote:
>
>> Send users mailing list submissions to
>>  us...@open-mpi.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>  http://www.open-mpi.org/mailman/listinfo.cgi/users
>> or, via email, send a message with subject or body 'help' to
>>  users-requ...@open-mpi.org
>>
>> You can reach the person managing the list at
>>  users-ow...@open-mpi.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of users digest..."
>>
>>
>> Today's Topics:
>>
>>  1. How do I compile OpenMPI in Xcode 3.1 (Vicente)
>>  2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain)
>>
>>
>>  
--

>>
>> Message: 1
>> Date: Mon, 4 May 2009 16:12:44 +0200
>> From: Vicente 
>> Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1
>> To: us...@open-mpi.org
>> Message-ID: <1c2c0085-940f-43bb-910f-975871ae2...@gmail.com>
>> Content-Type: text/plain; charset="windows-1252"; Format="flowed";
>>  DelSp="yes"
>>
>> Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in
>> Xcode", but it's only for MPICC. I am using MPIF90, so I did the
>> same,
>> but changing MPICC for MPIF90, and also the path, but it did not
>> work.
>>
>> Building target ?fortran? of project ?fortran? with configuration
>> ?Debug?
>>
>>
>> Checking Dependencies
>> Invalid value 'MPIF90' for GCC_VERSION
>>
>>
>> The file "MPIF90.cpcompspec" looks like this:
>>
>>  1 /**
>>  2 Xcode Coompiler Specification for MPIF90
>>  3
>>  4 */
>>  5
>>  6 {   Type = Compiler;
>>  7 Identifier = com.apple.compilers.mpif90;
>>  8 BasedOn = com.apple.compilers.gcc.4_0;
>>  9 Name = "MPIF90";
>> 10 Version = "Default";
>> 11 Description = "MPI GNU C/C++ Compiler 4.0";
>> 12 ExecPath = "/usr/local/bin/mpif90";  // This gets
>> converted to the g++ variant automatically
>> 13 PrecompStyle = pch;
>> 14 }
>>
>> and is located in "/Developer/Library/Xcode/Plug-ins"
>>
>> and when I do mpif90 -v on terminal it works well:
>>
>> Using built-in specs.
>> Target: i386-apple-darwin8.10.1
>> Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure --
>> prefix=/usr/local/gfortran --enable-languages=c,fortran --with- 
gmp=/

>> tmp/gfortran-20090321/gfortran_libs --enable-bootstrap
>> Thread model: posix
>> gcc version 4.4.0 20090321 (experimental) [trunk revision 144983]
>> (GCC)
>>
>>
>> Any idea??
>>
>> Thanks.
>>
>> Vincent
>> -- next part --
>> HTML attachment scrubbed and removed
>>
>> --
>>
>> Message: 2
>> Date: Mon, 4 May 2009 08:28:26 -0600
>> From: Ralph Castain 
>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>> To: Open MPI Users 
>> Message-ID:
>>  <71d2d8cc0905040728h2002f4d7s4c49219eee29e...@mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Unfortunately, I didn't write any of that code - I was just fixing
>> the
>> mapper so it would properly map the procs. From what I can tell,
>> the proper
>> things are happening there.
>>
>> I'll have to dig into the code that specifically deals with parsing
>> the
>> results to bind the processes. Afraid that will take awhi

Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Vicente Puig
If I can not make it work with Xcode,  which one could I use?, which one do
you use to compile and debug OpenMPI?.
Thanks

Vincent


2009/5/4 Jeff Squyres 

> Open MPI comes pre-installed in Leopard; as Warner noted, since Leopard
> doesn't ship with a Fortran compiler, the Open MPI that Apple ships has
> non-functional mpif77 and mpif90 wrapper compilers.
>
> So the Open MPI that you installed manually will use your Fortran
> compilers, and therefore will have functional mpif77 and mpif90 wrapper
> compilers.  Hence, you probably need to be sure to use the "right" wrapper
> compilers.  It looks like you specified the full path specified to ExecPath,
> so I'm not sure why Xcode wouldn't work with that (like I mentioned, I
> unfortunately don't use Xcode myself, so I don't know why that wouldn't
> work).
>
>
>
>
> On May 4, 2009, at 11:53 AM, Vicente wrote:
>
>  Yes, I already have gfortran compiler on /usr/local/bin, the same path
>> as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin
>> and on  /Developer/usr/bin says it:
>>
>> "Unfortunately, this installation of Open MPI was not compiled with
>> Fortran 90 support.  As such, the mpif90 compiler is non-functional."
>>
>>
>> That should be the problem, I will have to change the path to use the
>> gfortran I have installed.
>> How could I do it? (Sorry, I am beginner)
>>
>> Thanks.
>>
>>
>> El 04/05/2009, a las 17:38, Warner Yuen escribió:
>>
>> > Have you installed a Fortran compiler? Mac OS X's developer tools do
>> > not come with a Fortran compiler, so you'll need to install one if
>> > you haven't already done so. I routinely use the Intel IFORT
>> > compilers with success. However, I hear many good things about the
>> > gfortran compilers on Mac OS X, you can't beat the price of gfortran!
>> >
>> >
>> > Warner Yuen
>> > Scientific Computing
>> > Consulting Engineer
>> > Apple, Inc.
>> > email: wy...@apple.com
>> > Tel: 408.718.2859
>> >
>> >
>> >
>> >
>> > On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote:
>> >
>> >> Send users mailing list submissions to
>> >>  us...@open-mpi.org
>> >>
>> >> To subscribe or unsubscribe via the World Wide Web, visit
>> >>  http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >> or, via email, send a message with subject or body 'help' to
>> >>  users-requ...@open-mpi.org
>> >>
>> >> You can reach the person managing the list at
>> >>  users-ow...@open-mpi.org
>> >>
>> >> When replying, please edit your Subject line so it is more specific
>> >> than "Re: Contents of users digest..."
>> >>
>> >>
>> >> Today's Topics:
>> >>
>> >>  1. How do I compile OpenMPI in Xcode 3.1 (Vicente)
>> >>  2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain)
>> >>
>> >>
>> >> --
>> >>
>> >> Message: 1
>> >> Date: Mon, 4 May 2009 16:12:44 +0200
>> >> From: Vicente 
>> >> Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1
>> >> To: us...@open-mpi.org
>> >> Message-ID: <1c2c0085-940f-43bb-910f-975871ae2...@gmail.com>
>> >> Content-Type: text/plain; charset="windows-1252"; Format="flowed";
>> >>  DelSp="yes"
>> >>
>> >> Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in
>> >> Xcode", but it's only for MPICC. I am using MPIF90, so I did the
>> >> same,
>> >> but changing MPICC for MPIF90, and also the path, but it did not
>> >> work.
>> >>
>> >> Building target ?fortran? of project ?fortran? with configuration
>> >> ?Debug?
>> >>
>> >>
>> >> Checking Dependencies
>> >> Invalid value 'MPIF90' for GCC_VERSION
>> >>
>> >>
>> >> The file "MPIF90.cpcompspec" looks like this:
>> >>
>> >>  1 /**
>> >>  2 Xcode Coompiler Specification for MPIF90
>> >>  3
>> >>  4 */
>> >>  5
>> >>  6 {   Type = Compiler;
>> >>  7 Identifier = com.apple.compilers.mpif90;
>> >>  8 BasedOn = com.apple.compilers.gcc.4_0;
>> >>  9 Name = "MPIF90";
>> >> 10 Version = "Default";
>> >> 11 Description = "MPI GNU C/C++ Compiler 4.0";
>> >> 12 ExecPath = "/usr/local/bin/mpif90";  // This gets
>> >> converted to the g++ variant automatically
>> >> 13 PrecompStyle = pch;
>> >> 14 }
>> >>
>> >> and is located in "/Developer/Library/Xcode/Plug-ins"
>> >>
>> >> and when I do mpif90 -v on terminal it works well:
>> >>
>> >> Using built-in specs.
>> >> Target: i386-apple-darwin8.10.1
>> >> Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure --
>> >> prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/
>> >> tmp/gfortran-20090321/gfortran_libs --enable-bootstrap
>> >> Thread model: posix
>> >> gcc version 4.4.0 20090321 (experimental) [trunk revision 144983]
>> >> (GCC)
>> >>
>> >>
>> >> Any idea??
>> >>
>> >> Thanks.
>> >>
>> >> Vincent
>> >> -- next part --
>> >> HTML attachment scrubbed and removed
>> >>
>> >> --
>> >>
>> >> Message: 2
>> >> Date: Mon, 4 May 2009 08:28:26 -0600
>> >> From: Ralph Castain 
>> >> Subject: Re: [O

[OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Warner Yuen

Admittedly, I don't use Xcode to build Open MPI either.

You can just compile Open MPI from the command line and install  
everything in /usr/local/. Make sure that gfortran is set in your path  
and you should just be able to do a './configure --prefix=/usr/local'


After the installation, just make sure that your path is set correctly  
when you go to use the newly installed Open MPI. If you don't set your  
path, it will always default to using the version of OpenMPI that  
ships with Leopard.



Warner Yuen
Scientific Computing
Consulting Engineer
Apple, Inc.
email: wy...@apple.com
Tel: 408.718.2859




On May 4, 2009, at 9:13 AM, users-requ...@open-mpi.org wrote:


Send users mailing list submissions to
us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
users-requ...@open-mpi.org

You can reach the person managing the list at
users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

  1. Re: How do I compile OpenMPI in Xcode 3.1 (Vicente Puig)


--

Message: 1
Date: Mon, 4 May 2009 18:13:45 +0200
From: Vicente Puig 
Subject: Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
To: Open MPI Users 
Message-ID:
<3e9a21680905040913u3f36d3c9rdcd3413bfdcd...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

If I can not make it work with Xcode,  which one could I use?, which  
one do

you use to compile and debug OpenMPI?.
Thanks

Vincent


2009/5/4 Jeff Squyres 

Open MPI comes pre-installed in Leopard; as Warner noted, since  
Leopard
doesn't ship with a Fortran compiler, the Open MPI that Apple ships  
has

non-functional mpif77 and mpif90 wrapper compilers.

So the Open MPI that you installed manually will use your Fortran
compilers, and therefore will have functional mpif77 and mpif90  
wrapper
compilers.  Hence, you probably need to be sure to use the "right"  
wrapper
compilers.  It looks like you specified the full path specified to  
ExecPath,
so I'm not sure why Xcode wouldn't work with that (like I  
mentioned, I
unfortunately don't use Xcode myself, so I don't know why that  
wouldn't

work).




On May 4, 2009, at 11:53 AM, Vicente wrote:

Yes, I already have gfortran compiler on /usr/local/bin, the same  
path
as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/ 
bin

and on  /Developer/usr/bin says it:

"Unfortunately, this installation of Open MPI was not compiled with
Fortran 90 support.  As such, the mpif90 compiler is non- 
functional."



That should be the problem, I will have to change the path to use  
the

gfortran I have installed.
How could I do it? (Sorry, I am beginner)

Thanks.


El 04/05/2009, a las 17:38, Warner Yuen escribi?:

Have you installed a Fortran compiler? Mac OS X's developer tools  
do

not come with a Fortran compiler, so you'll need to install one if
you haven't already done so. I routinely use the Intel IFORT
compilers with success. However, I hear many good things about the
gfortran compilers on Mac OS X, you can't beat the price of  
gfortran!



Warner Yuen
Scientific Computing
Consulting Engineer
Apple, Inc.
email: wy...@apple.com
Tel: 408.718.2859




On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote:


Send users mailing list submissions to
us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
users-requ...@open-mpi.org

You can reach the person managing the list at
users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more  
specific

than "Re: Contents of users digest..."


Today's Topics:

1. How do I compile OpenMPI in Xcode 3.1 (Vicente)
2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain)


--

Message: 1
Date: Mon, 4 May 2009 16:12:44 +0200
From: Vicente 
Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1
To: us...@open-mpi.org
Message-ID: <1c2c0085-940f-43bb-910f-975871ae2...@gmail.com>
Content-Type: text/plain; charset="windows-1252"; Format="flowed";
DelSp="yes"

Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in
Xcode", but it's only for MPICC. I am using MPIF90, so I did the
same,
but changing MPICC for MPIF90, and also the path, but it did not
work.

Building target ?fortran? of project ?fortran? with configuration
?Debug?


Checking Dependencies
Invalid value 'MPIF90' for GCC_VERSION


The file "MPIF90.cpcompspec" looks like this:

1 /**
2 Xcode Coompiler Specification for MPIF90
3
4 */
5
6 {   Type = Compiler;
7 Identifier = com.apple.compilers.mpif90;
8 BasedOn = com.apple.comp

Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Vicente Puig
I can run openmpi perfectly with command line, but I wanted a graphic
interface for debugging because I was having problems.
Thanks anyway.

Vincent

2009/5/4 Warner Yuen 

> Admittedly, I don't use Xcode to build Open MPI either.
>
> You can just compile Open MPI from the command line and install everything
> in /usr/local/. Make sure that gfortran is set in your path and you should
> just be able to do a './configure --prefix=/usr/local'
>
> After the installation, just make sure that your path is set correctly when
> you go to use the newly installed Open MPI. If you don't set your path, it
> will always default to using the version of OpenMPI that ships with Leopard.
>
>
> Warner Yuen
> Scientific Computing
> Consulting Engineer
> Apple, Inc.
> email: wy...@apple.com
> Tel: 408.718.2859
>
>
>
>
> On May 4, 2009, at 9:13 AM, users-requ...@open-mpi.org wrote:
>
>  Send users mailing list submissions to
>>us...@open-mpi.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>> or, via email, send a message with subject or body 'help' to
>>users-requ...@open-mpi.org
>>
>> You can reach the person managing the list at
>>users-ow...@open-mpi.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of users digest..."
>>
>>
>> Today's Topics:
>>
>>  1. Re: How do I compile OpenMPI in Xcode 3.1 (Vicente Puig)
>>
>>
>> --
>>
>> Message: 1
>> Date: Mon, 4 May 2009 18:13:45 +0200
>> From: Vicente Puig 
>> Subject: Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
>> To: Open MPI Users 
>> Message-ID:
>><3e9a21680905040913u3f36d3c9rdcd3413bfdcd...@mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> If I can not make it work with Xcode,  which one could I use?, which one
>> do
>> you use to compile and debug OpenMPI?.
>> Thanks
>>
>> Vincent
>>
>>
>> 2009/5/4 Jeff Squyres 
>>
>>  Open MPI comes pre-installed in Leopard; as Warner noted, since Leopard
>>> doesn't ship with a Fortran compiler, the Open MPI that Apple ships has
>>> non-functional mpif77 and mpif90 wrapper compilers.
>>>
>>> So the Open MPI that you installed manually will use your Fortran
>>> compilers, and therefore will have functional mpif77 and mpif90 wrapper
>>> compilers.  Hence, you probably need to be sure to use the "right"
>>> wrapper
>>> compilers.  It looks like you specified the full path specified to
>>> ExecPath,
>>> so I'm not sure why Xcode wouldn't work with that (like I mentioned, I
>>> unfortunately don't use Xcode myself, so I don't know why that wouldn't
>>> work).
>>>
>>>
>>>
>>>
>>> On May 4, 2009, at 11:53 AM, Vicente wrote:
>>>
>>> Yes, I already have gfortran compiler on /usr/local/bin, the same path
>>>
 as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin
 and on  /Developer/usr/bin says it:

 "Unfortunately, this installation of Open MPI was not compiled with
 Fortran 90 support.  As such, the mpif90 compiler is non-functional."


 That should be the problem, I will have to change the path to use the
 gfortran I have installed.
 How could I do it? (Sorry, I am beginner)

 Thanks.


 El 04/05/2009, a las 17:38, Warner Yuen escribi?:

  Have you installed a Fortran compiler? Mac OS X's developer tools do
> not come with a Fortran compiler, so you'll need to install one if
> you haven't already done so. I routinely use the Intel IFORT
> compilers with success. However, I hear many good things about the
> gfortran compilers on Mac OS X, you can't beat the price of gfortran!
>
>
> Warner Yuen
> Scientific Computing
> Consulting Engineer
> Apple, Inc.
> email: wy...@apple.com
> Tel: 408.718.2859
>
>
>
>
> On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote:
>
>  Send users mailing list submissions to
>>us...@open-mpi.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>> or, via email, send a message with subject or body 'help' to
>>users-requ...@open-mpi.org
>>
>> You can reach the person managing the list at
>>users-ow...@open-mpi.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of users digest..."
>>
>>
>> Today's Topics:
>>
>> 1. How do I compile OpenMPI in Xcode 3.1 (Vicente)
>> 2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain)
>>
>>
>> --
>>
>> Message: 1
>> Date: Mon, 4 May 2009 16:12:44 +0200
>> From: Vicente 
>> Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1
>> To: us...@open-mpi.org
>>

Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Vicente Puig
Maybe I had to open a new thread, but if you have any idea why I receive it
when I use gdb for debugging an openmpi program:
warning: Could not find object file
"/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_umoddi3_s.o" - no debug
information available for
"../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".


warning: Could not find object file
"/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udiv_w_sdiv_s.o" - no
debug information available for
"../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".


warning: Could not find object file
"/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udivmoddi4_s.o" - no
debug information available for
"../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".


warning: Could not find object file
"/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2_s.o" - no debug
information available for
"../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2.c".


warning: Could not find object file
"/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2-fde-darwin_s.o"
- no debug information available for
"../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2-fde-darwin.c".


warning: Could not find object file
"/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-c_s.o" - no debug
information available for
"../../../gcc-4.3-20071026/libgcc/../gcc/unwind-c.c".
...



There is no 'admin' so I don't know why it happen. It works well with a C
program.

Any idea??.

Thanks.


Vincent





2009/5/4 Vicente Puig 

> I can run openmpi perfectly with command line, but I wanted a graphic
> interface for debugging because I was having problems.
> Thanks anyway.
>
> Vincent
>
> 2009/5/4 Warner Yuen 
>
> Admittedly, I don't use Xcode to build Open MPI either.
>>
>> You can just compile Open MPI from the command line and install everything
>> in /usr/local/. Make sure that gfortran is set in your path and you should
>> just be able to do a './configure --prefix=/usr/local'
>>
>> After the installation, just make sure that your path is set correctly
>> when you go to use the newly installed Open MPI. If you don't set your path,
>> it will always default to using the version of OpenMPI that ships with
>> Leopard.
>>
>>
>> Warner Yuen
>> Scientific Computing
>> Consulting Engineer
>> Apple, Inc.
>> email: wy...@apple.com
>> Tel: 408.718.2859
>>
>>
>>
>>
>> On May 4, 2009, at 9:13 AM, users-requ...@open-mpi.org wrote:
>>
>>  Send users mailing list submissions to
>>>us...@open-mpi.org
>>>
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> or, via email, send a message with subject or body 'help' to
>>>users-requ...@open-mpi.org
>>>
>>> You can reach the person managing the list at
>>>users-ow...@open-mpi.org
>>>
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of users digest..."
>>>
>>>
>>> Today's Topics:
>>>
>>>  1. Re: How do I compile OpenMPI in Xcode 3.1 (Vicente Puig)
>>>
>>>
>>> --
>>>
>>> Message: 1
>>> Date: Mon, 4 May 2009 18:13:45 +0200
>>> From: Vicente Puig 
>>> Subject: Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
>>> To: Open MPI Users 
>>> Message-ID:
>>><3e9a21680905040913u3f36d3c9rdcd3413bfdcd...@mail.gmail.com>
>>> Content-Type: text/plain; charset="iso-8859-1"
>>>
>>> If I can not make it work with Xcode,  which one could I use?, which one
>>> do
>>> you use to compile and debug OpenMPI?.
>>> Thanks
>>>
>>> Vincent
>>>
>>>
>>> 2009/5/4 Jeff Squyres 
>>>
>>>  Open MPI comes pre-installed in Leopard; as Warner noted, since Leopard
 doesn't ship with a Fortran compiler, the Open MPI that Apple ships has
 non-functional mpif77 and mpif90 wrapper compilers.

 So the Open MPI that you installed manually will use your Fortran
 compilers, and therefore will have functional mpif77 and mpif90 wrapper
 compilers.  Hence, you probably need to be sure to use the "right"
 wrapper
 compilers.  It looks like you specified the full path specified to
 ExecPath,
 so I'm not sure why Xcode wouldn't work with that (like I mentioned, I
 unfortunately don't use Xcode myself, so I don't know why that wouldn't
 work).




 On May 4, 2009, at 11:53 AM, Vicente wrote:

 Yes, I already have gfortran compiler on /usr/local/bin, the same path

> as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin
> and on  /Developer/usr/bin says it:
>
> "Unfortunately, this installation of Open MPI was not compiled with
> Fortran 90 support.  As such, the mpif90 compiler is non-functional."
>
>
> That should be the problem, I will have to change the path to use the
> gfortran I have installed.
> How could I do it? (Sorry, I am beginner)
>
> Thanks.
>
>
> El 04/05/2009, a las 17:38, Warner Yuen escribi?:
>
>  Have you 

[OMPI users] error while loading shared libraries: libcr.so.0: cannot open shared object file: No such file or directory.

2009-05-04 Thread Kritiraj Sajadah

Dear All,
I have install openmpi and blcr on my laptop and is trying to 
checkpoint an mpi application.

Both openmpi and blcr are installed in /usr/local.

When i try to checkpoint and mpi application, i get the following error:

error while loading shared libraries: libcr.so.0: cannot open shared object 
file: No such file or directory.

Any help would be very much appreciated.

Regards,

Raj





Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Jeff Squyres
I get those as well.  I believe that they are (annoying but) harmless  
-- an artifact of how the freeware gcc/gofrtran that I use was built.



On May 4, 2009, at 1:47 PM, Vicente Puig wrote:

Maybe I had to open a new thread, but if you have any idea why I  
receive it when I use gdb for debugging an openmpi program:


warning: Could not find object file "/Users/admin/build/i386-apple- 
darwin9.0.0/libgcc/_umoddi3_s.o" - no debug information available  
for "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".



warning: Could not find object file "/Users/admin/build/i386-apple- 
darwin9.0.0/libgcc/_udiv_w_sdiv_s.o" - no debug information  
available for "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".



warning: Could not find object file "/Users/admin/build/i386-apple- 
darwin9.0.0/libgcc/_udivmoddi4_s.o" - no debug information available  
for "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".



warning: Could not find object file "/Users/admin/build/i386-apple- 
darwin9.0.0/libgcc/unwind-dw2_s.o" - no debug information available  
for "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2.c".



warning: Could not find object file "/Users/admin/build/i386-apple- 
darwin9.0.0/libgcc/unwind-dw2-fde-darwin_s.o" - no debug information  
available for "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2- 
fde-darwin.c".



warning: Could not find object file "/Users/admin/build/i386-apple- 
darwin9.0.0/libgcc/unwind-c_s.o" - no debug information available  
for "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-c.c".

...



There is no 'admin' so I don't know why it happen. It works well  
with a C program.


Any idea??.

Thanks.


Vincent





2009/5/4 Vicente Puig 
I can run openmpi perfectly with command line, but I wanted a  
graphic interface for debugging because I was having problems.


Thanks anyway.

Vincent

2009/5/4 Warner Yuen 

Admittedly, I don't use Xcode to build Open MPI either.

You can just compile Open MPI from the command line and install  
everything in /usr/local/. Make sure that gfortran is set in your  
path and you should just be able to do a './configure --prefix=/usr/ 
local'


After the installation, just make sure that your path is set  
correctly when you go to use the newly installed Open MPI. If you  
don't set your path, it will always default to using the version of  
OpenMPI that ships with Leopard.



Warner Yuen
Scientific Computing
Consulting Engineer
Apple, Inc.
email: wy...@apple.com
Tel: 408.718.2859




On May 4, 2009, at 9:13 AM, users-requ...@open-mpi.org wrote:

Send users mailing list submissions to
   us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
   http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
   users-requ...@open-mpi.org

You can reach the person managing the list at
   users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

 1. Re: How do I compile OpenMPI in Xcode 3.1 (Vicente Puig)


--

Message: 1
Date: Mon, 4 May 2009 18:13:45 +0200
From: Vicente Puig 
Subject: Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
To: Open MPI Users 
Message-ID:
   <3e9a21680905040913u3f36d3c9rdcd3413bfdcd...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

If I can not make it work with Xcode,  which one could I use?, which  
one do

you use to compile and debug OpenMPI?.
Thanks

Vincent


2009/5/4 Jeff Squyres 

Open MPI comes pre-installed in Leopard; as Warner noted, since  
Leopard
doesn't ship with a Fortran compiler, the Open MPI that Apple ships  
has

non-functional mpif77 and mpif90 wrapper compilers.

So the Open MPI that you installed manually will use your Fortran
compilers, and therefore will have functional mpif77 and mpif90  
wrapper
compilers.  Hence, you probably need to be sure to use the "right"  
wrapper
compilers.  It looks like you specified the full path specified to  
ExecPath,

so I'm not sure why Xcode wouldn't work with that (like I mentioned, I
unfortunately don't use Xcode myself, so I don't know why that  
wouldn't

work).




On May 4, 2009, at 11:53 AM, Vicente wrote:

Yes, I already have gfortran compiler on /usr/local/bin, the same path
as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin
and on  /Developer/usr/bin says it:

"Unfortunately, this installation of Open MPI was not compiled with
Fortran 90 support.  As such, the mpif90 compiler is non-functional."


That should be the problem, I will have to change the path to use the
gfortran I have installed.
How could I do it? (Sorry, I am beginner)

Thanks.


El 04/05/2009, a las 17:38, Warner Yuen escribi?:

Have you installed a Fortran compiler? Mac OS X's developer tools do
not come with a Fortran compiler, so you'll need to install one if
you haven't already done 

Re: [OMPI users] mca: base: component_find: unable toopen/usr/local/lib/openmpi/mca_crs_blcr: file not found (ignored)

2009-05-04 Thread Jeff Squyres

On May 4, 2009, at 9:57 AM, Kritiraj Sajadah wrote:

  In fact i am testing it on my laptop before installing it  
on the cluster.


I downloaded BLCR and installed it in /usr/local on my laptop

Then i installed openmpi using the following option:

 ./configure --prefix=/usr/local --with-ft=cr --enable-ft-thread -- 
enable-mpi-threads --with-blcr=/usr/local/lib




The --with-blcr clause doesn't look right -- shouldn't that be --with- 
blcr=/usr/local?  Check the output of ompi_info and ensure that you  
actually built with BLCR support properly...


So, everything is installed and tested on my laptop for now but i am  
still getting the error.





Does the LD_LIBRARY_PATH environment variable in the shell where  
you're invoking mpirun include /usr/local/lib?  (or whatever directory  
libblcr.so is located)


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-05-04 Thread Ralph Castain
Hmmm...I'm afraid I can't replicate the problem. All seems to be working
just fine on the RHEL systems available to me. The procs indeed bind to the
specified processors in every case.

rhc@odin ~/trunk]$ cat rankfile
rank 0=odin001 slot=0
rank 1=odin002 slot=1

[rhc@odin mpi]$ mpirun -rf ../../../rankfile -n 2 --leave-session-attached
-mca paffinity_base_verbose 5 ./mpi_spin
[odin001.cs.indiana.edu:09297 ]
paffinity slot assignment: slot_list == 0
[odin001.cs.indiana.edu:09297 ]
paffinity slot assignment: rank 0 runs on cpu #0 (#0)
[odin002.cs.indiana.edu:13566] paffinity slot assignment: slot_list == 1
[odin002.cs.indiana.edu:13566] paffinity slot assignment: rank 1 runs on cpu
#1 (#1)

Suspended
[rhc@odin mpi]$ ssh odin001
[rhc@odin001 ~]$ ps axo stat,user,psr,pid,pcpu,comm | grep rhc
Srhc0  9296  0.0 orted
RLl  rhc0  9297  100 mpi_spin

[rhc@odin mpi]$ ssh odin002
[rhc@odin002 ~]$ ps axo stat,user,psr,pid,pcpu,comm | grep rhc
Srhc0 13562  0.0 orted
RLl  rhc1 13566  102 mpi_spin


Not sure where to go from here...perhaps someone else can spot the problem?
Ralph


On Mon, May 4, 2009 at 8:28 AM, Ralph Castain  wrote:

> Unfortunately, I didn't write any of that code - I was just fixing the
> mapper so it would properly map the procs. From what I can tell, the proper
> things are happening there.
>
> I'll have to dig into the code that specifically deals with parsing the
> results to bind the processes. Afraid that will take awhile longer - pretty
> dark in that hole.
>
>
>
> On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot wrote:
>
>> Hi,
>>
>> So, there are no more crashes with my "crazy" mpirun command. But the
>> paffinity feature seems to be broken. Indeed I am not able to pin my
>> processes.
>>
>> Simple test with a program using your plpa library :
>>
>> r011n006% cat hostf
>> r011n006 slots=4
>>
>> r011n006% cat rankf
>> rank 0=r011n006 slot=0   > bind to CPU 0 , exact ?
>>
>> r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf --rankfile
>> rankf --wdir /tmp -n 1 a.out
>>  >>> PLPA Number of processors online: 4
>>  >>> PLPA Number of processor sockets: 2
>>  >>> PLPA Socket 0 (ID 0): 2 cores
>>  >>> PLPA Socket 1 (ID 3): 2 cores
>>
>> Ctrl+Z
>> r011n006%bg
>>
>> r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot
>> R+   gpignot3  9271 97.8 a.out
>>
>> In fact whatever the slot number I put in my rankfile , a.out always runs
>> on the CPU 3. I was looking for it on CPU 0 accordind to my cpuinfo file
>> (see below)
>> The result is the same if I try another syntax (rank 0=r011n006 slot=0:0
>> bind to socket 0 - core 0  , exact ? )
>>
>> Thanks in advance
>>
>> Geoffroy
>>
>> PS: I run on rhel5
>>
>> r011n006% uname -a
>> Linux r011n006 2.6.18-92.1.1NOMAP32.el5 #1 SMP Sat Mar 15 01:46:39 CDT
>> 2008 x86_64 x86_64 x86_64 GNU/Linux
>>
>> My configure is :
>>  ./configure --prefix=/tmp/openmpi-1.4a --libdir='${exec_prefix}/lib64'
>> --disable-dlopen --disable-mpi-cxx --enable-heterogeneous
>>
>>
>> r011n006% cat /proc/cpuinfo
>> processor   : 0
>> vendor_id   : GenuineIntel
>> cpu family  : 6
>> model   : 15
>> model name  : Intel(R) Xeon(R) CPU5150  @ 2.66GHz
>> stepping: 6
>> cpu MHz : 2660.007
>> cache size  : 4096 KB
>> physical id : 0
>> siblings: 2
>> core id : 0
>> cpu cores   : 2
>> fpu : yes
>> fpu_exception   : yes
>> cpuid level : 10
>> wp  : yes
>> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm
>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
>> bogomips: 5323.68
>> clflush size: 64
>> cache_alignment : 64
>> address sizes   : 36 bits physical, 48 bits virtual
>> power management:
>>
>> processor   : 1
>> vendor_id   : GenuineIntel
>> cpu family  : 6
>> model   : 15
>> model name  : Intel(R) Xeon(R) CPU5150  @ 2.66GHz
>> stepping: 6
>> cpu MHz : 2660.007
>> cache size  : 4096 KB
>> physical id : 3
>> siblings: 2
>> core id : 0
>> cpu cores   : 2
>> fpu : yes
>> fpu_exception   : yes
>> cpuid level : 10
>> wp  : yes
>> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm
>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
>> bogomips: 5320.03
>> clflush size: 64
>> cache_alignment : 64
>> address sizes   : 36 bits physical, 48 bits virtual
>> power management:
>>
>> processor   : 2
>> vendor_id   : GenuineIntel
>> cpu family  : 6
>> model   : 15
>> model name  : Intel(R) Xeon(R) CPU5150  @ 2.66GHz
>> stepping: 6
>> cpu MHz : 2660.007
>> c

Re: [OMPI users] users Digest, Vol 1217, Issue 2, Message3

2009-05-04 Thread Jeff Squyres

On May 4, 2009, at 9:50 AM, jan wrote:

Thank you Jeff. I have passed the mail to the IB vendor Dell  
company(the

blade was ordered from Dell Taiwan), but he todl me that he didn't
understand  "layer 0 diagnostics". Coluld you help us to get more
information of "layer 0 diagnostics". Thanks again.



Layer 0 = your physical network layer.  Specifically: ensure that your  
IB network is actually functioning properly at both the physical and  
driver layer.  Cisco was an IB vendor for several years; I can tell  
you from experience that it is *not* enough to just plug everything in  
and run a few trivial tests to ensure that network traffic seems to be  
passed properly.  You need to have your vendor run a full set of layer  
0 diagnostics to ensure that all the cables are good, all the HCAs are  
good, all the drivers are functioning properly, etc.  This involves  
running diagnostic network testing patterns, checking various error  
counters on the HCAs and IB switches, etc.


This is something that Dell should know how to do.

I say all this because the problem that you are seeing *seems* to be a  
network-related problem, not an OMPI-related problem.  One can never  
know for sure, but it is fairly clear that the very first step in your  
case is to verify that the network is functioning 100% properly.   
FWIW: this was standard operating procedure when Cisco was selling IB  
hardware.


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-05-04 Thread Geoffroy Pignot
Hi Ralph

Thanks for your extra tests.  Before leaving , I just pointed out a problem
coming from running plpa across different rh distribs (<=> different Linux
kernels). Indeed, I configure and compile openmpi on rhel4 , then I run on
rhel5. I think my problem comes from this approximation. I'll do few more
tests tomorrow morning (France) and keep you inform.

Regards

Geoffroy








>
>
> Message: 2
> Date: Mon, 4 May 2009 13:34:40 -0600
> From: Ralph Castain 
> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
> To: Open MPI Users 
> Message-ID:
><71d2d8cc0905041234m76eb5a9dx57a773997779d...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hmmm...I'm afraid I can't replicate the problem. All seems to be working
> just fine on the RHEL systems available to me. The procs indeed bind to the
> specified processors in every case.
>
> rhc@odin ~/trunk]$ cat rankfile
> rank 0=odin001 slot=0
> rank 1=odin002 slot=1
>
> [rhc@odin mpi]$ mpirun -rf ../../../rankfile -n 2 --leave-session-attached
> -mca paffinity_base_verbose 5 ./mpi_spin
> [odin001.cs.indiana.edu:09297 ]
> paffinity slot assignment: slot_list == 0
> [odin001.cs.indiana.edu:09297 ]
> paffinity slot assignment: rank 0 runs on cpu #0 (#0)
> [odin002.cs.indiana.edu:13566] paffinity slot assignment: slot_list == 1
> [odin002.cs.indiana.edu:13566] paffinity slot assignment: rank 1 runs on
> cpu
> #1 (#1)
>
> Suspended
> [rhc@odin mpi]$ ssh odin001
> [rhc@odin001 ~]$ ps axo stat,user,psr,pid,pcpu,comm | grep rhc
> Srhc0  9296  0.0 orted
> RLl  rhc0  9297  100 mpi_spin
>
> [rhc@odin mpi]$ ssh odin002
> [rhc@odin002 ~]$ ps axo stat,user,psr,pid,pcpu,comm | grep rhc
> Srhc0 13562  0.0 orted
> RLl  rhc1 13566  102 mpi_spin
>
>
> Not sure where to go from here...perhaps someone else can spot the problem?
> Ralph
>
>
> On Mon, May 4, 2009 at 8:28 AM, Ralph Castain  wrote:
>
> > Unfortunately, I didn't write any of that code - I was just fixing the
> > mapper so it would properly map the procs. From what I can tell, the
> proper
> > things are happening there.
> >
> > I'll have to dig into the code that specifically deals with parsing the
> > results to bind the processes. Afraid that will take awhile longer -
> pretty
> > dark in that hole.
> >
> >
> >
> > On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot  >wrote:
> >
> >> Hi,
> >>
> >> So, there are no more crashes with my "crazy" mpirun command. But the
> >> paffinity feature seems to be broken. Indeed I am not able to pin my
> >> processes.
> >>
> >> Simple test with a program using your plpa library :
> >>
> >> r011n006% cat hostf
> >> r011n006 slots=4
> >>
> >> r011n006% cat rankf
> >> rank 0=r011n006 slot=0   > bind to CPU 0 , exact ?
> >>
> >> r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf
> --rankfile
> >> rankf --wdir /tmp -n 1 a.out
> >>  >>> PLPA Number of processors online: 4
> >>  >>> PLPA Number of processor sockets: 2
> >>  >>> PLPA Socket 0 (ID 0): 2 cores
> >>  >>> PLPA Socket 1 (ID 3): 2 cores
> >>
> >> Ctrl+Z
> >> r011n006%bg
> >>
> >> r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot
> >> R+   gpignot3  9271 97.8 a.out
> >>
> >> In fact whatever the slot number I put in my rankfile , a.out always
> runs
> >> on the CPU 3. I was looking for it on CPU 0 accordind to my cpuinfo file
> >> (see below)
> >> The result is the same if I try another syntax (rank 0=r011n006 slot=0:0
> >> bind to socket 0 - core 0  , exact ? )
> >>
> >> Thanks in advance
> >>
> >> Geoffroy
> >>
> >> PS: I run on rhel5
> >>
> >> r011n006% uname -a
> >> Linux r011n006 2.6.18-92.1.1NOMAP32.el5 #1 SMP Sat Mar 15 01:46:39 CDT
> >> 2008 x86_64 x86_64 x86_64 GNU/Linux
> >>
> >> My configure is :
> >>  ./configure --prefix=/tmp/openmpi-1.4a --libdir='${exec_prefix}/lib64'
> >> --disable-dlopen --disable-mpi-cxx --enable-heterogeneous
> >>
> >>
> >> r011n006% cat /proc/cpuinfo
> >> processor   : 0
> >> vendor_id   : GenuineIntel
> >> cpu family  : 6
> >> model   : 15
> >> model name  : Intel(R) Xeon(R) CPU5150  @ 2.66GHz
> >> stepping: 6
> >> cpu MHz : 2660.007
> >> cache size  : 4096 KB
> >> physical id : 0
> >> siblings: 2
> >> core id : 0
> >> cpu cores   : 2
> >> fpu : yes
> >> fpu_exception   : yes
> >> cpuid level : 10
> >> wp  : yes
> >> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca
> >> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm
> >> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
> >> bogomips: 5323.68
> >> clflush size: 64
> >> cache_alignment : 64
> >> address sizes   : 36 bits physical, 48 bits virtual
> >> power management:
> >>
> >> processor   : 1
> >> vendor_id   : GenuineIntel
> >> cpu family  : 6
> >> model   : 15
> >

Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Vicente Puig
But it doesn't work well.
For example, I am trying to debug a program, "floyd" in this case, and when
I make a breakpoint:

No line 26 in file "../../../gcc-4.2-20060805/libgfortran/fmain.c".

I am getting disappointed and frustrated that I can not work well with
openmpi in my Mac. There should be a was to make it run in Xcode, uff...

2009/5/4 Jeff Squyres 

> I get those as well.  I believe that they are (annoying but) harmless -- an
> artifact of how the freeware gcc/gofrtran that I use was built.
>
>
>
> On May 4, 2009, at 1:47 PM, Vicente Puig wrote:
>
>  Maybe I had to open a new thread, but if you have any idea why I receive
>> it when I use gdb for debugging an openmpi program:
>>
>> warning: Could not find object file
>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_umoddi3_s.o" - no debug
>> information available for
>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".
>>
>>
>> warning: Could not find object file
>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udiv_w_sdiv_s.o" - no
>> debug information available for
>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".
>>
>>
>> warning: Could not find object file
>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udivmoddi4_s.o" - no
>> debug information available for
>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".
>>
>>
>> warning: Could not find object file
>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2_s.o" - no debug
>> information available for
>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2.c".
>>
>>
>> warning: Could not find object file
>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2-fde-darwin_s.o"
>> - no debug information available for
>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2-fde-darwin.c".
>>
>>
>> warning: Could not find object file
>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-c_s.o" - no debug
>> information available for
>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-c.c".
>> ...
>>
>>
>>
>> There is no 'admin' so I don't know why it happen. It works well with a C
>> program.
>>
>> Any idea??.
>>
>> Thanks.
>>
>>
>> Vincent
>>
>>
>>
>>
>>
>> 2009/5/4 Vicente Puig 
>> I can run openmpi perfectly with command line, but I wanted a graphic
>> interface for debugging because I was having problems.
>>
>> Thanks anyway.
>>
>> Vincent
>>
>> 2009/5/4 Warner Yuen 
>>
>> Admittedly, I don't use Xcode to build Open MPI either.
>>
>> You can just compile Open MPI from the command line and install everything
>> in /usr/local/. Make sure that gfortran is set in your path and you should
>> just be able to do a './configure --prefix=/usr/local'
>>
>> After the installation, just make sure that your path is set correctly
>> when you go to use the newly installed Open MPI. If you don't set your path,
>> it will always default to using the version of OpenMPI that ships with
>> Leopard.
>>
>>
>> Warner Yuen
>> Scientific Computing
>> Consulting Engineer
>> Apple, Inc.
>> email: wy...@apple.com
>> Tel: 408.718.2859
>>
>>
>>
>>
>> On May 4, 2009, at 9:13 AM, users-requ...@open-mpi.org wrote:
>>
>> Send users mailing list submissions to
>>   us...@open-mpi.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>   http://www.open-mpi.org/mailman/listinfo.cgi/users
>> or, via email, send a message with subject or body 'help' to
>>   users-requ...@open-mpi.org
>>
>> You can reach the person managing the list at
>>   users-ow...@open-mpi.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of users digest..."
>>
>>
>> Today's Topics:
>>
>>  1. Re: How do I compile OpenMPI in Xcode 3.1 (Vicente Puig)
>>
>>
>> --
>>
>> Message: 1
>> Date: Mon, 4 May 2009 18:13:45 +0200
>> From: Vicente Puig 
>> Subject: Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
>> To: Open MPI Users 
>> Message-ID:
>>   <3e9a21680905040913u3f36d3c9rdcd3413bfdcd...@mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> If I can not make it work with Xcode,  which one could I use?, which one
>> do
>> you use to compile and debug OpenMPI?.
>> Thanks
>>
>> Vincent
>>
>>
>> 2009/5/4 Jeff Squyres 
>>
>> Open MPI comes pre-installed in Leopard; as Warner noted, since Leopard
>> doesn't ship with a Fortran compiler, the Open MPI that Apple ships has
>> non-functional mpif77 and mpif90 wrapper compilers.
>>
>> So the Open MPI that you installed manually will use your Fortran
>> compilers, and therefore will have functional mpif77 and mpif90 wrapper
>> compilers.  Hence, you probably need to be sure to use the "right" wrapper
>> compilers.  It looks like you specified the full path specified to
>> ExecPath,
>> so I'm not sure why Xcode wouldn't work with that (like I mentioned, I
>> unfortunately don't use Xcode myself, so I don't know why that wouldn't
>> work).
>>
>>
>>
>>
>> On May 4, 2009, at 11:53 AM, Vicente wrote:
>>
>

Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Vicente Puig
I forgot to say that "../../../gcc-4.2-20060805/libgfortran/fmain.c" is
neither the path nor the program that I am trying to debug.

2009/5/5 Vicente Puig 

> But it doesn't work well.
> For example, I am trying to debug a program, "floyd" in this case, and when
> I make a breakpoint:
>
> No line 26 in file "../../../gcc-4.2-20060805/libgfortran/fmain.c".
>
> I am getting disappointed and frustrated that I can not work well with
> openmpi in my Mac. There should be a was to make it run in Xcode, uff...
>
> 2009/5/4 Jeff Squyres 
>
>> I get those as well.  I believe that they are (annoying but) harmless --
>> an artifact of how the freeware gcc/gofrtran that I use was built.
>>
>>
>>
>> On May 4, 2009, at 1:47 PM, Vicente Puig wrote:
>>
>>  Maybe I had to open a new thread, but if you have any idea why I receive
>>> it when I use gdb for debugging an openmpi program:
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_umoddi3_s.o" - no debug
>>> information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".
>>>
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udiv_w_sdiv_s.o" - no
>>> debug information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".
>>>
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udivmoddi4_s.o" - no
>>> debug information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".
>>>
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2_s.o" - no debug
>>> information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2.c".
>>>
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2-fde-darwin_s.o"
>>> - no debug information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2-fde-darwin.c".
>>>
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-c_s.o" - no debug
>>> information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-c.c".
>>> ...
>>>
>>>
>>>
>>> There is no 'admin' so I don't know why it happen. It works well with a C
>>> program.
>>>
>>> Any idea??.
>>>
>>> Thanks.
>>>
>>>
>>> Vincent
>>>
>>>
>>>
>>>
>>>
>>> 2009/5/4 Vicente Puig 
>>> I can run openmpi perfectly with command line, but I wanted a graphic
>>> interface for debugging because I was having problems.
>>>
>>> Thanks anyway.
>>>
>>> Vincent
>>>
>>> 2009/5/4 Warner Yuen 
>>>
>>> Admittedly, I don't use Xcode to build Open MPI either.
>>>
>>> You can just compile Open MPI from the command line and install
>>> everything in /usr/local/. Make sure that gfortran is set in your path and
>>> you should just be able to do a './configure --prefix=/usr/local'
>>>
>>> After the installation, just make sure that your path is set correctly
>>> when you go to use the newly installed Open MPI. If you don't set your path,
>>> it will always default to using the version of OpenMPI that ships with
>>> Leopard.
>>>
>>>
>>> Warner Yuen
>>> Scientific Computing
>>> Consulting Engineer
>>> Apple, Inc.
>>> email: wy...@apple.com
>>> Tel: 408.718.2859
>>>
>>>
>>>
>>>
>>> On May 4, 2009, at 9:13 AM, users-requ...@open-mpi.org wrote:
>>>
>>> Send users mailing list submissions to
>>>   us...@open-mpi.org
>>>
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>   http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> or, via email, send a message with subject or body 'help' to
>>>   users-requ...@open-mpi.org
>>>
>>> You can reach the person managing the list at
>>>   users-ow...@open-mpi.org
>>>
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of users digest..."
>>>
>>>
>>> Today's Topics:
>>>
>>>  1. Re: How do I compile OpenMPI in Xcode 3.1 (Vicente Puig)
>>>
>>>
>>> --
>>>
>>> Message: 1
>>> Date: Mon, 4 May 2009 18:13:45 +0200
>>> From: Vicente Puig 
>>> Subject: Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
>>> To: Open MPI Users 
>>> Message-ID:
>>>   <3e9a21680905040913u3f36d3c9rdcd3413bfdcd...@mail.gmail.com>
>>> Content-Type: text/plain; charset="iso-8859-1"
>>>
>>> If I can not make it work with Xcode,  which one could I use?, which one
>>> do
>>> you use to compile and debug OpenMPI?.
>>> Thanks
>>>
>>> Vincent
>>>
>>>
>>> 2009/5/4 Jeff Squyres 
>>>
>>> Open MPI comes pre-installed in Leopard; as Warner noted, since Leopard
>>> doesn't ship with a Fortran compiler, the Open MPI that Apple ships has
>>> non-functional mpif77 and mpif90 wrapper compilers.
>>>
>>> So the Open MPI that you installed manually will use your Fortran
>>> compilers, and therefore will have functional mpif77 and mpif90 wrapper
>>> compilers.  Hence, you probably need to be sure to use the "r

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-05-04 Thread Gus Correa

Hi Ralph and Geffroy

I've been following this thread with a lot of interest.
Setting processor affinity and pin the processes to cores
was next on my "TODO" list, and I just started it.

I tried to use three different versions of rankfile,
with OpenMPI 1.3.1 on a dual-socket quad-core
Opteron machine.
In all cases I've got errors similar to Geoffroy's.

The mpiexec command line is:

${MPIEXEC} \
-prefix ${PREFIX} \
-np ${NP} \
-rf my_rankfile \
-mca btl openib,sm,self \
-mca mpi_leave_pinned 0 \
-mca paffinity_base_verbose 5 \
xhpl


I use Torque, and I generate the rankfile programatically based
on the $PBS_NODEFILE.

Here are three rank files I used:

#1 rankfile (trying to associate slot=physical_id:core_id from 
/proc/cpuinfo)

[gus@monk hpl]$ more my_rankfile
rank   0=node24  slot=0:0
rank   1=node24  slot=0:1
rank   2=node24  slot=0:2
rank   3=node24  slot=0:3
rank   4=node24  slot=1:0
rank   5=node24  slot=1:1
rank   6=node24  slot=1:2
rank   7=node24  slot=1:3


#2 rankfile (trying to associaate slot=processor from /proc/cpuinfo)
[gus@monk hpl]$ more my_rankfile
rank   0=node24  slot=0
rank   1=node24  slot=1
rank   2=node24  slot=2
rank   3=node24  slot=3
rank   4=node24  slot=4
rank   5=node24  slot=5
rank   6=node24  slot=6
rank   7=node24  slot=7


#3 rankfile (Similar to #1 but with "p" that the FAQs say stands for 
"physical")

[gus@monk hpl]$ more my_rankfile
rank   0=node24  slot=p0:0
rank   1=node24  slot=p0:1
rank   2=node24  slot=p0:2
rank   3=node24  slot=p0:3
rank   4=node24  slot=p1:0
rank   5=node24  slot=p1:1
rank   6=node24  slot=p1:2
rank   7=node24  slot=p1:3

***

In all cases I get this error (just like Geoffroy):

**

Rankfile claimed host node24 that was not allocated or oversubscribed 
it's slots

:

--
[node24.cluster:23762] [[59468,0],0] ORTE_ERROR_LOG: Bad parameter in 
file ../..

/../../../orte/mca/rmaps/rank_file/rmaps_rank_file.c at line 108
[node24.cluster:23762] [[59468,0],0] ORTE_ERROR_LOG: Bad parameter in 
file ../..

/../../orte/mca/rmaps/base/rmaps_base_map_job.c at line 87
[node24.cluster:23762] [[59468,0],0] ORTE_ERROR_LOG: Bad parameter in 
file ../..

/../../orte/mca/plm/base/plm_base_launch_support.c at line 77
[node24.cluster:23762] [[59468,0],0] ORTE_ERROR_LOG: Bad parameter in 
file ../..

/../../../orte/mca/plm/tm/plm_tm_module.c at line 167
--
A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
launch so we are aborting.



I confess I am a bit confused about the nomenclature.

What do you call CPU in the rankfile context?
How about slot, core, and socket?

Linux keeps information about these items in /proc/cpuinfo,
in  /sys/devices/system/cpu,
and in /sys/devices/system/nodes.
However, the nomenclature is different from OpenMPI.

How can one use that information to build a correct rankfile?
I read the mpiexec man page and the FAQs but I am still confused.

Questions:
1) In the rankfile notation slot=cpu_num, is cpu-num the same as 
"processor" in /proc/cpuinfo, or is it the same as "physical id" in 
/proc/cpuinfo?


2) In the rankfile notation slot=socket_num:core_num, is socket_num the
same as "physical id" in /proc/cpuinfo, or something else?

3) Is core_num in the rankfile notation the same as "core id" or the 
same as "processor" in /proc/cpuinfo?

Or is it yet another thing?

Geoffroy sent the /proc/cpuinfo of his
Intel dual-socket dual-core machine.
I enclose the one from my AMD dual-socket quad-core below.
The architectures (non-NUMA vs. NUMA) are different and so are the
numbering schemes:

Geoffrey's numbers go like this (each column match a single core):
processor---0-1-2-3
physical-id-0-3-0-3  (alternating physical IDs)
core0-0-1-1

Whereas my numbers go like this:
processor---0-1-2-3-4-5-6-7
physical-id-0-0-0-0-1-1-1-1 (physical IDs don't alternate)
core0-1-2-3-0-1-2-3


So, first I think a clarification about the nomenclature would
really help us build meaningful rankfiles.
I suggest to relate the names in rankfile to those in /proc/cpuinfo,
if possible (or to /sys/devices/system/cpu or /sys/devices/system/nodes).
(Other OSs may use different names though.)
The tables above show that things can get confusing to the user,
if the connection between the two is not made.

Second, as Ralph pointed out, there may be a bug to fix as well.



It would be great to have the rankfile functionality working.
However, the good news is that just setting processor affinity
works fine.
This is OK for now, since I am using the whole node.
The mpirun command line I used is :

${MPIEXEC} \
-prefix ${PREFIX} \

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-05-04 Thread Ralph Castain
Umm...actually, I said there isn't a bug to fix :-) I don't think  
there is a bug. I think it is doing what it should do.


Note that Geoffroy and I are specifically *not* talking about 1.3.1.  
We know that there are bugs in that release (specifically relating to  
multiple app_contexts, though there may be others), and in 1.3.2. We  
have been working on the OMPI trunk to fix the problems, and appear to  
have done so. Geoffroy's remaining observations are most likely due to  
building on one RHEL version and attempting to run on another.


You might try it again with the latest trunk tarball.

As for the nomenclature - that was decided by the folks who originally  
wrote that code. I don't have a personal stake in it, nor much of an  
opinion. However, note that we do differentiate between physical and  
logical cpu's. Your definitions correlate to our "physical" ones,  
while the rankfile mapping (in the absence of the 'P' qualifier)  
defaults to logical definitions. This may be the source of your  
confusion.


You might look at the paffinity documentation for a better explanation  
of physical vs logical numbering. If it isn't there or is inadequate,  
we can try to add more words - Jeff is particularly adept at doing  
so! :-)


HTH
Ralph


On May 4, 2009, at 7:49 PM, Gus Correa wrote:


Hi Ralph and Geffroy

I've been following this thread with a lot of interest.
Setting processor affinity and pin the processes to cores
was next on my "TODO" list, and I just started it.

I tried to use three different versions of rankfile,
with OpenMPI 1.3.1 on a dual-socket quad-core
Opteron machine.
In all cases I've got errors similar to Geoffroy's.

The mpiexec command line is:

${MPIEXEC} \
   -prefix ${PREFIX} \
   -np ${NP} \
-rf my_rankfile \
   -mca btl openib,sm,self \
-mca mpi_leave_pinned 0 \
-mca paffinity_base_verbose 5 \
   xhpl


I use Torque, and I generate the rankfile programatically based
on the $PBS_NODEFILE.

Here are three rank files I used:

#1 rankfile (trying to associate slot=physical_id:core_id from /proc/ 
cpuinfo)

[gus@monk hpl]$ more my_rankfile
rank   0=node24  slot=0:0
rank   1=node24  slot=0:1
rank   2=node24  slot=0:2
rank   3=node24  slot=0:3
rank   4=node24  slot=1:0
rank   5=node24  slot=1:1
rank   6=node24  slot=1:2
rank   7=node24  slot=1:3


#2 rankfile (trying to associaate slot=processor from /proc/cpuinfo)
[gus@monk hpl]$ more my_rankfile
rank   0=node24  slot=0
rank   1=node24  slot=1
rank   2=node24  slot=2
rank   3=node24  slot=3
rank   4=node24  slot=4
rank   5=node24  slot=5
rank   6=node24  slot=6
rank   7=node24  slot=7


#3 rankfile (Similar to #1 but with "p" that the FAQs say stands for  
"physical")

[gus@monk hpl]$ more my_rankfile
rank   0=node24  slot=p0:0
rank   1=node24  slot=p0:1
rank   2=node24  slot=p0:2
rank   3=node24  slot=p0:3
rank   4=node24  slot=p1:0
rank   5=node24  slot=p1:1
rank   6=node24  slot=p1:2
rank   7=node24  slot=p1:3

***

In all cases I get this error (just like Geoffroy):

**

Rankfile claimed host node24 that was not allocated or  
oversubscribed it's slots

:

--
[node24.cluster:23762] [[59468,0],0] ORTE_ERROR_LOG: Bad parameter  
in file ../..

/../../../orte/mca/rmaps/rank_file/rmaps_rank_file.c at line 108
[node24.cluster:23762] [[59468,0],0] ORTE_ERROR_LOG: Bad parameter  
in file ../..

/../../orte/mca/rmaps/base/rmaps_base_map_job.c at line 87
[node24.cluster:23762] [[59468,0],0] ORTE_ERROR_LOG: Bad parameter  
in file ../..

/../../orte/mca/plm/base/plm_base_launch_support.c at line 77
[node24.cluster:23762] [[59468,0],0] ORTE_ERROR_LOG: Bad parameter  
in file ../..

/../../../orte/mca/plm/tm/plm_tm_module.c at line 167
--
A daemon (pid unknown) died unexpectedly on signal 1  while  
attempting to

launch so we are aborting.



I confess I am a bit confused about the nomenclature.

What do you call CPU in the rankfile context?
How about slot, core, and socket?

Linux keeps information about these items in /proc/cpuinfo,
in  /sys/devices/system/cpu,
and in /sys/devices/system/nodes.
However, the nomenclature is different from OpenMPI.

How can one use that information to build a correct rankfile?
I read the mpiexec man page and the FAQs but I am still confused.

Questions:
1) In the rankfile notation slot=cpu_num, is cpu-num the same as  
"processor" in /proc/cpuinfo, or is it the same as "physical id" in / 
proc/cpuinfo?


2) In the rankfile notation slot=socket_num:core_num, is socket_num  
the

same as "physical id" in /proc/cpuinfo, or something else?

3) Is core_num in the rankfile notation the same as "core id" or the  
same as "proc