Re: [OMPI users] users Digest, Vol 1052, Issue 10

2008-10-31 Thread Allan Menezes
  
Sent by:   To 
users-bounces@ope Open MPI Users <us...@open-mpi.org> 
n-mpi.org  cc 
  
  Subject 
10/31/2008 03:38  [OMPI users] problem running Open   
PMMPI on Cells
  
  
Please respond to 
 Open MPI Users   
<users@open-mpi.o 
   rg>
  
  





Hello,

I'm having problems using Open MPI on a cluster of Mercury Computer's
Cell Accelerator Boards (CABs).

We have an MPI application that is running on multiple CABs.  The
application uses Mercury's MultiCore Framework (MCF) to use the Cell's
SPEs.  Here's the basic problem.  I can log into each CAB and run the
application in serial directly from the command line (i.e. without
using mpirun) without a problem.  I can also launch a serial job onto
each CAB from another machine using mpirun without a problem.

The problem occurs when I try to launch onto multiple CABs using
mpirun.  MCF requires a license file.  After the application
initializes MPI, it tries to initialized MCF on each node.  The
initialization routine loads the MCF license file and checks for valid
license keys.  If the keys are valid, then it continues to initialize
MCF.  If not, it throws an error.

When I run on multiple CABs, most of the time several of the CABs
throw an error saying MCF cannot find a valid license key.  The
strange this is that this behavior doesn't appear when I launch serial
jobs using MCF, only multiple CABs.  Additionally, the errors are
inconsistent.  Not all the CABs throw an error, sometimes a few of
them error out, sometimes all of them, sometimes none.

I've talked with the Mercury folks and they're just as stumped as I
am.  The only thing we can think of is that OpenMPI is somehow
modifying the environment and is interfering with MCF, but we can't
think of any reason why.

Any ideas out there?  Thanks.

Hahn

--
Hahn Kim, h...@ll.mit.edu
MIT Lincoln Laboratory
244 Wood St., Lexington, MA 02420
Tel: 781-981-0940, Fax: 781-981-5255






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
-- next part --
HTML attachment scrubbed and removed
-- next part --
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: 
<http://www.open-mpi.org/MailArchives/users/attachments/20081031/2d67d208/attachment.gif>
-- next part --
A non-text attachment was scrubbed...
Name: pic18585.gif
Type: image/gif
Size: 1255 bytes
Desc: not available
URL: 
<http://www.open-mpi.org/MailArchives/users/attachments/20081031/2d67d208/attachment-0001.gif>
------ next part --
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: 
<http://www.open-mpi.org/MailArchives/users/attachments/20081031/2d67d208/attachment-0002.gif>

--

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

End of users Digest, Vol 1052, Issue 10
***

 





Re: [OMPI users] problem running Open MPI on Cells

2008-10-31 Thread Mi Yan

Where did you put the environment variable related to  MCF licence file and
MCF share libraries?
What is your default shell?

Did you test  indicate the following?
Suppose you have 4 nodes,
on node 1,  " mpirun -np 4 --host  node1,node2,node3,node4 hostname" works,
but "mpirun -np4 --host node1,node2,node3,node4  foocbe"  does not work,
where foocbe is executable generated with MCF.

 It is possible that  MCF license is limited to a few concurrent use?  e.g.
the license is limited to 4 current use,  and mpi application  will fails
on 8 nodes?

Regards,
Mi


   
 Hahn Kim  
   
 Sent by:   To
 users-bounces@ope Open MPI Users 
 n-mpi.org  cc
   
   Subject
 10/31/2008 03:38  [OMPI users] problem running Open
 PMMPI on Cells
   
   
 Please respond to 
  Open MPI Users   
 
   
   




Hello,

I'm having problems using Open MPI on a cluster of Mercury Computer's
Cell Accelerator Boards (CABs).

We have an MPI application that is running on multiple CABs.  The
application uses Mercury's MultiCore Framework (MCF) to use the Cell's
SPEs.  Here's the basic problem.  I can log into each CAB and run the
application in serial directly from the command line (i.e. without
using mpirun) without a problem.  I can also launch a serial job onto
each CAB from another machine using mpirun without a problem.

The problem occurs when I try to launch onto multiple CABs using
mpirun.  MCF requires a license file.  After the application
initializes MPI, it tries to initialized MCF on each node.  The
initialization routine loads the MCF license file and checks for valid
license keys.  If the keys are valid, then it continues to initialize
MCF.  If not, it throws an error.

When I run on multiple CABs, most of the time several of the CABs
throw an error saying MCF cannot find a valid license key.  The
strange this is that this behavior doesn't appear when I launch serial
jobs using MCF, only multiple CABs.  Additionally, the errors are
inconsistent.  Not all the CABs throw an error, sometimes a few of
them error out, sometimes all of them, sometimes none.

I've talked with the Mercury folks and they're just as stumped as I
am.  The only thing we can think of is that OpenMPI is somehow
modifying the environment and is interfering with MCF, but we can't
think of any reason why.

Any ideas out there?  Thanks.

Hahn

--
Hahn Kim, h...@ll.mit.edu
MIT Lincoln Laboratory
244 Wood St., Lexington, MA 02420
Tel: 781-981-0940, Fax: 781-981-5255






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] Problem with openmpi version 1.3b1 beta1

2008-10-31 Thread Ralph Castain
I see you are using IPv6. From what I can tell, we do enable that  
support by default if the underlying system supports it.


My best guess is that either that support is broken (we never test it  
since none of us use IPv6), or our configure system isn't properly  
detecting that it exists.


Can you attach a copy of your config.log? It will tell us what the  
system thinks it should be building.


Thanks
Ralph

On Oct 31, 2008, at 4:54 PM, Allan Menezes wrote:


Date: Fri, 31 Oct 2008 09:34:52 -0600
From: Ralph Castain 
Subject: Re: [OMPI users] users Digest, Vol 1052, Issue 1
To: Open MPI Users 
Message-ID: <0cf28492-b13e-4f82-ac43-c1580f079...@lanl.gov>
Content-Type: text/plain; charset="us-ascii"; Format="flowed";
DelSp="yes"

It looks like the daemon isn't seeing the other interface address  
on  host x2. Can you ssh to x2 and send the contents of ifconfig -a?


Ralph

On Oct 31, 2008, at 9:18 AM, Allan Menezes wrote:



users-requ...@open-mpi.org wrote:


Send users mailing list submissions to
us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
users-requ...@open-mpi.org

You can reach the person managing the list at
users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

 1. Openmpi ver1.3beta1 (Allan Menezes)
 2. Re: Openmpi ver1.3beta1 (Ralph Castain)
 3. Re: Equivalent .h files (Benjamin Lamptey)
 4. Re: Equivalent .h files (Jeff Squyres)
 5. ompi-checkpoint is hanging (Matthias Hovestadt)
 6. unsubscibe (Bertrand P. S. Russell)
 7. Re: ompi-checkpoint is hanging (Tim Mattox)


--

Message: 1
Date: Fri, 31 Oct 2008 02:06:09 -0400
From: Allan Menezes 
Subject: [OMPI users] Openmpi ver1.3beta1
To: us...@open-mpi.org
Message-ID: 
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi,
  I built open mpi version 1.3b1 withe following cofigure command:
./configure --prefix=/opt/openmpi13b1 --enable-mpi-threads
--with-threads=posix --disable-ipv6
I have six nodes x1..6
I distributed the /opt/openmpi13b1 with scp to all other nodes  
from  the

head node
When i run the following command:
mpirun --prefix /opt/openmpi13b1  --host x1 hostname it works on x1
printing out the hostname of x1
But when i type
mpirun --prefix /opt/openmpi13b1 --host x2 hostname it hangs and  
does

not give me any output
I have a 6 node intel quad core cluster with OSCAR and pci express
gigabit ethernet for eth0
Can somebody advise?
Thank you very much.
Allan Menezes


--

Message: 2
Date: Fri, 31 Oct 2008 02:41:59 -0600
From: Ralph Castain 
Subject: Re: [OMPI users] Openmpi ver1.3beta1
To: Open MPI Users 
Message-ID: 
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

When you typed the --host x1 command, were you sitting on x1?
Likewise, when you typed the --host x2 command, were you not on   
host x2?


If the answer to both questions is "yes", then my guess is that
something is preventing you from launching a daemon on host x2. Try
adding --leave-session-attached to your cmd line and see if any  
error

messages appear. And check the FAQ for tips on how to setup for ssh
launch (I'm assuming that is what you are using).

http://www.open-mpi.org/faq/?category=rsh

Ralph

On Oct 31, 2008, at 12:06 AM, Allan Menezes wrote:




Hi Ralph,
 Yes that is true I tried both commands on x1 and ver 1.28 works   
on the same setup without a problem.

Here is the output with the added
--leave-session-attached
[allan@x1 ~]$ mpiexec --prefix /opt/openmpi13b2  --leave-session-  
attached -host x2 hostname
[x2.brampton.net:02236] [[1354,0],1]-[[1354,0],0]   
mca_oob_tcp_peer_try_connect: connect to 192.168.0.198:0 failed:   
Network is unreachable (101)
[x2.brampton.net:02236] [[1354,0],1]-[[1354,0],0]   
mca_oob_tcp_peer_try_connect: connect to 192.168.122.1:0 failed:   
Network is unreachable (101)
[x2.brampton.net:02236] [[1354,0],1] routed:binomial: Connection  
to  lifeline [[1354,0],0] lost

--
A daemon (pid 7665) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see  
above).


This may be because the daemon was unable to find all the needed   
shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to   
have the

location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.

[OMPI users] Problem with openmpi version 1.3b1 beta1

2008-10-31 Thread Allan Menezes

List-Post: users@lists.open-mpi.org
Date: Fri, 31 Oct 2008 09:34:52 -0600
From: Ralph Castain 
Subject: Re: [OMPI users] users Digest, Vol 1052, Issue 1
To: Open MPI Users 
Message-ID: <0cf28492-b13e-4f82-ac43-c1580f079...@lanl.gov>
Content-Type: text/plain; charset="us-ascii"; Format="flowed";
DelSp="yes"

It looks like the daemon isn't seeing the other interface address on  
host x2. Can you ssh to x2 and send the contents of ifconfig -a?


Ralph

On Oct 31, 2008, at 9:18 AM, Allan Menezes wrote:



users-requ...@open-mpi.org wrote:
 


Send users mailing list submissions to
us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
users-requ...@open-mpi.org

You can reach the person managing the list at
users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

  1. Openmpi ver1.3beta1 (Allan Menezes)
  2. Re: Openmpi ver1.3beta1 (Ralph Castain)
  3. Re: Equivalent .h files (Benjamin Lamptey)
  4. Re: Equivalent .h files (Jeff Squyres)
  5. ompi-checkpoint is hanging (Matthias Hovestadt)
  6. unsubscibe (Bertrand P. S. Russell)
  7. Re: ompi-checkpoint is hanging (Tim Mattox)


--

Message: 1
Date: Fri, 31 Oct 2008 02:06:09 -0400
From: Allan Menezes 
Subject: [OMPI users] Openmpi ver1.3beta1
To: us...@open-mpi.org
Message-ID: 
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi,
   I built open mpi version 1.3b1 withe following cofigure command:
./configure --prefix=/opt/openmpi13b1 --enable-mpi-threads
--with-threads=posix --disable-ipv6
I have six nodes x1..6
I distributed the /opt/openmpi13b1 with scp to all other nodes from  
the

head node
When i run the following command:
mpirun --prefix /opt/openmpi13b1  --host x1 hostname it works on x1
printing out the hostname of x1
But when i type
mpirun --prefix /opt/openmpi13b1 --host x2 hostname it hangs and does
not give me any output
I have a 6 node intel quad core cluster with OSCAR and pci express
gigabit ethernet for eth0
Can somebody advise?
Thank you very much.
Allan Menezes


--

Message: 2
Date: Fri, 31 Oct 2008 02:41:59 -0600
From: Ralph Castain 
Subject: Re: [OMPI users] Openmpi ver1.3beta1
To: Open MPI Users 
Message-ID: 
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

When you typed the --host x1 command, were you sitting on x1?
Likewise, when you typed the --host x2 command, were you not on  
host x2?


If the answer to both questions is "yes", then my guess is that
something is preventing you from launching a daemon on host x2. Try
adding --leave-session-attached to your cmd line and see if any error
messages appear. And check the FAQ for tips on how to setup for ssh
launch (I'm assuming that is what you are using).

http://www.open-mpi.org/faq/?category=rsh

Ralph

On Oct 31, 2008, at 12:06 AM, Allan Menezes wrote:


   


Hi Ralph,
  Yes that is true I tried both commands on x1 and ver 1.28 works  
on the same setup without a problem.

Here is the output with the added
--leave-session-attached
[allan@x1 ~]$ mpiexec --prefix /opt/openmpi13b2  --leave-session- 
attached -host x2 hostname
[x2.brampton.net:02236] [[1354,0],1]-[[1354,0],0]  
mca_oob_tcp_peer_try_connect: connect to 192.168.0.198:0 failed:  
Network is unreachable (101)
[x2.brampton.net:02236] [[1354,0],1]-[[1354,0],0]  
mca_oob_tcp_peer_try_connect: connect to 192.168.122.1:0 failed:  
Network is unreachable (101)
[x2.brampton.net:02236] [[1354,0],1] routed:binomial: Connection to  
lifeline [[1354,0],0] lost

--
A daemon (pid 7665) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed  
shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to  
have the

location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--
--
mpiexec noticed that the job aborted, but has no info as to the  
process

that caused that situation.
--
mpiexec: clean termination accomplished

[allan@x1 ~]$
However my main eth0 IP is 192.168.1.1 and internet gate way is  
192.168.0.1

Any solutions?
Allan Menezes



 



Re: [OMPI users] Working with a CellBlade cluster

2008-10-31 Thread Gilbert Grosdidier
OK, thanks to Mi and Jeff for their useful replies anyway.

 Gilbert.

On Fri, 31 Oct 2008, Jeff Squyres wrote:

> AFAIK, there are no parameters available to monitor IB message passing.  The
> majority of it is processed in hardware, and Linux is unaware of it.  We have
> not added any extra instrumentation into the openib BTL to provide auditing
> information, because, among other reasons, that is the performance-critical
> code path and we didn't want to add any latency in there.
> 
> The best you may be able to do is with a PMPI-based library to audit MPI
> function call invocations.
> 
> 
> On Oct 31, 2008, at 4:07 PM, Mi Yan wrote:
> 
> > Gilbert,
> > 
> > I did not know the MCA parameters that can monitor the message passing. I
> > have tried a few MCA verbose parameters and did not identify anyone helpful.
> > 
> > One way to check if the message goes via IB or SM maybe to check the
> > counters in /sys/class/infiniband.
> > 
> > Regards,
> > Mi
> > Gilbert Grosdidier 
> > 
> > 
> > Gilbert Grosdidier 
> > Sent by: users-boun...@open-mpi.org
> > 10/29/2008 12:36 PM
> > Please respond to
> > Open MPI Users 
> > 
> > To
> > 
> > Open MPI Users 
> > 
> > cc
> > 
> > 
> > Subject
> > 
> > Re: [OMPI users] Working with a CellBlade cluster
> > 
> > 
> > 
> > Thank you very much Mi and Lenny for your detailed replies.
> > 
> > I believe I can summarize the infos to allow for
> > 'Working with a QS22 CellBlade cluster' like this:
> > - Yes, messages are efficiently handled with "-mca btl openib,sm,self"
> > - Better to go to the OMPI-1.3 version ASAP
> > - It is currently more efficient/easy to use numactl to control
> > processor affinity on a QS22.
> > 
> > So far so good.
> > 
> > One question remains: how could I monitor in details message passing
> > thru IB (on one side) and thru SM (on the other side) thru the use of mca
> > parameters, please ? Additionnal info about the verbosity level
> > of this monitoring will be highly appreciated ... A lengthy travel
> > inside the list of such parameters provided by ompi_info did not
> > enlighten me (there are so many xxx_sm_yyy type params that I don't know
> > which
> > could be the right one ;-)
> > 
> > Thanks in advance for your hints,  Best Regards, Gilbert.
> > 
> > 
> > On Thu, 23 Oct 2008, Mi Yan wrote:
> > 
> > >
> > > 1.  MCA BTL parameters
> > > With "-mca btl openib,self", both message between two Cell processors on
> > > one QS22 and   messages between two QS22s go through IB.
> > >
> > > With "-mca btl openib,sm,slef",  message on one QS22 go through shared
> > > memory,  message between QS22 go through IB,
> > >
> > > Depending on the message size and other MCA parameters,  it does not
> > > guarantee message passing on shared memory is faster than on IB.   E.g.
> > > the bandwidth for 64KB message is 959MB/s on shared-memory and is 694MB/s
> > > on IB;  the bandwidth for 4MB message is 539 MB/s and 1092 MB/s on  IB.
> > > The bandwidth of 4MB message on shared memory may be higher if you tune
> > > some MCA parameter.
> > >
> > > 2.  mpi_paffinity_alone
> > >   "mpi_paffinity_alone =1"  is not a good choice for QS22.  There are two
> > > sockets with two physical  Cell/B.E. on one QS22.  Each Cell/B.E. has two
> > > SMT threads.   So there are four logical CPUs on one QS22.  CBE Linux
> > > kernel maps logical cpu 0 and 1 to socket1 and maps logical cpu 1 and 2 to
> > > socket 2.If mpi_paffinity_alone is set to 1,   the two MPI instances
> > > will be assigned to logical cpu 0 and cpu 1 on socket 1.  I believe this
> > is
> > > not what you want.
> > >
> > > A temporaily solution to  force the affinity on  QS22 is to use
> > > "numactl",   E.g.  assuming the hostname is "qs22" and the executable is
> > > "foo".  the following command can be used
> > > mpirun -np 1 -H qs22 numactl -c0 -m0  foo :   -np 1 -H
> > qs22
> > > numactl -c1 -m1 foo
> > >
> > >In the long run,  I wish CBE kernel export  CPU topology  in /sys  and
> > > use  PLPA to force the processor affinity.
> > >
> > > Best Regards,
> > > Mi
> > >
> > >
> > >
> > >
> > >  "Lenny
> > >  Verkhovsky"
> > >   > >  @gmail.com>   "Open MPI Users"
> > >  Sent by:  
> > >  users-bounces@ope  cc
> > >  n-mpi.org
> > >Subject
> > >Re: [OMPI users] Working with a
> > >  10/23/2008 05:48  CellBlade cluster
> > >  AM
> > >
> > >
> > >  Please respond to
> > >   Open MPI Users
> > >   > > rg>
> > >
> > >
> > >
> > >
> > >
> > >
> > > Hi,
> > >
> > >
> > > If I understand you correctly the most 

Re: [OMPI users] problem running Open MPI on Cells

2008-10-31 Thread Gilbert Grosdidier
Hi,

 To monitor the environment from inside the application, it could be useful to
issue a 'system("printenv")' call at the very beginning of the main program,
even before (and after, btw) the MPI_Init call, when running in serial job mode
with a single CAB, using mpirun.

 HTH,   Gilbert.

On Fri, 31 Oct 2008, Hahn Kim wrote:

> Hello,
> 
> I'm having problems using Open MPI on a cluster of Mercury Computer's Cell
> Accelerator Boards (CABs).
> 
> We have an MPI application that is running on multiple CABs.  The application
> uses Mercury's MultiCore Framework (MCF) to use the Cell's SPEs.  Here's the
> basic problem.  I can log into each CAB and run the application in serial
> directly from the command line (i.e. without using mpirun) without a problem.
> I can also launch a serial job onto each CAB from another machine using mpirun
> without a problem.
> 
> The problem occurs when I try to launch onto multiple CABs using mpirun.  MCF
> requires a license file.  After the application initializes MPI, it tries to
> initialized MCF on each node.  The initialization routine loads the MCF
> license file and checks for valid license keys.  If the keys are valid, then
> it continues to initialize MCF.  If not, it throws an error.
> 
> When I run on multiple CABs, most of the time several of the CABs throw an
> error saying MCF cannot find a valid license key.  The strange this is that
> this behavior doesn't appear when I launch serial jobs using MCF, only
> multiple CABs.  Additionally, the errors are inconsistent.  Not all the CABs
> throw an error, sometimes a few of them error out, sometimes all of them,
> sometimes none.
> 
> I've talked with the Mercury folks and they're just as stumped as I am.  The
> only thing we can think of is that OpenMPI is somehow modifying the
> environment and is interfering with MCF, but we can't think of any reason why.
> 
> Any ideas out there?  Thanks.
> 
> Hahn
> 
> --
> Hahn Kim, h...@ll.mit.edu
> MIT Lincoln Laboratory
> 244 Wood St., Lexington, MA 02420
> Tel: 781-981-0940, Fax: 781-981-5255
> 
> 
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
*-*
  Gilbert Grosdidier gilbert.grosdid...@in2p3.fr
  LAL / IN2P3 / CNRS Phone : +33 1 6446 8909
  Faculté des Sciences, Bat. 200 Fax   : +33 1 6446 8546
  B.P. 34, F-91898 Orsay Cedex (FRANCE)
 -


Re: [OMPI users] Fwd: Problems installing in Cygwin

2008-10-31 Thread Jeff Squyres

On Oct 31, 2008, at 3:20 PM, Gustavo Seabra wrote:

As Jeff mentioned this component is not required on Windows. You  
can disable
it completely in Open MPI and everything will continue to work  
correctly.
Please add --enable-mca-no-build=memory_mallopt o maybe the more  
generic (as

there is no need for any memory manager on Windows
--enable-mca-no-build=memory.


Tried, doesn't quite work:

If I configure with "--enable-mca-no-build=memory", the config dies  
with:


 *** Final output
 configure: error: conditional "OMPI_WANT_EXTERNAL_PTMALLOC2" was
never defined.
 Usually this means the macro was only invoked conditionally.


Ew, yoinks.  That's definitely a bug -- looks like we used an  
AM_CONDITIONAL inside the main configure.m4 for ptmalloc2; whoops (it  
needs to be inside MCA_memory_ptmalloc2_POST_CONFIG, not  
MCA_memory_ptmalloc2_CONFIG).  You're building up quite the bug list  
-- thanks for your patience!  It's probably unfortunately not that  
surprising, though, since we don't test on Cygwin at all... :-\



Now, if i try with "--enable-mca-no-build=memory_mallopt", the
configuration script runs just fine, but the compilation dies when
compiling "mca/paffinity/windows":

 libtool: compile:  gcc -DHAVE_CONFIG_H -I.
-I../../../../opal/include -I../../../../orte/include -I../../../..
 /ompi/include
-I../../../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../..
-D_REENTRANT -O3
 -DNDEBUG -finline-functions -fno-strict-aliasing -MT
paffinity_windows_module.lo -MD -MP -MF
 .deps/paffinity_windows_module.Tpo -c paffinity_windows_module.c
-DDLL_EXPORT -DPIC -o
 .libs/paffinity_windows_module.o
 paffinity_windows_module.c:44: error: parse error before "sys_info"

 [... and then a bunch of messages after that, all related to
paffinity_windows_module.c, which...]
 [... I think are all related to this first one...]


I do the build system stuff in OMPI, but this part is all George /  
Windows guys...  Perhaps this is a difference compiling between  
"normal" windows and Cygwin...?


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Fwd: Problems installing in Cygwin

2008-10-31 Thread George Bosilca

Gustavo,

I guess that if you disable the vt contrib package, this is take you  
one step further :) Hopefully at the end of the compile stage ... and  
at the beginning of troubles with running the cygwin parallel  
applications ...


Meanwhile, there is a special option to disable contrib packages.  
Please add --enable-contrib-no-build=vt to your configure line and  
this should do the trick.


  george.

On Oct 31, 2008, at 3:20 PM, Gustavo Seabra wrote:


On Thu, Oct 30, 2008 at 9:04 AM, George Bosilca wrote:

Hi George,

I'm sorry for taking too long to respond. As you mentioned, config
takes a veeery long time in cygwin, and then the install itself
takes many ties that :-(

As Jeff mentioned this component is not required on Windows. You  
can disable
it completely in Open MPI and everything will continue to work  
correctly.
Please add --enable-mca-no-build=memory_mallopt o maybe the more  
generic (as

there is no need for any memory manager on Windows
--enable-mca-no-build=memory.


Tried, doesn't quite work:

If I configure with "--enable-mca-no-build=memory", the config dies  
with:


 *** Final output
 configure: error: conditional "OMPI_WANT_EXTERNAL_PTMALLOC2" was
never defined.
 Usually this means the macro was only invoked conditionally.

Now, if i try with "--enable-mca-no-build=memory_mallopt", the
configuration script runs just fine, but the compilation dies when
compiling "mca/paffinity/windows":

 libtool: compile:  gcc -DHAVE_CONFIG_H -I.
-I../../../../opal/include -I../../../../orte/include -I../../../..
 /ompi/include
-I../../../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../..
-D_REENTRANT -O3
 -DNDEBUG -finline-functions -fno-strict-aliasing -MT
paffinity_windows_module.lo -MD -MP -MF
 .deps/paffinity_windows_module.Tpo -c paffinity_windows_module.c
-DDLL_EXPORT -DPIC -o
 .libs/paffinity_windows_module.o
 paffinity_windows_module.c:44: error: parse error before "sys_info"

 [... and then a bunch of messages after that, all related to
paffinity_windows_module.c, which...]
 [... I think are all related to this first one...]

Finally, I thought that I can live without processor affinity or even
memory affinity, so I tried using
" --enable-mca-no-build=memory_mallopt,maffinity,paffinity", and the
configuration went all smoothly. The compilation... You guessed, died
again. But this time it was something that had bit me before:
RTLD_NEXT, which is required by one contributed package (vt). (See my
previous message to Jeff and the list.)

My next attempt will be to remove this package, and see how far I can
get... But I'm getting there :-)

It is possible to have a native version of Open MPI on Windows.  
There are
two ways to achieve this. First, install SFU, and compile there. It  
worked
last time I checked, but it's not the solution I prefer. Second,  
you can
install the express version of the Microsoft Visual Studio (which  
is free),
and set your PATH, LIB and INCLUDE correctly to point to the  
installation,
and then you can use the cl compiler to build Open MPI even on  
Windows.


That is true, but it seems more complicated for the regular user than
installing OpenMPI (assuming I can figure out the correct combination
of options) Also, our program is actually made for unix, and as a
convenience it *can* be installed in Cygwin, but I'm not sure how it
would work with a native Windows OpenMPI.

Anyways... I fell like I'm getting closer.. Will keep trying during  
the weekend.


Thanks a lot for all the help! (That goes to Jeff too)

Cheers,
Gustavo.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] Fwd: Problems installing in Cygwin

2008-10-31 Thread Jeff Squyres
Ok, I'll CC the VT guys on the ticket and let them know.  They'll  
likely slurp in whatever fix we do for OMPI into VT.


FWIW: you can disable the VT package with:

--enable-contrib-no-build=vt


On Oct 31, 2008, at 3:02 PM, Gustavo Seabra wrote:


As I keep trying to install OpenMPI in Cygwin, I found another
instance where RTFD_NEXT is assumed to be present. Will keep trying...

Gustavo.

=
Making all in vtlib
make[5]: Entering directory
`/home/seabra/local/openmpi-1.3b1/ompi/contrib/vt/vt/vtlib'
gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib
-I../extlib/otf/otflib -I../extlib/otf/otflib -D_REENTRANT
-DBINDIR=\"/home/seabra/local/openmpi-1.3b1/bin\"
-DDATADIR=\"/home/seabra/local/openmpi-1.3b1/share/vampirtrace\" -DRFG
-DVT_BFD  -DVT_IOWRAP  -O3 -DNDEBUG -finline-functions
-fno-strict-aliasing  -MT vt_comp_gnu.o -MD -MP -MF
.deps/vt_comp_gnu.Tpo -c -o vt_comp_gnu.o vt_comp_gnu.c
mv -f .deps/vt_comp_gnu.Tpo .deps/vt_comp_gnu.Po
gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib
-I../extlib/otf/otflib -I../extlib/otf/otflib -D_REENTRANT
-DBINDIR=\"/home/seabra/local/openmpi-1.3b1/bin\"
-DDATADIR=\"/home/seabra/local/openmpi-1.3b1/share/vampirtrace\" -DRFG
-DVT_BFD  -DVT_IOWRAP  -O3 -DNDEBUG -finline-functions
-fno-strict-aliasing  -MT vt_iowrap.o -MD -MP -MF .deps/vt_iowrap.Tpo
-c -o vt_iowrap.o vt_iowrap.c
vt_iowrap.c: In function `vt_iowrap_init':
vt_iowrap.c:105: error: `RTLD_NEXT' undeclared (first use in this  
function)
vt_iowrap.c:105: error: (Each undeclared identifier is reported only  
once

vt_iowrap.c:105: error: for each function it appears in.)
vt_iowrap.c: In function `open':
vt_iowrap.c:188: error: `RTLD_NEXT' undeclared (first use in this  
function)

[...and a bunch of messages just like those last 2 lines...]
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Working with a CellBlade cluster

2008-10-31 Thread Jeff Squyres
AFAIK, there are no parameters available to monitor IB message  
passing.  The majority of it is processed in hardware, and Linux is  
unaware of it.  We have not added any extra instrumentation into the  
openib BTL to provide auditing information, because, among other  
reasons, that is the performance-critical code path and we didn't want  
to add any latency in there.


The best you may be able to do is with a PMPI-based library to audit  
MPI function call invocations.



On Oct 31, 2008, at 4:07 PM, Mi Yan wrote:


Gilbert,

I did not know the MCA parameters that can monitor the message  
passing. I have tried a few MCA verbose parameters and did not  
identify anyone helpful.


One way to check if the message goes via IB or SM maybe to check the  
counters in /sys/class/infiniband.


Regards,
Mi
Gilbert Grosdidier 


Gilbert Grosdidier 
Sent by: users-boun...@open-mpi.org
10/29/2008 12:36 PM
Please respond to
Open MPI Users 

To

Open MPI Users 

cc


Subject

Re: [OMPI users] Working with a CellBlade cluster



Thank you very much Mi and Lenny for your detailed replies.

I believe I can summarize the infos to allow for
'Working with a QS22 CellBlade cluster' like this:
- Yes, messages are efficiently handled with "-mca btl openib,sm,self"
- Better to go to the OMPI-1.3 version ASAP
- It is currently more efficient/easy to use numactl to control
processor affinity on a QS22.

So far so good.

One question remains: how could I monitor in details message passing
thru IB (on one side) and thru SM (on the other side) thru the use  
of mca

parameters, please ? Additionnal info about the verbosity level
of this monitoring will be highly appreciated ... A lengthy travel
inside the list of such parameters provided by ompi_info did not
enlighten me (there are so many xxx_sm_yyy type params that I don't  
know which

could be the right one ;-)

Thanks in advance for your hints,  Best Regards, Gilbert.


On Thu, 23 Oct 2008, Mi Yan wrote:

>
> 1.  MCA BTL parameters
> With "-mca btl openib,self", both message between two Cell  
processors on

> one QS22 and   messages between two QS22s go through IB.
>
> With "-mca btl openib,sm,slef",  message on one QS22 go through  
shared

> memory,  message between QS22 go through IB,
>
> Depending on the message size and other MCA parameters,  it does not
> guarantee message passing on shared memory is faster than on IB.
E.g.
> the bandwidth for 64KB message is 959MB/s on shared-memory and is  
694MB/s
> on IB;  the bandwidth for 4MB message is 539 MB/s and 1092 MB/s  
on  IB.
> The bandwidth of 4MB message on shared memory may be higher if you  
tune

> some MCA parameter.
>
> 2.  mpi_paffinity_alone
>   "mpi_paffinity_alone =1"  is not a good choice for QS22.  There  
are two
> sockets with two physical  Cell/B.E. on one QS22.  Each Cell/B.E.  
has two
> SMT threads.   So there are four logical CPUs on one QS22.  CBE  
Linux
> kernel maps logical cpu 0 and 1 to socket1 and maps logical cpu 1  
and 2 to
> socket 2.If mpi_paffinity_alone is set to 1,   the two MPI  
instances
> will be assigned to logical cpu 0 and cpu 1 on socket 1.  I  
believe this is

> not what you want.
>
> A temporaily solution to  force the affinity on  QS22 is to use
> "numactl",   E.g.  assuming the hostname is "qs22" and the  
executable is

> "foo".  the following command can be used
> mpirun -np 1 -H qs22 numactl -c0 -m0  foo :   -np  
1 -H qs22

> numactl -c1 -m1 foo
>
>In the long run,  I wish CBE kernel export  CPU topology  in / 
sys  and

> use  PLPA to force the processor affinity.
>
> Best Regards,
> Mi
>
>
>
>
>  "Lenny
>  Verkhovsky"
>   

>  @gmail.com>   "Open MPI Users"
>  Sent by:  
>  users- 
bounces@ope  cc

>  n-mpi.org
> 
Subject
>Re: [OMPI users] Working  
with a

>  10/23/2008 05:48  CellBlade cluster
>  AM
>
>
>  Please respond to
>   Open MPI Users
>   rg>
>
>
>
>
>
>
> Hi,
>
>
> If I understand you correctly the most suitable way to do it is by
> paffinity that we have in Open MPI 1.3 and the trank.
> how ever usually OS is distributing processes evenly between  
sockets by it

> self.
>
> There still no formal FAQ due to a multiple reasons but you can  
read how to
> use it in the attached scratch ( there were few name changings of  
the

> params, so check with ompi_info )
>
> shared memory is used between processes that share same machine,  
and openib
> is used between different machines ( hostnames ), no special mca  
params are

> needed.
>
> Best Regards
> 

[OMPI users] problem running Open MPI on Cells

2008-10-31 Thread Hahn Kim

Hello,

I'm having problems using Open MPI on a cluster of Mercury Computer's  
Cell Accelerator Boards (CABs).


We have an MPI application that is running on multiple CABs.  The  
application uses Mercury's MultiCore Framework (MCF) to use the Cell's  
SPEs.  Here's the basic problem.  I can log into each CAB and run the  
application in serial directly from the command line (i.e. without  
using mpirun) without a problem.  I can also launch a serial job onto  
each CAB from another machine using mpirun without a problem.


The problem occurs when I try to launch onto multiple CABs using  
mpirun.  MCF requires a license file.  After the application  
initializes MPI, it tries to initialized MCF on each node.  The  
initialization routine loads the MCF license file and checks for valid  
license keys.  If the keys are valid, then it continues to initialize  
MCF.  If not, it throws an error.


When I run on multiple CABs, most of the time several of the CABs  
throw an error saying MCF cannot find a valid license key.  The  
strange this is that this behavior doesn't appear when I launch serial  
jobs using MCF, only multiple CABs.  Additionally, the errors are  
inconsistent.  Not all the CABs throw an error, sometimes a few of  
them error out, sometimes all of them, sometimes none.


I've talked with the Mercury folks and they're just as stumped as I  
am.  The only thing we can think of is that OpenMPI is somehow  
modifying the environment and is interfering with MCF, but we can't  
think of any reason why.


Any ideas out there?  Thanks.

Hahn

--
Hahn Kim, h...@ll.mit.edu
MIT Lincoln Laboratory
244 Wood St., Lexington, MA 02420
Tel: 781-981-0940, Fax: 781-981-5255








Re: [OMPI users] Working with a CellBlade cluster

2008-10-31 Thread Mi Yan

Gilbert,

  I did not know the MCA parameters that can monitor the message
passing.  I have tried a few MCA verbose parameters and did not identify
anyone helpful.

 One way to check if the message goes via IB or SM maybe  to check the
counters in /sys/class/infiniband.

Regards,
Mi


   
 Gilbert   
 Grosdidier
   Open MPI Users 
 Sent by:   cc
 users-bounces@ope 
 n-mpi.org Subject
   Re: [OMPI users] Working with a 
   CellBlade cluster   
 10/29/2008 12:36  
 PM
   
   
 Please respond to 
  Open MPI Users   
 
   
   




Thank you very much Mi and Lenny for your detailed replies.

 I believe I can summarize the infos to allow for
'Working with a QS22 CellBlade cluster' like this:
- Yes, messages are efficiently handled with "-mca btl openib,sm,self"
- Better to go to the OMPI-1.3 version ASAP
- It is currently more efficient/easy to use numactl to control
processor affinity on a QS22.

 So far so good.

 One question remains: how could I monitor in details message passing
thru IB (on one side) and thru SM (on the other side) thru the use of mca
parameters, please ? Additionnal info about the verbosity level
of this monitoring will be highly appreciated ... A lengthy travel
inside the list of such parameters provided by ompi_info did not
enlighten me (there are so many xxx_sm_yyy type params that I don't know
which
could be the right one ;-)

 Thanks in advance for your hints,  Best Regards, Gilbert.


On Thu, 23 Oct 2008, Mi Yan wrote:

>
> 1.  MCA BTL parameters
> With "-mca btl openib,self", both message between two Cell processors on
> one QS22 and   messages between two QS22s go through IB.
>
> With "-mca btl openib,sm,slef",  message on one QS22 go through shared
> memory,  message between QS22 go through IB,
>
> Depending on the message size and other MCA parameters,  it does not
> guarantee message passing on shared memory is faster than on IB.   E.g.
> the bandwidth for 64KB message is 959MB/s on shared-memory and is 694MB/s
> on IB;  the bandwidth for 4MB message is 539 MB/s and 1092 MB/s on  IB.
> The bandwidth of 4MB message on shared memory may be higher if you tune
> some MCA parameter.
>
> 2.  mpi_paffinity_alone
>   "mpi_paffinity_alone =1"  is not a good choice for QS22.  There are two
> sockets with two physical  Cell/B.E. on one QS22.  Each Cell/B.E. has two
> SMT threads.   So there are four logical CPUs on one QS22.  CBE Linux
> kernel maps logical cpu 0 and 1 to socket1 and maps logical cpu 1 and 2
to
> socket 2.If mpi_paffinity_alone is set to 1,   the two MPI instances
> will be assigned to logical cpu 0 and cpu 1 on socket 1.  I believe this
is
> not what you want.
>
> A temporaily solution to  force the affinity on  QS22 is to use
> "numactl",   E.g.  assuming the hostname is "qs22" and the executable is
> "foo".  the following command can be used
> mpirun -np 1 -H qs22 numactl -c0 -m0  foo :   -np 1 -H
qs22
> numactl -c1 -m1 foo
>
>In the long run,  I wish CBE kernel export  CPU topology  in /sys  and
> use  PLPA to force the processor affinity.
>
> Best Regards,
> Mi
>
>
>
>
>  "Lenny
>  Verkhovsky"
>@gmail.com>   "Open MPI Users"
>  Sent by:  
>  users-bounces@ope
cc
>  n-mpi.org
>
Subject
>Re: [OMPI users] Working with a
>  10/23/2008 05:48  CellBlade cluster
>  AM
>
>
>  Please respond to
>   Open MPI Users
>   rg>
>
>
>
>
>
>
> Hi,
>
>
> If I understand you correctly the most suitable way to do it is by
> paffinity that we have in Open MPI 1.3 and the trank.
> 

Re: [OMPI users] MPI + Mixed language coding(Fortran90 + C++)

2008-10-31 Thread Gustavo Seabra
On Fri, Oct 31, 2008 at 3:07 PM, Rajesh Ramaya wrote:

> Actually I am
> not writing any MPI code inside? It's the executable (third party software)
> who does that part.

What are you using for this? We too use Fortran and C routines
combined with no problem at all. I would think that whatever
"third-party" software you are using here is not doing its job right.

-- 
Gustavo Seabra
Postdoctoral Associate
Quantum Theory Project - University of Florida
Gainesville - Florida - USA


Re: [OMPI users] Fwd: Problems installing in Cygwin

2008-10-31 Thread Gustavo Seabra
On Thu, Oct 30, 2008 at 9:04 AM, George Bosilca wrote:

Hi George,

I'm sorry for taking too long to respond. As you mentioned, config
takes a veeery long time in cygwin, and then the install itself
takes many ties that :-(

> As Jeff mentioned this component is not required on Windows. You can disable
> it completely in Open MPI and everything will continue to work correctly.
> Please add --enable-mca-no-build=memory_mallopt o maybe the more generic (as
> there is no need for any memory manager on Windows
> --enable-mca-no-build=memory.

Tried, doesn't quite work:

If I configure with "--enable-mca-no-build=memory", the config dies with:

  *** Final output
  configure: error: conditional "OMPI_WANT_EXTERNAL_PTMALLOC2" was
never defined.
  Usually this means the macro was only invoked conditionally.

Now, if i try with "--enable-mca-no-build=memory_mallopt", the
configuration script runs just fine, but the compilation dies when
compiling "mca/paffinity/windows":

  libtool: compile:  gcc -DHAVE_CONFIG_H -I.
-I../../../../opal/include -I../../../../orte/include -I../../../..
  /ompi/include
-I../../../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../..
-D_REENTRANT -O3
  -DNDEBUG -finline-functions -fno-strict-aliasing -MT
paffinity_windows_module.lo -MD -MP -MF
  .deps/paffinity_windows_module.Tpo -c paffinity_windows_module.c
-DDLL_EXPORT -DPIC -o
  .libs/paffinity_windows_module.o
  paffinity_windows_module.c:44: error: parse error before "sys_info"

  [... and then a bunch of messages after that, all related to
paffinity_windows_module.c, which...]
  [... I think are all related to this first one...]

Finally, I thought that I can live without processor affinity or even
memory affinity, so I tried using
" --enable-mca-no-build=memory_mallopt,maffinity,paffinity", and the
configuration went all smoothly. The compilation... You guessed, died
again. But this time it was something that had bit me before:
RTLD_NEXT, which is required by one contributed package (vt). (See my
previous message to Jeff and the list.)

My next attempt will be to remove this package, and see how far I can
get... But I'm getting there :-)

> It is possible to have a native version of Open MPI on Windows. There are
> two ways to achieve this. First, install SFU, and compile there. It worked
> last time I checked, but it's not the solution I prefer. Second, you can
> install the express version of the Microsoft Visual Studio (which is free),
> and set your PATH, LIB and INCLUDE correctly to point to the installation,
> and then you can use the cl compiler to build Open MPI even on Windows.

That is true, but it seems more complicated for the regular user than
installing OpenMPI (assuming I can figure out the correct combination
of options) Also, our program is actually made for unix, and as a
convenience it *can* be installed in Cygwin, but I'm not sure how it
would work with a native Windows OpenMPI.

Anyways... I fell like I'm getting closer.. Will keep trying during the weekend.

Thanks a lot for all the help! (That goes to Jeff too)

Cheers,
Gustavo.


Re: [OMPI users] MPI + Mixed language coding(Fortran90 + C++)

2008-10-31 Thread Rajesh Ramaya
Hello Jeff Squyres,
   Thank you very much for the immediate reply. I am able to successfully
access the data from the common block but the values are zero. In my
algorithm I even update a common block but the update made by the shared
library is not taken in to account by the executable. Can you please be very
specific how to make the parallel algorithm aware of the data? Actually I am
not writing any MPI code inside? It's the executable (third party software)
who does that part. All that I am doing is to compile my code with MPI c
compiler and add it in the LD_LIBIRARY_PATH. 
In fact I did a simple test by creating a shared library using a FORTRAN
code and the update made to the common block is taken in to account by the
executable. Is there any flag or pragma that need to be activated for mixed
language MPI?
Thank you once again for the reply.

Rajesh   

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: vendredi 31 octobre 2008 18:53
To: Open MPI Users
Subject: Re: [OMPI users] MPI + Mixed language coding(Fortran90 + C++)

On Oct 31, 2008, at 11:57 AM, Rajesh Ramaya wrote:

> I am completely new to MPI. I have a basic question concerning  
> MPI and mixed language coding. I hope any of you could help me out.  
> Is it possible to access FORTRAN common blocks in C++ in a MPI  
> compiled code. It works without MPI but as soon I switch to MPI the  
> access of common block does not work anymore.
> I have a Linux MPI executable which loads a shared library at  
> runtime and resolves all undefined symbols etc  The shared library  
> is written in C++ and the MPI executable in written in FORTRAN. Some  
> of the input that the shared library looking for are in the Fortran  
> common blocks. As I access those common blocks during runtime the  
> values are not  initialized.  I would like to know if what I am  
> doing is possible ?I hope that my problem is clear..


Generally, MPI should not get in the way of sharing common blocks  
between Fortran and C/C++.  Indeed, in Open MPI itself, we share a few  
common blocks between Fortran and the main C Open MPI implementation.

What is the exact symptom that you are seeing?  Is the application  
failing to resolve symbols at run-time, possibly indicating that  
something hasn't instantiated a common block?  Or are you able to  
successfully access the data from the common block, but it doesn't  
have the values you expect (e.g., perhaps you're seeing all zeros)?

If the former, you might want to check your build procedure.  You  
*should* be able to simply replace your C++ / F90 compilers with  
mpicxx and mpif90, respectively, and be able to build an MPI version  
of your app.  If the latter, you might need to make your parallel  
algorithm aware of what data is available in which MPI process --  
perhaps not all the data is filled in on each MPI process...?

-- 
Jeff Squyres
Cisco Systems


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Fwd: Problems installing in Cygwin

2008-10-31 Thread Gustavo Seabra
As I keep trying to install OpenMPI in Cygwin, I found another
instance where RTFD_NEXT is assumed to be present. Will keep trying...

Gustavo.

=
Making all in vtlib
make[5]: Entering directory
`/home/seabra/local/openmpi-1.3b1/ompi/contrib/vt/vt/vtlib'
gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib
-I../extlib/otf/otflib -I../extlib/otf/otflib -D_REENTRANT
-DBINDIR=\"/home/seabra/local/openmpi-1.3b1/bin\"
-DDATADIR=\"/home/seabra/local/openmpi-1.3b1/share/vampirtrace\" -DRFG
-DVT_BFD  -DVT_IOWRAP  -O3 -DNDEBUG -finline-functions
-fno-strict-aliasing  -MT vt_comp_gnu.o -MD -MP -MF
.deps/vt_comp_gnu.Tpo -c -o vt_comp_gnu.o vt_comp_gnu.c
mv -f .deps/vt_comp_gnu.Tpo .deps/vt_comp_gnu.Po
gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib
-I../extlib/otf/otflib -I../extlib/otf/otflib -D_REENTRANT
-DBINDIR=\"/home/seabra/local/openmpi-1.3b1/bin\"
-DDATADIR=\"/home/seabra/local/openmpi-1.3b1/share/vampirtrace\" -DRFG
-DVT_BFD  -DVT_IOWRAP  -O3 -DNDEBUG -finline-functions
-fno-strict-aliasing  -MT vt_iowrap.o -MD -MP -MF .deps/vt_iowrap.Tpo
-c -o vt_iowrap.o vt_iowrap.c
vt_iowrap.c: In function `vt_iowrap_init':
vt_iowrap.c:105: error: `RTLD_NEXT' undeclared (first use in this function)
vt_iowrap.c:105: error: (Each undeclared identifier is reported only once
vt_iowrap.c:105: error: for each function it appears in.)
vt_iowrap.c: In function `open':
vt_iowrap.c:188: error: `RTLD_NEXT' undeclared (first use in this function)
[...and a bunch of messages just like those last 2 lines...]


Re: [OMPI users] MPI + Mixed language coding(Fortran90 + C++)

2008-10-31 Thread Jeff Squyres

On Oct 31, 2008, at 11:57 AM, Rajesh Ramaya wrote:

I am completely new to MPI. I have a basic question concerning  
MPI and mixed language coding. I hope any of you could help me out.  
Is it possible to access FORTRAN common blocks in C++ in a MPI  
compiled code. It works without MPI but as soon I switch to MPI the  
access of common block does not work anymore.
I have a Linux MPI executable which loads a shared library at  
runtime and resolves all undefined symbols etc  The shared library  
is written in C++ and the MPI executable in written in FORTRAN. Some  
of the input that the shared library looking for are in the Fortran  
common blocks. As I access those common blocks during runtime the  
values are not  initialized.  I would like to know if what I am  
doing is possible ?I hope that my problem is clear..



Generally, MPI should not get in the way of sharing common blocks  
between Fortran and C/C++.  Indeed, in Open MPI itself, we share a few  
common blocks between Fortran and the main C Open MPI implementation.


What is the exact symptom that you are seeing?  Is the application  
failing to resolve symbols at run-time, possibly indicating that  
something hasn't instantiated a common block?  Or are you able to  
successfully access the data from the common block, but it doesn't  
have the values you expect (e.g., perhaps you're seeing all zeros)?


If the former, you might want to check your build procedure.  You  
*should* be able to simply replace your C++ / F90 compilers with  
mpicxx and mpif90, respectively, and be able to build an MPI version  
of your app.  If the latter, you might need to make your parallel  
algorithm aware of what data is available in which MPI process --  
perhaps not all the data is filled in on each MPI process...?


--
Jeff Squyres
Cisco Systems




Re: [OMPI users] ompi-checkpoint is hanging

2008-10-31 Thread Josh Hursey
After some additional testing I believe that I have been able to  
reproduce the problem. I suspect that there is a bug in the  
coordination protocol that is causing an occasional hang in the  
system. Since it only happens occasionally (though slightly more often  
on a fully loaded machine) that is probably how I missed it in my  
testing.


I'll work on a patch, and let you know when it is ready. Unfortunately  
it probably won't be for a couple weeks. :(


You can increase the verbose level for all of the fault tolerance  
frameworks and components through MCA parameters. They are referenced  
in the FT C/R User Doc on the Open MPI wiki, and you can access them  
through 'ompi-info'. You will look for the following frameworks/ 
components:

 - crs/blcr
 - snapc/full
 - crcp/bkmrk
 - opal_cr_verbose
 - orte_cr_verbose
 - ompi_cr_verbose

Thanks for the bug report. I filed a ticket in our bug tracker, and  
CC'ed you on it. The ticket is:

  http://svn.open-mpi.org/trac/ompi/ticket/1619

Cheers,
Josh

On Oct 31, 2008, at 10:51 AM, Matthias Hovestadt wrote:


Hi Tim!

First of all: thanks a lot for answering! :-)



Could you try running your two MPI jobs with fewer procs each,
say 2 or 3 each instead of 4, so that there are a few extra cores  
available.


This problem occurrs with any number of procs.

Also, what happens to the checkpointing of one MPI job if you kill  
the

other MPI job
after the first "hangs"?


Nothing, it keeps hanging.

> (It may not be a true hang, but very very slow progress that you
> are observing.)

I already waited for more than 12 hours, but the ompi-checkpoint
did not return. So if it's slow, it must be very slow.


I continued testing and just observed a case where the problem
occurred with only one job running on the compute node:

---
ccs@grid-demo-1:~$ ps auxww | grep mpirun | grep -v grep
ccs   7706  0.4  0.2  63864  2640 ?S15:35   0:00  
mpirun -np 1 -am ft-enable-cr -np 6 /home/ccs/XN-OMPI/testdrive/ 
loop-1/remotedir/mpi-x-povray +I planet.pov -w1600 -h1200 +SP1 +O  
planet.tga

ccs@grid-demo-1:~$
---

The resource management system tried to checkpoint this job using the
command "ompi-checkpoint -v --term 7706". This is the output of that
command:

---
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: Checkpointing...
[grid-demo-1.cit.tu-berlin.de:08178] PID 7706
[grid-demo-1.cit.tu-berlin.de:08178] Connected to Mpirun  
[[3623,0],0]

[grid-demo-1.cit.tu-berlin.de:08178] Terminating after checkpoint
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: notify_hnp:  
Contact Head Node Process PID 7706
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: notify_hnp:  
Requested a checkpoint of jobid [INVALID]
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: hnp_receiver:  
Receive a command message.
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: hnp_receiver:  
Status Update.
[grid-demo-1.cit.tu-berlin.de:08178] Requested -  
Global Snapshot Reference: (null)
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: hnp_receiver:  
Receive a command message.
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: hnp_receiver:  
Status Update.
[grid-demo-1.cit.tu-berlin.de:08178] Pending (Termination) -  
Global Snapshot Reference: (null)
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: hnp_receiver:  
Receive a command message.
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: hnp_receiver:  
Status Update.
[grid-demo-1.cit.tu-berlin.de:08178]   Running -  
Global Snapshot Reference: (null)

---

If I look to the activity on the node, I see that the processes
are still computing:

---
 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
7710 ccs   25   0  327m 6936 4052 R  102  0.7   4:14.17 mpi-x- 
povray
7712 ccs   25   0  327m 6884 4000 R  102  0.7   3:34.06 mpi-x- 
povray
7708 ccs   25   0  327m 6896 4012 R   66  0.7   2:42.10 mpi-x- 
povray
7707 ccs   25   0  331m  10m 3736 R   54  1.0   3:08.62 mpi-x- 
povray
7709 ccs   25   0  327m 6940 4056 R   48  0.7   1:48.24 mpi-x- 
povray
7711 ccs   25   0  327m 6724 4032 R   36  0.7   1:29.34 mpi-x- 
povray

---

Now I killed the hanging ompi-checkpoint operation and tried
to execute a checkpoint manually:

---
ccs@grid-demo-1:~$ ompi-checkpoint -v --term 7706
[grid-demo-1.cit.tu-berlin.de:08224] orte_checkpoint: Checkpointing...
[grid-demo-1.cit.tu-berlin.de:08224] PID 7706
[grid-demo-1.cit.tu-berlin.de:08224] Connected to Mpirun  
[[3623,0],0]

[grid-demo-1.cit.tu-berlin.de:08224] Terminating after checkpoint

[OMPI users] MPI + Mixed language coding(Fortran90 + C++)

2008-10-31 Thread Rajesh Ramaya
Hello MPI Users,

I am completely new to MPI. I have a basic question concerning MPI and
mixed language coding. I hope any of you could help me out. Is it possible
to access FORTRAN common blocks in C++ in a MPI compiled code. It works
without MPI but as soon I switch to MPI the access of common block does not
work anymore. 

I have a Linux MPI executable which loads a shared library at runtime and
resolves all undefined symbols etc  The shared library is written in C++ and
the MPI executable in written in FORTRAN. Some of the input that the shared
library looking for are in the Fortran common blocks. As I access those
common blocks during runtime the values are not  initialized.  I would like
to know if what I am doing is possible ?I hope that my problem is
clear..

  Your valuable suggestions are welcome !!!



Thank you,

Rajesh 





Re: [OMPI users] users Digest, Vol 1052, Issue 1

2008-10-31 Thread Ralph Castain
It looks like the daemon isn't seeing the other interface address on  
host x2. Can you ssh to x2 and send the contents of ifconfig -a?


Ralph

On Oct 31, 2008, at 9:18 AM, Allan Menezes wrote:


users-requ...@open-mpi.org wrote:

Send users mailing list submissions to
us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
users-requ...@open-mpi.org

You can reach the person managing the list at
users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

   1. Openmpi ver1.3beta1 (Allan Menezes)
   2. Re: Openmpi ver1.3beta1 (Ralph Castain)
   3. Re: Equivalent .h files (Benjamin Lamptey)
   4. Re: Equivalent .h files (Jeff Squyres)
   5. ompi-checkpoint is hanging (Matthias Hovestadt)
   6. unsubscibe (Bertrand P. S. Russell)
   7. Re: ompi-checkpoint is hanging (Tim Mattox)


--

Message: 1
Date: Fri, 31 Oct 2008 02:06:09 -0400
From: Allan Menezes 
Subject: [OMPI users] Openmpi ver1.3beta1
To: us...@open-mpi.org
Message-ID: 
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi,
I built open mpi version 1.3b1 withe following cofigure command:
./configure --prefix=/opt/openmpi13b1 --enable-mpi-threads
--with-threads=posix --disable-ipv6
I have six nodes x1..6
I distributed the /opt/openmpi13b1 with scp to all other nodes from  
the

head node
When i run the following command:
mpirun --prefix /opt/openmpi13b1  --host x1 hostname it works on x1
printing out the hostname of x1
But when i type
mpirun --prefix /opt/openmpi13b1 --host x2 hostname it hangs and does
not give me any output
I have a 6 node intel quad core cluster with OSCAR and pci express
gigabit ethernet for eth0
Can somebody advise?
Thank you very much.
Allan Menezes


--

Message: 2
Date: Fri, 31 Oct 2008 02:41:59 -0600
From: Ralph Castain 
Subject: Re: [OMPI users] Openmpi ver1.3beta1
To: Open MPI Users 
Message-ID: 
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

When you typed the --host x1 command, were you sitting on x1?
Likewise, when you typed the --host x2 command, were you not on  
host x2?


If the answer to both questions is "yes", then my guess is that
something is preventing you from launching a daemon on host x2. Try
adding --leave-session-attached to your cmd line and see if any error
messages appear. And check the FAQ for tips on how to setup for ssh
launch (I'm assuming that is what you are using).

http://www.open-mpi.org/faq/?category=rsh

Ralph

On Oct 31, 2008, at 12:06 AM, Allan Menezes wrote:



Hi Ralph,
   Yes that is true I tried both commands on x1 and ver 1.28 works  
on the same setup without a problem.

Here is the output with the added
--leave-session-attached
[allan@x1 ~]$ mpiexec --prefix /opt/openmpi13b2  --leave-session- 
attached -host x2 hostname
[x2.brampton.net:02236] [[1354,0],1]-[[1354,0],0]  
mca_oob_tcp_peer_try_connect: connect to 192.168.0.198:0 failed:  
Network is unreachable (101)
[x2.brampton.net:02236] [[1354,0],1]-[[1354,0],0]  
mca_oob_tcp_peer_try_connect: connect to 192.168.122.1:0 failed:  
Network is unreachable (101)
[x2.brampton.net:02236] [[1354,0],1] routed:binomial: Connection to  
lifeline [[1354,0],0] lost

--
A daemon (pid 7665) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed  
shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to  
have the

location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--
--
mpiexec noticed that the job aborted, but has no info as to the  
process

that caused that situation.
--
mpiexec: clean termination accomplished

[allan@x1 ~]$
However my main eth0 IP is 192.168.1.1 and internet gate way is  
192.168.0.1

Any solutions?
Allan Menezes




Hi,
  I built open mpi version 1.3b1 withe following cofigure command:
./configure --prefix=/opt/openmpi13b1 --enable-mpi-threads --with-
threads=posix --disable-ipv6
I have six nodes x1..6
I distributed the /opt/openmpi13b1 with scp to all other nodes from
the head node
When i run the following command:
mpirun --prefix /opt/openmpi13b1  --host x1 

[OMPI users] MPI_Type_create_darray causes MPI_File_set_view to crash when ndims=2, array_of_gsizes[0]>array_of_gsizes[1]

2008-10-31 Thread Antonio Molins

Hi again,

The problem in a nutshell: it looks like, when I use  
MPI_Type_create_darray with an argument array_of_gsizes where  
array_of_gsizes[0]>array_of_gsizes[1], the datatype returned goes  
through MPI_Type_commit() just fine, but then it causes  
MPI_File_set_view to crash!! Any idea as to why this is happening?


A




Antonio Molins, PhD Candidate
Medical Engineering and Medical Physics
Harvard - MIT Division of Health Sciences and Technology
--
"When a traveler reaches a fork in the road,
the ℓ1 -norm tells him to take either one way or the other,
but the ℓ2 -norm instructs him to head off into the bushes. "

John F. Claerbout and Francis Muir, 1973


*** glibc detected *** double free or corruption (!prev):  
0x00cf4130 ***

[login4:26709] *** Process received signal ***
[login4:26708] *** Process received signal ***
[login4:26708] Signal: Aborted (6)
[login4:26708] Signal code:  (-6)
[login4:26709] Signal: Segmentation fault (11)
[login4:26709] Signal code: Address not mapped (1)
[login4:26709] Failing at address: 0x18
[login4:26708] [ 0] /lib64/tls/libpthread.so.0 [0x36ff10c5b0]
[login4:26708] [ 1] /lib64/tls/libc.so.6(gsignal+0x3d) [0x36fe62e26d]
[login4:26708] [ 2] /lib64/tls/libc.so.6(abort+0xfe) [0x36fe62fa6e]
[login4:26708] [ 3] /lib64/tls/libc.so.6 [0x36fe6635f1]
[login4:26708] [ 4] /lib64/tls/libc.so.6 [0x36fe6691fe]
[login4:26708] [ 5] /lib64/tls/libc.so.6(__libc_free+0x76)  
[0x36fe669596]
[login4:26708] [ 6] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0  
[0x2a962cc4ae]
[login4:26708] [ 7] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so. 
0(ompi_ddt_destroy+0x65) [0x2a962cd31d]
[login4:26708] [ 8] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so. 
0(MPI_Type_free+0x5b) [0x2a962f654f]
[login4:26708] [ 9] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(ADIOI_Flatten+0x1804) [0x2aa4603612]
[login4:26708] [10] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(ADIOI_Flatten_datatype+0xe7) [0x2aa46017fd]
[login4:26708] [11] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(ADIO_Set_view+0x14f) [0x2aa45ecb57]
[login4:26708] [12] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(mca_io_romio_dist_MPI_File_set_view+0x1dd)  
[0x2aa46088a9]
[login4:26708] [13] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so [0x2aa45ec288]
[login4:26708] [14] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so. 
0(MPI_File_set_view+0x53) [0x2a963002ff]
[login4:26708] [15] ./bin/test2(_ZN14pMatCollection3getEiP7pMatrix 
+0xc3) [0x42a50b]

[login4:26708] [16] ./bin/test2(main+0xc2e) [0x43014a]
[login4:26708] [17] /lib64/tls/libc.so.6(__libc_start_main+0xdb)  
[0x36fe61c40b]
[login4:26708] [18] ./bin/test2(_ZNSt8ios_base4InitD1Ev+0x42)  
[0x41563a]

[login4:26708] *** End of error message ***
[login4:26709] [ 0] /lib64/tls/libpthread.so.0 [0x36ff10c5b0]
[login4:26709] [ 1] /lib64/tls/libc.so.6 [0x36fe66882b]
[login4:26709] [ 2] /lib64/tls/libc.so.6 [0x36fe668f8d]
[login4:26709] [ 3] /lib64/tls/libc.so.6(__libc_free+0x76)  
[0x36fe669596]
[login4:26709] [ 4] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0  
[0x2a962cc4ae]
[login4:26709] [ 5] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so. 
0(ompi_ddt_release_args+0x93) [0x2a962d5641]
[login4:26709] [ 6] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0  
[0x2a962cc514]
[login4:26709] [ 7] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so. 
0(ompi_ddt_release_args+0x93) [0x2a962d5641]
[login4:26709] [ 8] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0  
[0x2a962cc514]
[login4:26709] [ 9] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so. 
0(ompi_ddt_destroy+0x65) [0x2a962cd31d]
[login4:26709] [10] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so. 
0(MPI_Type_free+0x5b) [0x2a962f654f]
[login4:26709] [11] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(ADIOI_Flatten+0x147) [0x2aa4601f55]
[login4:26709] [12] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(ADIOI_Flatten+0x1569) [0x2aa4603377]
[login4:26709] [13] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(ADIOI_Flatten_datatype+0xe7) [0x2aa46017fd]
[login4:26709] [14] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(ADIO_Set_view+0x14f) [0x2aa45ecb57]
[login4:26709] [15] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(mca_io_romio_dist_MPI_File_set_view+0x1dd)  
[0x2aa46088a9]
[login4:26709] [16] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so [0x2aa45ec288]
[login4:26709] [17] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so. 
0(MPI_File_set_view+0x53) [0x2a963002ff]
[login4:26709] [18] ./bin/test2(_ZN14pMatCollection3getEiP7pMatrix 
+0xc3) [0x42a50b]

[login4:26709] [19] ./bin/test2(main+0xc2e) [0x43014a]
[login4:26709] [20] /lib64/tls/libc.so.6(__libc_start_main+0xdb)  
[0x36fe61c40b]
[login4:26709] 

Re: [OMPI users] users Digest, Vol 1052, Issue 1

2008-10-31 Thread Allan Menezes

users-requ...@open-mpi.org wrote:


Send users mailing list submissions to
us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
users-requ...@open-mpi.org

You can reach the person managing the list at
users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

  1. Openmpi ver1.3beta1 (Allan Menezes)
  2. Re: Openmpi ver1.3beta1 (Ralph Castain)
  3. Re: Equivalent .h files (Benjamin Lamptey)
  4. Re: Equivalent .h files (Jeff Squyres)
  5. ompi-checkpoint is hanging (Matthias Hovestadt)
  6. unsubscibe (Bertrand P. S. Russell)
  7. Re: ompi-checkpoint is hanging (Tim Mattox)


--

Message: 1
Date: Fri, 31 Oct 2008 02:06:09 -0400
From: Allan Menezes 
Subject: [OMPI users] Openmpi ver1.3beta1
To: us...@open-mpi.org
Message-ID: 
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi,
   I built open mpi version 1.3b1 withe following cofigure command:
./configure --prefix=/opt/openmpi13b1 --enable-mpi-threads 
--with-threads=posix --disable-ipv6

I have six nodes x1..6
I distributed the /opt/openmpi13b1 with scp to all other nodes from the 
head node

When i run the following command:
mpirun --prefix /opt/openmpi13b1  --host x1 hostname it works on x1 
printing out the hostname of x1

But when i type
mpirun --prefix /opt/openmpi13b1 --host x2 hostname it hangs and does 
not give me any output
I have a 6 node intel quad core cluster with OSCAR and pci express 
gigabit ethernet for eth0

Can somebody advise?
Thank you very much.
Allan Menezes


--

Message: 2
Date: Fri, 31 Oct 2008 02:41:59 -0600
From: Ralph Castain 
Subject: Re: [OMPI users] Openmpi ver1.3beta1
To: Open MPI Users 
Message-ID: 
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

When you typed the --host x1 command, were you sitting on x1?  
Likewise, when you typed the --host x2 command, were you not on host x2?


If the answer to both questions is "yes", then my guess is that  
something is preventing you from launching a daemon on host x2. Try  
adding --leave-session-attached to your cmd line and see if any error  
messages appear. And check the FAQ for tips on how to setup for ssh  
launch (I'm assuming that is what you are using).


http://www.open-mpi.org/faq/?category=rsh

Ralph

On Oct 31, 2008, at 12:06 AM, Allan Menezes wrote:

 


Hi Ralph,
  Yes that is true I tried both commands on x1 and ver 1.28 works on 
the same setup without a problem.
Here is the output with the added 


--leave-session-attached

[allan@x1 ~]$ mpiexec --prefix /opt/openmpi13b2  
--leave-session-attached -host x2 hostname
[x2.brampton.net:02236] [[1354,0],1]-[[1354,0],0] 
mca_oob_tcp_peer_try_connect: connect to 192.168.0.198:0 failed: Network 
is unreachable (101)
[x2.brampton.net:02236] [[1354,0],1]-[[1354,0],0] 
mca_oob_tcp_peer_try_connect: connect to 192.168.122.1:0 failed: Network 
is unreachable (101)
[x2.brampton.net:02236] [[1354,0],1] routed:binomial: Connection to 
lifeline [[1354,0],0] lost

--
A daemon (pid 7665) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--
--
mpiexec noticed that the job aborted, but has no info as to the process
that caused that situation.
--
mpiexec: clean termination accomplished

[allan@x1 ~]$
However my main eth0 IP is 192.168.1.1 and internet gate way is 192.168.0.1
Any solutions?
Allan Menezes




Hi,
 I built open mpi version 1.3b1 withe following cofigure command:
./configure --prefix=/opt/openmpi13b1 --enable-mpi-threads --with- 
threads=posix --disable-ipv6

I have six nodes x1..6
I distributed the /opt/openmpi13b1 with scp to all other nodes from  
the head node

When i run the following command:
mpirun --prefix /opt/openmpi13b1  --host x1 hostname it works on x1  
printing out the hostname of x1

But when i type
mpirun --prefix /opt/openmpi13b1 --host x2 hostname it hangs and  
does not give me any output
I have a 6 node intel quad core 

Re: [OMPI users] ompi-checkpoint is hanging

2008-10-31 Thread Matthias Hovestadt

Hi Tim!

First of all: thanks a lot for answering! :-)



Could you try running your two MPI jobs with fewer procs each,
say 2 or 3 each instead of 4, so that there are a few extra cores available.


This problem occurrs with any number of procs.


Also, what happens to the checkpointing of one MPI job if you kill the
other MPI job
after the first "hangs"?


Nothing, it keeps hanging.

> (It may not be a true hang, but very very slow progress that you
> are observing.)

I already waited for more than 12 hours, but the ompi-checkpoint
did not return. So if it's slow, it must be very slow.


I continued testing and just observed a case where the problem
occurred with only one job running on the compute node:

---
ccs@grid-demo-1:~$ ps auxww | grep mpirun | grep -v grep
ccs   7706  0.4  0.2  63864  2640 ?S15:35   0:00 mpirun 
-np 1 -am ft-enable-cr -np 6 
/home/ccs/XN-OMPI/testdrive/loop-1/remotedir/mpi-x-povray +I planet.pov 
-w1600 -h1200 +SP1 +O planet.tga

ccs@grid-demo-1:~$
---

The resource management system tried to checkpoint this job using the
command "ompi-checkpoint -v --term 7706". This is the output of that
command:

---
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: Checkpointing...
[grid-demo-1.cit.tu-berlin.de:08178] PID 7706
[grid-demo-1.cit.tu-berlin.de:08178] Connected to Mpirun [[3623,0],0]
[grid-demo-1.cit.tu-berlin.de:08178] Terminating after checkpoint
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: notify_hnp: 
Contact Head Node Process PID 7706
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: notify_hnp: 
Requested a checkpoint of jobid [INVALID]
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: hnp_receiver: 
Receive a command message.
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: hnp_receiver: 
Status Update.
[grid-demo-1.cit.tu-berlin.de:08178] Requested - Global 
Snapshot Reference: (null)
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: hnp_receiver: 
Receive a command message.
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: hnp_receiver: 
Status Update.
[grid-demo-1.cit.tu-berlin.de:08178] Pending (Termination) - Global 
Snapshot Reference: (null)
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: hnp_receiver: 
Receive a command message.
[grid-demo-1.cit.tu-berlin.de:08178] orte_checkpoint: hnp_receiver: 
Status Update.
[grid-demo-1.cit.tu-berlin.de:08178]   Running - Global 
Snapshot Reference: (null)

---

If I look to the activity on the node, I see that the processes
are still computing:

---
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 7710 ccs   25   0  327m 6936 4052 R  102  0.7   4:14.17 mpi-x-povray
 7712 ccs   25   0  327m 6884 4000 R  102  0.7   3:34.06 mpi-x-povray
 7708 ccs   25   0  327m 6896 4012 R   66  0.7   2:42.10 mpi-x-povray
 7707 ccs   25   0  331m  10m 3736 R   54  1.0   3:08.62 mpi-x-povray
 7709 ccs   25   0  327m 6940 4056 R   48  0.7   1:48.24 mpi-x-povray
 7711 ccs   25   0  327m 6724 4032 R   36  0.7   1:29.34 mpi-x-povray
---

Now I killed the hanging ompi-checkpoint operation and tried
to execute a checkpoint manually:

---
ccs@grid-demo-1:~$ ompi-checkpoint -v --term 7706
[grid-demo-1.cit.tu-berlin.de:08224] orte_checkpoint: Checkpointing...
[grid-demo-1.cit.tu-berlin.de:08224] PID 7706
[grid-demo-1.cit.tu-berlin.de:08224] Connected to Mpirun [[3623,0],0]
[grid-demo-1.cit.tu-berlin.de:08224] Terminating after checkpoint
[grid-demo-1.cit.tu-berlin.de:08224] orte_checkpoint: notify_hnp: 
Contact Head Node Process PID 7706
[grid-demo-1.cit.tu-berlin.de:08224] orte_checkpoint: notify_hnp: 
Requested a checkpoint of jobid [INVALID]

---

Is there perhaps a way of increasing the level of debug output?
Please let me know if I can support you in any way...


Best,
Matthias


Re: [OMPI users] Issues with MPI_Type_create_darray

2008-10-31 Thread Antonio Molins

Hi again,

Using MPI_Type_get_true_extent(), I changed the way of reporting type  
size and extent to:


int typesize;
long typeextent, typelb;
MPI_Type_size(this->datatype,);
MPI_Type_get_true_extent(this->datatype,,);
//MPI_Type_lb(this->datatype,);
//MPI_Type_extent(this->datatype,);
	printf("\ntype size for process rank (%d,%d) is %d doubles, type  
extent is %d doubles (up to %d), range is [%d, %d].\n",pr,pc,typesize/ 
(int)sizeof(double),(int)(typeextent/sizeof(double)),nx*ny,(int) 
(typelb/sizeof(double)),(int)((typelb+typeextent)/sizeof(double)));


Which now is giving me the correct answers for both situations. For  
the first one (works):


	type size for process rank (1,0) is 20 doubles, type extent is 60  
doubles (up to 91), range is [28, 88].
	type size for process rank (0,0) is 32 doubles, type extent is 81  
doubles (up to 91), range is [0, 81].
	type size for process rank (0,1) is 24 doubles, type extent is 80  
doubles (up to 91), range is [4, 84].
	type size for process rank (1,1) is 15 doubles, type extent is 59  
doubles (up to 91), range is [32, 91].


For the second one (before getting the same double free error with  
MPI_File_set_view):


	type size for process rank (1,0) is 20 doubles, type extent is 48  
doubles (up to 91), range is [4, 52].
	type size for process rank (0,0) is 32 doubles, type extent is 51  
doubles (up to 91), range is [0, 51].
	type size for process rank (0,1) is 24 doubles, type extent is 38  
doubles (up to 91), range is [52, 90].
	type size for process rank (1,1) is 15 doubles, type extent is 35  
doubles (up to 91), range is [56, 91].


Can anybody give me a hint here? Is there a bug in  
MPI_Type_create_darray I should be aware of?


Best,
A

On Oct 30, 2008, at 5:21 PM, Antonio Molins wrote:


Hi all,

I am having some trouble with this function. I want to map data to a  
2x2 block-cyclic configuration in C, using the code:


MPI_Barrier(blacs_comm);
// size of each matrix
int *array_of_gsizes = new int[2];
array_of_gsizes[0]=this->nx;
array_of_gsizes[1]=this->ny;
// block-cyclic distritution used by ScaLAPACK
int *array_of_distrs = new int[2];
array_of_distrs[0]=MPI_DISTRIBUTE_CYCLIC;
array_of_distrs[1]=MPI_DISTRIBUTE_CYCLIC;
int *array_of_dargs = new int[2];
array_of_dargs[0]=BLOCK_SIZE;
array_of_dargs[1]=BLOCK_SIZE;
int *array_of_psizes = new int[2];
array_of_psizes[0]=Pr;
array_of_psizes[1]=Pc;
int rank = pc+pr*Pc;
	MPI_Type_create_darray(Pr*Pc,rank, 
2,array_of_gsizes,array_of_distrs,array_of_dargs,


array_of_psizes,MPI_ORDER_C,MPI_DOUBLE,>datatype);
MPI_Type_commit(>datatype);
int typesize;
long typeextent;
MPI_Type_size(this->datatype,);
MPI_Type_extent(this->datatype,);
	printf("type size for process rank (%d,%d) is %d doubles, type  
extent is %d doubles (up to %d).",pr,pc,typesize/(int)sizeof(double), 
(int)(typeextent/sizeof(double)),nx*ny);
	MPI_File_open(blacs_comm,(char*)filename, MPI_MODE_RDWR,  
MPI_INFO_NULL, >fid);
	MPI_File_set_view(this->fid,this->offset 
+i*nx*ny*sizeof(double),MPI_DOUBLE,this- 
>datatype,"native",MPI_INFO_NULL);	



This works well when used like this, but problem is that the matrix  
itself is written in disk column-major fashion, so I would want to  
use the code as if I was reading it transposed, that is:


MPI_Barrier(blacs_comm);
// size of each matrix
int *array_of_gsizes = new int[2];
array_of_gsizes[0]=this->ny;
array_of_gsizes[1]=this->nx;
// block-cyclic distritution used by ScaLAPACK
int *array_of_distrs = new int[2];
array_of_distrs[0]=MPI_DISTRIBUTE_CYCLIC;
array_of_distrs[1]=MPI_DISTRIBUTE_CYCLIC;
int *array_of_dargs = new int[2];
array_of_dargs[0]=BLOCK_SIZE;
array_of_dargs[1]=BLOCK_SIZE;
int *array_of_psizes = new int[2];
array_of_psizes[0]=Pr;
array_of_psizes[1]=Pc;
int rank = pr+pc*Pr;
	MPI_Type_create_darray(Pr*Pc,rank, 
2,array_of_gsizes,array_of_distrs,array_of_dargs,


array_of_psizes,MPI_ORDER_C,MPI_DOUBLE,>datatype);
MPI_Type_commit(>datatype);
MPI_Type_size(this->datatype,);
MPI_Type_extent(this->datatype,);
	printf("type size for process rank (%d,%d) is %d doubles, type  
extent is %d doubles (up to %d).",pr,pc,typesize/(int)sizeof(double), 
(int)(typeextent/sizeof(double)),nx*ny);
	MPI_File_open(blacs_comm,(char*)filename, MPI_MODE_RDWR,  
MPI_INFO_NULL, >fid);
	MPI_File_set_view(this->fid,this->offset 
+i*nx*ny*sizeof(double),MPI_DOUBLE,this- 
>datatype,"native",MPI_INFO_NULL);	


To my surprise, this code crashes while calling  
MPI_File_set_view()!!! And before you ask, I did try switching  

Re: [OMPI users] ompi-checkpoint is hanging

2008-10-31 Thread Tim Mattox
Hello Matthias,
Hopefully Josh will chime in shortly.  But I have one suggestion to
help diagnose
this.  Could you try running your two MPI jobs with fewer procs each,
say 2 or 3 each instead of 4, so that there are a few extra cores available.
I know that isn't a solution, but it may help us diagnose what is going on.
(It may not be a true hang, but very very slow progress that you are observing.)

Also, what happens to the checkpointing of one MPI job if you kill the
other MPI job
after the first "hangs"?

On Fri, Oct 31, 2008 at 8:18 AM, Matthias Hovestadt
 wrote:
> Hi!
>
> I'm using the development version of OMPI from SVN (rev. 19857)
> for executing MPI jobs on my cluster system. I'm particularly using
> the checkpoint and restart feature, basing on the currentmost version
> of BLCR.
>
> The checkpointing is working pretty fine as long as I only execute
> a single job on a node. If more than one MPI application is executing
> on a system, ompi-checkpoint sometimes does not return, hanging forever.
>
>
> Example: checkpointing with a single running application
>
> I'm using the MPI-enabled flavor of Povray as demo application. So I'm
> starting it on a node using the following command.
>
>  mpirun -np 4 mpi-x-povray +I planet.pov -w1200 -h1000 +SP1 \
>  +O planet.tga
>
> This gives me 4 MPI processes, all running on the local node.
> checkpointing it with
>
>  ompi-checkpoint -v --term 7022
>
> (where 7022 is the PID of the mpirun process) gives me a checkpoint
> dataset ompi_global_snapshot_7022.ckpt, that can be used for restarting
> the job.
>
> The ompi-checkpoint command gives the following output:
>
> ---
> [grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: Checkpointing...
> [grid-demo-1.cit.tu-berlin.de:07480] PID 7022
> [grid-demo-1.cit.tu-berlin.de:07480] Connected to Mpirun [[2899,0],0]
> [grid-demo-1.cit.tu-berlin.de:07480] Terminating after checkpoint
> [grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: notify_hnp: Contact
> Head Node Process PID 7022
> [grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: notify_hnp: Requested
> a checkpoint of jobid [INVALID]
> [grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: Receive
> a command message.
> [grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: Status
> Update.
> [grid-demo-1.cit.tu-berlin.de:07480] Requested - Global
> Snapshot Reference: (null)
> [grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: Receive
> a command message.
> [grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: Status
> Update.
> [grid-demo-1.cit.tu-berlin.de:07480] Pending (Termination) - Global
> Snapshot Reference: (null)
> [grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: Receive
> a command message.
> [grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: Status
> Update.
> [grid-demo-1.cit.tu-berlin.de:07480]   Running - Global
> Snapshot Reference: (null)
> [grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: Receive
> a command message.
> [grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: Status
> Update.
> [grid-demo-1.cit.tu-berlin.de:07480] File Transfer - Global
> Snapshot Reference: (null)
> [grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: Receive
> a command message.
> [grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: Status
> Update.
> [grid-demo-1.cit.tu-berlin.de:07480]  Finished - Global
> Snapshot Reference: ompi_global_snapshot_7022.ckpt
> Snapshot Ref.:   0 ompi_global_snapshot_7022.ckpt
> ---
>
>
>
> Example: checkpointing with two running applications
>
> Similar to the first example, I'm again using the MPI-enabled flavor
> of Povray as demo application. But now, I'm not only starting a single
> Povray computation, but a second one in parallel. This gives me 8 MPI
> processes (4 processes for each MPI job), so that the 8 cores of my
> system are fully utilized
>
> Without checkpointing, these two processes are executing without any
> problem, each job resulting in a Povray image. However, if I'm using
> the ompi-checkpoint command for checkpointing one of these two jobs,
> this ompi-checkpoint is in danger of not returning.
>
> Again I'm executing
>
>  ompi-checkpoint -v --term 13572
>
> (where 13752 is the PID of the mpirun process). This command gives
> the following output, not returning back to the user:
>
> ---
> [grid-demo-1.cit.tu-berlin.de:14252] orte_checkpoint: Checkpointing...
> [grid-demo-1.cit.tu-berlin.de:14252] PID 13572
> [grid-demo-1.cit.tu-berlin.de:14252] Connected to Mpirun [[9529,0],0]
> [grid-demo-1.cit.tu-berlin.de:14252] Terminating after checkpoint
> [grid-demo-1.cit.tu-berlin.de:14252] 

[OMPI users] unsubscibe

2008-10-31 Thread Bertrand P. S. Russell
-- 
There is much pleasure to be gained from useless knowledge.
Bertrand. P. S. Russell
TROSY-NMR Lab,
Singapore.


[OMPI users] ompi-checkpoint is hanging

2008-10-31 Thread Matthias Hovestadt

Hi!

I'm using the development version of OMPI from SVN (rev. 19857)
for executing MPI jobs on my cluster system. I'm particularly using
the checkpoint and restart feature, basing on the currentmost version
of BLCR.

The checkpointing is working pretty fine as long as I only execute
a single job on a node. If more than one MPI application is executing
on a system, ompi-checkpoint sometimes does not return, hanging forever.


Example: checkpointing with a single running application

I'm using the MPI-enabled flavor of Povray as demo application. So I'm
starting it on a node using the following command.

  mpirun -np 4 mpi-x-povray +I planet.pov -w1200 -h1000 +SP1 \
  +O planet.tga

This gives me 4 MPI processes, all running on the local node.
checkpointing it with

  ompi-checkpoint -v --term 7022

(where 7022 is the PID of the mpirun process) gives me a checkpoint
dataset ompi_global_snapshot_7022.ckpt, that can be used for restarting
the job.

The ompi-checkpoint command gives the following output:

---
[grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: Checkpointing...
[grid-demo-1.cit.tu-berlin.de:07480] PID 7022
[grid-demo-1.cit.tu-berlin.de:07480] Connected to Mpirun [[2899,0],0]
[grid-demo-1.cit.tu-berlin.de:07480] Terminating after checkpoint
[grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: notify_hnp: 
Contact Head Node Process PID 7022
[grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: notify_hnp: 
Requested a checkpoint of jobid [INVALID]
[grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: 
Receive a command message.
[grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: 
Status Update.
[grid-demo-1.cit.tu-berlin.de:07480] Requested - Global 
Snapshot Reference: (null)
[grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: 
Receive a command message.
[grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: 
Status Update.
[grid-demo-1.cit.tu-berlin.de:07480] Pending (Termination) - Global 
Snapshot Reference: (null)
[grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: 
Receive a command message.
[grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: 
Status Update.
[grid-demo-1.cit.tu-berlin.de:07480]   Running - Global 
Snapshot Reference: (null)
[grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: 
Receive a command message.
[grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: 
Status Update.
[grid-demo-1.cit.tu-berlin.de:07480] File Transfer - Global 
Snapshot Reference: (null)
[grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: 
Receive a command message.
[grid-demo-1.cit.tu-berlin.de:07480] orte_checkpoint: hnp_receiver: 
Status Update.
[grid-demo-1.cit.tu-berlin.de:07480]  Finished - Global 
Snapshot Reference: ompi_global_snapshot_7022.ckpt

Snapshot Ref.:   0 ompi_global_snapshot_7022.ckpt
---



Example: checkpointing with two running applications

Similar to the first example, I'm again using the MPI-enabled flavor
of Povray as demo application. But now, I'm not only starting a single
Povray computation, but a second one in parallel. This gives me 8 MPI
processes (4 processes for each MPI job), so that the 8 cores of my
system are fully utilized

Without checkpointing, these two processes are executing without any
problem, each job resulting in a Povray image. However, if I'm using
the ompi-checkpoint command for checkpointing one of these two jobs,
this ompi-checkpoint is in danger of not returning.

Again I'm executing

  ompi-checkpoint -v --term 13572

(where 13752 is the PID of the mpirun process). This command gives
the following output, not returning back to the user:

---
[grid-demo-1.cit.tu-berlin.de:14252] orte_checkpoint: Checkpointing...
[grid-demo-1.cit.tu-berlin.de:14252] PID 13572
[grid-demo-1.cit.tu-berlin.de:14252] Connected to Mpirun [[9529,0],0]
[grid-demo-1.cit.tu-berlin.de:14252] Terminating after checkpoint
[grid-demo-1.cit.tu-berlin.de:14252] orte_checkpoint: notify_hnp: 
Contact Head Node Process PID 13572
[grid-demo-1.cit.tu-berlin.de:14252] orte_checkpoint: notify_hnp: 
Requested a checkpoint of jobid [INVALID]
[grid-demo-1.cit.tu-berlin.de:14252] orte_checkpoint: hnp_receiver: 
Receive a command message.
[grid-demo-1.cit.tu-berlin.de:14252] orte_checkpoint: hnp_receiver: 
Status Update.
[grid-demo-1.cit.tu-berlin.de:14252] Requested - Global 
Snapshot Reference: (null)
[grid-demo-1.cit.tu-berlin.de:14252] orte_checkpoint: hnp_receiver: 
Receive a command message.
[grid-demo-1.cit.tu-berlin.de:14252] orte_checkpoint: hnp_receiver: 
Status Update.
[grid-demo-1.cit.tu-berlin.de:14252] Pending (Termination) - Global 
Snapshot Reference: (null)

Re: [OMPI users] Equivalent .h files

2008-10-31 Thread Jeff Squyres
The Open MPI that ships with Leopard does not include Fortran support  
because OS X does not ship with a Fortran compiler (this was Apple's  
decision, not ours).  If you have Fortran MPI applications, you'll  
need to a) download and install your own Fortran compiler (e.g., http://hpc.sf.net/) 
, and b) install your own copy Open MPI that includes Fortran support  
(e.g., install it to /opt/openmpi or somesuch -- I do not recommend  
installing it over the system-installed Open MPI).


Once you do this, mpif90 should work as expected, and statements like  
"use mpi" or "include "mpifh."" should function properly.



On Oct 31, 2008, at 5:48 AM, Benjamin Lamptey wrote:


Hello again,
I have to be more specific with my problem.

1) I am using the Mac OS X (Leopard) operating system.
When I do uname -a, I get Darwin Kernel Version 9.5.0

2) My code if fortran 90

3) I tried using the mpif90 wrapper and I got the following message

x
mpif90  -c -O3   /Users/lamptey/projectb/src/blag_real_burnmpi.f90
--
Unfortunately, this installation of Open MPI was not compiled with
Fortran 90 support.  As such, the mpif90 compiler is non-functional.

--
make: *** [blag_real_burnmpi.o] Error 1
x

4) I have the g95 compiler installed. So when I try using the
g95, (with include "mpif.h" or 'mpif.h'), I get the following mesage:

xx
g95 -fno-pic -c -O3   /Users/lamptey/projectb/src/ 
blag_real_burnmpi.f90

Error: Can't open included file 'mpif.h'
make: *** [blag_real_burnmpi.o] Error 1
xxx

5) What are people's experience in this case?

Thanks
Ben

On Thu, Oct 30, 2008 at 2:33 PM, Benjamin Lamptey  
 wrote:

Hello,
I am new at using open-mpi and will like to know something basic.

What is the equivalent of the "mpif.h" in open-mpi which is normally  
"included" at

the beginning of mpi codes (fortran in this case).

I shall appreciate that for cpp as well.

Thanks
Ben

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Equivalent .h files

2008-10-31 Thread Benjamin Lamptey
Hello again,
I have to be more specific with my problem.

1) I am using the Mac OS X (Leopard) operating system.
When I do uname -a, I get Darwin Kernel Version 9.5.0

2) My code if fortran 90

3) I tried using the mpif90 wrapper and I got the following message

x
mpif90  -c -O3   /Users/lamptey/projectb/src/blag_real_burnmpi.f90
--
Unfortunately, this installation of Open MPI was not compiled with
Fortran 90 support.  As such, the mpif90 compiler is non-functional.

--
make: *** [blag_real_burnmpi.o] Error 1
x

4) I have the g95 compiler installed. So when I try using the
g95, (with include "mpif.h" or 'mpif.h'), I get the following mesage:

xx
g95 -fno-pic -c -O3   /Users/lamptey/projectb/src/blag_real_burnmpi.f90
Error: Can't open included file 'mpif.h'
make: *** [blag_real_burnmpi.o] Error 1
xxx

5) What are people's experience in this case?

Thanks
Ben

On Thu, Oct 30, 2008 at 2:33 PM, Benjamin Lamptey wrote:

> Hello,
> I am new at using open-mpi and will like to know something basic.
>
> What is the equivalent of the "mpif.h" in open-mpi which is normally
> "included" at
> the beginning of mpi codes (fortran in this case).
>
> I shall appreciate that for cpp as well.
>
> Thanks
> Ben
>


Re: [OMPI users] Openmpi ver1.3beta1

2008-10-31 Thread Ralph Castain
When you typed the --host x1 command, were you sitting on x1?  
Likewise, when you typed the --host x2 command, were you not on host x2?


If the answer to both questions is "yes", then my guess is that  
something is preventing you from launching a daemon on host x2. Try  
adding --leave-session-attached to your cmd line and see if any error  
messages appear. And check the FAQ for tips on how to setup for ssh  
launch (I'm assuming that is what you are using).


http://www.open-mpi.org/faq/?category=rsh

Ralph

On Oct 31, 2008, at 12:06 AM, Allan Menezes wrote:


Hi,
  I built open mpi version 1.3b1 withe following cofigure command:
./configure --prefix=/opt/openmpi13b1 --enable-mpi-threads --with- 
threads=posix --disable-ipv6

I have six nodes x1..6
I distributed the /opt/openmpi13b1 with scp to all other nodes from  
the head node

When i run the following command:
mpirun --prefix /opt/openmpi13b1  --host x1 hostname it works on x1  
printing out the hostname of x1

But when i type
mpirun --prefix /opt/openmpi13b1 --host x2 hostname it hangs and  
does not give me any output
I have a 6 node intel quad core cluster with OSCAR and pci express  
gigabit ethernet for eth0

Can somebody advise?
Thank you very much.
Allan Menezes
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Openmpi ver1.3beta1

2008-10-31 Thread Allan Menezes

Hi,
   I built open mpi version 1.3b1 withe following cofigure command:
./configure --prefix=/opt/openmpi13b1 --enable-mpi-threads 
--with-threads=posix --disable-ipv6

I have six nodes x1..6
I distributed the /opt/openmpi13b1 with scp to all other nodes from the 
head node

When i run the following command:
mpirun --prefix /opt/openmpi13b1  --host x1 hostname it works on x1 
printing out the hostname of x1

But when i type
mpirun --prefix /opt/openmpi13b1 --host x2 hostname it hangs and does 
not give me any output
I have a 6 node intel quad core cluster with OSCAR and pci express 
gigabit ethernet for eth0

Can somebody advise?
Thank you very much.
Allan Menezes