Re: [OMPI users] Searching the FAQ

2010-01-25 Thread Jeff Squyres
On Jan 25, 2010, at 5:38 PM, Gus Correa wrote:

> A) Keep the FAQ, please!

No worries -- I am not asking about removing the FAQ.  I was more asking if 
people wanted the *form* of the FAQ would be useful in a different form.

> B) Add an "ALL FAQ" category, to make keyword search easier
> on web browsers.

Hmm.  Not a bad idea.  Probably not hard to do.

> C) Please write the (long overdue) FAQ set about the
> OpenMPI collectives!
> 
> I asked before, and I beg for it again:
> Please write a set of FAQs about
> OpenMPI collectives, and how to tune them up.

George Bosilca is the owner of this one -- he's the guy with all the 
knowledge...

> Which algorithms are available for each collective?
> What is the rationale behind these algorithms?
> What is the default algorithm used by each collective?
> How do you enforce the use of a certain collective algorithm?
> What are the pros and cons of hardwiring
> a choice of collective algorithm?
> How to tune up the collective algorithms to your application and to
> your hardware?

George -- if you write something up, I can word smyth it to put it into nice 
FAQ prose...

-- 
Jeff Squyres
jsquy...@cisco.com




[OMPI users] Can I start MPI_Spawn child processes early?

2010-01-25 Thread Jaison Paul

Hi All,

I am trying to use MPI for scientific High Performance (hpc) 
applications. I use MPI_Spawn to create child processes. Is there a way 
to start child processes early than the parent process, using MPI_Spawn?


I want this because, my experiments showed that the time to spawn the 
children by parent is too long for HPC apps which slows down the whole 
process. If the children are ready when parent application process seeks 
for them, that initial delay can be avoided. Is there a way to do that?


Thanks in advance,

Jaison
Australian National University


Re: [OMPI users] Searching the FAQ

2010-01-25 Thread Gus Correa

Hi Jeff

Thanks for your "RFC on FAQs"!

There go my two cents:

A) Keep the FAQ, please!

I am a big fan of the OpenMPI FAQ.
I use them all the time.
I also recommend them to everybody on this list and on other lists.
I've seen a lot of people do the same.
In the absence of more comprehensive documentation,
the FAQ is the resource we all count on to fix mistakes,
look for a forgotten syntax, setup our computers to work properly
with OpenMPI, learn a new concept, etc.

So, whatever you do, please don't do away with the FAQs,
unless you already have more comprehensive documentation
ready to replace the FAQ.

B) Add an "ALL FAQ" category, to make keyword search easier
on web browsers.

Keyword search of the FAQ is a bit cumbersome when one
has 26 different FAQ categories / web pages to search for.

A very simple / minimal effort way to allow web search
of the whole FAQ set would be to add the "ALL FAQ"
category to the current FAQ categories list (maybe on the very top
of the list).
The "ALL FAQ" page would concatenate all of your FAQ HTML files, 
allowing keyword search across all FAQs in any web browser.


One doesn't need to be fancy and stylish to be effective.

C) Please write the (long overdue) FAQ set about the
OpenMPI collectives!

I asked before, and I beg for it again:
Please write a set of FAQs about
OpenMPI collectives, and how to tune them up.

The current resource available to learn about collectives
are several sparse postings on the mailing list archive.
Despite the interesting questions posed and the generous answers 
provided about collectives on the mailing list,

they don't form a coherent elucidating body,
and are not easy to follow.

Some questions about collectives that in one way
or another have been asked on the list:

Which algorithms are available for each collective?
What is the rationale behind these algorithms?
What is the default algorithm used by each collective?
How do you enforce the use of a certain collective algorithm?
What are the pros and cons of hardwiring
a choice of collective algorithm?
How to tune up the collective algorithms to your application and to
your hardware?
And more ...

Cheers,
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-


Jeff Squyres wrote:

I have some simple questions for all you users out there about the OMPI FAQ.  I 
ask because we see a LOT of you end up on the OMPI FAQ in our web statistics 
(most users who search either end up on the FAQ and/or on the web archives of 
the mailing list).  Hence, I'd like to know if we can improve the FAQ from a 
usability standpoint.

1. Is the FAQ useful in its current form?  More specifically:

- I personally find it a little difficult to web search for something and then 
end up on a single FAQ page with a LOT of information on it (e.g., the text for 
all the questions/answers in that category).  I.e., if I'm searching for 
something specific, it would be useful to end up on a page with *just that one 
FAQ question/answer*.

- OTOH, if I don't know exactly what I'm looking for, it is useful to see a 
whole page of FAQ questions and answers so that I can scan through them all to 
find what I'm looking for (vs. clicking through a million different individual 
pages).

2. We wrote all the PHP for the OMPI FAQ ourselves (it's not driven by a 
database; the content is all in individual text files).  Back when we started, 
we surveyed the web FAQ systems and found each of them lacking for one reason 
or another (I don't remember the details), and therefore wrote our own PHP 
stuff.  Do people have other FAQ web systems that they'd recommend these days?

3. Are there other features from an FAQ that you would like to see in the OMPI 
FAQ?

I ask these questions because a) the current system has annoyed me a few too 
many times recently for various limitations, and b) I'm wondering if there is 
something better out there -- better searching, more web-2.0-ish, ...whatever.  
We're certainly not tied to the existing FAQ system -- the current set of 
questions and answers is fairly easy to extract from the PHP, so we could move 
it to another system if it would be desirable.





Re: [OMPI users] ABI stabilization/versioning

2010-01-25 Thread Jed Brown
On Mon, 25 Jan 2010 15:10:12 -0500, Jeff Squyres  wrote:
> Indeed.  Our wrapper compilers currently explicitly list all 3
> libraries (-lmpi -lopen-rte -lopen-pal) because we don't know if those
> libraries will be static or shared at link time.

I am suggesting that it is unavoidable for the person doing the linking
to be explicit about whether they want static or dynamic libs when they
invoke mpicc.  Consider the pkg-config model where you might write

  gcc -static -o my-app main.o `pkg-config --libs --static openmpi fftw3`

  gcc -o my-app main.o `pkg-config --libs openmpi fftw3`

In MPI world,

  gcc -static -o my-app main.o `mpicc -showme:link-static` `pkg-config --libs 
--static fftw3`

  gcc -o my-app main.o `mpicc -showme:link` `pkg-config --libs fftw3`

seems tolerable.  The trick (as you point out) is to get the option
processed when the wrapper is being invoked as the compiler instead of
just for the -showme options.  Possible options are defining an
OMPI_STATIC environment variable or inspecting argv for --link:static
(or some such).  This is one many the reasons why wrappers are a
horrible solution, especially when they are expected to be used in
nontrivial cases.

Ideally, the adopted plan could be done in some coordination with MPICH2
(which lacks a -showme:link analogue) so that it is not so hard to write
portable build systems.

> > On the cited bug report, I just wanted to note that collapsing
> > libopen-rte and libopen-pal (even only in production builds) has the
> > undesirable effect that their ABI cannot change without incrementing
> > the soname of libmpi (i.e. user binaries are coupled just as tightly
> > to these libraries as when they were separate but linked explicitly,
> > so this offers no benefit at all).
> 
> Indeed -- this is exactly the reason we ended up leaving libopen-* .so
> versions at 0:0:0.

But not versioning those libs isn't much of a solution either since it
becomes possible to get an ABI mismatch at runtime (consider someone who
uses them independently, or if they are packaged separately as in a
distribution so that it becomes possible to update these out from
underneath libmpi).

> There's an additional variable -- we had considered collapsing all 3
> libraries into libmpi for production builds,

My point was that this is no solution at all since you have to bump the
soname any time you change libopen-*.  So even users who NEVER call into
libopen-* have to relink any time something happens there, despite their
interface not changing.  And that is exactly the situation if the
wrappers continue to overlink AND libopen-* became versioned, so at
least by keeping them separate, you give users the option of not
overlinking (albeit manually) and the option of using libopen-* without
libmpi.

> Yuck.

It's 2010 and we still don't have a standard way to represent link
dependencies (pkg-config might be the closest thing, but it's bad if you
have multiple versions of the same library, and the granularity is
wrong, e.g. if you want to link some exotic lib statically and the
common ones dynamically).

Jed


Re: [OMPI users] checkpointing multi node and multi process applications

2010-01-25 Thread Josh Hursey
Actually, let me roll that back a bit. I was preparing a custom patch  
for the v1.4 series, and it seems that the code does not have the bug  
I mentioned. It is only the v1.5 and trunk that were effected by this.  
The v1.4 series should be fine.


I will still ask that the error message fix be brought over to the  
v1.4 branch, but it is unlikely to fix your problem. However it would  
be useful to know if upgrading to the trunk or v1.5 series fixes this  
problem. The v1.4 series has an old version of the file and metadata  
handling mechanisms, so I am encouraging people to move to the v1.5  
series if possible.


-- Josh

On Jan 25, 2010, at 3:33 PM, Josh Hursey wrote:

So while working on the error message, I noticed that the global  
coordinator was using the wrong path to investigate the checkpoint  
metadata. This particular section of code is not often used (which  
is probably why I could not reproduce). I just committed a fix to  
the Open MPI development trunk:

 https://svn.open-mpi.org/trac/ompi/changeset/22479

Additionally, I am asking for this to be brought over to the v1.4  
and v1.5 release branches:

 https://svn.open-mpi.org/trac/ompi/ticket/2195
 https://svn.open-mpi.org/trac/ompi/ticket/2196

It seems to solve the problem as I could reproduce it. Can you try  
the trunk (either SVN checkout or nightly tarball from tonight) and  
check if this solves your problem?


Cheers,
Josh

On Jan 25, 2010, at 12:14 PM, Josh Hursey wrote:

I am not able to reproduce this problem with the 1.4 branch using a  
hostfile, and node configuration like you mentioned.


I suspect that the error is caused by a failed local checkpoint.  
The error message is triggered when the global coordinator (located  
in 'mpirun') tries to read the metadata written by the application  
in the local snapshot. If the global coordinator cannot properly  
read the metadata, then it will print a variety of error messages  
depending on what is going wrong.


If these are the only two errors produced, then this typically  
means that the local metadata file has been found, but is empty/ 
corrupted. Can you send me the contents of the local checkpoint  
metadata file:
shell$ cat GLOBAL_SNAPSHOT_DIR/ompi_global_snapshot_YYY.ckpt/0/ 
opal_snapshot_0.ckpt/snapshot_meta.data


It should look something like:
-
#
# PID: 23915
# Component: blcr
# CONTEXT: ompi_blcr_context.23915
-

It may also help to see the following metadata file as well:
shell$ cat GLOBAL_SNAPSHOT_DIR/ompi_global_snapshot_YYY.ckpt/ 
global_snapshot_meta.data



If there are other errors printed by the process, that would  
potentially indicate a different problem. So if there are, let me  
know.


This error message should be a bit more specific about which  
process checkpoint is causing the problem, and what the this  
usually indicates. I filed a bug to cleanup the error:

https://svn.open-mpi.org/trac/ompi/ticket/2190

-- Josh

On Jan 21, 2010, at 8:27 AM, Jean Potsam wrote:


Hi Josh/all,

I have upgraded the openmpi to v 1.4  but still get the same error  
when I try executing the application on multiple nodes:


***
Error: expected_component: PID information unavailable!
Error: expected_component: Component Name information unavailable!
***

I am running my application from the node 'portal11' as follows:

mpirun -am ft-enable-cr -np 2 --hostfile hosts  myapp.

The file 'hosts' contains two host names: portal10, portal11.

I am triggering the checkpoint using ompi-checkpoint -v 'PID' from  
portal11.



I configured open mpi as follows:

#

./configure --prefix=/home/jean/openmpi/ --enable-picky --enable- 
debug --enable-mpi-profile --enable-mpi-cxx --enable-pretty-print- 
stacktrace --enable-binaries --enable-trace --enable-static=yes -- 
enable-debug --with-devel-headers=1 --with-mpi-param-check=always  
--with-ft=cr --enable-ft-thread --with-blcr=/usr/local/blcr/ -- 
with-blcr-libdir=/usr/local/blcr/lib --enable-mpi-threads=yes

#

Question:

what do you think can be wrong? Please instruct me on how to  
resolve this problem.


Thank you

Jean




--- On Mon, 11/1/10, Josh Hursey  wrote:

From: Josh Hursey 
Subject: Re: [OMPI users] checkpointing multi node and multi  
process applications

To: "Open MPI Users" 
Date: Monday, 11 January, 2010, 21:42


On Dec 19, 2009, at 7:42 AM, Jean Potsam wrote:

> Hi Everyone,
>I am trying to checkpoint an mpi  
application running on multiple nodes. However, I get some error  
messages when i trigger the checkpointing process.

>
> Error: expected_component: PID information unavailable!
> Error: expected_component: Component Name information unavailable!
>
> I am using  open mpi 1.3 and blcr 0.8.1

Can you try the v1.4 release and see if the problem persists?

>
> I execute my application as follows:
>
> mpirun -am ft-enable-cr -np 3 --hostfile host

Re: [OMPI users] checkpointing multi node and multi process applications

2010-01-25 Thread Josh Hursey
So while working on the error message, I noticed that the global  
coordinator was using the wrong path to investigate the checkpoint  
metadata. This particular section of code is not often used (which is  
probably why I could not reproduce). I just committed a fix to the  
Open MPI development trunk:

  https://svn.open-mpi.org/trac/ompi/changeset/22479

Additionally, I am asking for this to be brought over to the v1.4 and  
v1.5 release branches:

  https://svn.open-mpi.org/trac/ompi/ticket/2195
  https://svn.open-mpi.org/trac/ompi/ticket/2196

It seems to solve the problem as I could reproduce it. Can you try the  
trunk (either SVN checkout or nightly tarball from tonight) and check  
if this solves your problem?


Cheers,
Josh

On Jan 25, 2010, at 12:14 PM, Josh Hursey wrote:

I am not able to reproduce this problem with the 1.4 branch using a  
hostfile, and node configuration like you mentioned.


I suspect that the error is caused by a failed local checkpoint. The  
error message is triggered when the global coordinator (located in  
'mpirun') tries to read the metadata written by the application in  
the local snapshot. If the global coordinator cannot properly read  
the metadata, then it will print a variety of error messages  
depending on what is going wrong.


If these are the only two errors produced, then this typically means  
that the local metadata file has been found, but is empty/corrupted.  
Can you send me the contents of the local checkpoint metadata file:
 shell$ cat GLOBAL_SNAPSHOT_DIR/ompi_global_snapshot_YYY.ckpt/0/ 
opal_snapshot_0.ckpt/snapshot_meta.data


It should look something like:
-
#
# PID: 23915
# Component: blcr
# CONTEXT: ompi_blcr_context.23915
-

It may also help to see the following metadata file as well:
shell$ cat GLOBAL_SNAPSHOT_DIR/ompi_global_snapshot_YYY.ckpt/ 
global_snapshot_meta.data



If there are other errors printed by the process, that would  
potentially indicate a different problem. So if there are, let me  
know.


This error message should be a bit more specific about which process  
checkpoint is causing the problem, and what the this usually  
indicates. I filed a bug to cleanup the error:

 https://svn.open-mpi.org/trac/ompi/ticket/2190

-- Josh

On Jan 21, 2010, at 8:27 AM, Jean Potsam wrote:


Hi Josh/all,

I have upgraded the openmpi to v 1.4  but still get the same error  
when I try executing the application on multiple nodes:


***
Error: expected_component: PID information unavailable!
Error: expected_component: Component Name information unavailable!
***

I am running my application from the node 'portal11' as follows:

mpirun -am ft-enable-cr -np 2 --hostfile hosts  myapp.

The file 'hosts' contains two host names: portal10, portal11.

I am triggering the checkpoint using ompi-checkpoint -v 'PID' from  
portal11.



I configured open mpi as follows:

#

./configure --prefix=/home/jean/openmpi/ --enable-picky --enable- 
debug --enable-mpi-profile --enable-mpi-cxx --enable-pretty-print- 
stacktrace --enable-binaries --enable-trace --enable-static=yes -- 
enable-debug --with-devel-headers=1 --with-mpi-param-check=always -- 
with-ft=cr --enable-ft-thread --with-blcr=/usr/local/blcr/ --with- 
blcr-libdir=/usr/local/blcr/lib --enable-mpi-threads=yes

#

Question:

what do you think can be wrong? Please instruct me on how to  
resolve this problem.


Thank you

Jean




--- On Mon, 11/1/10, Josh Hursey  wrote:

From: Josh Hursey 
Subject: Re: [OMPI users] checkpointing multi node and multi  
process applications

To: "Open MPI Users" 
Date: Monday, 11 January, 2010, 21:42


On Dec 19, 2009, at 7:42 AM, Jean Potsam wrote:

> Hi Everyone,
>I am trying to checkpoint an mpi  
application running on multiple nodes. However, I get some error  
messages when i trigger the checkpointing process.

>
> Error: expected_component: PID information unavailable!
> Error: expected_component: Component Name information unavailable!
>
> I am using  open mpi 1.3 and blcr 0.8.1

Can you try the v1.4 release and see if the problem persists?

>
> I execute my application as follows:
>
> mpirun -am ft-enable-cr -np 3 --hostfile hosts gol.
>
> My question:
>
> Does openmpi with blcr support checkpointing of multi node  
execution of mpi application? If so, can you provide me with some  
information on how to achieve this.


Open MPI is able to checkpoint a multi-node application (that's  
what it was designed to do). There are some examples at the link  
below:

 http://www.osl.iu.edu/research/ft/ompi-cr/examples.php

-- Josh

>
> Cheers,
>
> Jean.
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/

Re: [OMPI users] [ompi-1.4.1] compiling without openib, running with openib + ompi141 and gcc3

2010-01-25 Thread Jeff Squyres
On Jan 25, 2010, at 11:58 AM, Mathieu Gontier wrote:

> I built OpenMPI-1.4.1 without openib support with the following configuration 
> options:
> 
> ./configure 
> --prefix=/develop/libs/OpenMPI/openmpi-1.4.1/LINUX_GCC_4_1_tcp_mach 
> --enable-static --enable-shared --enable-cxx-exceptions --enable-mpi-f77 
> --disable-mpi-f90 --enable-mpi-cxx --disable-mpi-cxx-seek --enable-dist 
> --enable-mpi-profile --enable-binaries --enable-mpi-threads 
> --enable-memchecker --disable-debug --with-pic --with-threads   --with-sge

Note that you should not use --enable-dist.  --enable-dist is used by the OMPI 
maintainers ONLY when generating official downloadable tarballs.  It is *NOT* 
guaranteed to make sane / correct builds for general purpose runs.  Here's what 
./configure --help says about --enable-dist:

  --enable-dist   guarantee that that the "dist" make target will be
  functional, although may not guarantee that any
  other make target will be functional.

Specifically: --enable-dist allows some configure tests to "pass" even though 
they shouldn't.  For example, I don't have MX installed on my systems.  But 
with --enable-dist, the MX tests in OMPI's configure script will "pass" just 
enough so that I can "make dist" to generate a tarball and still include all 
the MX plugin source code.  

> On my cluster, I run a small test (a broadcast on a 100 integer array) on 12 
> processes balanced on 3 nodes, but I asked for using openib. It works with 
> the following messages:
> 
> mpirun -np 12 -hostfile /tmp/72936.1.64.q/machines --mca btl openib,sm,self 
> /home/numeca/tmp/gontier/bcast/exe_ompi_cluster -nloop 2 -nbuff 100

Is your PATH and LD_LIBRARY_PATH set correctly such that you'll find the 
"right" ones (i.e., the ones that you just built/installed in 
/develop/libs/OpenMPI/openmpi-1.4.1/LINUX_GCC_4_1_tcp_mach)?  I.e., is it 
possible that you're finding some other OMPI install that has OpenFabrics 
support?

Further, did you ever previously install Open MPI into that prefix and include 
OpenFabrics support?  I ask because OMPI's OpenFabrics support is in the form 
of a plugin -- if you simply installed another copy of OMPI into the same 
prefix without uninstalling first, the OpenFabrics plugin could still have been 
left in the tree, and therefore used at run time.

Finally, note that you didn't tell Open MPI to *NOT* build OpenFabrics support. 
 In this case, OMPI's configure script looks for OpenFabrics support, and if it 
finds it, builds it.  But if it doesn't find OpenFabrics support (and you 
didn't specifically ask for it), it just skips it and keeps going.  You might 
want to look through the output of OMPI's configure and see if it found 
OpenFabrics support and therefore decided to build it.

> I finally run ompi_info:
> 
> ./ompi_info | grep openib
>  MCA btl: openib (MCA v2.0, API v2.0, Component v1.4.1)
> 
> Openib seems to be supported. That is weird because I did not ask for...

Yep; see above.

> So, assuming the compilation of OpenMPI which does not support openib here, 
> what happened? Was tcp selected? How can I check which device has been used 
> (or force an explicit message)?

Unfortunately, OMPI currently lacks a good message indicating which device is 
used at run-time (because it's actually a surprisingly complex issue, since 
OMPI chooses a communication device based on which peer it's talking to, among 
other reasons).  We hope to have a good message in sometime in the OMPI 1.5 
series.

> By the way, what is the meaning of this message in my case?

Do you mean this message?

-
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   node005
  Local device: mthca0
-

If so, it means that Open MPI was unable to initialize the InfiniBand HCA known 
as "mthca0" on the server known as node005.  

The RLIMIT messages are likely symptoms of the issue; you likely need to set 
your registered memory limits to "unlimited".  See the OMPI FAQ in the 
OpenFabrics section for questions about registered memory limits for 
instructions how.

> By the way, another different think: does OpenMPI must be compiled with 
> gcc-4.1 or later, or gcc-3.4 (for example) can be used? 

gcc 3.4 should be fine.

-- 
Jeff Squyres
jsquy...@cisco.com




Re: [OMPI users] Problems building Open MPI 1.4.1 with Pathscale

2010-01-25 Thread Åke Sandgren
1 - Do you have problems with openmpi 1.4 too? (I don't, haven't built
1.4.1 yet)
2 - There is a bug in the pathscale compiler with -fPIC and -g that
generates incorrect dwarf2 data so debuggers get really confused and
will have BIG problems debugging the code. I'm chasing them to get a
fix...
3 - Do you have an example code that have problems?

On Mon, 2010-01-25 at 15:01 -0500, Jeff Squyres wrote:
> I'm afraid I don't have any clues offhand.  We *have* had problems with the 
> Pathscale compiler in the past that were never resolved by their support 
> crew.  However, they were of the "variables weren't initialized and the 
> process generally aborts" kind of failure, not a "persistent hang" kind of 
> failure.
> 
> Can you tell where in MPI_Init the process is hanging?  E.g., can you build 
> Open MPI with debugging enabled (such as by passing CFLAGS=-g to OMPI's 
> configure line) and then attach a debugger to a hung process and see what 
> it's stuck on?
> 
> 
> On Jan 25, 2010, at 7:52 AM, Rafael Arco Arredondo wrote:
> 
> > Hello:
> > 
> > I'm having some issues with Open MPI 1.4.1 and Pathscale compiler
> > (version 3.2). Open MPI builds successfully with the following configure
> > arguments:
> > 
> > ./configure --with-openib=/usr --with-openib-libdir=/usr/lib64
> > --with-sge --enable-static CC=pathcc CXX=pathCC F77=pathf90 F90=pathf90
> > FC=pathf90
> > 
> > (we have OpenFabrics 1.2 Infiniband drivers, by the way)
> > 
> > However, applications hang on MPI_Init (or maybe MPI_Comm_rank or
> > MPI_Comm_size, a basic hello-world anyway doesn't print 'Hello World
> > from node...'). I tried running them with and without SGE. Same result.
> > 
> > This hello-world works flawlessly when I build Open MPI with gcc:
> > 
> > ./configure --with-openib=/usr --with-openib-libdir=/usr/lib64
> > --with-sge --enable-static
> > 
> > This successful execution runs in one machine only, so it shouldn't use
> > Infiniband, and it also works when several nodes are used.
> > 
> > I was able to build previous versions of Open MPI with Pathscale (1.2.6
> > and 1.3.2, particularly). I tried building version 1.4.1 both with
> > Pathscale 3.2 and Pathscale 3.1. No difference.
> > 
> > Any ideas?
> > 
> > Thank you in advance,
> > 
> > Rafa
> > 
> > --
> > Rafael Arco Arredondo
> > Centro de Servicios de Informática y Redes de Comunicaciones
> > Universidad de Granada
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> 
> 



Re: [OMPI users] ABI stabilization/versioning

2010-01-25 Thread Jeff Squyres
On Jan 25, 2010, at 12:55 PM, Jed Brown wrote:

> > The short version is that the possibility of static linking really
> > fouls up the scheme, and we haven't figured out a good way around this
> > yet.  :-(
> 
> So pkg-config addresses this with it's Libs.private field and an
> explicit command-line argument when you want static libs, e.g.
> 
>   $ pkg-config --libs libavcodec
>   -lavcodec 
>   $ pkg-config --libs --static libavcodec
>   -pthread -lavcodec -lz -lbz2 -lfaac -lfaad -lmp3lame -lopencore-amrnb 
> -lopencore-amrwb -ltheoraenc -ltheoradec -lvorbisenc -lvorbis -logg -lx264 
> -lm -lxvidcore -ldl -lasound -lavutil
> 
> There is no way to simultaneously (a) prevent overlinking shared libs
> and (b) correctly link static libs without an explicit statement from
> the user about whether to link *your library* statically or dynamically.

Indeed.  Our wrapper compilers currently explicitly list all 3 libraries (-lmpi 
-lopen-rte -lopen-pal) because we don't know if those libraries will be static 
or shared at link time.  If they're shared, then listing -lmpi should be 
sufficient because its implicit dependencies should be sufficient to pull in 
the other two (and therefore libopen-rte and libopen-pal can have their own, 
independent .so version numbers.  yay!).  But if they're static, then libmpi 
has no implicit dependencies, and you *have* to list all clauses (-lmpi 
-lopen-rte -lopen-pal).  

We did not want our wrapper compilers to get in the business of:

- attempting to divine whether the link will be static or dynamic (e.g., could 
be as "simple" [read: not really] as parsing argv, but could be as difficult as 
reading compiler config files).

- figuring out shared library filenames (e.g., .so, .dylib, .dll, ...etc.).

Yuck.

> Unfortunately, pkgconfig doesn't work well with multiple builds of a
> package, and doesn't know how to link some libs statically and some
> dynamically.
> 
> On the cited bug report, I just wanted to note that collapsing
> libopen-rte and libopen-pal (even only in production builds) has the
> undesirable effect that their ABI cannot change without incrementing the
> soname of libmpi (i.e. user binaries are coupled just as tightly to
> these libraries as when they were separate but linked explicitly, so
> this offers no benefit at all).

Indeed -- this is exactly the reason we ended up leaving libopen-* .so versions 
at 0:0:0.  

There's an additional variable -- we had considered collapsing all 3 libraries 
into libmpi for production builds, but the problem here is that multiple 
external projects have starting using libopen-rte and libopen-pal independently 
of libmpi.  Hence, we can't just make those libraries disappear.  :-\  The 
developers of those external projects don't want a big monolithic library to 
link against, particularly when they have nothing to do with MPI.

Yuck.

-- 
Jeff Squyres
jsquy...@cisco.com




Re: [OMPI users] Problems building Open MPI 1.4.1 with Pathscale

2010-01-25 Thread Jeff Squyres
I'm afraid I don't have any clues offhand.  We *have* had problems with the 
Pathscale compiler in the past that were never resolved by their support crew.  
However, they were of the "variables weren't initialized and the process 
generally aborts" kind of failure, not a "persistent hang" kind of failure.

Can you tell where in MPI_Init the process is hanging?  E.g., can you build 
Open MPI with debugging enabled (such as by passing CFLAGS=-g to OMPI's 
configure line) and then attach a debugger to a hung process and see what it's 
stuck on?


On Jan 25, 2010, at 7:52 AM, Rafael Arco Arredondo wrote:

> Hello:
> 
> I'm having some issues with Open MPI 1.4.1 and Pathscale compiler
> (version 3.2). Open MPI builds successfully with the following configure
> arguments:
> 
> ./configure --with-openib=/usr --with-openib-libdir=/usr/lib64
> --with-sge --enable-static CC=pathcc CXX=pathCC F77=pathf90 F90=pathf90
> FC=pathf90
> 
> (we have OpenFabrics 1.2 Infiniband drivers, by the way)
> 
> However, applications hang on MPI_Init (or maybe MPI_Comm_rank or
> MPI_Comm_size, a basic hello-world anyway doesn't print 'Hello World
> from node...'). I tried running them with and without SGE. Same result.
> 
> This hello-world works flawlessly when I build Open MPI with gcc:
> 
> ./configure --with-openib=/usr --with-openib-libdir=/usr/lib64
> --with-sge --enable-static
> 
> This successful execution runs in one machine only, so it shouldn't use
> Infiniband, and it also works when several nodes are used.
> 
> I was able to build previous versions of Open MPI with Pathscale (1.2.6
> and 1.3.2, particularly). I tried building version 1.4.1 both with
> Pathscale 3.2 and Pathscale 3.1. No difference.
> 
> Any ideas?
> 
> Thank you in advance,
> 
> Rafa
> 
> --
> Rafael Arco Arredondo
> Centro de Servicios de Informática y Redes de Comunicaciones
> Universidad de Granada
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


-- 
Jeff Squyres
jsquy...@cisco.com




Re: [OMPI users] ABI stabilization/versioning

2010-01-25 Thread Jed Brown
On Mon, 25 Jan 2010 09:09:47 -0500, Jeff Squyres  wrote:
> The short version is that the possibility of static linking really
> fouls up the scheme, and we haven't figured out a good way around this
> yet.  :-(

So pkg-config addresses this with it's Libs.private field and an
explicit command-line argument when you want static libs, e.g.

  $ pkg-config --libs libavcodec
  -lavcodec  
  $ pkg-config --libs --static libavcodec
  -pthread -lavcodec -lz -lbz2 -lfaac -lfaad -lmp3lame -lopencore-amrnb 
-lopencore-amrwb -ltheoraenc -ltheoradec -lvorbisenc -lvorbis -logg -lx264 -lm 
-lxvidcore -ldl -lasound -lavutil

There is no way to simultaneously (a) prevent overlinking shared libs
and (b) correctly link static libs without an explicit statement from
the user about whether to link *your library* statically or dynamically.

Unfortunately, pkgconfig doesn't work well with multiple builds of a
package, and doesn't know how to link some libs statically and some
dynamically.


On the cited bug report, I just wanted to note that collapsing
libopen-rte and libopen-pal (even only in production builds) has the
undesirable effect that their ABI cannot change without incrementing the
soname of libmpi (i.e. user binaries are coupled just as tightly to
these libraries as when they were separate but linked explicitly, so
this offers no benefit at all).

Jed


[OMPI users] Searching the FAQ

2010-01-25 Thread Jeff Squyres
I have some simple questions for all you users out there about the OMPI FAQ.  I 
ask because we see a LOT of you end up on the OMPI FAQ in our web statistics 
(most users who search either end up on the FAQ and/or on the web archives of 
the mailing list).  Hence, I'd like to know if we can improve the FAQ from a 
usability standpoint.

1. Is the FAQ useful in its current form?  More specifically:

- I personally find it a little difficult to web search for something and then 
end up on a single FAQ page with a LOT of information on it (e.g., the text for 
all the questions/answers in that category).  I.e., if I'm searching for 
something specific, it would be useful to end up on a page with *just that one 
FAQ question/answer*.

- OTOH, if I don't know exactly what I'm looking for, it is useful to see a 
whole page of FAQ questions and answers so that I can scan through them all to 
find what I'm looking for (vs. clicking through a million different individual 
pages).

2. We wrote all the PHP for the OMPI FAQ ourselves (it's not driven by a 
database; the content is all in individual text files).  Back when we started, 
we surveyed the web FAQ systems and found each of them lacking for one reason 
or another (I don't remember the details), and therefore wrote our own PHP 
stuff.  Do people have other FAQ web systems that they'd recommend these days?

3. Are there other features from an FAQ that you would like to see in the OMPI 
FAQ?

I ask these questions because a) the current system has annoyed me a few too 
many times recently for various limitations, and b) I'm wondering if there is 
something better out there -- better searching, more web-2.0-ish, ...whatever.  
We're certainly not tied to the existing FAQ system -- the current set of 
questions and answers is fairly easy to extract from the PHP, so we could move 
it to another system if it would be desirable.

-- 
Jeff Squyres
jsquy...@cisco.com




Re: [OMPI users] checkpointing multi node and multi process applications

2010-01-25 Thread Josh Hursey
I am not able to reproduce this problem with the 1.4 branch using a  
hostfile, and node configuration like you mentioned.


I suspect that the error is caused by a failed local checkpoint. The  
error message is triggered when the global coordinator (located in  
'mpirun') tries to read the metadata written by the application in the  
local snapshot. If the global coordinator cannot properly read the  
metadata, then it will print a variety of error messages depending on  
what is going wrong.


If these are the only two errors produced, then this typically means  
that the local metadata file has been found, but is empty/corrupted.  
Can you send me the contents of the local checkpoint metadata file:
  shell$ cat GLOBAL_SNAPSHOT_DIR/ompi_global_snapshot_YYY.ckpt/0/ 
opal_snapshot_0.ckpt/snapshot_meta.data


It should look something like:
-
#
# PID: 23915
# Component: blcr
# CONTEXT: ompi_blcr_context.23915
-

It may also help to see the following metadata file as well:
 shell$ cat GLOBAL_SNAPSHOT_DIR/ompi_global_snapshot_YYY.ckpt/ 
global_snapshot_meta.data



If there are other errors printed by the process, that would  
potentially indicate a different problem. So if there are, let me know.


This error message should be a bit more specific about which process  
checkpoint is causing the problem, and what the this usually  
indicates. I filed a bug to cleanup the error:

  https://svn.open-mpi.org/trac/ompi/ticket/2190

-- Josh

On Jan 21, 2010, at 8:27 AM, Jean Potsam wrote:


Hi Josh/all,

I have upgraded the openmpi to v 1.4  but still get the same error  
when I try executing the application on multiple nodes:


***
 Error: expected_component: PID information unavailable!
 Error: expected_component: Component Name information unavailable!
***

I am running my application from the node 'portal11' as follows:

mpirun -am ft-enable-cr -np 2 --hostfile hosts  myapp.

The file 'hosts' contains two host names: portal10, portal11.

I am triggering the checkpoint using ompi-checkpoint -v 'PID' from  
portal11.



I configured open mpi as follows:

#

./configure --prefix=/home/jean/openmpi/ --enable-picky --enable- 
debug --enable-mpi-profile --enable-mpi-cxx --enable-pretty-print- 
stacktrace --enable-binaries --enable-trace --enable-static=yes -- 
enable-debug --with-devel-headers=1 --with-mpi-param-check=always -- 
with-ft=cr --enable-ft-thread --with-blcr=/usr/local/blcr/ --with- 
blcr-libdir=/usr/local/blcr/lib --enable-mpi-threads=yes

#

Question:

what do you think can be wrong? Please instruct me on how to resolve  
this problem.


Thank you

Jean




--- On Mon, 11/1/10, Josh Hursey  wrote:

From: Josh Hursey 
Subject: Re: [OMPI users] checkpointing multi node and multi process  
applications

To: "Open MPI Users" 
Date: Monday, 11 January, 2010, 21:42


On Dec 19, 2009, at 7:42 AM, Jean Potsam wrote:

> Hi Everyone,
>I am trying to checkpoint an mpi  
application running on multiple nodes. However, I get some error  
messages when i trigger the checkpointing process.

>
> Error: expected_component: PID information unavailable!
> Error: expected_component: Component Name information unavailable!
>
> I am using  open mpi 1.3 and blcr 0.8.1

Can you try the v1.4 release and see if the problem persists?

>
> I execute my application as follows:
>
> mpirun -am ft-enable-cr -np 3 --hostfile hosts gol.
>
> My question:
>
> Does openmpi with blcr support checkpointing of multi node  
execution of mpi application? If so, can you provide me with some  
information on how to achieve this.


Open MPI is able to checkpoint a multi-node application (that's what  
it was designed to do). There are some examples at the link below:

  http://www.osl.iu.edu/research/ft/ompi-cr/examples.php

-- Josh

>
> Cheers,
>
> Jean.
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] [ompi-1.4.1] compiling without openib, running with openib + ompi141 and gcc3

2010-01-25 Thread Mathieu Gontier





Hello, 

I built OpenMPI-1.4.1 without openib support with the following
configuration options:

./configure
--prefix=/develop/libs/OpenMPI/openmpi-1.4.1/LINUX_GCC_4_1_tcp_mach
--enable-static --enable-shared --enable-cxx-exceptions
--enable-mpi-f77 --disable-mpi-f90 --enable-mpi-cxx
--disable-mpi-cxx-seek --enable-dist --enable-mpi-profile
--enable-binaries --enable-mpi-threads --enable-memchecker
--disable-debug --with-pic --with-threads   --with-sge

On my cluster, I run a small test (a broadcast on a 100 integer array)
on 12 processes balanced on 3 nodes, but I asked for using openib. It
works with the following messages:

mpirun -np 12 -hostfile
/tmp/72936.1.64.q/machines --mca btl openib,sm,self
/home/numeca/tmp/gontier/bcast/exe_ompi_cluster -nloop 2 -nbuff 100

libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
--
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   node005
  Local device: mthca0
--
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
processing...
done
[node005:04791] 11 more processes have sent help message
help-mpi-btl-openib.txt / error in device init
[node005:04791] Set MCA parameter "orte_base_help_aggregate" to 0 to
see all help / error messages

I finally run ompi_info:

./ompi_info | grep openib
 MCA btl: openib (MCA v2.0, API v2.0, Component v1.4.1)

Openib seems to be supported. That is weird because I did not ask for...
So, assuming the compilation of OpenMPI which does not support openib
here, what happened? Was tcp selected? How can I check which device has
been used (or force an explicit message)?
By the way, what is the meaning of this message in my case?

By the way, another different think: does OpenMPI must be compiled with
gcc-4.1 or later, or gcc-3.4 (for example) can be used? 

Thank you for your help, 
Mathieu.










Re: [OMPI users] Checkpoint/Restart error

2010-01-25 Thread Josh Hursey
I tested the 1.4.1 release, and everything worked fine for me (tested  
a few different configurations of nodes/environments).


The ompi-checkpoint error you cited is usually caused by one of two  
things:
 - The PID specified is wrong (which I don't think that is the case  
here)

 - The session directory cannot be found in /tmp.

So I think the problem is the latter. The session directory looks  
something like:

  /tmp/openmpi-sessions-USERNAME@LOCALHOST_0
Within this directory the mpirun process places its contact  
information. ompi-checkpoint uses this contact information to connect  
to the job. If it cannot find it, then it errors out. (We definitely  
need a better error message here. I filed a ticket [1]).


We usually do not recommend running Open MPI as a root user. So I  
would strongly recommend that you do not run as a root user.


With a regular user, check the location of the session directory. Make  
sure that it is in /tmp on the node where 'mpirun' and 'ompi- 
checkpoint' are run.


-- Josh

[1] https://svn.open-mpi.org/trac/ompi/ticket/2189

On Jan 25, 2010, at 5:48 AM, Andreea Costea wrote:


So? anyone? any clue?

Summarize:
- installed OpenMPI 1.4.1 on fresh Centos 5
- mpirun works but ompi-checkpoint throws this error:
ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line 405
- on another VM I have OpenMPI 1.3.3. installed. Checkpointing works  
fine on guest but has the previous mentioned error on root. Both  
root and guest show the same output after "param -all -all" except  
for the $HOME (which only matters for mca_component_path,  
mca_param_files, snapc_base_global_snapshot_dir)



Thanks,
Andreea


On Tue, Jan 19, 2010 at 9:01 PM, Andreea Costea > wrote:
I noticed one more thing. As I still have some VMs that have OpenMPI  
version 1.3.3 installed I started to use those machines 'till I fix  
the problem with 1.4.1 And while checkpointing on one of this VMs I  
realized that checkpointing as a guest works fine and checkpointing  
as a root outputs the same error like in 1.4.1. : ORTE_ERROR_LOG:  
Not found in file orte-checkpoint.c at line 405


I logged the outputs of "ompi_info --param all all" which I run for  
root and for another user and the only differences were at these  
parameters:


mca_component_path
mca_param_files
snapc_base_global_snapshot_dir

All 3 params differ because of the $HOME.
One more thing: I don't have the directory $HOME/.openmpi

Ideas?

Thanks,
Andreea





On Tue, Jan 19, 2010 at 12:51 PM, Andreea Costea > wrote:
Well... I decided to install a fresh OS to be sure that there is no  
OpenMPI version conflict. So I formatted one of my VMs, did a fresh  
CentOS install, installed BLCR 0.8.2 and OpenMPI 1.4.1 and the  
result: the same. mpirun works but ompi-checkpoint has that error at  
line 405:


[[35906,0],0] ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at  
line 405


As for the files remaining after uninstalling: Jeff you were rigth.  
There is no file left, just some empty directories.


Which might be the problem with that ORTE_ERROR_LOG error?

Thanks,
Andreea

On Fri, Jan 15, 2010 at 11:47 PM, Andreea Costea > wrote:

It's almost midnight here, so I left home, but I will try it tomorrow.
There were some directories left after "make uninstall". I will give  
more details tomorrow.


Thanks Jeff,
Andreea


On Fri, Jan 15, 2010 at 11:30 PM, Jeff Squyres   
wrote:

On Jan 15, 2010, at 8:07 AM, Andreea Costea wrote:

> - I wanted to update to version 1.4.1 and I uninstalled previous  
version like this: make uninstall, and than manually deleted all the  
left over files. the directory where I installed was /usr/local


I'll let Josh answer your CR questions, but I did want to ask about  
this point.  AFAIK, "make uninstall" removes *all* Open MPI files.   
For example:


-
[7:25] $ cd /path/to/my/OMPI/tree
[7:25] $ make install > /dev/null
[7:26] $ find /tmp/bogus/ -type f | wc
   646 646   28082
[7:26] $ make uninstall > /dev/null
[7:27] $ find /tmp/bogus/ -type f | wc
 0   0   0
[7:27] $
-

I realize that some *directories* are left in $prefix, but there  
should be no *files* left.  Are you seeing something different?


--
Jeff Squyres
jsquy...@cisco.com


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Windows CMake build problems ... (cont.)

2010-01-25 Thread Shiqing Fan


Yes, it might be necessary. Done in r22473.

Thanks,
Shiqing

Jeff Squyres wrote:

Should this kind of info be added to README.windows?

On Jan 25, 2010, at 4:34 AM,  
 wrote:

  

Thanks, that second part about the wrappers was what I was looking for.
 
Charlie ... 
 
 Original Message 

Subject: Re: [OMPI users] Windows CMake build problems ... (cont.)
From: Shiqing Fan 
Date: Mon, January 25, 2010 2:09 am
To: cjohn...@valverdecomputing.com
Cc: Open MPI Users 


Hi Charlie,

Actually, to compile and link your application with Open MPI on Windows 
is similar as on Linux. You have to link your application against the 
generated Open MPI libraries, e.g. libopen-mpi.lib (don't forget the 
suffix 'd' if you build debug version of the OMPI libraries, e.g. 
libopen-mpid.lib).


But according to the information you provided, I assume that you only 
added the search path into the project, that's not enough, you should 
probably add the library names into "Project Property Pages" -> 
"Configuration Properties" -> Linker -> Input -> "Additional 
Dependencies", normally only libopen-mpi.lib (or libopen-mpid.lib) would 
be enough, so that Visual Studio will know which libraries to link to.


Besides, the Open MPI compiler wrappers should also work on Windows, in 
this case you just need to open a "Visual Studio command prompt" with 
the Open MPI path env added (e.g. "set PATH=c:\Program 
Files\OpenMPI_v1.4\bin;%PATH%"), and simply run command like:




mpicc app.c
  

and



mpirun -np 2 app.exe
  
Please note that, before executing the application, Open MPI has to be 
installed somewhere either by build the "INSTALL" project or by running 
the generated installer, so that the correct Open MPI folder structure 
could be created.



Regards,
Shiqing


cjohn...@valverdecomputing.com wrote:


OK, so I'm a little farther on and perplexed.

As I said, Visual C++ 2005 (release 8.0.50727.867) build 
of OpenMPI 1.4, using CMake 2.6.4, built everything and it all linked.


Went ahead and built the PACKAGE item in the OpenMPI.sln project, 
which made a zip file and an installer (although it was not obvious 
where to look for this , what its name was, etc., I figured it out by 
dates on files).


Another thing that''s not obvious, is how to shoehorn your code into a 
VCC project that will successfully build.


I created a project from existing files in a place where the include 
on the mpi.h would be found and examples, etc. did compile.


However, they did not find any of the library routines. Link errors.

So, I added in the generated libraries location into the search 
locations for libraries.


No good.

So, I added all of the generated libraries into the VCC project I created.

No good.

How does one do this (aside from rigging up something through CMake, 
cygwin, minGW, or MS SFU)?


Charlie ...


 Original Message 
Subject: Re: [OMPI users] Windows CMake build problems ... (cont.)
From: Shiqing Fan 
Date: Fri, January 15, 2010 2:56 am
To: cjohn...@valverdecomputing.com
Cc: Open MPI Users 


Hi Charlie,

Glad to hear that you compiled it successfully.

The error you got with 1.3.4 is a bug that the CMake script didn't
set
the SVN information correctly, and it has been fixed in 1.4 and later.


Thanks,
Shiqing


cjohn...@valverdecomputing.com wrote:
  

Yes that was it.

A much improved result now from CMake 2.6.4, no errors from


compiling
  

openmpi-1.4:

1>libopen-pal - 0 error(s), 9 warning(s)
2>libopen-rte - 0 error(s), 7 warning(s)
3>opal-restart - 0 error(s), 0 warning(s)
4>opal-wrapper - 0 error(s), 0 warning(s)
5>libmpi - 0 error(s), 42 warning(s)
6>orte-checkpoint - 0 error(s), 0 warning(s)
7>orte-ps - 0 error(s), 0 warning(s)
8>orted - 0 error(s), 0 warning(s)
9>orte-clean - 0 error(s), 0 warning(s)
10>orterun - 0 error(s), 3 warning(s)
11>ompi_info - 0 error(s), 0 warning(s)
12>ompi-server - 0 error(s), 0 warning(s)
13>libmpi_cxx - 0 error(s), 61 warning(s)
== Build: 13 succeeded, 0 failed, 1 up-to-date, 0 skipped
==

And only one failure from compiling openmpi-1.3.4 (the ompi_info


project):
  

1>libopen-pal - 0 error(s), 9 warning(s)
2>libopen-rte - 0 error(s), 7 warning(s)
3>opal-restart - 0 error(s), 0 warning(s)
4>opal-wrapper - 0 error(s), 0 warning(s)
5>orte-checkpoint - 0 error(s), 0 warning(s)
6>libmpi - 0 error(s), 42 warning(s)
7>orte-ps - 0 error(s), 0 warning(s)
8>orted - 0 error(s), 0 warning(s)
9>orte-clean - 0 error(s), 0 warning(s)
10>orterun - 0 error(s), 3 warning(s)
11>ompi_info - 3 error(s), 0 warning(s)
12>ompi-server - 0 error(s), 0 warning(s)
13>libmpi_cxx - 0 error(s), 61 warning(s)
== Rebuild All: 13 succeeded, 1 failed, 0 skipped
  

==
  

Here's the listing from the non-linking project:

11>-- Rebuild All started: Project: ompi_info, Configuration:
Debug Win32 --
11>Deleting intermediate and output files for project 'ompi_info',
configuration 'Deb

Re: [OMPI users] ABI stabilization/versioning

2010-01-25 Thread Jeff Squyres
On Jan 25, 2010, at 7:11 AM, Dave Love wrote:

> What's the status of (stabilizing and?) versioning libraries?  If I
> recall correctly, it was supposed to be defined as fixed for some
> release period as of 1.3.something.

Correct.  We started with 1.3.2 or 1.3.3, IIRC...?  I'd have to go back and 
check to be sure.

To be clear, however, we are only versioning the MPI libraries (as you noted, 
libmpi went to 0.0.1).  That is, the hidden sub-libraries (libopen-rte and 
libopen-pal) are still NOT versioned for complex, icky reasons (see 
https://svn.open-mpi.org/trac/ompi/ticket/2092 for more details).  The short 
version is that the possibility of static linking really fouls up the scheme, 
and we haven't figured out a good way around this yet.  :-(

To be absolutely crystal clear: OMPI's MPI shared libraries now have .so 
versioning enabled, but you still can't install two copies of Open MPI into the 
same $prefix (without overriding a bunch of other directory names, that is, 
like $pkglibdir, etc.).  This is because Open MPI has a bunch of files that are 
not named in relation to OMPI's version number (e.g., $includedir/mpi.h, 
$mandir/man3/*, $pkgdir/*, libopen-rte.so, etc.).  That is, the lack of .so 
versioning in libopen-rte and libopen-pal are only two of (unfortunately) many 
reasons that you can't install 2 different versions of Open MPI into the same 
$prefix.

Does that make sense?

> I assumed that the libraries would then be versioned (at least for ELF
> -- I don't know about other formats) and we could remove a major source
> of grief from dynamically linking against the wrong thing, and I think
> Jeff said that would happen.  

Right -- we're using the Libtool shared library versioning scheme.

> However, the current sources don't seem to
> be trying to set libtool version info, though I'm not sure what
> determines them producing .so.0.0.1 instead of .0.0.0 in other binaries
> I have.  

The top-level VERSION file has text fields that set what the version numbers 
will be for each of the so libraries.  These numbers get pasted in to various 
Makefile's in the build process; hence, the LT .so versioning info is included 
down at the level where each .so library is created (by Libtool).  

Check out our wiki page about the shared library version numbering: 
https://svn.open-mpi.org/trac/ompi/wiki/ReleaseProcedures.

> This doesn't seem to have been addressed in the Debian or
> Fedora packaging, either
> 
> Is that just an oversight or something dropped, so it could be fixed
> (modulo historical mess) if someone did the work?  It isn't covered
> under http://www.open-mpi.org/software/ompi/versions/ or as far as I can
> tell in the FAQ, and seems important (like plenty of other things, I'm
> sure!), given how much of a problem it's been for users and admins doing
> updates.

Good point -- I'll take a to-do to add some text about the shared library 
versioning scheme in the FAQ and the /versions/ page.  Probably not today, but 
I should be able to get to it this week.

Do the links and text I provided above give you enough information / rationale?

-- 
Jeff Squyres
jsquy...@cisco.com




Re: [OMPI users] Windows CMake build problems ... (cont.)

2010-01-25 Thread Jeff Squyres
Should this kind of info be added to README.windows?

On Jan 25, 2010, at 4:34 AM,  
 wrote:

> Thanks, that second part about the wrappers was what I was looking for.
>  
> Charlie ... 
>  
>  Original Message 
> Subject: Re: [OMPI users] Windows CMake build problems ... (cont.)
> From: Shiqing Fan 
> Date: Mon, January 25, 2010 2:09 am
> To: cjohn...@valverdecomputing.com
> Cc: Open MPI Users 
> 
> 
> Hi Charlie,
> 
> Actually, to compile and link your application with Open MPI on Windows 
> is similar as on Linux. You have to link your application against the 
> generated Open MPI libraries, e.g. libopen-mpi.lib (don't forget the 
> suffix 'd' if you build debug version of the OMPI libraries, e.g. 
> libopen-mpid.lib).
> 
> But according to the information you provided, I assume that you only 
> added the search path into the project, that's not enough, you should 
> probably add the library names into "Project Property Pages" -> 
> "Configuration Properties" -> Linker -> Input -> "Additional 
> Dependencies", normally only libopen-mpi.lib (or libopen-mpid.lib) would 
> be enough, so that Visual Studio will know which libraries to link to.
> 
> Besides, the Open MPI compiler wrappers should also work on Windows, in 
> this case you just need to open a "Visual Studio command prompt" with 
> the Open MPI path env added (e.g. "set PATH=c:\Program 
> Files\OpenMPI_v1.4\bin;%PATH%"), and simply run command like:
> 
> > mpicc app.c
> 
> and
> 
> > mpirun -np 2 app.exe
> 
> 
> Please note that, before executing the application, Open MPI has to be 
> installed somewhere either by build the "INSTALL" project or by running 
> the generated installer, so that the correct Open MPI folder structure 
> could be created.
> 
> 
> Regards,
> Shiqing
> 
> 
> cjohn...@valverdecomputing.com wrote:
> > OK, so I'm a little farther on and perplexed.
> > 
> > As I said, Visual C++ 2005 (release 8.0.50727.867) build 
> > of OpenMPI 1.4, using CMake 2.6.4, built everything and it all linked.
> > 
> > Went ahead and built the PACKAGE item in the OpenMPI.sln project, 
> > which made a zip file and an installer (although it was not obvious 
> > where to look for this , what its name was, etc., I figured it out by 
> > dates on files).
> > 
> > Another thing that''s not obvious, is how to shoehorn your code into a 
> > VCC project that will successfully build.
> > 
> > I created a project from existing files in a place where the include 
> > on the mpi.h would be found and examples, etc. did compile.
> > 
> > However, they did not find any of the library routines. Link errors.
> > 
> > So, I added in the generated libraries location into the search 
> > locations for libraries.
> > 
> > No good.
> > 
> > So, I added all of the generated libraries into the VCC project I created.
> > 
> > No good.
> > 
> > How does one do this (aside from rigging up something through CMake, 
> > cygwin, minGW, or MS SFU)?
> > 
> > Charlie ...
> > 
> >
> >  Original Message 
> > Subject: Re: [OMPI users] Windows CMake build problems ... (cont.)
> > From: Shiqing Fan 
> > Date: Fri, January 15, 2010 2:56 am
> > To: cjohn...@valverdecomputing.com
> > Cc: Open MPI Users 
> >
> >
> > Hi Charlie,
> >
> > Glad to hear that you compiled it successfully.
> >
> > The error you got with 1.3.4 is a bug that the CMake script didn't
> > set
> > the SVN information correctly, and it has been fixed in 1.4 and later.
> >
> >
> > Thanks,
> > Shiqing
> >
> >
> > cjohn...@valverdecomputing.com wrote:
> > > Yes that was it.
> > >
> > > A much improved result now from CMake 2.6.4, no errors from
> > compiling
> > > openmpi-1.4:
> > >
> > > 1>libopen-pal - 0 error(s), 9 warning(s)
> > > 2>libopen-rte - 0 error(s), 7 warning(s)
> > > 3>opal-restart - 0 error(s), 0 warning(s)
> > > 4>opal-wrapper - 0 error(s), 0 warning(s)
> > > 5>libmpi - 0 error(s), 42 warning(s)
> > > 6>orte-checkpoint - 0 error(s), 0 warning(s)
> > > 7>orte-ps - 0 error(s), 0 warning(s)
> > > 8>orted - 0 error(s), 0 warning(s)
> > > 9>orte-clean - 0 error(s), 0 warning(s)
> > > 10>orterun - 0 error(s), 3 warning(s)
> > > 11>ompi_info - 0 error(s), 0 warning(s)
> > > 12>ompi-server - 0 error(s), 0 warning(s)
> > > 13>libmpi_cxx - 0 error(s), 61 warning(s)
> > > == Build: 13 succeeded, 0 failed, 1 up-to-date, 0 skipped
> > > ==
> > >
> > > And only one failure from compiling openmpi-1.3.4 (the ompi_info
> > project):
> > >
> > > > 1>libopen-pal - 0 error(s), 9 warning(s)
> > > > 2>libopen-rte - 0 error(s), 7 warning(s)
> > > > 3>opal-restart - 0 error(s), 0 warning(s)
> > > > 4>opal-wrapper - 0 error(s), 0 warning(s)
> > > > 5>orte-checkpoint - 0 error(s), 0 warning(s)
> > > > 6>libmpi - 0 error(s), 42 warning(s)
> > > > 7>orte-ps - 0 error(s), 0 warning(s)
> > > > 8>orted - 0 error(s), 0 warning(s)
> > > > 9>orte-clean - 0 error(s), 0 warning(s)
> > > > 10>orterun - 0 error(s), 3 warning(s)
> > > > 11>ompi_info - 3 error(s), 0 warning(s)
> > > > 12>ompi-serve

Re: [OMPI users] ABI stabilization/versioning

2010-01-25 Thread Manuel Prinz
Am Montag, den 25.01.2010, 12:11 + schrieb Dave Love:
> I assumed that the libraries would then be versioned (at least for ELF
> -- I don't know about other formats) and we could remove a major source
> of grief from dynamically linking against the wrong thing, and I think
> Jeff said that would happen.  However, the current sources don't seem to
> be trying to set libtool version info, though I'm not sure what
> determines them producing .so.0.0.1 instead of .0.0.0 in other binaries
> I have.  This doesn't seem to have been addressed in the Debian or
> Fedora packaging, either

The ABI should be stable since 1.3.2. OMPI 1.4.x does set the libtool
version info; Versions where bumped to 0.0.1 for libmpi which has no
effect for dynamic linking.

Could you please elaborate on what needs to be addressed? Debian does
not have 1.4.1 yet though I'm planning to upload it really soon. The ABI
did not change (also not in an incompatible way, AFAICS). If you know of
any issues, I'd be glad if you could tell us, so we can find a solution
before any damage is done. Thanks in advance!

Best regards
Manuel



[OMPI users] Problems building Open MPI 1.4.1 with Pathscale

2010-01-25 Thread Rafael Arco Arredondo
Hello:

I'm having some issues with Open MPI 1.4.1 and Pathscale compiler
(version 3.2). Open MPI builds successfully with the following configure
arguments:

./configure --with-openib=/usr --with-openib-libdir=/usr/lib64
--with-sge --enable-static CC=pathcc CXX=pathCC F77=pathf90 F90=pathf90
FC=pathf90

(we have OpenFabrics 1.2 Infiniband drivers, by the way)

However, applications hang on MPI_Init (or maybe MPI_Comm_rank or
MPI_Comm_size, a basic hello-world anyway doesn't print 'Hello World
from node...'). I tried running them with and without SGE. Same result.

This hello-world works flawlessly when I build Open MPI with gcc:

./configure --with-openib=/usr --with-openib-libdir=/usr/lib64
--with-sge --enable-static

This successful execution runs in one machine only, so it shouldn't use
Infiniband, and it also works when several nodes are used.

I was able to build previous versions of Open MPI with Pathscale (1.2.6
and 1.3.2, particularly). I tried building version 1.4.1 both with
Pathscale 3.2 and Pathscale 3.1. No difference.

Any ideas?

Thank you in advance,

Rafa

-- 
Rafael Arco Arredondo
Centro de Servicios de Informática y Redes de Comunicaciones
Universidad de Granada



[OMPI users] ABI stabilization/versioning

2010-01-25 Thread Dave Love
What's the status of (stabilizing and?) versioning libraries?  If I
recall correctly, it was supposed to be defined as fixed for some
release period as of 1.3.something.

I assumed that the libraries would then be versioned (at least for ELF
-- I don't know about other formats) and we could remove a major source
of grief from dynamically linking against the wrong thing, and I think
Jeff said that would happen.  However, the current sources don't seem to
be trying to set libtool version info, though I'm not sure what
determines them producing .so.0.0.1 instead of .0.0.0 in other binaries
I have.  This doesn't seem to have been addressed in the Debian or
Fedora packaging, either

Is that just an oversight or something dropped, so it could be fixed
(modulo historical mess) if someone did the work?  It isn't covered
under http://www.open-mpi.org/software/ompi/versions/ or as far as I can
tell in the FAQ, and seems important (like plenty of other things, I'm
sure!), given how much of a problem it's been for users and admins doing
updates.



Re: [OMPI users] Checkpoint/Restart error

2010-01-25 Thread Andreea Costea
So? anyone? any clue?

Summarize:
- installed OpenMPI 1.4.1 on fresh Centos 5
- mpirun works but ompi-checkpoint throws this error:
ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line 405
- on another VM I have OpenMPI 1.3.3. installed. Checkpointing works fine on
guest but has the previous mentioned error on root. Both root and guest show
the same output after "param -all -all" except for the $HOME (which only
matters for mca_component_path, mca_param_files,
snapc_base_global_snapshot_dir)


Thanks,
Andreea


On Tue, Jan 19, 2010 at 9:01 PM, Andreea Costea wrote:

> I noticed one more thing. As I still have some VMs that have OpenMPI
> version 1.3.3 installed I started to use those machines 'till I fix the
> problem with 1.4.1 And while checkpointing on one of this VMs I realized
> that checkpointing as a guest works fine and checkpointing as a root outputs
> the same error like in 1.4.1. : ORTE_ERROR_LOG: Not found in file
> orte-checkpoint.c at line 405
>
> I logged the outputs of "ompi_info --param all all" which I run for root
> and for another user and the only differences were at these parameters:
>
> mca_component_path
> mca_param_files
> snapc_base_global_snapshot_dir
>
> All 3 params differ because of the $HOME.
> One more thing: I don't have the directory $HOME/.openmpi
>
> Ideas?
>
> Thanks,
> Andreea
>
>
>
>
>
> On Tue, Jan 19, 2010 at 12:51 PM, Andreea Costea 
> wrote:
>
>> Well... I decided to install a fresh OS to be sure that there is no
>> OpenMPI version conflict. So I formatted one of my VMs, did a fresh CentOS
>> install, installed BLCR 0.8.2 and OpenMPI 1.4.1 and the result: the same.
>> mpirun works but ompi-checkpoint has that error at line 405:
>>
>> [[35906,0],0] ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line
>> 405
>>
>> As for the files remaining after uninstalling: Jeff you were rigth. There
>> is no file left, just some empty directories.
>>
>> Which might be the problem with that ORTE_ERROR_LOG error?
>>
>> Thanks,
>> Andreea
>>
>> On Fri, Jan 15, 2010 at 11:47 PM, Andreea Costea 
>> wrote:
>>
>>> It's almost midnight here, so I left home, but I will try it tomorrow.
>>> There were some directories left after "make uninstall". I will give more
>>> details tomorrow.
>>>
>>> Thanks Jeff,
>>> Andreea
>>>
>>>
>>> On Fri, Jan 15, 2010 at 11:30 PM, Jeff Squyres wrote:
>>>
 On Jan 15, 2010, at 8:07 AM, Andreea Costea wrote:

 > - I wanted to update to version 1.4.1 and I uninstalled previous
 version like this: make uninstall, and than manually deleted all the left
 over files. the directory where I installed was /usr/local

 I'll let Josh answer your CR questions, but I did want to ask about this
 point.  AFAIK, "make uninstall" removes *all* Open MPI files.  For example:

 -
 [7:25] $ cd /path/to/my/OMPI/tree
 [7:25] $ make install > /dev/null
 [7:26] $ find /tmp/bogus/ -type f | wc
646 646   28082
 [7:26] $ make uninstall > /dev/null
 [7:27] $ find /tmp/bogus/ -type f | wc
  0   0   0
 [7:27] $
 -

 I realize that some *directories* are left in $prefix, but there should
 be no *files* left.  Are you seeing something different?

 --
 Jeff Squyres
 jsquy...@cisco.com


 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users

>>>
>>>
>>
>


Re: [OMPI users] Windows CMake build problems ... (cont.)

2010-01-25 Thread cjohnson
Thanks, that second part about the wrappers was what I was looking for.
 
Charlie ... 
 

 Original Message Subject: Re: [OMPI users] Windows CMake build problems ... (cont.)From: Shiqing Fan Date: Mon, January 25, 2010 2:09 amTo: cjohn...@valverdecomputing.comCc: Open MPI Users Hi Charlie,Actually, to compile and link your application with Open MPI on Windows is similar as on Linux. You have to link your application against the generated Open MPI libraries, e.g. libopen-mpi.lib (don't forget the suffix 'd' if you build debug version of the OMPI libraries, e.g. libopen-mpid.lib).But according to the information you provided, I assume that you only added the search path into the project, that's not enough, you should probably add the library names into "Project Property Pages" -> "Configuration Properties" -> Linker -> Input -> "Additional Dependencies", normally only libopen-mpi.lib (or libopen-mpid.lib) would be enough, so that Visual Studio will know which libraries to link to.Besides, the Open MPI compiler wrappers should also work on Windows, in this case you just need to open a "Visual Studio command prompt" with the Open MPI path env added (e.g. "set PATH=c:\Program Files\OpenMPI_v1.4\bin;%PATH%"), and simply run command like:> mpicc app.cand> mpirun -np 2 app.exePlease note that, before executing the application, Open MPI has to be installed somewhere either by build the "INSTALL" project or by running the generated installer, so that the correct Open MPI folder structure could be created.Regards,Shiqingcjohn...@valverdecomputing.com wrote:> OK, so I'm a little farther on and perplexed.> > As I said, Visual C++ 2005 (release 8.0.50727.867) build > of OpenMPI 1.4, using CMake 2.6.4, built everything and it all linked.> > Went ahead and built the PACKAGE item in the OpenMPI.sln project, > which made a zip file and an installer (although it was not obvious > where to look for this , what its name was, etc., I figured it out by > dates on files).> > Another thing that''s not obvious, is how to shoehorn your code into a > VCC project that will successfully build.> > I created a project from existing files in a place where the include > on the mpi.h would be found and examples, etc. did compile.> > However, they did not find any of the library routines. Link errors.> > So, I added in the generated libraries location into the search > locations for libraries.> > No good.> > So, I added all of the generated libraries into the VCC project I created.> > No good.> > How does one do this (aside from rigging up something through CMake, > cygwin, minGW, or MS SFU)?> > Charlie ...> >>  Original Message > Subject: Re: [OMPI users] Windows CMake build problems ... (cont.)> From: Shiqing Fan > Date: Fri, January 15, 2010 2:56 am> To: cjohn...@valverdecomputing.com> Cc: Open MPI Users >>> Hi Charlie,>> Glad to hear that you compiled it successfully.>> The error you got with 1.3.4 is a bug that the CMake script didn't> set> the SVN information correctly, and it has been fixed in 1.4 and later.>>> Thanks,> Shiqing>>> cjohn...@valverdecomputing.com wrote:> > Yes that was it.> >> > A much improved result now from CMake 2.6.4, no errors from> compiling> > openmpi-1.4:> >> > 1>libopen-pal - 0 error(s), 9 warning(s)> > 2>libopen-rte - 0 error(s), 7 warning(s)> > 3>opal-restart - 0 error(s), 0 warning(s)> > 4>opal-wrapper - 0 error(s), 0 warning(s)> > 5>libmpi - 0 error(s), 42 warning(s)> > 6>orte-checkpoint - 0 error(s), 0 warning(s)> > 7>orte-ps - 0 error(s), 0 warning(s)> > 8>orted - 0 error(s), 0 warning(s)> > 9>orte-clean - 0 error(s), 0 warning(s)> > 10>orterun - 0 error(s), 3 warning(s)> > 11>ompi_info - 0 error(s), 0 warning(s)> > 12>ompi-server - 0 error(s), 0 warning(s)> > 13>libmpi_cxx - 0 error(s), 61 warning(s)> > == Build: 13 succeeded, 0 failed, 1 up-to-date, 0 skipped> > ==> >> > And only one failure from compiling openmpi-1.3.4 (the ompi_info> project):> >> > > 1>libopen-pal - 0 error(s), 9 warning(s)> > > 2>libopen-rte - 0 error(s), 7 warning(s)> > > 3>opal-restart - 0 error(s), 0 warning(s)> > > 4>opal-wrapper - 0 error(s), 0 warning(s)> > > 5>orte-checkpoint - 0 error(s), 0 warning(s)> > > 6>libmpi - 0 error(s), 42 warning(s)> > > 7>orte-ps - 0 error(s), 0 warning(s)> > > 8>orted - 0 error(s), 0 warning(s)> > > 9>orte-clean - 0 error(s), 0 warning(s)> > > 10>orterun - 0 error(s), 3 warning(s)> > > 11>ompi_info - 3 error(s), 0 warning(s)> > > 12>ompi-server - 0 error(s), 0 warning(s)> > > 13>libmpi_cxx - 0 error(s), 61 warning(s)> > > == Rebuild All: 13 succeeded, 1 failed, 0 skipped> ==> >> > Here's the listing from the non-linking project:> >> > 11>-- Rebuild All started: Project: ompi_info, Configuration:> > Debug Win32 --> > 11>Deleting intermediate and output files for project 'ompi_info',> > configuration 'Debug|Win32'> > 11>Compiling...> > 11>version.cc> > 11>..\..\..\..\openmpi-1.3.4\ompi\tools\ompi_info\version.cc(136) :> > err

Re: [OMPI users] Windows CMake build problems ... (cont.)

2010-01-25 Thread Shiqing Fan


Hi Charlie,

Actually, to compile and link your application with Open MPI on Windows 
is similar as on Linux. You have to link your application against the 
generated Open MPI libraries, e.g. libopen-mpi.lib (don't forget the 
suffix 'd' if you build debug version of the OMPI libraries, e.g. 
libopen-mpid.lib).


But according to the information you provided, I assume that you only 
added the search path into the project, that's not enough, you should 
probably add the library names into "Project Property Pages" -> 
"Configuration Properties" -> Linker -> Input -> "Additional 
Dependencies", normally only libopen-mpi.lib (or libopen-mpid.lib) would 
be enough, so that Visual Studio will know which libraries to link to.


Besides, the Open MPI compiler wrappers should also work on Windows, in 
this case you just need to open a "Visual Studio command prompt" with 
the Open MPI path env added (e.g. "set PATH=c:\Program 
Files\OpenMPI_v1.4\bin;%PATH%"), and simply run command like:


   > mpicc app.c

and

  > mpirun -np 2 app.exe


Please note that, before executing the application, Open MPI has to be 
installed somewhere either by build the "INSTALL" project or by running 
the generated installer, so that the correct Open MPI folder structure 
could be created.



Regards,
Shiqing


cjohn...@valverdecomputing.com wrote:

OK, so I'm a little farther on and perplexed.
 
As I said, Visual C++ 2005 (release 8.0.50727.867) build 
of OpenMPI 1.4, using CMake 2.6.4, built everything and it all linked.
 
Went ahead and built the PACKAGE item in the OpenMPI.sln project, 
which made a zip file and an installer (although it was not obvious 
where to look for this , what its name was, etc., I figured it out by 
dates on files).
 
Another thing that''s not obvious, is how to shoehorn your code into a 
VCC project that will successfully build.
 
I created a project from existing files in a place where the include 
on the mpi.h would be found and examples, etc. did compile.
 
However, they did not find any of the library routines. Link errors.
 
So, I added in the generated libraries location into the search 
locations for libraries.
 
No good.
 
So, I added all of the generated libraries into the VCC project I created.
 
No good.
 
How does one do this (aside from rigging up something through CMake, 
cygwin, minGW, or MS SFU)?
 
Charlie ...
 


 Original Message 
Subject: Re: [OMPI users] Windows CMake build problems ... (cont.)
From: Shiqing Fan 
Date: Fri, January 15, 2010 2:56 am
To: cjohn...@valverdecomputing.com
Cc: Open MPI Users 


Hi Charlie,

Glad to hear that you compiled it successfully.

The error you got with 1.3.4 is a bug that the CMake script didn't
set
the SVN information correctly, and it has been fixed in 1.4 and later.


Thanks,
Shiqing


cjohn...@valverdecomputing.com wrote:
> Yes that was it.
>
> A much improved result now from CMake 2.6.4, no errors from
compiling
> openmpi-1.4:
>
> 1>libopen-pal - 0 error(s), 9 warning(s)
> 2>libopen-rte - 0 error(s), 7 warning(s)
> 3>opal-restart - 0 error(s), 0 warning(s)
> 4>opal-wrapper - 0 error(s), 0 warning(s)
> 5>libmpi - 0 error(s), 42 warning(s)
> 6>orte-checkpoint - 0 error(s), 0 warning(s)
> 7>orte-ps - 0 error(s), 0 warning(s)
> 8>orted - 0 error(s), 0 warning(s)
> 9>orte-clean - 0 error(s), 0 warning(s)
> 10>orterun - 0 error(s), 3 warning(s)
> 11>ompi_info - 0 error(s), 0 warning(s)
> 12>ompi-server - 0 error(s), 0 warning(s)
> 13>libmpi_cxx - 0 error(s), 61 warning(s)
> == Build: 13 succeeded, 0 failed, 1 up-to-date, 0 skipped
> ==
>
> And only one failure from compiling openmpi-1.3.4 (the ompi_info
project):
>
> > 1>libopen-pal - 0 error(s), 9 warning(s)
> > 2>libopen-rte - 0 error(s), 7 warning(s)
> > 3>opal-restart - 0 error(s), 0 warning(s)
> > 4>opal-wrapper - 0 error(s), 0 warning(s)
> > 5>orte-checkpoint - 0 error(s), 0 warning(s)
> > 6>libmpi - 0 error(s), 42 warning(s)
> > 7>orte-ps - 0 error(s), 0 warning(s)
> > 8>orted - 0 error(s), 0 warning(s)
> > 9>orte-clean - 0 error(s), 0 warning(s)
> > 10>orterun - 0 error(s), 3 warning(s)
> > 11>ompi_info - 3 error(s), 0 warning(s)
> > 12>ompi-server - 0 error(s), 0 warning(s)
> > 13>libmpi_cxx - 0 error(s), 61 warning(s)
> > == Rebuild All: 13 succeeded, 1 failed, 0 skipped
==
>
> Here's the listing from the non-linking project:
>
> 11>-- Rebuild All started: Project: ompi_info, Configuration:
> Debug Win32 --
> 11>Deleting intermediate and output files for project 'ompi_info',
> configuration 'Debug|Win32'
> 11>Compiling...
> 11>version.cc
> 11>..\..\..\..\openmpi-1.3.4\ompi\tools\ompi_info\version.cc(136) :
> error C2059: syntax error : ','
> 11>..\..\..\..\openmpi-1.3.4\ompi\to

Re: [OMPI users] Windows CMake build problems ... (cont.)

2010-01-25 Thread cjohnson
OK, so I'm a little farther on and perplexed.
 
As I said, Visual C++ 2005 (release 8.0.50727.867) build of OpenMPI 1.4, using CMake 2.6.4, built everything and it all linked.
 
Went ahead and built the PACKAGE item in the OpenMPI.sln project, which made a zip file and an installer (although it was not obvious where to look for this , what its name was, etc., I figured it out by dates on files).
 
Another thing that''s not obvious, is how to shoehorn your code into a VCC project that will successfully build.
 
I created a project from existing files in a place where the include on the mpi.h would be found and examples, etc. did compile.
 
However, they did not find any of the library routines. Link errors.
 
So, I added in the generated libraries location into the search locations for libraries.
 
No good.
 
So, I added all of the generated libraries into the VCC project I created.
 
No good.
 
How does one do this (aside from rigging up something through CMake, cygwin, minGW, or MS SFU)?
 
Charlie ...
 

 Original Message Subject: Re: [OMPI users] Windows CMake build problems ... (cont.)From: Shiqing Fan Date: Fri, January 15, 2010 2:56 amTo: cjohn...@valverdecomputing.comCc: Open MPI Users Hi Charlie,Glad to hear that you compiled it successfully.The error you got with 1.3.4 is a bug that the CMake script didn't set the SVN information correctly, and it has been fixed in 1.4 and later.Thanks,Shiqingcjohn...@valverdecomputing.com wrote:> Yes that was it.>> A much improved result now from CMake 2.6.4, no errors from compiling > openmpi-1.4:>> 1>libopen-pal - 0 error(s), 9 warning(s)> 2>libopen-rte - 0 error(s), 7 warning(s)> 3>opal-restart - 0 error(s), 0 warning(s)> 4>opal-wrapper - 0 error(s), 0 warning(s)> 5>libmpi - 0 error(s), 42 warning(s)> 6>orte-checkpoint - 0 error(s), 0 warning(s)> 7>orte-ps - 0 error(s), 0 warning(s)> 8>orted - 0 error(s), 0 warning(s)> 9>orte-clean - 0 error(s), 0 warning(s)> 10>orterun - 0 error(s), 3 warning(s)> 11>ompi_info - 0 error(s), 0 warning(s)> 12>ompi-server - 0 error(s), 0 warning(s)> 13>libmpi_cxx - 0 error(s), 61 warning(s)> == Build: 13 succeeded, 0 failed, 1 up-to-date, 0 skipped > ==>> And only one failure from compiling openmpi-1.3.4 (the ompi_info project):>> > 1>libopen-pal - 0 error(s), 9 warning(s)> > 2>libopen-rte - 0 error(s), 7 warning(s)> > 3>opal-restart - 0 error(s), 0 warning(s)> > 4>opal-wrapper - 0 error(s), 0 warning(s)> > 5>orte-checkpoint - 0 error(s), 0 warning(s)> > 6>libmpi - 0 error(s), 42 warning(s)> > 7>orte-ps - 0 error(s), 0 warning(s)> > 8>orted - 0 error(s), 0 warning(s)> > 9>orte-clean - 0 error(s), 0 warning(s)> > 10>orterun - 0 error(s), 3 warning(s)> > 11>ompi_info - 3 error(s), 0 warning(s)> > 12>ompi-server - 0 error(s), 0 warning(s)> > 13>libmpi_cxx - 0 error(s), 61 warning(s)> > == Rebuild All: 13 succeeded, 1 failed, 0 skipped ==>> Here's the listing from the non-linking project:>> 11>-- Rebuild All started: Project: ompi_info, Configuration: > Debug Win32 --> 11>Deleting intermediate and output files for project 'ompi_info', > configuration 'Debug|Win32'> 11>Compiling...> 11>version.cc> 11>..\..\..\..\openmpi-1.3.4\ompi\tools\ompi_info\version.cc(136) : > error C2059: syntax error : ','> 11>..\..\..\..\openmpi-1.3.4\ompi\tools\ompi_info\version.cc(147) : > error C2059: syntax error : ','> 11>..\..\..\..\openmpi-1.3.4\ompi\tools\ompi_info\version.cc(158) : > error C2059: syntax error : ','> 11>param.cc> 11>output.cc> 11>ompi_info.cc> 11>components.cc> 11>Generating Code...> 11>Build log was saved at > "file://c:\prog\mon\ompi\tools\ompi_info\ompi_info.dir\Debug\BuildLog.htm"> 11>ompi_info - 3 error(s), 0 warning(s)>> Thank you Shiqing !>> Charlie ...>>  Original Message > Subject: Re: [OMPI users] Windows CMake build problems ... (cont.)> From: Shiqing Fan > Date: Thu, January 14, 2010 11:20 am> To: Open MPI Users ,> cjohn...@valverdecomputing.com>>> Hi Charlie,>> The problem turns out to be the different behavior of one CMake> macro in> different version of CMake. And it's fixed in Open MPI trunk with> r22405. I also created a ticket to move the fix over to 1.4> branch, see> #2169: https://svn.open-mpi.org/trac/ompi/ticket/2169 .>> So you could either switch to use OMPI trunk or use CMake 2.6 to> solve> the problem. Thanks a lot.>>> Best Regards,> Shiqing>>> cjohn...@valverdecomputing.com wrote:> > The OpenMPI build problem I'm having occurs in both OpenMPI 1.4> and 1.3.4.> >> > I am on a Windows 7 (US) Enterprise (x86) OS on an HP system with> > Intel core 2 extreme x9000 (4GB RAM), using the 2005 Visual> Studio for> > S/W Architects (release 8.0.50727.867).> >> > [That release has everything the platform SDK would have.]> >> > I'm using CMake 2.8 to generate code, I used it correctly,> pointing at> > the root directory where the makelists are located for the source> side> > and to an empty directory for the build side: did configure, _*I did> > not cli