Re: [OMPI devel] RFC: ABI break between 1.4 and 1.5 / .so versioning

2010-02-18 Thread Ralf Wildenhues
* Jeff Squyres wrote on Wed, Feb 17, 2010 at 11:51:42PM CET:
> On Feb 17, 2010, at 3:05 PM, Ralf Wildenhues wrote:
> 
> > > The issue is that if the user has to specify -static to their linker,
> > > they *also* have to specify --ompi:static, or Bad Things will happen.
> > > Or, if they don't specify -static but *only* specify --ompi:static,
> > > Bad Things will happen.  In short: it seems like adding yet another
> > > wrapper-compiler-specific flag to the MPI ecosystem will cause
> > > confusion, fear, and possibly the death of some cats.
> > 
> > Do you care for omitting -lopen-pal and -lorte only for capable Linux
> > systems?  With new-enough binutils, you should be able to use
> > -Wl,--as-needed -Wl,--no-as-needed around these two libs.
> 
> Mmmm.  Good point.  But I don't think it helps us on Solaris or OS X, does 
> it?  (maybe it does on OS X?)  Or do all linkers have some kind of option 
> like this?  (this *might* be a way out, but I would probably need to be 
> convinced :-) )

No, I think only binutils ld (and gold) have this.  Sorry.

> > I'm not entirely sure I understand your argumentation for why libmpi
> > from 1.5.x has to be binary incompatible, but I haven't fully thought
> > through this yet.
> 
> The context for this issue is so long that much was left out of my mail.  
> Here's this particular issue in a nutshell:
> 
> - Open MPI v1.4.1 has libmpi at 0:1:0 and libopen-rte and libopen-pal both at 
> 0:0:0.
> - Open MPI v1.4.1 links MPI apps against -lmpi -lopen-rte -lopen-pal.
> - If we start .so versioning properly in v1.5, it's likely that libopen-rte 
> and libopen-pal will both be 1:0:0.
>   --> Note that these are both internal libraries; there are no symbols in 
> these libraries that are used in the MPI applications.
> - Open MPI v1.5 libmpi *could* be 1:0:1.
> - Hence, an a.out created for OMPI v1.4.1 would work fine with v1.5 libmpi.
> - But that a.out would not work with v1.5 libopen-rte and libopen-pal.

You could probably create fake empty libopen-rte and libopen-pal stub
libraries with 0:0:0 purely for the sake of allowing such an a.out to
still work (on systems with versioned sonames[1]).  Since this doesn't
actually use any of the APIs from those libraries, there is no problem
here, and your 1.5 libmpi will pull in the 1:0:0 versions of the other
two libraries.

I understand if you decide not to go such ways, and in that case, I
agree that bumping libmpi to 1:0:0 won't cause much extra pain.

Cheers,
Ralf

[1] This includes many but probably not all systems with shared
libraries.  E.g., I think AIX without runtimelinking (-Wl,-brtl)
would have a problem.


Re: [OMPI devel] PATCH: remove trailing colon at the end of thegenerated LD_LIBRARY_PATH

2010-02-18 Thread Nadia Derbey
On Wed, 2010-02-17 at 17:14 -0500, Jeff Squyres wrote:
> Looks good to me!
> 
> Please commit and file CMRs for v1.4 and v1.5 (assuming this patch applies 
> cleanly to both branches).

Not sure I have the rights to do these things?

Regards,
Nadia
> 
> 
> On Feb 16, 2010, at 6:46 AM, Nadia Derbey wrote:
> 
> > Hi,
> > 
> > The mpivars.sh genereted in openmpi.spec might in some cases lead to a
> > LD_LIBRARY_PATH that contains a trailing ":". This happens if the
> > LD_LIBRARY_PATH is originally unset.
> > This means that current directory is included in the search path for the
> > loader, which might not be the desired result.
> > 
> > The following patch proposal fixes this potential issue by adding the
> > ":" only if LD_LIBRARY_PATH is already set.
> > 
> > Regards,
> > Nadia
> > 
> > 
> > diff -r 6609b6ba7637 contrib/dist/linux/openmpi.spec
> > --- a/contrib/dist/linux/openmpi.spec   Mon Feb 15 22:14:59 2010 +
> > +++ b/contrib/dist/linux/openmpi.spec   Tue Feb 16 12:44:41 2010 +0100
> > @@ -505,7 +505,7 @@ fi
> > 
> >  # LD_LIBRARY_PATH
> >  if test -z "\`echo \$LD_LIBRARY_PATH | grep %{_libdir}\`"; then
> > -LD_LIBRARY_PATH=%{_libdir}:\${LD_LIBRARY_PATH}
> > +LD_LIBRARY_PATH=%{_libdir}\${LD_LIBRARY_PATH:+:}\${LD_LIBRARY_PATH}
> >  export LD_LIBRARY_PATH
> >  fi
> > 
> > 
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > 
> 
> 
-- 
Nadia Derbey 



Re: [OMPI devel] RFC: ABI break between 1.4 and 1.5 / .so versioning

2010-02-18 Thread Jeff Squyres
On Feb 18, 2010, at 1:53 AM, Ralf Wildenhues wrote:

> You could probably create fake empty libopen-rte and libopen-pal stub
> libraries with 0:0:0 purely for the sake of allowing such an a.out to
> still work (on systems with versioned sonames[1]).  Since this doesn't
> actually use any of the APIs from those libraries, there is no problem
> here, and your 1.5 libmpi will pull in the 1:0:0 versions of the other
> two libraries.

You get 10 "evil genius" points for a nifty-yet-icky solution.  :-)

I don't really want to continue carrying forward empty libraries just to 
maintain ABI.  I'm (mostly) ok with breaking ABI at a major series change 
(i.e., 1.5.0).

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI devel] RFC: pkg-config(1) files for OMPI

2010-02-18 Thread Jeff Squyres
WHAT: Add pkg-config(1) data files for Open MPI

WHY: pkg-config is a de facto Linux standard.  At least one user asked for it; 
it was easy to do

WHERE: */config/config_files.m4, */tools/wrappers

WHEN: Can be for 1.5 or 1.5.1; I don't really have a strong opinion

TIMEOUT: Next Tuesday teleconf

---

pkg-config(1) is fairly common in Linux and *BSD distributions.  See 
http://linux.die.net/man/1/pkg-config for a description of the pkg-config 
software.  It's basically an alternate, de facto standard way to get to OMPI's 
wrapper compiler flags.  Something like this works, for example:

gcc mpi_ring.c `pkg-config ompi-c --cflags --libs` -o mpi_ring 

All this does is add a few configure-generated files that get installed under 
$libdir/pkgconfig.  I did almost all the work while waiting for other 
compilers.  MPICH2 provides a pkg-config file.

I think we should include it because it affects almost nothing else, is easy to 
do, and at least 1 user asked for it.

I don't expect this to be contentious.  We can discuss next Tuesday, but feel 
free to pipe up before then if you have any objections.

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r22663

2010-02-18 Thread Ethan Mallove
About this change - I have been seeing the below error while trying to
build the trunk recently:

  $ make ...
  cd . && /bin/bash /tmp/config-missing-bug-in-trunk/trunk/config/missing --run 
aclocal-1.10 -I config
  configure.ac:939: warning: OMPI_CONFIGURE_SETUP is m4_require'd but not 
m4_defun'd
  config/ompi_mca.m4:37: OMPI_MCA is expanded from...
  configure.ac:939: the top level
   cd . && /bin/bash /tmp/config-missing-bug-in-trunk/trunk/config/missing 
--run automake-1.10 --foreign
  configure.ac:939: warning: OMPI_CONFIGURE_SETUP is m4_require'd but not 
m4_defun'd
  ...
  ompi/mca/allocator/Makefile.am:31: WANT_INSTALL_HEADERS does not appear in 
AM_CONDITIONAL
  ... repeats 49 times ...
  make: *** [configure] Error 1

While fixing ACLOCAL_AMFLAGS gets the build to complete successfully,
the real issue is: why is config/missing getting immediately invoked
by "make"?  This wasn't happening before, and it means configure is
getting run twice per build now.

Any ideas what could be causing this?

-Ethan


On Thu, Feb/18/2010 01:11:23PM, emall...@osl.iu.edu wrote:
> Author: emallove
> Date: 2010-02-18 13:11:23 EST (Thu, 18 Feb 2010)
> New Revision: 22663
> URL: https://svn.open-mpi.org/trac/ompi/changeset/22663
> 
> Log:
> In case `config/missing` gets invoked, ensure that all the OMPI-specific m4
> macros are defined.
> 
> Text files modified: 
>trunk/Makefile.am | 2 +-  
>1 files changed, 1 insertions(+), 1 deletions(-)
> 
> Modified: trunk/Makefile.am
> ==
> --- trunk/Makefile.am (original)
> +++ trunk/Makefile.am 2010-02-18 13:11:23 EST (Thu, 18 Feb 2010)
> @@ -25,4 +25,4 @@
>  dist-hook:
>   csh "$(top_srcdir)/config/distscript.csh" "$(top_srcdir)" "$(distdir)" 
> "$(OMPI_VERSION)" "$(OMPI_SVN_R)"
>  
> -ACLOCAL_AMFLAGS = -I config
> +ACLOCAL_AMFLAGS = -I config -I opal/config -I ompi/config -I orte/config
> ___
> svn-full mailing list
> svn-f...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full


Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r22663

2010-02-18 Thread Jeff Squyres
On Feb 18, 2010, at 1:12 PM, Ethan Mallove wrote:

> About this change - I have been seeing the below error while trying to
> build the trunk recently:
> 
>   $ make ...
>   cd . && /bin/bash /tmp/config-missing-bug-in-trunk/trunk/config/missing 
> --run aclocal-1.10 -I config
>   configure.ac:939: warning: OMPI_CONFIGURE_SETUP is m4_require'd but not 
> m4_defun'd
>   config/ompi_mca.m4:37: OMPI_MCA is expanded from...
>   configure.ac:939: the top level
>cd . && /bin/bash /tmp/config-missing-bug-in-trunk/trunk/config/missing 
> --run automake-1.10 --foreign
>   configure.ac:939: warning: OMPI_CONFIGURE_SETUP is m4_require'd but not 
> m4_defun'd
>   ...
> 
> While fixing ACLOCAL_AMFLAGS gets the build to complete successfully,
> the real issue is: why is config/missing getting immediately invoked
> by "make"?  This wasn't happening before, and it means configure is
> getting run twice per build now.
> 
> Any ideas what could be causing this?

No -- it should not be happening.  I'd think that those extra -I's shouldn't be 
necessary.

Check the usual suspects, such as time synchronization between NFS client and 
server, etc.

You might also want to run "make -d" to see what rules are being invoked and 
why.

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI devel] Modex-less launch

2010-02-18 Thread Ralph Castain
Hi folks

I've had a few recent inquiries about more scalable launch methods for OMPI. It 
rapidly became clear that I had never documented the modex-less launch 
operations in the OMPI trunk, and that many people were unaware of their 
existence.

So...I finally wrote a wiki page on the subject:

https://svn.open-mpi.org/trac/ompi/wiki/ModexlessLaunch

Please feel free to contact me with any questions about this capability. As 
noted on the wiki, it was working as of last summer - and scaling very  nicely. 
However, I haven't tested it since that time.

HTH
Ralph