Re: [OMPI devel] 1.3 Release schedule and contents
Hello Brad, please note the valgrind memchecker merging, that could go in for 1.3 under the "m. Miscellaneous" section. Also, please note, moving the ORTE merging to 1.3.1 would mean moving m. Miscellaneous point vii., Windows CCP support there as well. The current/new does not seem to work on windows at the moment, Shiqing will propose a patch for that. Thanks, Rainer On Tuesday 12 February 2008 04:09, Brad Benton wrote: > All: > > The latest scrub of the 1.3 release schedule and contents is ready for > review and comment. Please use the following links: > 1.3 milestones: > https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3 > 1.3.1 milestones: > https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3.1 > > In order to try and keep the dates for 1.3 in, I've pushed a bunch of stuff > (particularly ORTE things) to 1.3.1. Even though there will be new > functionality slated for 1.3.1, the goal is to not have any interface > changes between the phases. > > Please look over the list and schedules and let me or my fellow > 1.3co-release manager George Bosilca ( > bosi...@eecs.utk.edu) know of any issues, errors, suggestions, omissions, > heartburn, etc. > > Thanks, > --Brad > > Brad Benton > IBM -- Dipl.-Inf. Rainer Keller http://www.hlrs.de/people/keller HLRS Tel: ++49 (0)711-685 6 5858 Nobelstrasse 19 Fax: ++49 (0)711-685 6 5832 70550 Stuttgartemail: kel...@hlrs.de Germany AIM/Skype:rusraink
[OMPI devel] [RFC] Remove explicit call to progress() from ob1.
Hi, I am planning to commit the following patch. Those two progress() calls are responsible for most of our deep recursion troubles. And I also think they are completely unnecessary. diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c b/ompi/mca/pml/ob1/pml_ob1_recvreq.c index 5899243..641176e 100644 --- a/ompi/mca/pml/ob1/pml_ob1_recvreq.c +++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.c @@ -704,9 +704,6 @@ int mca_pml_ob1_recv_request_schedule_once( mca_bml_base_free(bml_btl,dst); continue; } - -/* run progress as the prepare (pinning) can take some time */ -mca_bml.bml_progress(); } return OMPI_SUCCESS; diff --git a/ompi/mca/pml/ob1/pml_ob1_sendreq.c b/ompi/mca/pml/ob1/pml_ob1_sendreq.c index 0998a05..9d7f3f9 100644 --- a/ompi/mca/pml/ob1/pml_ob1_sendreq.c +++ b/ompi/mca/pml/ob1/pml_ob1_sendreq.c @@ -968,7 +968,6 @@ cannot_pack: mca_bml_base_free(bml_btl,des); continue; } -mca_bml.bml_progress(); } return OMPI_SUCCESS; -- Gleb.
Re: [OMPI devel] Fixlet for config/ompi_contrib.m4
Hi Ralf, thanks for the patch. I've added this to the trunk... Matthias On Mo, 2008-02-11 at 21:14 +0100, Ralf Wildenhues wrote: > Hello, > > please apply this patch, to make future contrib integration just a tad > bit easier. I verified that the generated configure script is > identical, minus whitespace and comments. > > Cheers, > Ralf > > 2008-02-11 Ralf Wildenhues > > * config/ompi_contrib.m4 (OMPI_CONTRIB): Unify listings of > contrib software packages. > > Index: config/ompi_contrib.m4 > === > --- config/ompi_contrib.m4(Revision 17419) > +++ config/ompi_contrib.m4(Arbeitskopie) > @@ -67,20 +67,13 @@ > # Cycle through each of the hard-coded software packages and > # configure them if not disabled. May someday be expanded to have > # autogen find the packages instead of this hard-coded list > -# (https://svn.open-mpi.org/trac/ompi/ticket/1162). I couldn't > -# figure out a simple/easy way to have the m4 foreach do the m4 > -# include *and* all the rest of the stuff, so I settled for having > -# two lists: each contribted software package will need to add its > -# configure.m4 list here and then add its name to the m4 define > -# for contrib_software_list. Cope. > -#dnlm4_include(ompi/contrib/libnbc/configure.m4) > -m4_include(ompi/contrib/vt/configure.m4) > - > -m4_define(contrib_software_list, [vt]) > -#dnlm4_define(contrib_software_list, [libnbc, vt]) > +# (https://svn.open-mpi.org/trac/ompi/ticket/1162). > +# m4_define([contrib_software_list], [libnbc, vt]) > +m4_define([contrib_software_list], [vt]) > m4_foreach(software, [contrib_software_list], > - [OMPI_CONTRIB_DIST_SUBDIRS="$OMPI_CONTRIB_DIST_SUBDIRS > contrib/software" > - _OMPI_CONTRIB_CONFIGURE(software)]) > +[m4_include([ompi/contrib/]software[/configure.m4]) > +OMPI_CONTRIB_DIST_SUBDIRS="$OMPI_CONTRIB_DIST_SUBDIRS > contrib/software" > +_OMPI_CONTRIB_CONFIGURE(software)]) > > # Setup the top-level glue > AC_SUBST(OMPI_CONTRIB_SUBDIRS) > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Matthias Jurenz, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A106, Zellescher Weg 12, 01062 Dresden phone +49-351-463-31945, fax +49-351-463-37773 smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] VT integration: make distclean problem
On Monday 11 February 2008, Josh Hursey wrote: > I've been noticing another problem with the VT integration. If you do > a "./configure --enable-contrib-no-build=vt" a subsequent 'make > distclean' will fail in contrib/vt. The 'make distclean' will succeed > with VT enabled (default). > hm, tricky. I guess it is about the 'make dist' functionality. All others like 'make distclean' etc. are only assisting functionality for 'make dist' after all. And for 'make dist' you need to have everything configured that is going to be part of the distribution. Therefore, VT needs to be part of the tarball, so you can disable it at build time. It would not work the other way around. So in my opinion, the current status is what we want to have. Are there any problems when configuring VT, then building the tarball with VT and disabling it once you build Open MPI from the tarball? Regards, Andreas -- Dipl. Math. Andreas Knuepfer, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A114, Zellescher Weg 12, 01062 Dresden phone +49-351-463-38323, fax +49-351-463-37773 signature.asc Description: This is a digitally signed message part.
Re: [OMPI devel] Something wrong with vt?
Hi Gleb, that's very strange... cause' the corresponding 'Makefile.in' is definitely not empty (checked in to the SVN repository). Could you reproduce this error after 'make distclean, configure, make' ? Which version of the autotools are you using? Matthias On Mo, 2008-02-11 at 11:42 +0200, Gleb Natapov wrote: > I get the following error while "make install": > > make[2]: Entering directory `/home_local/glebn/build_dbg/ompi/contrib/vt' > Making install in vt > make[3]: Entering directory `/home_local/glebn/build_dbg/ompi/contrib/vt/vt' > make[3]: *** No rule to make target `install'. Stop. > make[3]: Leaving directory `/home_local/glebn/build_dbg/ompi/contrib/vt/vt' > make[2]: *** [install-recursive] Error 1 > make[2]: Leaving directory `/home_local/glebn/build_dbg/ompi/contrib/vt' > make[1]: *** [install-recursive] Error 1 > make[1]: Leaving directory `/home_local/glebn/build_dbg/ompi' > make: *** [install-recursive] Error 1 > > ompi/contrib/vt/vt/Makefile is empty! > -- > Gleb. > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Matthias Jurenz, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A106, Zellescher Weg 12, 01062 Dresden phone +49-351-463-31945, fax +49-351-463-37773 smime.p7s Description: S/MIME cryptographic signature
[OMPI devel] merging new PLPA to the trunk
Hi all, During coding new RMAPS component I found strange behavior of PLPA. Same behavior that was described in http://www.open-mpi.org/community/lists/plpa-users/2007/04/0073.php I believe that it was fixed in new version of PLPA. This new version needed to be merged to the trunk due to bug fixes and changes in API. If there is no objection I volunteer to do it. Best Regards, Lenny.
Re: [OMPI devel] 1.3 Release schedule and contents
The VampirTrace integration is already in the trunk. It should be mentioned as complete somewhere in the misc section. Andreas signature.asc Description: This is a digitally signed message part.
Re: [OMPI devel] Something wrong with vt?
On Tue, Feb 12, 2008 at 01:08:32PM +0100, Matthias Jurenz wrote: > Hi Gleb, > > that's very strange... cause' the corresponding 'Makefile.in' is > definitely not empty (checked in to the SVN repository). Ah, here is the problem. Makefile.in is empty in my tree. I am building not from SVN checkout, but from the other source tree that is synced with SVN checkout and the sync process consider Makefile.in files as generated and ignores them. Why Makefiles.in is not regenerated by autogen.sh in vt sources? > Could you reproduce this error after 'make distclean, configure, make' ? > Which version of the autotools are you using? > > > Matthias > > On Mo, 2008-02-11 at 11:42 +0200, Gleb Natapov wrote: > > > I get the following error while "make install": > > > > make[2]: Entering directory `/home_local/glebn/build_dbg/ompi/contrib/vt' > > Making install in vt > > make[3]: Entering directory `/home_local/glebn/build_dbg/ompi/contrib/vt/vt' > > make[3]: *** No rule to make target `install'. Stop. > > make[3]: Leaving directory `/home_local/glebn/build_dbg/ompi/contrib/vt/vt' > > make[2]: *** [install-recursive] Error 1 > > make[2]: Leaving directory `/home_local/glebn/build_dbg/ompi/contrib/vt' > > make[1]: *** [install-recursive] Error 1 > > make[1]: Leaving directory `/home_local/glebn/build_dbg/ompi' > > make: *** [install-recursive] Error 1 > > > > ompi/contrib/vt/vt/Makefile is empty! > > -- > > Gleb. > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > -- > Matthias Jurenz, > Center for Information Services and > High Performance Computing (ZIH), TU Dresden, > Willersbau A106, Zellescher Weg 12, 01062 Dresden > phone +49-351-463-31945, fax +49-351-463-37773 > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.
Re: [OMPI devel] Something wrong with vt?
In future all Makefile.in's and the configure script of VT will be built from OMPI's autogen.sh. I'm working on a solution, but the autogen.sh script is a little bit unclear for me... :-( Matthias On Di, 2008-02-12 at 14:37 +0200, Gleb Natapov wrote: > On Tue, Feb 12, 2008 at 01:08:32PM +0100, Matthias Jurenz wrote: > > Hi Gleb, > > > > that's very strange... cause' the corresponding 'Makefile.in' is > > definitely not empty (checked in to the SVN repository). > Ah, here is the problem. Makefile.in is empty in my tree. I am building > not from SVN checkout, but from the other source tree that is synced > with SVN checkout and the sync process consider Makefile.in files as > generated and ignores them. Why Makefiles.in is not regenerated by > autogen.sh in vt sources? > > > > Could you reproduce this error after 'make distclean, configure, make' ? > > Which version of the autotools are you using? > > > > > > Matthias > > > > On Mo, 2008-02-11 at 11:42 +0200, Gleb Natapov wrote: > > > > > I get the following error while "make install": > > > > > > make[2]: Entering directory `/home_local/glebn/build_dbg/ompi/contrib/vt' > > > Making install in vt > > > make[3]: Entering directory > > > `/home_local/glebn/build_dbg/ompi/contrib/vt/vt' > > > make[3]: *** No rule to make target `install'. Stop. > > > make[3]: Leaving directory > > > `/home_local/glebn/build_dbg/ompi/contrib/vt/vt' > > > make[2]: *** [install-recursive] Error 1 > > > make[2]: Leaving directory `/home_local/glebn/build_dbg/ompi/contrib/vt' > > > make[1]: *** [install-recursive] Error 1 > > > make[1]: Leaving directory `/home_local/glebn/build_dbg/ompi' > > > make: *** [install-recursive] Error 1 > > > > > > ompi/contrib/vt/vt/Makefile is empty! > > > -- > > > Gleb. > > > ___ > > > devel mailing list > > > de...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > > > -- > > Matthias Jurenz, > > Center for Information Services and > > High Performance Computing (ZIH), TU Dresden, > > Willersbau A106, Zellescher Weg 12, 01062 Dresden > > phone +49-351-463-31945, fax +49-351-463-37773 > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > -- > Gleb. > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Matthias Jurenz, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A106, Zellescher Weg 12, 01062 Dresden phone +49-351-463-31945, fax +49-351-463-37773 smime.p7s Description: S/MIME cryptographic signature
[OMPI devel] C++ build failures
I'm a little concerned about the C++ test build failures from last night: http://www.open-mpi.org/mtt/index.php?do_redir=530 They are likely due to the C++ changes that came in over the weekend, but they *only* showed up at IU, which is somewhat odd. I'm trying to replicate now (doing a fresh build of the trunk and will build the tests that failed for you), but I'm kinda guessing it's going to work fine on my platforms. IU: do you have any idea what caused these failures? Does sif have a newer compiler that is somehow picking up on a latent bug that we missed in the C++ stuff? -- Jeff Squyres Cisco Systems
[OMPI devel] memchecker build broken
To simplify things, I'm going to start filing tickets for all build breaks that I find. Here's the latest: libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../opal/include - I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/ linux/plpa/src/libplpa -I../.. -DOMPI_TV_DLL=\"/home/jsquyres/bogus/ lib/openmpi/libompitv.so\" -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror- implicit-function-declaration -finline-functions -fno-strict-aliasing - pthread -g -MT libdebuggers_la-ompi_totalview.lo -MD -MP -MF .deps/ libdebuggers_la-ompi_totalview.Tpo -c ompi_totalview.c -fPIC -DPIC - o .libs/libdebuggers_la-ompi_totalview.o In file included from ../../ompi/mca/pml/base/pml_base_request.h:28, from ompi_dll.c:71: ../../ompi/include/ompi/memchecker.h:22:31: valgrind/valgrind.h: No such file or directory In file included from ../../ompi/mca/pml/base/pml_base_request.h:28, from ompi_totalview.c:42: ../../ompi/include/ompi/memchecker.h:22:31: valgrind/valgrind.h: No such file or directory -- Jeff Squyres Cisco Systems
Re: [OMPI devel] VT integration: make distclean problem
Good points about 'distclean' versus 'clean'. For the make distclean case then I think it is ok if we fail here since it is not a full 'make dist' that I was working with originally. Sorry for the distraction. Cheers, Josh On Feb 12, 2008, at 6:52 AM, Andreas Knüpfer wrote: On Monday 11 February 2008, Josh Hursey wrote: I've been noticing another problem with the VT integration. If you do a "./configure --enable-contrib-no-build=vt" a subsequent 'make distclean' will fail in contrib/vt. The 'make distclean' will succeed with VT enabled (default). hm, tricky. I guess it is about the 'make dist' functionality. All others like 'make distclean' etc. are only assisting functionality for 'make dist' after all. And for 'make dist' you need to have everything configured that is going to be part of the distribution. Therefore, VT needs to be part of the tarball, so you can disable it at build time. It would not work the other way around. So in my opinion, the current status is what we want to have. Are there any problems when configuring VT, then building the tarball with VT and disabling it once you build Open MPI from the tarball? Regards, Andreas -- Dipl. Math. Andreas Knuepfer, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A114, Zellescher Weg 12, 01062 Dresden phone +49-351-463-38323, fax +49-351-463-37773 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] C++ build failures
I just talked to Jeff about this. The problem was that on Sif we use --enable-visibility, and apparently the new c++ bindings access ompi_errhandler_create, which was not OMPI_DECLSPEC'd. Jeff will fix this soon. Tim Jeff Squyres wrote: I'm a little concerned about the C++ test build failures from last night: http://www.open-mpi.org/mtt/index.php?do_redir=530 They are likely due to the C++ changes that came in over the weekend, but they *only* showed up at IU, which is somewhat odd. I'm trying to replicate now (doing a fresh build of the trunk and will build the tests that failed for you), but I'm kinda guessing it's going to work fine on my platforms. IU: do you have any idea what caused these failures? Does sif have a newer compiler that is somehow picking up on a latent bug that we missed in the C++ stuff?
Re: [OMPI devel] C++ build failures
I filed a ticket: https://svn.open-mpi.org/trac/ompi/ticket/1213 Am looking into the problem, but ran into the memchecker trunk build breakage first (https://svn.open-mpi.org/trac/ompi/ticket/1211). #$%#@ %#@$% On Feb 12, 2008, at 9:23 AM, Tim Prins wrote: I just talked to Jeff about this. The problem was that on Sif we use --enable-visibility, and apparently the new c++ bindings access ompi_errhandler_create, which was not OMPI_DECLSPEC'd. Jeff will fix this soon. Tim Jeff Squyres wrote: I'm a little concerned about the C++ test build failures from last night: http://www.open-mpi.org/mtt/index.php?do_redir=530 They are likely due to the C++ changes that came in over the weekend, but they *only* showed up at IU, which is somewhat odd. I'm trying to replicate now (doing a fresh build of the trunk and will build the tests that failed for you), but I'm kinda guessing it's going to work fine on my platforms. IU: do you have any idea what caused these failures? Does sif have a newer compiler that is somehow picking up on a latent bug that we missed in the C++ stuff? ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] 1.3 Release schedule and contents
Ticket #1073 should be associated with the first bullet under "MCA parameters" - "Scope & precedence cleanup" It's unclear if this is "fixed" or not, but I had to look at this ticket to determine what this bullet meant. -- Josh On Feb 11, 2008, at 10:09 PM, Brad Benton wrote: All: The latest scrub of the 1.3 release schedule and contents is ready for review and comment. Please use the following links: 1.3 milestones: https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3 1.3.1 milestones: https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3.1 In order to try and keep the dates for 1.3 in, I've pushed a bunch of stuff (particularly ORTE things) to 1.3.1. Even though there will be new functionality slated for 1.3.1, the goal is to not have any interface changes between the phases. Please look over the list and schedules and let me or my fellow 1.3 co-release manager George Bosilca (bosi...@eecs.utk.edu) know of any issues, errors, suggestions, omissions, heartburn, etc. Thanks, --Brad Brad Benton IBM ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Something wrong with vt?
autogen.sh has some deep mojo in it... Would it be sufficient to just have our autogen.sh recurse down into your tree? An undocumented feature of our autogen.sh is that you can have a "autogen.subdirs" file in ompi/contrib/vt with a single line in it: "vt". This will make our autogen recurse into the vt ompi/contrib/ vt/vt tree and run all of its magic in there. I say this with only a *very quick* look elsewhere in the tree; it works in the MCA frameworks (e.g., ompi/mca/io/romio) -- I have not checked to see if it'll work in the contrib area; we may need to expand the logic a bit there (i.e., copy over what was done for autogen.subdirs elsewhere). Can you investigate this possibility? Matthias: can you please login to trac and set your e-mail address to that I can assign ticket #1212 to you? Use your SVN username and password. Thanks. On Feb 12, 2008, at 8:04 AM, Matthias Jurenz wrote: In future all Makefile.in's and the configure script of VT will be built from OMPI's autogen.sh. I'm working on a solution, but the autogen.sh script is a little bit unclear for me... :-( Matthias On Di, 2008-02-12 at 14:37 +0200, Gleb Natapov wrote: On Tue, Feb 12, 2008 at 01:08:32PM +0100, Matthias Jurenz wrote: > Hi Gleb, > > that's very strange... cause' the corresponding 'Makefile.in' is > definitely not empty (checked in to the SVN repository). Ah, here is the problem. Makefile.in is empty in my tree. I am building not from SVN checkout, but from the other source tree that is synced with SVN checkout and the sync process consider Makefile.in files as generated and ignores them. Why Makefiles.in is not regenerated by autogen.sh in vt sources? > Could you reproduce this error after 'make distclean, configure, make' ? > Which version of the autotools are you using? > > > Matthias > > On Mo, 2008-02-11 at 11:42 +0200, Gleb Natapov wrote: > > > I get the following error while "make install": > > > > make[2]: Entering directory `/home_local/glebn/build_dbg/ompi/ contrib/vt' > > Making install in vt > > make[3]: Entering directory `/home_local/glebn/build_dbg/ompi/ contrib/vt/vt' > > make[3]: *** No rule to make target `install'. Stop. > > make[3]: Leaving directory `/home_local/glebn/build_dbg/ompi/ contrib/vt/vt' > > make[2]: *** [install-recursive] Error 1 > > make[2]: Leaving directory `/home_local/glebn/build_dbg/ompi/ contrib/vt' > > make[1]: *** [install-recursive] Error 1 > > make[1]: Leaving directory `/home_local/glebn/build_dbg/ompi' > > make: *** [install-recursive] Error 1 > > > > ompi/contrib/vt/vt/Makefile is empty! > > -- > > Gleb. > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > -- > Matthias Jurenz, > Center for Information Services and > High Performance Computing (ZIH), TU Dresden, > Willersbau A106, Zellescher Weg 12, 01062 Dresden > phone +49-351-463-31945, fax +49-351-463-37773 > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Matthias Jurenz, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A106, Zellescher Weg 12, 01062 Dresden phone +49-351-463-31945, fax +49-351-463-37773 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
[OMPI devel] Please set svn:ignore properly
Developers -- When you add a new component, framework, or anything that includes one or more new directories: please be sure to set the svn:ignore property on each new directory properly. Here's the SVN docs on the svn:ignore property: http://svnbook.red-bean.com/en/1.4/svn-book.html#svn.advanced.props.special.ignore It is proper to ignore all automatically-generated files, such as (but not limited to): *.la *.lo .libs .deps .dirstamp Makefile Makefile.in static-components.h ...etc. Thanks. -- Jeff Squyres Cisco Systems
[OMPI devel] more memchecker q's
Why is memchecker.h included like this: #include "ompi/include/ompi/memchecker.h" Shouldn't it be #include "ompi/memchecker.h" Using the former will work in an SVN checkout, but won't work in a -- with-devel-headers installation (the latter should). Can this be fixed? -- Jeff Squyres Cisco Systems
Re: [OMPI devel] more vt woes
I keep getting some warnings when I compile with gcc-4.2 on MAC OS X. tools/compwrap/Makefile.am:38: `CXXFLAGS' is a user variable, you should not override it; tools/compwrap/Makefile.am:38: use `AM_CXXFLAGS' instead. tools/compwrap/Makefile.am:40: `CPPFLAGS' is a user variable, you should not override it; tools/compwrap/Makefile.am:40: use `AM_CPPFLAGS' instead. tools/compwrap/Makefile.am:41: `LDFLAGS' is a user variable, you should not override it; tools/compwrap/Makefile.am:41: use `AM_LDFLAGS' instead. tools/opari/tool/Makefile.am:8: `CXXFLAGS' is a user variable, you should not override it; tools/opari/tool/Makefile.am:8: use `AM_CXXFLAGS' instead. tools/opari/tool/Makefile.am:10: `CPPFLAGS' is a user variable, you should not override it; tools/opari/tool/Makefile.am:10: use `AM_CPPFLAGS' instead. tools/opari/tool/Makefile.am:11: `LDFLAGS' is a user variable, you should not override it; tools/opari/tool/Makefile.am:11: use `AM_LDFLAGS' instead. Thanks, george. smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] Scheduled merge of ORTE devel branch to trunk
Ralph -- We talked about this on the OMPI con call today and everyone agrees that this seems to be a good plan. Just as a safety net: if the merge goes disastrously wrong and you're unavailable Thu/Fri this week, we can just back it out and try again later. Thanks! On Feb 11, 2008, at 11:37 PM, Ralph Castain wrote: Hello all Per last week's telecon, we planned the merge of the latest ORTE devel branch to the OMPI trunk for after Sun had committed its C++ changes. That happened over the weekend. Therefore, based on the requests at the telecon, I will be merging the current ORTE devel branch to the trunk on Wed 2/13. I'll make the commit around 4:30pm Eastern time - will send out warning shortly before the commit to let you know it is coming. I'll advise of any delays. This will be a snapshot of that devel branch - it will include the upgraded launch system, remove the GPR, add the new tool communication library, allow arbitrary mpiruns to interconnect, supports the revamped hostfile and dash-host behaviors per the wiki, etc. However, it is incomplete and contains some known flaws. For example, totalview support has not been enabled yet. Comm_spawn, which is currently broken on the OMPI trunk, is fixed - but singleton comm_spawn remains broken. I am in the process of establishing support for direct and standalone launch capabilities, but those won't be in the merge. I have updated all of the launchers, but can only certify the SLURM, TM, and RSH ones to work - the Xgrid launcher is known to not compile, so if you have Xgrid on your Mac, you need to tell the build system to not build that component. This will give you a chance to look over the new arch, though, and I understand that people would like to begin having a chance to test and review the revised code. Hopefully, you will find most of the bugs to be minor. Please advise of any concerns about this merge. The schedule is totally driven by the requests of the MPI team members (delaying the merge has no impact on ORTE development), so requests to shift the schedule should be discussed amongst the community. Thanks Ralph ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] more vt woes
Ew. I've filed a ticket: https://svn.open-mpi.org/trac/ompi/ticket/1214 On Feb 12, 2008, at 11:27 AM, George Bosilca wrote: I keep getting some warnings when I compile with gcc-4.2 on MAC OS X. tools/compwrap/Makefile.am:38: `CXXFLAGS' is a user variable, you should not override it; tools/compwrap/Makefile.am:38: use `AM_CXXFLAGS' instead. tools/compwrap/Makefile.am:40: `CPPFLAGS' is a user variable, you should not override it; tools/compwrap/Makefile.am:40: use `AM_CPPFLAGS' instead. tools/compwrap/Makefile.am:41: `LDFLAGS' is a user variable, you should not override it; tools/compwrap/Makefile.am:41: use `AM_LDFLAGS' instead. tools/opari/tool/Makefile.am:8: `CXXFLAGS' is a user variable, you should not override it; tools/opari/tool/Makefile.am:8: use `AM_CXXFLAGS' instead. tools/opari/tool/Makefile.am:10: `CPPFLAGS' is a user variable, you should not override it; tools/opari/tool/Makefile.am:10: use `AM_CPPFLAGS' instead. tools/opari/tool/Makefile.am:11: `LDFLAGS' is a user variable, you should not override it; tools/opari/tool/Makefile.am:11: use `AM_LDFLAGS' instead. Thanks, george. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] more memchecker q's
Hi Jeff, Sorry for that, I didn't know it before. Now it's fixed. Thanks a lot. :) Shiqing Jeff Squyres wrote: Why is memchecker.h included like this: #include "ompi/include/ompi/memchecker.h" Shouldn't it be #include "ompi/memchecker.h" Using the former will work in an SVN checkout, but won't work in a -- with-devel-headers installation (the latter should). Can this be fixed? -- -- Shiqing Fan http://www.hlrs.de/people/fan High Performance ComputingTel.: +49 711 685 87234 Center Stuttgart (HLRS)Fax.: +49 711 685 65832 POSTAL:Nobelstrasse 19email: f...@hlrs.de ACTUAL:Allmandring 30 70569 Stuttgart
Re: [OMPI devel] more memchecker q's
Excellent; thanks! Sometimes we have weird reasons for what we do, but there's [usually :-)] a reason. On Feb 12, 2008, at 1:00 PM, Shiqing Fan wrote: Hi Jeff, Sorry for that, I didn't know it before. Now it's fixed. Thanks a lot. :) Shiqing Jeff Squyres wrote: Why is memchecker.h included like this: #include "ompi/include/ompi/memchecker.h" Shouldn't it be #include "ompi/memchecker.h" Using the former will work in an SVN checkout, but won't work in a -- with-devel-headers installation (the latter should). Can this be fixed? -- -- Shiqing Fan http://www.hlrs.de/people/fan High Performance ComputingTel.: +49 711 685 87234 Center Stuttgart (HLRS)Fax.: +49 711 685 65832 POSTAL:Nobelstrasse 19email: f...@hlrs.de ACTUAL:Allmandring 30 70569 Stuttgart ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
[OMPI devel] ROMIO updates
I just committed two patches to OMPI's ROMIO that I discussed this morning on the teleconf. They remove two things from OMPI's bundled ROMIO: - function renaming (foo -> io_romio_foo) - file sym linking (foo.c -> io_romio_foo.c) Although these features were added for a good reason (to abide by OMPI's component prefix rule), they make it much more difficult to track upstream ROMIO releases. This tacking ability has been judged to be more important than the prefix rule, in this case. Additionally, since other MPI implementations include ROMIO without symbol/file renaming, we should be ok for all real-world MPI applications. When you update to >=r17437, it will *not* require a new autogen/ configure, but you will see automake update a few makefiles when it gets to building the ROMIO component. Additionally, I updated the svn:ignores so that all the sym links that were previously created (e.g., io_romio_foo.c) will no longer be ignored. You'll likely want to remove them yourself: shell$ cd ompi/mca/io/romio/romio shell$ rm mpi-io/io_romio_*.c shell$ rm adio/*/io_romio_*.c -- Jeff Squyres Cisco Systems
Re: [OMPI devel] New Driver BTL
On Feb 11, 2008, at 7:33 PM, George Bosilca wrote: But if you do all this internally in NewMadeleine, I guess you don't need the Open MPI PML support. s/PML/OB1/, since OB1 is the specific PML in Open MPI that does all that stuff (striping, etc.). :-) -- Jeff Squyres Cisco Systems
Re: [OMPI devel] status of LSF integration work?
There are two issues: - You must have a recent enough version of LSF. I'm afraid I don't remember the LSF version number offhand, but we both (OMPI and LSF) had to make some changes/fixes to achieve compatibility. - LSF compatibility in OMPI is scheduled for v1.3 (i.e., it doesn't exist in the v1.2 series). As Ralph indicated, we're aware that it's currently broken in the trunk -- it'll be fixed by the v1.3 release, but I don't know exactly when. To be blunt: I wouldn't count on it in a production environment until v1.3 is officially released. Betas may become available before v1.3 goes gold that would be suitable for testing, though. Here's the OMPI v1.3 roadmap document -- it's more-or-less continually updated: https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3 On Feb 11, 2008, at 10:36 PM, Ralph Castain wrote: Jeff and I chatted about this today, in fact. We know the LSF support is borked, but neither of us had time right now to fix it. We plan to do so, though, before the 1.3 release - just can't promise when. Ralph On 2/11/08 8:00 AM, "Eric Jones" wrote: Greetings, MPI mavens, Perhaps this belongs on users@, but since it's about development status I thought I start here. I've fairly recently gotten involved in getting an MPI environment configured for our institute. We have an existing LSF cluster because most of our work is more High-Throughput than High-Performance, so if I can use LSF to underlie our MPI environment, that'd be administratively easiest. I tried to compile the LSF support in the public SVN repo and noticed it was, er, broken. I'll include the trivial changes we made below. But the behavior is still fairly unpredictable, mostly involving mpirun never spinning up daemons on other nodes. I saw mention that work was being suspended on LSF support pending technical improvements on the LSF side (mentioning that Platform had provided a patch or try.) Can I assume, based on the inactivity in the repo, that Platform hasn't resolved the issue? Thanks, Eric Here're the diffs to get LSF support to compile. We also made a change so it would report the LSF failure code instead of an uninitialized variable when it fails: Index: pls_lsf_module.c === --- pls_lsf_module.c(revision 17234) +++ pls_lsf_module.c(working copy) @@ -304,7 +304,7 @@ */ if (lsb_launch(nodelist_argv, argv, LSF_DJOB_NOWAIT, env) < 0) { ORTE_ERROR_LOG(ORTE_ERR_FAILED_TO_START); -opal_output(0, "lsb_launch failed: %d", rc); +opal_output(0, "lsb_launch failed: %d", lsberrno); rc = ORTE_ERR_FAILED_TO_START; goto cleanup; } @@ -356,7 +356,7 @@ /* check for failed launch - if so, force terminate */ if (failed_launch) { -if (ORTE_SUCCESS != +/*if (ORTE_SUCCESS != */ orte_pls_base_daemon_failed(jobid, false, -1, 0, ORTE_JOB_STATE_FAILED_TO_START); } ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] merging new PLPA to the trunk
On Feb 12, 2008, at 7:11 AM, Lenny Verkhovsky wrote: During coding new RMAPS component I found strange behavior of PLPA. Same behavior that was described in http://www.open-mpi.org/community/lists/plpa-users/2007/04/0073.php I believe that it was fixed in new version of PLPA. This new version needed to be merged to the trunk due to bug fixes and changes in API. If there is no objection I volunteer to do it. That would be great. Please use the official SVN "3rd party import" guidelines. There's a /vendor/plpa branch that *may* be in good shape for this, but may not (I don't think I fully grokked the SVN 3rd part import procedures when I was using that branch before). :-\ In a worst-cast scenario, we can "reset the clock" in the /vendor/plpa branch and make the new PLPA version be the "first" version in that tree (i.e., as if it were the first version we imported). What's your timeframe? I ask because it would probably be best if I finally get around to releasing a stable version of PLPA. The last version is technically still a beta. -- Jeff Squyres Cisco Systems
[OMPI devel] New address selection for btl-tcp (was Re: [OMPI svn] svn:open-mpi r17307)
On Fri, Feb 01, 2008 at 11:40:20AM -0500, Tim Prins wrote: > Adrian, Hi! Sorry for the late reply and thanks for your testing. > 1. There are some warnings when compiling: I've fixed these issues. > 2. If I exclude all my tcp interfaces, the connection fails properly, > but I do get a malloc request for 0 bytes: > tprins@odin examples]$ mpirun -mca btl tcp,self -mca btl_tcp_if_exclude > eth0,ib0,lo -np 2 ./ring_c > malloc debug: Request for 0 bytes (btl_tcp_component.c, 844) > malloc debug: Request for 0 bytes (btl_tcp_component.c, 844) > Not my fault, but I guess we could fix it anyway. Should we? > 3. If the exclude list does not contain 'lo', or the include list > contains 'lo', the job hangs when using multiple nodes: That's weird. Loopback interfaces should automatically be excluded right from the beginning. See opal/util/if.c. I neither know nor haven't checked where things go wrong. Do you want to investigate? As already mentioned, this should not happen. Can you post the output of "ip a s" or "ifconfig -a"? > However, the great news about this patch is that it appears to fix > https://svn.open-mpi.org/trac/ompi/ticket/1027 for me. It also fixes my #1206. I'd like to merge tmp-public/btl-tcp into the trunk, especially before the 1.3 code freeze. Any objections? -- Cluster and Metacomputing Working Group Friedrich-Schiller-Universität Jena, Germany private: http://adi.thur.de
Re: [OMPI devel] Scheduled merge of ORTE devel branch to trunk
Hi Ralph - How extensive are the changes involved in removing the GPR? How hard would it be for someone to maintain an enhanced version of this as an addon or compile-time optional module? Thanks. - Doug On Mon, 11 Feb 2008, Ralph Castain wrote: > Hello all > > Per last week's telecon, we planned the merge of the latest ORTE devel > branch to the OMPI trunk for after Sun had committed its C++ changes. That > happened over the weekend. > > Therefore, based on the requests at the telecon, I will be merging the > current ORTE devel branch to the trunk on Wed 2/13. I'll make the commit > around 4:30pm Eastern time - will send out warning shortly before the commit > to let you know it is coming. I'll advise of any delays. > > This will be a snapshot of that devel branch - it will include the upgraded > launch system, remove the GPR, add the new tool communication library, allow > arbitrary mpiruns to interconnect, supports the revamped hostfile and > dash-host behaviors per the wiki, etc. > > However, it is incomplete and contains some known flaws. For example, > totalview support has not been enabled yet. Comm_spawn, which is currently > broken on the OMPI trunk, is fixed - but singleton comm_spawn remains > broken. I am in the process of establishing support for direct and > standalone launch capabilities, but those won't be in the merge. I have > updated all of the launchers, but can only certify the SLURM, TM, and RSH > ones to work - the Xgrid launcher is known to not compile, so if you have > Xgrid on your Mac, you need to tell the build system to not build that > component. > > This will give you a chance to look over the new arch, though, and I > understand that people would like to begin having a chance to test and > review the revised code. Hopefully, you will find most of the bugs to be > minor. > > Please advise of any concerns about this merge. The schedule is totally > driven by the requests of the MPI team members (delaying the merge has no > impact on ORTE development), so requests to shift the schedule should be > discussed amongst the community. > > Thanks > Ralph > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] status of LSF integration work?
Thanks for response, Jeff. I'll definitely plan an upgrade to the latest LSF release (7.0 update 2), then. Given the roadmap, I think I'm way better off forging ahead with MPI on LSF than implementing a separate solution. I didn't really expect production-ready code at this point. Just checking whether it was still planned for 1.3, really (the last thing I saw in the mailing lists was fairly discouraging). I'm willing to dedicate some time to testing code if you think it would be helpful. Cheers, Eric Jeff Squyres wrote: There are two issues: - You must have a recent enough version of LSF. I'm afraid I don't remember the LSF version number offhand, but we both (OMPI and LSF) had to make some changes/fixes to achieve compatibility. - LSF compatibility in OMPI is scheduled for v1.3 (i.e., it doesn't exist in the v1.2 series). As Ralph indicated, we're aware that it's currently broken in the trunk -- it'll be fixed by the v1.3 release, but I don't know exactly when. To be blunt: I wouldn't count on it in a production environment until v1.3 is officially released. Betas may become available before v1.3 goes gold that would be suitable for testing, though. Here's the OMPI v1.3 roadmap document -- it's more-or-less continually updated: https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3 On Feb 11, 2008, at 10:36 PM, Ralph Castain wrote: Jeff and I chatted about this today, in fact. We know the LSF support is borked, but neither of us had time right now to fix it. We plan to do so, though, before the 1.3 release - just can't promise when. Ralph On 2/11/08 8:00 AM, "Eric Jones" wrote: Greetings, MPI mavens, Perhaps this belongs on users@, but since it's about development status I thought I start here. I've fairly recently gotten involved in getting an MPI environment configured for our institute. We have an existing LSF cluster because most of our work is more High-Throughput than High-Performance, so if I can use LSF to underlie our MPI environment, that'd be administratively easiest. I tried to compile the LSF support in the public SVN repo and noticed it was, er, broken. I'll include the trivial changes we made below. But the behavior is still fairly unpredictable, mostly involving mpirun never spinning up daemons on other nodes. I saw mention that work was being suspended on LSF support pending technical improvements on the LSF side (mentioning that Platform had provided a patch or try.) Can I assume, based on the inactivity in the repo, that Platform hasn't resolved the issue? Thanks, Eric Here're the diffs to get LSF support to compile. We also made a change so it would report the LSF failure code instead of an uninitialized variable when it fails: Index: pls_lsf_module.c === --- pls_lsf_module.c(revision 17234) +++ pls_lsf_module.c(working copy) @@ -304,7 +304,7 @@ */ if (lsb_launch(nodelist_argv, argv, LSF_DJOB_NOWAIT, env) < 0) { ORTE_ERROR_LOG(ORTE_ERR_FAILED_TO_START); -opal_output(0, "lsb_launch failed: %d", rc); +opal_output(0, "lsb_launch failed: %d", lsberrno); rc = ORTE_ERR_FAILED_TO_START; goto cleanup; } @@ -356,7 +356,7 @@ /* check for failed launch - if so, force terminate */ if (failed_launch) { -if (ORTE_SUCCESS != +/*if (ORTE_SUCCESS != */ orte_pls_base_daemon_failed(jobid, false, -1, 0, ORTE_JOB_STATE_FAILED_TO_START); } ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] status of LSF integration work?
On Feb 12, 2008, at 4:09 PM, ejon wrote: I'll definitely plan an upgrade to the latest LSF release (7.0 update 2), then. Given the roadmap, I think I'm way better off forging ahead with MPI on LSF than implementing a separate solution. I didn't really expect production-ready code at this point. Just checking whether it was still planned for 1.3, really (the last thing I saw in the mailing lists was fairly discouraging). I'm willing to dedicate some time to testing code if you think it would be helpful. That would be great, thanks. Note that there is a fairly major change coming in to our run-time portion of the trunk tomorrow afternoon (a snapshot from a long- standing run-time development branch) which will overhaul just about everything -- including the LSF support. The LSF stuff probably hasn't been [fully] updated and definitely has not yet been tested under the overhaul. Never fear -- LSF support is a feature we've committed to, so it will definitely be there for v1.3. -- Jeff Squyres Cisco Systems
Re: [OMPI devel] status of LSF integration work?
I joined this list on time to see the discussion of the merge, so I'm expecting the update, but thanks for the heads up. Until I saw the mail about that, I hadn't realized the ORTE stuff was developed separately...now I understand why the trunk version was left uncompilable so long :-). What we have now works well enough that we can probably get along with it, but I can run the new code in parallel. Hopefully I'll be able to offer some useful feedback. E Jeff Squyres wrote: That would be great, thanks. Note that there is a fairly major change coming in to our run-time portion of the trunk tomorrow afternoon (a snapshot from a long- standing run-time development branch) which will overhaul just about everything -- including the LSF support. The LSF stuff probably hasn't been [fully] updated and definitely has not yet been tested under the overhaul. Never fear -- LSF support is a feature we've committed to, so it will definitely be there for v1.3.
[OMPI devel] btl_openib_rnr_retry MCA param
I see that in the OOB CPC for the openib BTL, when setting up the send side of the QP, we set the rnr_retry value depending on whether the remote receive queue is a per-peer or SRQ: - SRQ: btl_openib_rnr_retry MCA param value - PP: 0 The rationale given in a comment is that setting the RNR to 0 is a good way to find bugs in our flow control. Do we really want this in production builds? Or do we want 0 for developer builds and the same btl_openib_rnr_retry value for PP queues? Or should we offer a finer-grained control, such as: - btl_openib_rnr_retry_pp: value to use for per-peer q's, -1=use the default - btl_openib_rnr_retry_srq: value to use for srq's, -1=use the default - btl_openib_rnr_retry: value to use as the default for _pp and _srq -- Jeff Squyres Cisco Systems
Re: [OMPI devel] VT integration: make distclean problem
Hmm; I'm not sure. distclean will fail in a tarball or SVN checkout if you do --enable- contrib-no-build=vt. So it's not a developer-only artifact. I don't know what the Right solution is, though. :-\ On Feb 12, 2008, at 9:22 AM, Josh Hursey wrote: Good points about 'distclean' versus 'clean'. For the make distclean case then I think it is ok if we fail here since it is not a full 'make dist' that I was working with originally. Sorry for the distraction. Cheers, Josh On Feb 12, 2008, at 6:52 AM, Andreas Knüpfer wrote: On Monday 11 February 2008, Josh Hursey wrote: I've been noticing another problem with the VT integration. If you do a "./configure --enable-contrib-no-build=vt" a subsequent 'make distclean' will fail in contrib/vt. The 'make distclean' will succeed with VT enabled (default). hm, tricky. I guess it is about the 'make dist' functionality. All others like 'make distclean' etc. are only assisting functionality for 'make dist' after all. And for 'make dist' you need to have everything configured that is going to be part of the distribution. Therefore, VT needs to be part of the tarball, so you can disable it at build time. It would not work the other way around. So in my opinion, the current status is what we want to have. Are there any problems when configuring VT, then building the tarball with VT and disabling it once you build Open MPI from the tarball? Regards, Andreas -- Dipl. Math. Andreas Knuepfer, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A114, Zellescher Weg 12, 01062 Dresden phone +49-351-463-38323, fax +49-351-463-37773 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [RFC] Remove explicit call to progress() from ob1.
Were these supposed to cover the time required for pinning and unpinning? Can you explain why you think they're unnecessary? On Feb 12, 2008, at 5:27 AM, Gleb Natapov wrote: Hi, I am planning to commit the following patch. Those two progress() calls are responsible for most of our deep recursion troubles. And I also think they are completely unnecessary. diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c b/ompi/mca/pml/ob1/ pml_ob1_recvreq.c index 5899243..641176e 100644 --- a/ompi/mca/pml/ob1/pml_ob1_recvreq.c +++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.c @@ -704,9 +704,6 @@ int mca_pml_ob1_recv_request_schedule_once( mca_bml_base_free(bml_btl,dst); continue; } - -/* run progress as the prepare (pinning) can take some time */ -mca_bml.bml_progress(); } return OMPI_SUCCESS; diff --git a/ompi/mca/pml/ob1/pml_ob1_sendreq.c b/ompi/mca/pml/ob1/ pml_ob1_sendreq.c index 0998a05..9d7f3f9 100644 --- a/ompi/mca/pml/ob1/pml_ob1_sendreq.c +++ b/ompi/mca/pml/ob1/pml_ob1_sendreq.c @@ -968,7 +968,6 @@ cannot_pack: mca_bml_base_free(bml_btl,des); continue; } -mca_bml.bml_progress(); } return OMPI_SUCCESS; -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Scheduled merge of ORTE devel branch to trunk
Wellbest laid plans of mice and men, as they say. We were just having -way- too much fun here at IBM today going over the new ORTE design, planning for future scalability changes, etc., so American decided to cancel my flight back home! So thoughtful! I will be spending my Wed (hopefully!) enroute back home. It looks like it will be Thurs before I can do the merge. My apologies to all - but I would really rather not try to do it from a notebook computer in an airline terminal! Ralph On 2/12/08 9:54 AM, "Jeff Squyres" wrote: > Ralph -- > > We talked about this on the OMPI con call today and everyone agrees > that this seems to be a good plan. Just as a safety net: if the merge > goes disastrously wrong and you're unavailable Thu/Fri this week, we > can just back it out and try again later. > > Thanks! > > > On Feb 11, 2008, at 11:37 PM, Ralph Castain wrote: > >> Hello all >> >> Per last week's telecon, we planned the merge of the latest ORTE devel >> branch to the OMPI trunk for after Sun had committed its C++ >> changes. That >> happened over the weekend. >> >> Therefore, based on the requests at the telecon, I will be merging the >> current ORTE devel branch to the trunk on Wed 2/13. I'll make the >> commit >> around 4:30pm Eastern time - will send out warning shortly before >> the commit >> to let you know it is coming. I'll advise of any delays. >> >> This will be a snapshot of that devel branch - it will include the >> upgraded >> launch system, remove the GPR, add the new tool communication >> library, allow >> arbitrary mpiruns to interconnect, supports the revamped hostfile and >> dash-host behaviors per the wiki, etc. >> >> However, it is incomplete and contains some known flaws. For example, >> totalview support has not been enabled yet. Comm_spawn, which is >> currently >> broken on the OMPI trunk, is fixed - but singleton comm_spawn remains >> broken. I am in the process of establishing support for direct and >> standalone launch capabilities, but those won't be in the merge. I >> have >> updated all of the launchers, but can only certify the SLURM, TM, >> and RSH >> ones to work - the Xgrid launcher is known to not compile, so if you >> have >> Xgrid on your Mac, you need to tell the build system to not build that >> component. >> >> This will give you a chance to look over the new arch, though, and I >> understand that people would like to begin having a chance to test and >> review the revised code. Hopefully, you will find most of the bugs >> to be >> minor. >> >> Please advise of any concerns about this merge. The schedule is >> totally >> driven by the requests of the MPI team members (delaying the merge >> has no >> impact on ORTE development), so requests to shift the schedule >> should be >> discussed amongst the community. >> >> Thanks >> Ralph >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] === CREATE FAILURE ===
r17440 fix this issue. It's too easy :) george. On Feb 12, 2008, at 9:15 PM, MPI Team wrote: ERROR: Command returned a non-zero exist status make -j 4 distcheck Start time: Tue Feb 12 21:00:13 EST 2008 End time: Tue Feb 12 21:15:34 EST 2008 = == [... previous lines snipped ...] libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../opal/include - I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/ linux/plpa/src/libplpa -I../../../ompi/datatype -I../../.. -I../.. - I../../../opal/include -I../../../orte/include -I../../../ompi/ include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing - pthread -MT dt_match_size.lo -MD -MP -MF .deps/dt_match_size.Tpo - c ../../../ompi/datatype/dt_match_size.c -fPIC -DPIC -o .libs/ dt_match_size.o depbase=`echo position.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\ /bin/sh ../../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H - I. -I../../opal/include -I../../orte/include -I../../ompi/include - I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../ompi/ datatype -I../../.. -I../.. -I../../../opal/include -I../../../ orte/include -I../../../ompi/include-O3 -DNDEBUG -finline- functions -fno-strict-aliasing -pthread -MT position.lo -MD -MP -MF $depbase.Tpo -c -o position.lo ../../../ompi/datatype/position.c &&\ mv -f $depbase.Tpo $depbase.Plo libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../opal/include - I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/ linux/plpa/src/libplpa -I../../../ompi/datatype -I../../.. -I../.. - I../../../opal/include -I../../../orte/include -I../../../ompi/ include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing - pthread -MT convertor.lo -MD -MP -MF .deps/convertor.Tpo -c ../../../ ompi/datatype/convertor.c -fPIC -DPIC -o .libs/convertor.o libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../opal/include - I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/ linux/plpa/src/libplpa -I../../../ompi/datatype -I../../.. -I../.. - I../../../opal/include -I../../../orte/include -I../../../ompi/ include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing - pthread -MT position.lo -MD -MP -MF .deps/position.Tpo -c ../../../ ompi/datatype/position.c -fPIC -DPIC -o .libs/position.o depbase=`echo copy_functions.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\ /bin/sh ../../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H - I. -I../../opal/include -I../../orte/include -I../../ompi/include - I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../ompi/ datatype -I../../.. -I../.. -I../../../opal/include -I../../../ orte/include -I../../../ompi/include-O3 -DNDEBUG -finline- functions -fno-strict-aliasing -pthread -MT copy_functions.lo -MD - MP -MF $depbase.Tpo -c -o copy_functions.lo ../../../ompi/datatype/ copy_functions.c &&\ mv -f $depbase.Tpo $depbase.Plo libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../opal/include - I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/ linux/plpa/src/libplpa -I../../../ompi/datatype -I../../.. -I../.. - I../../../opal/include -I../../../orte/include -I../../../ompi/ include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing - pthread -MT copy_functions.lo -MD -MP -MF .deps/copy_functions.Tpo - c ../../../ompi/datatype/copy_functions.c -fPIC -DPIC -o .libs/ copy_functions.o depbase=`echo copy_functions_heterogeneous.lo | sed 's|[^/]*$|.deps/ &|;s|\.lo$||'`;\ /bin/sh ../../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H - I. -I../../opal/include -I../../orte/include -I../../ompi/include - I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../ompi/ datatype -I../../.. -I../.. -I../../../opal/include -I../../../ orte/include -I../../../ompi/include-O3 -DNDEBUG -finline- functions -fno-strict-aliasing -pthread -MT copy_functions_heterogeneous.lo -MD -MP -MF $depbase.Tpo -c -o copy_functions_heterogeneous.lo ../../../ompi/datatype/ copy_functions_heterogeneous.c &&\ mv -f $depbase.Tpo $depbase.Plo libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../opal/include - I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/ linux/plpa/src/libplpa -I../../../ompi/datatype -I../../.. -I../.. - I../../../opal/include -I../../../orte/include -I../../../ompi/ include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing - pthread -MT copy_functions_heterogeneous.lo -MD -MP -MF .deps/ copy_functions_heterogeneous.Tpo -c ../../../ompi/datatype/ copy_functions_heterogeneous.c -fPIC -DPIC -o .libs/ copy_functions_heterogeneous.o depbase=`echo dt_get_count.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\ /bin/sh ../../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H - I. -I../../opal/include -I../../orte/include -I../../ompi/include - I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../ompi/ datatype -I../../.. -I../.. -I../../../opal/include -I../../../
Re: [OMPI devel] Scheduled merge of ORTE devel branch to trunk
Hi Doug The changes are rather far-reaching. We essentially revamped the entire RTE to switch from an event-driven architecture to one based on sequential logic. This had large benefits, but the GPR was the casualty. Remember, the aim for the past year has been to create a dedicated "lean, mean OMPI machine"! That said, it would be relatively simple to add an extension that provided a level of data storage that user-level programs could access. It would not provide any subscription or trigger capabilities, however - we need to leave those out of the system to avoid reintroducing the event-driven problems again. But if you just wanted to store and retrieve data for sharing it across processes, that could be provided with minimal effort or impact. Probably best done as a compile-time optional module, though, to avoid adding to the memory footprint for everyone. Another alternative: there is a separate "ORTE" project in Europe that is building extensions to our ORTE - they are tracking these code changes, but adding "bolt-ons" such as a GPR-like central data store, hooks for workflow management and the grid, multi-cluster operations, etc. I'm working with them on those efforts - if there is interest in such capabilities, I can probably look into architecting things so that some of the "bolt-ons" could be dynamically picked up by OMPI as binary modules or something. For now, though, there will be no GPR-like storage in the new system. Ralph On 2/12/08 1:43 PM, "Doug Tody" wrote: > Hi Ralph - > > How extensive are the changes involved in removing the GPR? How hard would > it be for someone to maintain an enhanced version of this as an addon or > compile-time optional module? Thanks. > > - Doug > > > On Mon, 11 Feb 2008, Ralph Castain wrote: > >> Hello all >> >> Per last week's telecon, we planned the merge of the latest ORTE devel >> branch to the OMPI trunk for after Sun had committed its C++ changes. That >> happened over the weekend. >> >> Therefore, based on the requests at the telecon, I will be merging the >> current ORTE devel branch to the trunk on Wed 2/13. I'll make the commit >> around 4:30pm Eastern time - will send out warning shortly before the commit >> to let you know it is coming. I'll advise of any delays. >> >> This will be a snapshot of that devel branch - it will include the upgraded >> launch system, remove the GPR, add the new tool communication library, allow >> arbitrary mpiruns to interconnect, supports the revamped hostfile and >> dash-host behaviors per the wiki, etc. >> >> However, it is incomplete and contains some known flaws. For example, >> totalview support has not been enabled yet. Comm_spawn, which is currently >> broken on the OMPI trunk, is fixed - but singleton comm_spawn remains >> broken. I am in the process of establishing support for direct and >> standalone launch capabilities, but those won't be in the merge. I have >> updated all of the launchers, but can only certify the SLURM, TM, and RSH >> ones to work - the Xgrid launcher is known to not compile, so if you have >> Xgrid on your Mac, you need to tell the build system to not build that >> component. >> >> This will give you a chance to look over the new arch, though, and I >> understand that people would like to begin having a chance to test and >> review the revised code. Hopefully, you will find most of the bugs to be >> minor. >> >> Please advise of any concerns about this merge. The schedule is totally >> driven by the requests of the MPI team members (delaying the merge has no >> impact on ORTE development), so requests to shift the schedule should be >> discussed amongst the community. >> >> Thanks >> Ralph >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel