Re: [OMPI devel] 1.3 Release schedule and contents

2008-02-12 Thread Rainer Keller
Hello Brad,
please note the valgrind memchecker merging, that could go in for 1.3
under the "m. Miscellaneous" section.

Also, please note, moving the ORTE merging to 1.3.1 would mean moving m. 
Miscellaneous point vii., Windows CCP support there as well. The current/new 
does not seem to work on windows at the moment, Shiqing will propose a patch 
for that.

Thanks,
Rainer



On Tuesday 12 February 2008 04:09, Brad Benton wrote:
> All:
>
> The latest scrub of the 1.3 release schedule and contents is ready for
> review and comment.  Please use the following links:
>   1.3 milestones:
> https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3
>   1.3.1 milestones:
> https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3.1
>
> In order to try and keep the dates for 1.3 in, I've pushed a bunch of stuff
> (particularly ORTE things) to 1.3.1.  Even though there will be new
> functionality slated for 1.3.1, the goal is to not have any interface
> changes between the phases.
>
> Please look over the list and schedules and let me or my fellow
> 1.3co-release manager George Bosilca (
> bosi...@eecs.utk.edu) know of any issues, errors, suggestions, omissions,
> heartburn, etc.
>
> Thanks,
> --Brad
>
> Brad Benton
> IBM

-- 

Dipl.-Inf. Rainer Keller   http://www.hlrs.de/people/keller
 HLRS  Tel: ++49 (0)711-685 6 5858
 Nobelstrasse 19  Fax: ++49 (0)711-685 6 5832
 70550 Stuttgartemail: kel...@hlrs.de 
 Germany AIM/Skype:rusraink


[OMPI devel] [RFC] Remove explicit call to progress() from ob1.

2008-02-12 Thread Gleb Natapov
Hi,

I am planning to commit the following patch. Those two progress() calls
are responsible for most of our deep recursion troubles. And I also
think they are completely unnecessary.

diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c 
b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
index 5899243..641176e 100644
--- a/ompi/mca/pml/ob1/pml_ob1_recvreq.c
+++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
@@ -704,9 +704,6 @@ int mca_pml_ob1_recv_request_schedule_once(
 mca_bml_base_free(bml_btl,dst);
 continue;
 }
-
-/* run progress as the prepare (pinning) can take some time */
-mca_bml.bml_progress();
 }

 return OMPI_SUCCESS;
diff --git a/ompi/mca/pml/ob1/pml_ob1_sendreq.c 
b/ompi/mca/pml/ob1/pml_ob1_sendreq.c
index 0998a05..9d7f3f9 100644
--- a/ompi/mca/pml/ob1/pml_ob1_sendreq.c
+++ b/ompi/mca/pml/ob1/pml_ob1_sendreq.c
@@ -968,7 +968,6 @@ cannot_pack:
 mca_bml_base_free(bml_btl,des);
 continue;
 }
-mca_bml.bml_progress();
 }

 return OMPI_SUCCESS;
--
Gleb.


Re: [OMPI devel] Fixlet for config/ompi_contrib.m4

2008-02-12 Thread Matthias Jurenz
Hi Ralf,

thanks for the patch. I've added this to the trunk...


Matthias

On Mo, 2008-02-11 at 21:14 +0100, Ralf Wildenhues wrote:

> Hello,
> 
> please apply this patch, to make future contrib integration just a tad
> bit easier.  I verified that the generated configure script is
> identical, minus whitespace and comments.
> 
> Cheers,
> Ralf
> 
> 2008-02-11  Ralf Wildenhues  
> 
>   * config/ompi_contrib.m4 (OMPI_CONTRIB): Unify listings of
>   contrib software packages.
> 
> Index: config/ompi_contrib.m4
> ===
> --- config/ompi_contrib.m4(Revision 17419)
> +++ config/ompi_contrib.m4(Arbeitskopie)
> @@ -67,20 +67,13 @@
>  # Cycle through each of the hard-coded software packages and
>  # configure them if not disabled.  May someday be expanded to have
>  # autogen find the packages instead of this hard-coded list
> -# (https://svn.open-mpi.org/trac/ompi/ticket/1162).  I couldn't
> -# figure out a simple/easy way to have the m4 foreach do the m4
> -# include *and* all the rest of the stuff, so I settled for having
> -# two lists: each contribted software package will need to add its
> -# configure.m4 list here and then add its name to the m4 define
> -# for contrib_software_list.  Cope.
> -#dnlm4_include(ompi/contrib/libnbc/configure.m4)
> -m4_include(ompi/contrib/vt/configure.m4)
> -
> -m4_define(contrib_software_list, [vt])
> -#dnlm4_define(contrib_software_list, [libnbc, vt])
> +# (https://svn.open-mpi.org/trac/ompi/ticket/1162).
> +# m4_define([contrib_software_list], [libnbc, vt])
> +m4_define([contrib_software_list], [vt])
>  m4_foreach(software, [contrib_software_list],
> -   [OMPI_CONTRIB_DIST_SUBDIRS="$OMPI_CONTRIB_DIST_SUBDIRS 
> contrib/software"
> -   _OMPI_CONTRIB_CONFIGURE(software)])
> +[m4_include([ompi/contrib/]software[/configure.m4])
> +OMPI_CONTRIB_DIST_SUBDIRS="$OMPI_CONTRIB_DIST_SUBDIRS 
> contrib/software"
> +_OMPI_CONTRIB_CONFIGURE(software)])
>  
>  # Setup the top-level glue
>  AC_SUBST(OMPI_CONTRIB_SUBDIRS)
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 

--
Matthias Jurenz,
Center for Information Services and 
High Performance Computing (ZIH), TU Dresden, 
Willersbau A106, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-31945, fax +49-351-463-37773


smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI devel] VT integration: make distclean problem

2008-02-12 Thread Andreas Knüpfer
On Monday 11 February 2008, Josh Hursey wrote:
> I've been noticing another problem with the VT integration. If you do
> a "./configure --enable-contrib-no-build=vt" a subsequent 'make
> distclean' will fail in contrib/vt. The 'make distclean' will succeed
> with VT enabled (default).
>

hm, tricky. I guess it is about the 'make dist' functionality. All others 
like 'make distclean' etc. are only assisting functionality for 'make dist' 
after all.

And for 'make dist' you need to have everything configured that is going to be 
part of the distribution. Therefore, VT needs to be part of the tarball, so 
you can disable it at build time. It would not work the other way around.

So in my opinion, the current status is what we want to have. Are there any 
problems when configuring VT, then building the tarball with VT and disabling 
it once you build Open MPI from the tarball?

Regards, Andreas

-- 
Dipl. Math. Andreas Knuepfer, 
Center for Information Services and 
High Performance Computing (ZIH), TU Dresden, 
Willersbau A114, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-38323, fax +49-351-463-37773


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI devel] Something wrong with vt?

2008-02-12 Thread Matthias Jurenz
Hi Gleb,

that's very strange... cause' the corresponding 'Makefile.in' is
definitely not empty (checked in to the SVN repository).
Could you reproduce this error after 'make distclean, configure, make' ?
Which version of the autotools are you using?


Matthias

On Mo, 2008-02-11 at 11:42 +0200, Gleb Natapov wrote:

> I get the following error while "make install":
> 
> make[2]: Entering directory `/home_local/glebn/build_dbg/ompi/contrib/vt'
> Making install in vt
> make[3]: Entering directory `/home_local/glebn/build_dbg/ompi/contrib/vt/vt'
> make[3]: *** No rule to make target `install'.  Stop.
> make[3]: Leaving directory `/home_local/glebn/build_dbg/ompi/contrib/vt/vt'
> make[2]: *** [install-recursive] Error 1
> make[2]: Leaving directory `/home_local/glebn/build_dbg/ompi/contrib/vt'
> make[1]: *** [install-recursive] Error 1
> make[1]: Leaving directory `/home_local/glebn/build_dbg/ompi'
> make: *** [install-recursive] Error 1
> 
> ompi/contrib/vt/vt/Makefile is empty!
> --
>   Gleb.
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 

--
Matthias Jurenz,
Center for Information Services and 
High Performance Computing (ZIH), TU Dresden, 
Willersbau A106, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-31945, fax +49-351-463-37773


smime.p7s
Description: S/MIME cryptographic signature


[OMPI devel] merging new PLPA to the trunk

2008-02-12 Thread Lenny Verkhovsky

Hi all,

During coding new RMAPS component I found strange behavior of PLPA. Same
behavior that was described in 
http://www.open-mpi.org/community/lists/plpa-users/2007/04/0073.php

I believe that it was fixed in new version of PLPA.

This new version needed to be merged to the trunk due to bug fixes and
changes in API.

If there is no objection I volunteer to do it. 


Best Regards,
Lenny.




Re: [OMPI devel] 1.3 Release schedule and contents

2008-02-12 Thread Andreas Knüpfer
The VampirTrace integration is already in the trunk. It should be mentioned as 
complete somewhere in the misc section.

Andreas


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI devel] Something wrong with vt?

2008-02-12 Thread Gleb Natapov
On Tue, Feb 12, 2008 at 01:08:32PM +0100, Matthias Jurenz wrote:
> Hi Gleb,
> 
> that's very strange... cause' the corresponding 'Makefile.in' is
> definitely not empty (checked in to the SVN repository).
Ah, here is the problem. Makefile.in is empty in my tree. I am building
not from SVN checkout, but from the other source tree that is synced
with SVN checkout and the sync process consider Makefile.in files as
generated and ignores them. Why  Makefiles.in is not regenerated by
autogen.sh in vt sources?


> Could you reproduce this error after 'make distclean, configure, make' ?
> Which version of the autotools are you using?
> 
> 
> Matthias
> 
> On Mo, 2008-02-11 at 11:42 +0200, Gleb Natapov wrote:
> 
> > I get the following error while "make install":
> > 
> > make[2]: Entering directory `/home_local/glebn/build_dbg/ompi/contrib/vt'
> > Making install in vt
> > make[3]: Entering directory `/home_local/glebn/build_dbg/ompi/contrib/vt/vt'
> > make[3]: *** No rule to make target `install'.  Stop.
> > make[3]: Leaving directory `/home_local/glebn/build_dbg/ompi/contrib/vt/vt'
> > make[2]: *** [install-recursive] Error 1
> > make[2]: Leaving directory `/home_local/glebn/build_dbg/ompi/contrib/vt'
> > make[1]: *** [install-recursive] Error 1
> > make[1]: Leaving directory `/home_local/glebn/build_dbg/ompi'
> > make: *** [install-recursive] Error 1
> > 
> > ompi/contrib/vt/vt/Makefile is empty!
> > --
> > Gleb.
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > 
> 
> --
> Matthias Jurenz,
> Center for Information Services and 
> High Performance Computing (ZIH), TU Dresden, 
> Willersbau A106, Zellescher Weg 12, 01062 Dresden
> phone +49-351-463-31945, fax +49-351-463-37773



> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Gleb.


Re: [OMPI devel] Something wrong with vt?

2008-02-12 Thread Matthias Jurenz
In future all Makefile.in's and the configure script of VT will be built
from OMPI's autogen.sh.
I'm working on a solution, but the autogen.sh script is a little bit
unclear for me... :-(

Matthias


On Di, 2008-02-12 at 14:37 +0200, Gleb Natapov wrote:

> On Tue, Feb 12, 2008 at 01:08:32PM +0100, Matthias Jurenz wrote:
> > Hi Gleb,
> > 
> > that's very strange... cause' the corresponding 'Makefile.in' is
> > definitely not empty (checked in to the SVN repository).
> Ah, here is the problem. Makefile.in is empty in my tree. I am building
> not from SVN checkout, but from the other source tree that is synced
> with SVN checkout and the sync process consider Makefile.in files as
> generated and ignores them. Why  Makefiles.in is not regenerated by
> autogen.sh in vt sources?
> 
> 
> > Could you reproduce this error after 'make distclean, configure, make' ?
> > Which version of the autotools are you using?
> > 
> > 
> > Matthias
> > 
> > On Mo, 2008-02-11 at 11:42 +0200, Gleb Natapov wrote:
> > 
> > > I get the following error while "make install":
> > > 
> > > make[2]: Entering directory `/home_local/glebn/build_dbg/ompi/contrib/vt'
> > > Making install in vt
> > > make[3]: Entering directory 
> > > `/home_local/glebn/build_dbg/ompi/contrib/vt/vt'
> > > make[3]: *** No rule to make target `install'.  Stop.
> > > make[3]: Leaving directory 
> > > `/home_local/glebn/build_dbg/ompi/contrib/vt/vt'
> > > make[2]: *** [install-recursive] Error 1
> > > make[2]: Leaving directory `/home_local/glebn/build_dbg/ompi/contrib/vt'
> > > make[1]: *** [install-recursive] Error 1
> > > make[1]: Leaving directory `/home_local/glebn/build_dbg/ompi'
> > > make: *** [install-recursive] Error 1
> > > 
> > > ompi/contrib/vt/vt/Makefile is empty!
> > > --
> > >   Gleb.
> > > ___
> > > devel mailing list
> > > de...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > 
> > 
> > --
> > Matthias Jurenz,
> > Center for Information Services and 
> > High Performance Computing (ZIH), TU Dresden, 
> > Willersbau A106, Zellescher Weg 12, 01062 Dresden
> > phone +49-351-463-31945, fax +49-351-463-37773
> 
> 
> 
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> --
>   Gleb.
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 

--
Matthias Jurenz,
Center for Information Services and 
High Performance Computing (ZIH), TU Dresden, 
Willersbau A106, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-31945, fax +49-351-463-37773


smime.p7s
Description: S/MIME cryptographic signature


[OMPI devel] C++ build failures

2008-02-12 Thread Jeff Squyres
I'm a little concerned about the C++ test build failures from last  
night:


http://www.open-mpi.org/mtt/index.php?do_redir=530

They are likely due to the C++ changes that came in over the weekend,  
but they *only* showed up at IU, which is somewhat odd.  I'm trying to  
replicate now (doing a fresh build of the trunk and will build the  
tests that failed for you), but I'm kinda guessing it's going to work  
fine on my platforms.


IU: do you have any idea what caused these failures?  Does sif have a  
newer compiler that is somehow picking up on a latent bug that we  
missed in the C++ stuff?


--
Jeff Squyres
Cisco Systems



[OMPI devel] memchecker build broken

2008-02-12 Thread Jeff Squyres
To simplify things, I'm going to start filing tickets for all build  
breaks that I find.  Here's the latest:


libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../opal/include - 
I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/ 
linux/plpa/src/libplpa -I../.. -DOMPI_TV_DLL=\"/home/jsquyres/bogus/ 
lib/openmpi/libompitv.so\" -Wall -Wundef -Wno-long-long -Wsign-compare  
-Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror- 
implicit-function-declaration -finline-functions -fno-strict-aliasing - 
pthread -g -MT libdebuggers_la-ompi_totalview.lo -MD -MP -MF .deps/ 
libdebuggers_la-ompi_totalview.Tpo -c ompi_totalview.c  -fPIC -DPIC - 
o .libs/libdebuggers_la-ompi_totalview.o

In file included from ../../ompi/mca/pml/base/pml_base_request.h:28,
 from ompi_dll.c:71:
../../ompi/include/ompi/memchecker.h:22:31: valgrind/valgrind.h: No  
such file or directory

In file included from ../../ompi/mca/pml/base/pml_base_request.h:28,
 from ompi_totalview.c:42:
../../ompi/include/ompi/memchecker.h:22:31: valgrind/valgrind.h: No  
such file or directory


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] VT integration: make distclean problem

2008-02-12 Thread Josh Hursey
Good points about 'distclean' versus 'clean'. For the make distclean  
case then I think it is ok if we fail here since it is not a full  
'make dist' that I was working with originally.


Sorry for the distraction.

Cheers,
Josh

On Feb 12, 2008, at 6:52 AM, Andreas Knüpfer wrote:


On Monday 11 February 2008, Josh Hursey wrote:

I've been noticing another problem with the VT integration. If you do
a "./configure --enable-contrib-no-build=vt" a subsequent 'make
distclean' will fail in contrib/vt. The 'make distclean' will succeed
with VT enabled (default).



hm, tricky. I guess it is about the 'make dist' functionality. All  
others
like 'make distclean' etc. are only assisting functionality for  
'make dist'

after all.

And for 'make dist' you need to have everything configured that is  
going to be
part of the distribution. Therefore, VT needs to be part of the  
tarball, so
you can disable it at build time. It would not work the other way  
around.


So in my opinion, the current status is what we want to have. Are  
there any
problems when configuring VT, then building the tarball with VT and  
disabling

it once you build Open MPI from the tarball?

Regards, Andreas

--
Dipl. Math. Andreas Knuepfer,
Center for Information Services and
High Performance Computing (ZIH), TU Dresden,
Willersbau A114, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-38323, fax +49-351-463-37773
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] C++ build failures

2008-02-12 Thread Tim Prins
I just talked to Jeff about this. The problem was that on Sif we use 
--enable-visibility, and apparently the new c++ bindings access 
ompi_errhandler_create, which was not OMPI_DECLSPEC'd. Jeff will fix 
this soon.


Tim

Jeff Squyres wrote:
I'm a little concerned about the C++ test build failures from last  
night:


 http://www.open-mpi.org/mtt/index.php?do_redir=530

They are likely due to the C++ changes that came in over the weekend,  
but they *only* showed up at IU, which is somewhat odd.  I'm trying to  
replicate now (doing a fresh build of the trunk and will build the  
tests that failed for you), but I'm kinda guessing it's going to work  
fine on my platforms.


IU: do you have any idea what caused these failures?  Does sif have a  
newer compiler that is somehow picking up on a latent bug that we  
missed in the C++ stuff?






Re: [OMPI devel] C++ build failures

2008-02-12 Thread Jeff Squyres

I filed a ticket: https://svn.open-mpi.org/trac/ompi/ticket/1213

Am looking into the problem, but ran into the memchecker trunk build  
breakage first (https://svn.open-mpi.org/trac/ompi/ticket/1211).  #$%#@ 
%#@$%




On Feb 12, 2008, at 9:23 AM, Tim Prins wrote:


I just talked to Jeff about this. The problem was that on Sif we use
--enable-visibility, and apparently the new c++ bindings access
ompi_errhandler_create, which was not OMPI_DECLSPEC'd. Jeff will fix
this soon.

Tim

Jeff Squyres wrote:

I'm a little concerned about the C++ test build failures from last
night:

http://www.open-mpi.org/mtt/index.php?do_redir=530

They are likely due to the C++ changes that came in over the weekend,
but they *only* showed up at IU, which is somewhat odd.  I'm trying  
to

replicate now (doing a fresh build of the trunk and will build the
tests that failed for you), but I'm kinda guessing it's going to work
fine on my platforms.

IU: do you have any idea what caused these failures?  Does sif have a
newer compiler that is somehow picking up on a latent bug that we
missed in the C++ stuff?



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] 1.3 Release schedule and contents

2008-02-12 Thread Josh Hursey
Ticket #1073 should be associated with the first bullet under "MCA  
parameters" - "Scope & precedence cleanup"


It's unclear if this is "fixed" or not, but I had to look at this  
ticket to determine what this bullet meant.


-- Josh

On Feb 11, 2008, at 10:09 PM, Brad Benton wrote:


All:

The latest scrub of the 1.3 release schedule and contents is ready  
for review and comment.  Please use the following links:

  1.3 milestones:  https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3
  1.3.1 milestones: 
https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3.1

In order to try and keep the dates for 1.3 in, I've pushed a bunch  
of stuff (particularly ORTE things) to 1.3.1.  Even though there  
will be new functionality slated for 1.3.1, the goal is to not have  
any interface changes between the phases.


Please look over the list and schedules and let me or my fellow 1.3  
co-release manager George Bosilca (bosi...@eecs.utk.edu) know of any  
issues, errors, suggestions, omissions, heartburn, etc.


Thanks,
--Brad

Brad Benton
IBM
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Something wrong with vt?

2008-02-12 Thread Jeff Squyres

autogen.sh has some deep mojo in it...

Would it be sufficient to just have our autogen.sh recurse down into  
your tree?  An undocumented feature of our autogen.sh is that you can  
have a "autogen.subdirs" file in ompi/contrib/vt with a single line in  
it: "vt".  This will make our autogen recurse into the vt ompi/contrib/ 
vt/vt tree and run all of its magic in there.


I say this with only a *very quick* look elsewhere in the tree; it  
works in the MCA frameworks (e.g., ompi/mca/io/romio) -- I have not  
checked to see if it'll work in the contrib area; we may need to  
expand the logic a bit there (i.e., copy over what was done for  
autogen.subdirs elsewhere).  Can you investigate this possibility?


Matthias: can you please login to trac and set your e-mail address to  
that I can assign ticket #1212 to you?  Use your SVN username and  
password.  Thanks.



On Feb 12, 2008, at 8:04 AM, Matthias Jurenz wrote:

In future all Makefile.in's and the configure script of VT will be  
built from OMPI's autogen.sh.
I'm working on a solution, but the autogen.sh script is a little bit  
unclear for me... :-(


Matthias


On Di, 2008-02-12 at 14:37 +0200, Gleb Natapov wrote:


On Tue, Feb 12, 2008 at 01:08:32PM +0100, Matthias Jurenz wrote:
> Hi Gleb,
>
> that's very strange... cause' the corresponding 'Makefile.in' is
> definitely not empty (checked in to the SVN repository).
Ah, here is the problem. Makefile.in is empty in my tree. I am  
building

not from SVN checkout, but from the other source tree that is synced
with SVN checkout and the sync process consider Makefile.in files as
generated and ignores them. Why  Makefiles.in is not regenerated by
autogen.sh in vt sources?


> Could you reproduce this error after 'make distclean, configure,  
make' ?

> Which version of the autotools are you using?
>
>
> Matthias
>
> On Mo, 2008-02-11 at 11:42 +0200, Gleb Natapov wrote:
>
> > I get the following error while "make install":
> >
> > make[2]: Entering directory `/home_local/glebn/build_dbg/ompi/ 
contrib/vt'

> > Making install in vt
> > make[3]: Entering directory `/home_local/glebn/build_dbg/ompi/ 
contrib/vt/vt'

> > make[3]: *** No rule to make target `install'.  Stop.
> > make[3]: Leaving directory `/home_local/glebn/build_dbg/ompi/ 
contrib/vt/vt'

> > make[2]: *** [install-recursive] Error 1
> > make[2]: Leaving directory `/home_local/glebn/build_dbg/ompi/ 
contrib/vt'

> > make[1]: *** [install-recursive] Error 1
> > make[1]: Leaving directory `/home_local/glebn/build_dbg/ompi'
> > make: *** [install-recursive] Error 1
> >
> > ompi/contrib/vt/vt/Makefile is empty!
> > --
> >   Gleb.
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
>
> --
> Matthias Jurenz,
> Center for Information Services and
> High Performance Computing (ZIH), TU Dresden,
> Willersbau A106, Zellescher Weg 12, 01062 Dresden
> phone +49-351-463-31945, fax +49-351-463-37773



> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Matthias Jurenz,
Center for Information Services and
High Performance Computing (ZIH), TU Dresden,
Willersbau A106, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-31945, fax +49-351-463-37773
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



[OMPI devel] Please set svn:ignore properly

2008-02-12 Thread Jeff Squyres

Developers --

When you add a new component, framework, or anything that includes one  
or more new directories: please be sure to set the svn:ignore property  
on each new directory properly.  Here's the SVN docs on the svn:ignore  
property:


http://svnbook.red-bean.com/en/1.4/svn-book.html#svn.advanced.props.special.ignore

It is proper to ignore all automatically-generated files, such as (but  
not limited to):


*.la
*.lo
.libs
.deps
.dirstamp
Makefile
Makefile.in
static-components.h
...etc.

Thanks.

--
Jeff Squyres
Cisco Systems



[OMPI devel] more memchecker q's

2008-02-12 Thread Jeff Squyres

Why is memchecker.h included like this:

#include "ompi/include/ompi/memchecker.h"

Shouldn't it be

#include "ompi/memchecker.h"

Using the former will work in an SVN checkout, but won't work in a -- 
with-devel-headers installation (the latter should).


Can this be fixed?

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] more vt woes

2008-02-12 Thread George Bosilca

I keep getting some warnings when I compile with gcc-4.2 on MAC OS X.

tools/compwrap/Makefile.am:38: `CXXFLAGS' is a user variable, you  
should not override it;

tools/compwrap/Makefile.am:38: use `AM_CXXFLAGS' instead.
tools/compwrap/Makefile.am:40: `CPPFLAGS' is a user variable, you  
should not override it;

tools/compwrap/Makefile.am:40: use `AM_CPPFLAGS' instead.
tools/compwrap/Makefile.am:41: `LDFLAGS' is a user variable, you  
should not override it;

tools/compwrap/Makefile.am:41: use `AM_LDFLAGS' instead.
tools/opari/tool/Makefile.am:8: `CXXFLAGS' is a user variable, you  
should not override it;

tools/opari/tool/Makefile.am:8: use `AM_CXXFLAGS' instead.
tools/opari/tool/Makefile.am:10: `CPPFLAGS' is a user variable, you  
should not override it;

tools/opari/tool/Makefile.am:10: use `AM_CPPFLAGS' instead.
tools/opari/tool/Makefile.am:11: `LDFLAGS' is a user variable, you  
should not override it;

tools/opari/tool/Makefile.am:11: use `AM_LDFLAGS' instead.

  Thanks,
george.



smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI devel] Scheduled merge of ORTE devel branch to trunk

2008-02-12 Thread Jeff Squyres

Ralph --

We talked about this on the OMPI con call today and everyone agrees  
that this seems to be a good plan.  Just as a safety net: if the merge  
goes disastrously wrong and you're unavailable Thu/Fri this week, we  
can just back it out and try again later.


Thanks!


On Feb 11, 2008, at 11:37 PM, Ralph Castain wrote:


Hello all

Per last week's telecon, we planned the merge of the latest ORTE devel
branch to the OMPI trunk for after Sun had committed its C++  
changes. That

happened over the weekend.

Therefore, based on the requests at the telecon, I will be merging the
current ORTE devel branch to the trunk on Wed 2/13. I'll make the  
commit
around 4:30pm Eastern time - will send out warning shortly before  
the commit

to let you know it is coming. I'll advise of any delays.

This will be a snapshot of that devel branch - it will include the  
upgraded
launch system, remove the GPR, add the new tool communication  
library, allow

arbitrary mpiruns to interconnect, supports the revamped hostfile and
dash-host behaviors per the wiki, etc.

However, it is incomplete and contains some known flaws. For example,
totalview support has not been enabled yet. Comm_spawn, which is  
currently

broken on the OMPI trunk, is fixed - but singleton comm_spawn remains
broken. I am in the process of establishing support for direct and
standalone launch capabilities, but those won't be in the merge. I  
have
updated all of the launchers, but can only certify the SLURM, TM,  
and RSH
ones to work - the Xgrid launcher is known to not compile, so if you  
have

Xgrid on your Mac, you need to tell the build system to not build that
component.

This will give you a chance to look over the new arch, though, and I
understand that people would like to begin having a chance to test and
review the revised code. Hopefully, you will find most of the bugs  
to be

minor.

Please advise of any concerns about this merge. The schedule is  
totally
driven by the requests of the MPI team members (delaying the merge  
has no
impact on ORTE development), so requests to shift the schedule  
should be

discussed amongst the community.

Thanks
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] more vt woes

2008-02-12 Thread Jeff Squyres

Ew.  I've filed a ticket:

https://svn.open-mpi.org/trac/ompi/ticket/1214


On Feb 12, 2008, at 11:27 AM, George Bosilca wrote:


I keep getting some warnings when I compile with gcc-4.2 on MAC OS X.

tools/compwrap/Makefile.am:38: `CXXFLAGS' is a user variable, you  
should not override it;

tools/compwrap/Makefile.am:38: use `AM_CXXFLAGS' instead.
tools/compwrap/Makefile.am:40: `CPPFLAGS' is a user variable, you  
should not override it;

tools/compwrap/Makefile.am:40: use `AM_CPPFLAGS' instead.
tools/compwrap/Makefile.am:41: `LDFLAGS' is a user variable, you  
should not override it;

tools/compwrap/Makefile.am:41: use `AM_LDFLAGS' instead.
tools/opari/tool/Makefile.am:8: `CXXFLAGS' is a user variable, you  
should not override it;

tools/opari/tool/Makefile.am:8: use `AM_CXXFLAGS' instead.
tools/opari/tool/Makefile.am:10: `CPPFLAGS' is a user variable, you  
should not override it;

tools/opari/tool/Makefile.am:10: use `AM_CPPFLAGS' instead.
tools/opari/tool/Makefile.am:11: `LDFLAGS' is a user variable, you  
should not override it;

tools/opari/tool/Makefile.am:11: use `AM_LDFLAGS' instead.

 Thanks,
   george.

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] more memchecker q's

2008-02-12 Thread Shiqing Fan

Hi Jeff,

Sorry for that, I didn't know it before. Now it's fixed. Thanks a lot. :)


Shiqing

Jeff Squyres wrote:

Why is memchecker.h included like this:

 #include "ompi/include/ompi/memchecker.h"

Shouldn't it be

 #include "ompi/memchecker.h"

Using the former will work in an SVN checkout, but won't work in a -- 
with-devel-headers installation (the latter should).


Can this be fixed?

  



--
--
Shiqing Fan 
http://www.hlrs.de/people/fan

High Performance ComputingTel.: +49 711 685 87234
  Center Stuttgart (HLRS)Fax.: +49 711 685 65832
POSTAL:Nobelstrasse 19email: f...@hlrs.de 


ACTUAL:Allmandring 30
70569 Stuttgart



Re: [OMPI devel] more memchecker q's

2008-02-12 Thread Jeff Squyres

Excellent; thanks!

Sometimes we have weird reasons for what we do, but there's  
[usually :-)] a reason.




On Feb 12, 2008, at 1:00 PM, Shiqing Fan wrote:


Hi Jeff,

Sorry for that, I didn't know it before. Now it's fixed. Thanks a  
lot. :)



Shiqing

Jeff Squyres wrote:

Why is memchecker.h included like this:

#include "ompi/include/ompi/memchecker.h"

Shouldn't it be

#include "ompi/memchecker.h"

Using the former will work in an SVN checkout, but won't work in a --
with-devel-headers installation (the latter should).

Can this be fixed?





--
--
Shiqing Fan
http://www.hlrs.de/people/fan
High Performance ComputingTel.: +49 711 685 87234
  Center Stuttgart (HLRS)Fax.: +49 711 685 65832
POSTAL:Nobelstrasse 19email: f...@hlrs.de

ACTUAL:Allmandring 30
70569 Stuttgart

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



[OMPI devel] ROMIO updates

2008-02-12 Thread Jeff Squyres
I just committed two patches to OMPI's ROMIO that I discussed this  
morning on the teleconf.  They remove two things from OMPI's bundled  
ROMIO:


- function renaming (foo -> io_romio_foo)
- file sym linking (foo.c -> io_romio_foo.c)

Although these features were added for a good reason (to abide by  
OMPI's component prefix rule), they make it much more difficult to  
track upstream ROMIO releases.  This tacking ability has been judged  
to be more important than the prefix rule, in this case.   
Additionally, since other MPI implementations include ROMIO without  
symbol/file renaming, we should be ok for all real-world MPI  
applications.


When you update to >=r17437, it will *not* require a new autogen/ 
configure, but you will see automake update a few makefiles when it  
gets to building the ROMIO component.


Additionally, I updated the svn:ignores so that all the sym links that  
were previously created (e.g., io_romio_foo.c) will no longer be  
ignored.  You'll likely want to remove them yourself:


shell$ cd ompi/mca/io/romio/romio
shell$ rm mpi-io/io_romio_*.c
shell$ rm adio/*/io_romio_*.c

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] New Driver BTL

2008-02-12 Thread Jeff Squyres

On Feb 11, 2008, at 7:33 PM, George Bosilca wrote:

But if you do all this internally in NewMadeleine, I guess you don't  
need the Open MPI PML support.


s/PML/OB1/, since OB1 is the specific PML in Open MPI that does all  
that stuff (striping, etc.).


:-)

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] status of LSF integration work?

2008-02-12 Thread Jeff Squyres

There are two issues:

- You must have a recent enough version of LSF.  I'm afraid I don't  
remember the LSF version number offhand, but we both (OMPI and LSF)  
had to make some changes/fixes to achieve compatibility.


- LSF compatibility in OMPI is scheduled for v1.3 (i.e., it doesn't  
exist in the v1.2 series).  As Ralph indicated, we're aware that it's  
currently broken in the trunk -- it'll be fixed by the v1.3 release,  
but I don't know exactly when.  To be blunt: I wouldn't count on it in  
a production environment until v1.3 is officially released.  Betas may  
become available before v1.3 goes gold that would be suitable for  
testing, though.


Here's the OMPI v1.3 roadmap document -- it's more-or-less continually  
updated:


https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3


On Feb 11, 2008, at 10:36 PM, Ralph Castain wrote:

Jeff and I chatted about this today, in fact. We know the LSF  
support is
borked, but neither of us had time right now to fix it. We plan to  
do so,

though, before the 1.3 release - just can't promise when.

Ralph



On 2/11/08 8:00 AM, "Eric Jones"  wrote:


Greetings, MPI mavens,

Perhaps this belongs on users@, but since it's about development  
status
I thought I start here.  I've fairly recently gotten involved in  
getting

an MPI environment configured for our institute.  We have an existing
LSF cluster because most of our work is more High-Throughput than
High-Performance, so if I can use LSF to underlie our MPI  
environment,

that'd be administratively easiest.

I tried to compile the LSF support in the public SVN repo and  
noticed it
was, er, broken.  I'll include the trivial changes we made below.   
But

the behavior is still fairly unpredictable, mostly involving mpirun
never spinning up daemons on other nodes.

I saw mention that work was being suspended on LSF support pending
technical improvements on the LSF side (mentioning that Platform had
provided a patch or try.)

Can I assume, based on the inactivity in the repo, that Platform  
hasn't

resolved the issue?

Thanks,
Eric


Here're the diffs to get LSF support to compile.  We also made a  
change

so it would report the LSF failure code instead of an uninitialized
variable when it fails:

Index: pls_lsf_module.c
===
--- pls_lsf_module.c(revision 17234)
+++ pls_lsf_module.c(working copy)
@@ -304,7 +304,7 @@
  */
 if (lsb_launch(nodelist_argv, argv, LSF_DJOB_NOWAIT, env) < 0) {
 ORTE_ERROR_LOG(ORTE_ERR_FAILED_TO_START);
-opal_output(0, "lsb_launch failed: %d", rc);
+opal_output(0, "lsb_launch failed: %d", lsberrno);
 rc = ORTE_ERR_FAILED_TO_START;
 goto cleanup;
 }
@@ -356,7 +356,7 @@

 /* check for failed launch - if so, force terminate */
 if (failed_launch) {
-if (ORTE_SUCCESS !=
+/*if (ORTE_SUCCESS != */
 orte_pls_base_daemon_failed(jobid, false, -1, 0,
ORTE_JOB_STATE_FAILED_TO_START);
 }
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] merging new PLPA to the trunk

2008-02-12 Thread Jeff Squyres

On Feb 12, 2008, at 7:11 AM, Lenny Verkhovsky wrote:

During coding new RMAPS component I found strange behavior of PLPA.  
Same

behavior that was described in
http://www.open-mpi.org/community/lists/plpa-users/2007/04/0073.php

I believe that it was fixed in new version of PLPA.

This new version needed to be merged to the trunk due to bug fixes and
changes in API.

If there is no objection I volunteer to do it.



That would be great.  Please use the official SVN "3rd party import"  
guidelines.  There's a /vendor/plpa branch that *may* be in good shape  
for this, but may not (I don't think I fully grokked the SVN 3rd part  
import procedures when I was using that branch before).  :-\  In a  
worst-cast scenario, we can "reset the clock" in the /vendor/plpa  
branch and make the new PLPA version be the "first" version in that  
tree (i.e., as if it were the first version we imported).


What's your timeframe?

I ask because it would probably be best if I finally get around to  
releasing a stable version of PLPA.  The last version is technically  
still a beta.


--
Jeff Squyres
Cisco Systems



[OMPI devel] New address selection for btl-tcp (was Re: [OMPI svn] svn:open-mpi r17307)

2008-02-12 Thread Adrian Knoth
On Fri, Feb 01, 2008 at 11:40:20AM -0500, Tim Prins wrote:

> Adrian,

Hi!

Sorry for the late reply and thanks for your testing.

> 1. There are some warnings when compiling:

I've fixed these issues.

> 2. If I exclude all my tcp interfaces, the connection fails properly, 
> but I do get a malloc request for 0 bytes:
> tprins@odin examples]$ mpirun -mca btl tcp,self  -mca btl_tcp_if_exclude 
> eth0,ib0,lo -np 2 ./ring_c
> malloc debug: Request for 0 bytes (btl_tcp_component.c, 844)
> malloc debug: Request for 0 bytes (btl_tcp_component.c, 844)
> 

Not my fault, but I guess we could fix it anyway. Should we?

> 3. If the exclude list does not contain 'lo', or the include list 
> contains 'lo', the job hangs when using multiple nodes:

That's weird. Loopback interfaces should automatically be excluded right
from the beginning. See opal/util/if.c.

I neither know nor haven't checked where things go wrong. Do you want to
investigate? As already mentioned, this should not happen.

Can you post the output of "ip a s" or "ifconfig -a"?

> However, the great news about this patch is that it appears to fix 
> https://svn.open-mpi.org/trac/ompi/ticket/1027 for me.

It also fixes my #1206. I'd like to merge tmp-public/btl-tcp into the
trunk, especially before the 1.3 code freeze. Any objections?


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] Scheduled merge of ORTE devel branch to trunk

2008-02-12 Thread Doug Tody
Hi Ralph -

How extensive are the changes involved in removing the GPR?  How hard would
it be for someone to maintain an enhanced version of this as an addon or
compile-time optional module?  Thanks.

- Doug


On Mon, 11 Feb 2008, Ralph Castain wrote:

> Hello all
> 
> Per last week's telecon, we planned the merge of the latest ORTE devel
> branch to the OMPI trunk for after Sun had committed its C++ changes. That
> happened over the weekend.
> 
> Therefore, based on the requests at the telecon, I will be merging the
> current ORTE devel branch to the trunk on Wed 2/13. I'll make the commit
> around 4:30pm Eastern time - will send out warning shortly before the commit
> to let you know it is coming. I'll advise of any delays.
> 
> This will be a snapshot of that devel branch - it will include the upgraded
> launch system, remove the GPR, add the new tool communication library, allow
> arbitrary mpiruns to interconnect, supports the revamped hostfile and
> dash-host behaviors per the wiki, etc.
> 
> However, it is incomplete and contains some known flaws. For example,
> totalview support has not been enabled yet. Comm_spawn, which is currently
> broken on the OMPI trunk, is fixed - but singleton comm_spawn remains
> broken. I am in the process of establishing support for direct and
> standalone launch capabilities, but those won't be in the merge. I have
> updated all of the launchers, but can only certify the SLURM, TM, and RSH
> ones to work - the Xgrid launcher is known to not compile, so if you have
> Xgrid on your Mac, you need to tell the build system to not build that
> component.
> 
> This will give you a chance to look over the new arch, though, and I
> understand that people would like to begin having a chance to test and
> review the revised code. Hopefully, you will find most of the bugs to be
> minor.
> 
> Please advise of any concerns about this merge. The schedule is totally
> driven by the requests of the MPI team members (delaying the merge has no
> impact on ORTE development), so requests to shift the schedule should be
> discussed amongst the community.
> 
> Thanks
> Ralph
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Re: [OMPI devel] status of LSF integration work?

2008-02-12 Thread ejon

Thanks for response, Jeff.

I'll definitely plan an upgrade to the latest LSF release (7.0 update 2), 
then.  Given the roadmap, I think I'm way better off forging ahead with MPI 
on LSF than implementing a separate solution.  I didn't really expect 
production-ready code at this point.  Just checking whether it was still 
planned for 1.3, really (the last thing I saw in the mailing lists was fairly 
discouraging).


I'm willing to dedicate some time to testing code if you think it would be 
helpful.


Cheers,
Eric

Jeff Squyres wrote:

There are two issues:

- You must have a recent enough version of LSF.  I'm afraid I don't  
remember the LSF version number offhand, but we both (OMPI and LSF)  
had to make some changes/fixes to achieve compatibility.


- LSF compatibility in OMPI is scheduled for v1.3 (i.e., it doesn't  
exist in the v1.2 series).  As Ralph indicated, we're aware that it's  
currently broken in the trunk -- it'll be fixed by the v1.3 release,  
but I don't know exactly when.  To be blunt: I wouldn't count on it in  
a production environment until v1.3 is officially released.  Betas may  
become available before v1.3 goes gold that would be suitable for  
testing, though.


Here's the OMPI v1.3 roadmap document -- it's more-or-less continually  
updated:


 https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3


On Feb 11, 2008, at 10:36 PM, Ralph Castain wrote:

Jeff and I chatted about this today, in fact. We know the LSF  
support is
borked, but neither of us had time right now to fix it. We plan to  
do so,

though, before the 1.3 release - just can't promise when.

Ralph



On 2/11/08 8:00 AM, "Eric Jones"  wrote:


Greetings, MPI mavens,

Perhaps this belongs on users@, but since it's about development  
status
I thought I start here.  I've fairly recently gotten involved in  
getting

an MPI environment configured for our institute.  We have an existing
LSF cluster because most of our work is more High-Throughput than
High-Performance, so if I can use LSF to underlie our MPI  
environment,

that'd be administratively easiest.

I tried to compile the LSF support in the public SVN repo and  
noticed it
was, er, broken.  I'll include the trivial changes we made below.   
But

the behavior is still fairly unpredictable, mostly involving mpirun
never spinning up daemons on other nodes.

I saw mention that work was being suspended on LSF support pending
technical improvements on the LSF side (mentioning that Platform had
provided a patch or try.)

Can I assume, based on the inactivity in the repo, that Platform  
hasn't

resolved the issue?

Thanks,
Eric


Here're the diffs to get LSF support to compile.  We also made a  
change

so it would report the LSF failure code instead of an uninitialized
variable when it fails:

Index: pls_lsf_module.c
===
--- pls_lsf_module.c(revision 17234)
+++ pls_lsf_module.c(working copy)
@@ -304,7 +304,7 @@
  */
 if (lsb_launch(nodelist_argv, argv, LSF_DJOB_NOWAIT, env) < 0) {
 ORTE_ERROR_LOG(ORTE_ERR_FAILED_TO_START);
-opal_output(0, "lsb_launch failed: %d", rc);
+opal_output(0, "lsb_launch failed: %d", lsberrno);
 rc = ORTE_ERR_FAILED_TO_START;
 goto cleanup;
 }
@@ -356,7 +356,7 @@

 /* check for failed launch - if so, force terminate */
 if (failed_launch) {
-if (ORTE_SUCCESS !=
+/*if (ORTE_SUCCESS != */
 orte_pls_base_daemon_failed(jobid, false, -1, 0,
ORTE_JOB_STATE_FAILED_TO_START);
 }
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] status of LSF integration work?

2008-02-12 Thread Jeff Squyres

On Feb 12, 2008, at 4:09 PM, ejon wrote:

I'll definitely plan an upgrade to the latest LSF release (7.0  
update 2),
then.  Given the roadmap, I think I'm way better off forging ahead  
with MPI

on LSF than implementing a separate solution.  I didn't really expect
production-ready code at this point.  Just checking whether it was  
still
planned for 1.3, really (the last thing I saw in the mailing lists  
was fairly discouraging).


I'm willing to dedicate some time to testing code if you think it  
would be

helpful.



That would be great, thanks.

Note that there is a fairly major change coming in to our run-time  
portion of the trunk tomorrow afternoon (a snapshot from a long- 
standing run-time development branch) which will overhaul just about  
everything -- including the LSF support.  The LSF stuff probably  
hasn't been [fully] updated and definitely has not yet been tested  
under the overhaul.


Never fear -- LSF support is a feature we've committed to, so it will  
definitely be there for v1.3.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] status of LSF integration work?

2008-02-12 Thread ejon
I joined this list on time to see the discussion of the merge, so I'm 
expecting the update, but thanks for the heads up.  Until I saw the mail 
about that, I hadn't realized the ORTE stuff was developed separately...now I 
understand why the trunk version was left uncompilable so long :-).


What we have now works well enough that we can probably get along with it, 
but I can run the new code in parallel.  Hopefully I'll be able to offer some 
useful feedback.


E

Jeff Squyres wrote:



That would be great, thanks.

Note that there is a fairly major change coming in to our run-time  
portion of the trunk tomorrow afternoon (a snapshot from a long- 
standing run-time development branch) which will overhaul just about  
everything -- including the LSF support.  The LSF stuff probably  
hasn't been [fully] updated and definitely has not yet been tested  
under the overhaul.


Never fear -- LSF support is a feature we've committed to, so it will  
definitely be there for v1.3.




[OMPI devel] btl_openib_rnr_retry MCA param

2008-02-12 Thread Jeff Squyres
I see that in the OOB CPC for the openib BTL, when setting up the send  
side of the QP, we set the rnr_retry value depending on whether the  
remote receive queue is a per-peer or SRQ:


- SRQ: btl_openib_rnr_retry MCA param value
- PP: 0

The rationale given in a comment is that setting the RNR to 0 is a  
good way to find bugs in our flow control.


Do we really want this in production builds?  Or do we want 0 for  
developer builds and the same btl_openib_rnr_retry value for PP queues?


Or should we offer a finer-grained control, such as:

- btl_openib_rnr_retry_pp: value to use for per-peer q's, -1=use the  
default

- btl_openib_rnr_retry_srq: value to use for srq's, -1=use the default
- btl_openib_rnr_retry: value to use as the default for _pp and _srq

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] VT integration: make distclean problem

2008-02-12 Thread Jeff Squyres

Hmm; I'm not sure.

distclean will fail in a tarball or SVN checkout if you do --enable- 
contrib-no-build=vt.  So it's not a developer-only artifact.


I don't know what the Right solution is, though.  :-\



On Feb 12, 2008, at 9:22 AM, Josh Hursey wrote:


Good points about 'distclean' versus 'clean'. For the make distclean
case then I think it is ok if we fail here since it is not a full
'make dist' that I was working with originally.

Sorry for the distraction.

Cheers,
Josh

On Feb 12, 2008, at 6:52 AM, Andreas Knüpfer wrote:


On Monday 11 February 2008, Josh Hursey wrote:
I've been noticing another problem with the VT integration. If you  
do

a "./configure --enable-contrib-no-build=vt" a subsequent 'make
distclean' will fail in contrib/vt. The 'make distclean' will  
succeed

with VT enabled (default).



hm, tricky. I guess it is about the 'make dist' functionality. All
others
like 'make distclean' etc. are only assisting functionality for
'make dist'
after all.

And for 'make dist' you need to have everything configured that is
going to be
part of the distribution. Therefore, VT needs to be part of the
tarball, so
you can disable it at build time. It would not work the other way
around.

So in my opinion, the current status is what we want to have. Are
there any
problems when configuring VT, then building the tarball with VT and
disabling
it once you build Open MPI from the tarball?

Regards, Andreas

--
Dipl. Math. Andreas Knuepfer,
Center for Information Services and
High Performance Computing (ZIH), TU Dresden,
Willersbau A114, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-38323, fax +49-351-463-37773
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems




Re: [OMPI devel] [RFC] Remove explicit call to progress() from ob1.

2008-02-12 Thread Jeff Squyres
Were these supposed to cover the time required for pinning and  
unpinning?


Can you explain why you think they're unnecessary?


On Feb 12, 2008, at 5:27 AM, Gleb Natapov wrote:


Hi,

I am planning to commit the following patch. Those two progress()  
calls

are responsible for most of our deep recursion troubles. And I also
think they are completely unnecessary.

diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c b/ompi/mca/pml/ob1/ 
pml_ob1_recvreq.c

index 5899243..641176e 100644
--- a/ompi/mca/pml/ob1/pml_ob1_recvreq.c
+++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
@@ -704,9 +704,6 @@ int mca_pml_ob1_recv_request_schedule_once(
mca_bml_base_free(bml_btl,dst);
continue;
}
-
-/* run progress as the prepare (pinning) can take some time  
*/

-mca_bml.bml_progress();
}

return OMPI_SUCCESS;
diff --git a/ompi/mca/pml/ob1/pml_ob1_sendreq.c b/ompi/mca/pml/ob1/ 
pml_ob1_sendreq.c

index 0998a05..9d7f3f9 100644
--- a/ompi/mca/pml/ob1/pml_ob1_sendreq.c
+++ b/ompi/mca/pml/ob1/pml_ob1_sendreq.c
@@ -968,7 +968,6 @@ cannot_pack:
mca_bml_base_free(bml_btl,des);
continue;
}
-mca_bml.bml_progress();
}

return OMPI_SUCCESS;
--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Scheduled merge of ORTE devel branch to trunk

2008-02-12 Thread Ralph Castain
Wellbest laid plans of mice and men, as they say.

We were just having -way- too much fun here at IBM today going over the new
ORTE design, planning for future scalability changes, etc., so American
decided to cancel my flight back home! So thoughtful!

I will be spending my Wed (hopefully!) enroute back home. It looks like it
will be Thurs before I can do the merge. My apologies to all - but I would
really rather not try to do it from a notebook computer in an airline
terminal!

Ralph


On 2/12/08 9:54 AM, "Jeff Squyres"  wrote:

> Ralph --
> 
> We talked about this on the OMPI con call today and everyone agrees
> that this seems to be a good plan.  Just as a safety net: if the merge
> goes disastrously wrong and you're unavailable Thu/Fri this week, we
> can just back it out and try again later.
> 
> Thanks!
> 
> 
> On Feb 11, 2008, at 11:37 PM, Ralph Castain wrote:
> 
>> Hello all
>> 
>> Per last week's telecon, we planned the merge of the latest ORTE devel
>> branch to the OMPI trunk for after Sun had committed its C++
>> changes. That
>> happened over the weekend.
>> 
>> Therefore, based on the requests at the telecon, I will be merging the
>> current ORTE devel branch to the trunk on Wed 2/13. I'll make the
>> commit
>> around 4:30pm Eastern time - will send out warning shortly before
>> the commit
>> to let you know it is coming. I'll advise of any delays.
>> 
>> This will be a snapshot of that devel branch - it will include the
>> upgraded
>> launch system, remove the GPR, add the new tool communication
>> library, allow
>> arbitrary mpiruns to interconnect, supports the revamped hostfile and
>> dash-host behaviors per the wiki, etc.
>> 
>> However, it is incomplete and contains some known flaws. For example,
>> totalview support has not been enabled yet. Comm_spawn, which is
>> currently
>> broken on the OMPI trunk, is fixed - but singleton comm_spawn remains
>> broken. I am in the process of establishing support for direct and
>> standalone launch capabilities, but those won't be in the merge. I
>> have
>> updated all of the launchers, but can only certify the SLURM, TM,
>> and RSH
>> ones to work - the Xgrid launcher is known to not compile, so if you
>> have
>> Xgrid on your Mac, you need to tell the build system to not build that
>> component.
>> 
>> This will give you a chance to look over the new arch, though, and I
>> understand that people would like to begin having a chance to test and
>> review the revised code. Hopefully, you will find most of the bugs
>> to be
>> minor.
>> 
>> Please advise of any concerns about this merge. The schedule is
>> totally
>> driven by the requests of the MPI team members (delaying the merge
>> has no
>> impact on ORTE development), so requests to shift the schedule
>> should be
>> discussed amongst the community.
>> 
>> Thanks
>> Ralph
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 




Re: [OMPI devel] === CREATE FAILURE ===

2008-02-12 Thread George Bosilca

r17440 fix this issue. It's too easy :)

  george.


On Feb 12, 2008, at 9:15 PM, MPI Team wrote:



ERROR: Command returned a non-zero exist status
  make -j 4 distcheck

Start time: Tue Feb 12 21:00:13 EST 2008
End time:   Tue Feb 12 21:15:34 EST 2008

=
==
[... previous lines snipped ...]
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../opal/include - 
I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/ 
linux/plpa/src/libplpa -I../../../ompi/datatype -I../../.. -I../.. - 
I../../../opal/include -I../../../orte/include -I../../../ompi/ 
include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing - 
pthread -MT dt_match_size.lo -MD -MP -MF .deps/dt_match_size.Tpo - 
c ../../../ompi/datatype/dt_match_size.c  -fPIC -DPIC -o .libs/ 
dt_match_size.o

depbase=`echo position.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
/bin/sh ../../libtool --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H - 
I. -I../../opal/include -I../../orte/include -I../../ompi/include - 
I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../ompi/ 
datatype   -I../../.. -I../.. -I../../../opal/include -I../../../ 
orte/include -I../../../ompi/include-O3 -DNDEBUG -finline- 
functions -fno-strict-aliasing -pthread -MT position.lo -MD -MP -MF  
$depbase.Tpo -c -o position.lo ../../../ompi/datatype/position.c &&\

mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../opal/include - 
I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/ 
linux/plpa/src/libplpa -I../../../ompi/datatype -I../../.. -I../.. - 
I../../../opal/include -I../../../orte/include -I../../../ompi/ 
include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing - 
pthread -MT convertor.lo -MD -MP -MF .deps/convertor.Tpo -c ../../../ 
ompi/datatype/convertor.c  -fPIC -DPIC -o .libs/convertor.o
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../opal/include - 
I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/ 
linux/plpa/src/libplpa -I../../../ompi/datatype -I../../.. -I../.. - 
I../../../opal/include -I../../../orte/include -I../../../ompi/ 
include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing - 
pthread -MT position.lo -MD -MP -MF .deps/position.Tpo -c ../../../ 
ompi/datatype/position.c  -fPIC -DPIC -o .libs/position.o

depbase=`echo copy_functions.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
/bin/sh ../../libtool --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H - 
I. -I../../opal/include -I../../orte/include -I../../ompi/include - 
I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../ompi/ 
datatype   -I../../.. -I../.. -I../../../opal/include -I../../../ 
orte/include -I../../../ompi/include-O3 -DNDEBUG -finline- 
functions -fno-strict-aliasing -pthread -MT copy_functions.lo -MD - 
MP -MF $depbase.Tpo -c -o copy_functions.lo ../../../ompi/datatype/ 
copy_functions.c &&\

mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../opal/include - 
I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/ 
linux/plpa/src/libplpa -I../../../ompi/datatype -I../../.. -I../.. - 
I../../../opal/include -I../../../orte/include -I../../../ompi/ 
include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing - 
pthread -MT copy_functions.lo -MD -MP -MF .deps/copy_functions.Tpo - 
c ../../../ompi/datatype/copy_functions.c  -fPIC -DPIC -o .libs/ 
copy_functions.o
depbase=`echo copy_functions_heterogeneous.lo | sed 's|[^/]*$|.deps/ 
&|;s|\.lo$||'`;\
/bin/sh ../../libtool --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H - 
I. -I../../opal/include -I../../orte/include -I../../ompi/include - 
I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../ompi/ 
datatype   -I../../.. -I../.. -I../../../opal/include -I../../../ 
orte/include -I../../../ompi/include-O3 -DNDEBUG -finline- 
functions -fno-strict-aliasing -pthread -MT  
copy_functions_heterogeneous.lo -MD -MP -MF $depbase.Tpo -c -o  
copy_functions_heterogeneous.lo ../../../ompi/datatype/ 
copy_functions_heterogeneous.c &&\

mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../opal/include - 
I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/ 
linux/plpa/src/libplpa -I../../../ompi/datatype -I../../.. -I../.. - 
I../../../opal/include -I../../../orte/include -I../../../ompi/ 
include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing - 
pthread -MT copy_functions_heterogeneous.lo -MD -MP -MF .deps/ 
copy_functions_heterogeneous.Tpo -c ../../../ompi/datatype/ 
copy_functions_heterogeneous.c  -fPIC -DPIC -o .libs/ 
copy_functions_heterogeneous.o

depbase=`echo dt_get_count.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
/bin/sh ../../libtool --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H - 
I. -I../../opal/include -I../../orte/include -I../../ompi/include - 
I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../ompi/ 
datatype   -I../../.. -I../.. -I../../../opal/include -I../../../ 

Re: [OMPI devel] Scheduled merge of ORTE devel branch to trunk

2008-02-12 Thread Ralph Castain
Hi Doug

The changes are rather far-reaching. We essentially revamped the entire RTE
to switch from an event-driven architecture to one based on sequential
logic. This had large benefits, but the GPR was the casualty. Remember, the
aim for the past year has been to create a dedicated "lean, mean OMPI
machine"!

That said, it would be relatively simple to add an extension that provided a
level of data storage that user-level programs could access. It would not
provide any subscription or trigger capabilities, however - we need to leave
those out of the system to avoid reintroducing the event-driven problems
again. But if you just wanted to store and retrieve data for sharing it
across processes, that could be provided with minimal effort or impact.
Probably best done as a compile-time optional module, though, to avoid
adding to the memory footprint for everyone.

Another alternative: there is a separate "ORTE" project in Europe that is
building extensions to our ORTE - they are tracking these code changes, but
adding "bolt-ons" such as a GPR-like central data store, hooks for workflow
management and the grid, multi-cluster operations, etc. I'm working with
them on those efforts - if there is interest in such capabilities, I can
probably look into architecting things so that some of the "bolt-ons" could
be dynamically picked up by OMPI as binary modules or something.

For now, though, there will be no GPR-like storage in the new system.
Ralph



On 2/12/08 1:43 PM, "Doug Tody"  wrote:

> Hi Ralph -
> 
> How extensive are the changes involved in removing the GPR?  How hard would
> it be for someone to maintain an enhanced version of this as an addon or
> compile-time optional module?  Thanks.
> 
> - Doug
> 
> 
> On Mon, 11 Feb 2008, Ralph Castain wrote:
> 
>> Hello all
>> 
>> Per last week's telecon, we planned the merge of the latest ORTE devel
>> branch to the OMPI trunk for after Sun had committed its C++ changes. That
>> happened over the weekend.
>> 
>> Therefore, based on the requests at the telecon, I will be merging the
>> current ORTE devel branch to the trunk on Wed 2/13. I'll make the commit
>> around 4:30pm Eastern time - will send out warning shortly before the commit
>> to let you know it is coming. I'll advise of any delays.
>> 
>> This will be a snapshot of that devel branch - it will include the upgraded
>> launch system, remove the GPR, add the new tool communication library, allow
>> arbitrary mpiruns to interconnect, supports the revamped hostfile and
>> dash-host behaviors per the wiki, etc.
>> 
>> However, it is incomplete and contains some known flaws. For example,
>> totalview support has not been enabled yet. Comm_spawn, which is currently
>> broken on the OMPI trunk, is fixed - but singleton comm_spawn remains
>> broken. I am in the process of establishing support for direct and
>> standalone launch capabilities, but those won't be in the merge. I have
>> updated all of the launchers, but can only certify the SLURM, TM, and RSH
>> ones to work - the Xgrid launcher is known to not compile, so if you have
>> Xgrid on your Mac, you need to tell the build system to not build that
>> component.
>> 
>> This will give you a chance to look over the new arch, though, and I
>> understand that people would like to begin having a chance to test and
>> review the revised code. Hopefully, you will find most of the bugs to be
>> minor.
>> 
>> Please advise of any concerns about this merge. The schedule is totally
>> driven by the requests of the MPI team members (delaying the merge has no
>> impact on ORTE development), so requests to shift the schedule should be
>> discussed amongst the community.
>> 
>> Thanks
>> Ralph
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel