[OMPI devel] Something wrong with vt?
I get the following error while "make install": make[2]: Entering directory `/home_local/glebn/build_dbg/ompi/contrib/vt' Making install in vt make[3]: Entering directory `/home_local/glebn/build_dbg/ompi/contrib/vt/vt' make[3]: *** No rule to make target `install'. Stop. make[3]: Leaving directory `/home_local/glebn/build_dbg/ompi/contrib/vt/vt' make[2]: *** [install-recursive] Error 1 make[2]: Leaving directory `/home_local/glebn/build_dbg/ompi/contrib/vt' make[1]: *** [install-recursive] Error 1 make[1]: Leaving directory `/home_local/glebn/build_dbg/ompi' make: *** [install-recursive] Error 1 ompi/contrib/vt/vt/Makefile is empty! -- Gleb.
[OMPI devel] status of LSF integration work?
Greetings, MPI mavens, Perhaps this belongs on users@, but since it's about development status I thought I start here. I've fairly recently gotten involved in getting an MPI environment configured for our institute. We have an existing LSF cluster because most of our work is more High-Throughput than High-Performance, so if I can use LSF to underlie our MPI environment, that'd be administratively easiest. I tried to compile the LSF support in the public SVN repo and noticed it was, er, broken. I'll include the trivial changes we made below. But the behavior is still fairly unpredictable, mostly involving mpirun never spinning up daemons on other nodes. I saw mention that work was being suspended on LSF support pending technical improvements on the LSF side (mentioning that Platform had provided a patch or try.) Can I assume, based on the inactivity in the repo, that Platform hasn't resolved the issue? Thanks, Eric Here're the diffs to get LSF support to compile. We also made a change so it would report the LSF failure code instead of an uninitialized variable when it fails: Index: pls_lsf_module.c === --- pls_lsf_module.c(revision 17234) +++ pls_lsf_module.c(working copy) @@ -304,7 +304,7 @@ */ if (lsb_launch(nodelist_argv, argv, LSF_DJOB_NOWAIT, env) < 0) { ORTE_ERROR_LOG(ORTE_ERR_FAILED_TO_START); -opal_output(0, "lsb_launch failed: %d", rc); +opal_output(0, "lsb_launch failed: %d", lsberrno); rc = ORTE_ERR_FAILED_TO_START; goto cleanup; } @@ -356,7 +356,7 @@ /* check for failed launch - if so, force terminate */ if (failed_launch) { -if (ORTE_SUCCESS != +/*if (ORTE_SUCCESS != */ orte_pls_base_daemon_failed(jobid, false, -1, 0, ORTE_JOB_STATE_FAILED_TO_START); }
Re: [OMPI devel] more vt woes
This problem should be fixed now. Thanks for the hint. On Sa, 2008-02-09 at 08:47 -0500, Jeff Squyres wrote: > While doing some pathscale compiler testing on the trunk (r17407), I > ran into this compile problem (the first is a warning, the second is > an error): > > pathCC -DHAVE_CONFIG_H -I. -I../.. -I../../extlib/otf/otflib -I../../ > extlib/otf/otflib -I../../vtlib/ -I../../vtlib -openmp -DVT_OMP -g - > Wall -Wundef -Wno-long-long -finline-functions -pthread -MT vtfilter- > vt_tracefilter.o -MD -MP -MF .deps/vtfilter-vt_tracefilter.Tpo -c -o > vtfilter-vt_tracefilter.o `test -f 'vt_tracefilter.cc' || echo > './'`vt_tracefilter.cc > mv -f .deps/vtfilter-vt_otfhandler.Tpo .deps/vtfilter-vt_otfhandler.Po > mv -f .deps/vtfilter-vt_filthandler.Tpo .deps/vtfilter-vt_filthandler.Po > "vt_tracefilter.cc", line 451: Warning: Referenced scalar variable > _ZZ4mainE5retev is SHARED by default > "vt_tracefilter.cc", line 921: Warning: Referenced scalar variable > _ZZ4mainE5retev is SHARED by default > "vt_tracefilter.cc", line 950: Warning: Referenced scalar variable > _ZZ4mainE5retst is SHARED by default > "vt_tracefilter.cc", line 977: Warning: Referenced scalar variable > _ZZ4mainE5retsn is SHARED by default > mv -f .deps/vtfilter-vt_filter.Tpo .deps/vtfilter-vt_filter.Po > mv -f .deps/vtfilter-vt_tracefilter.Tpo .deps/vtfilter-vt_tracefilter.Po > pathCC -openmp -DVT_OMP -g -Wall -Wundef -Wno-long-long -finline- > functions -pthread -openmp -o vtfilter vtfilter-vt_filter.o vtfilter- > vt_filthandler.o vtfilter-vt_otfhandler.o vtfilter-vt_tracefilter.o - > L../../extlib/otf/otflib -L../../extlib/otf/otflib/.libs -lotf -lz - > lnsl -lutil -lm > vtfilter-vt_tracefilter.o(.text+0x309b): In function `main': > /home/jsquyres/svn/ompi2/ompi/contrib/vt/vt/tools/vtfilter/ > vt_tracefilter.cc:794: undefined reference to > `FiltHandlerArgument::FiltHandlerArgument(FiltHandlerArgument const&)' > vtfilter-vt_tracefilter.o(.text+0x312f):/home/jsquyres/svn/ompi2/ompi/ > contrib/vt/vt/tools/vtfilter/vt_tracefilter.cc:802: undefined > reference to > `FiltHandlerArgument::FiltHandlerArgument(FiltHandlerArgument const&)' > vtfilter-vt_tracefilter.o(.text+0x577b): In function `__ompdo_main2': > /home/jsquyres/svn/ompi2/ompi/contrib/vt/vt/tools/vtfilter/ > vt_tracefilter.cc:802: undefined reference to > `FiltHandlerArgument::FiltHandlerArgument(FiltHandlerArgument const&)' > collect2: ld returned 1 exit status > make[6]: *** [vtfilter] Error 1 > make[6]: Leaving directory `/home/jsquyres/svn/ompi2/ompi/contrib/vt/ > vt/tools/vtfilter' > > This is with the pathscale v3.0 compilers. > -- Matthias Jurenz, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A106, Zellescher Weg 12, 01062 Dresden phone +49-351-463-31945, fax +49-351-463-37773 smime.p7s Description: S/MIME cryptographic signature
[OMPI devel] VT integration: make distclean problem
I've been noticing another problem with the VT integration. If you do a "./configure --enable-contrib-no-build=vt" a subsequent 'make distclean' will fail in contrib/vt. The 'make distclean' will succeed with VT enabled (default). --- Making distclean in contrib/vt make[2]: Entering directory `/san/homedirs/jjhursey/svn/ompi/ompi/ contrib/vt' make[2]: *** No rule to make target `distclean'. Stop. make[2]: Leaving directory `/san/homedirs/jjhursey/svn/ompi/ompi/ contrib/vt' make[1]: *** [distclean-recursive] Error 1 make[1]: Leaving directory `/san/homedirs/jjhursey/svn/ompi/ompi' make: *** [distclean-recursive] Error 1 --- I haven't looked at how to fix this, but maybe it is as simple as adding a flag to the Makefile.am in that directory. -- Josh
[OMPI devel] New Driver BTL
Hello! I don't know if it is the good method to have some help for developing with open mpi. We are 4 french students and we have a project : we have to write a new driver (new btl) between openmpi and newmadeleine (see the web page, http://pm2.gforge.inria.fr/newmadeleine/doc/html/) With newmad, we use send receive interface. we need just the part of btl which is able to do it. Have you some docs about structure like mca_btl_base_module and its friends ? I don't find where the function mca_btl_tcp_send is used. Do you know it ? PLEASE HELP US! Team
Re: [OMPI devel] VT integration: make distclean problem
* Josh Hursey wrote on Mon, Feb 11, 2008 at 07:31:25PM CET: > I've been noticing another problem with the VT integration. If you do > a "./configure --enable-contrib-no-build=vt" a subsequent 'make > distclean' will fail in contrib/vt. The 'make distclean' will succeed > with VT enabled (default). ATM the toplevel configury does not run configure in contrib/vt/vt, if that is disabled. I think that's intended. But it also means that a distribution built from such a build tree cannot be complete, i.e., contain all contribs, because their Makefiles do not exist which contain the respective dist rules. Likewise for distclean and maintainer-clean. I suppose for distclean, this could be worked around uglily (please speak up if you want me to take a shot at it), but if you want all of these to work out of the box even for --enable-contrib-no-build=vt, then you need to run configure for vt every time. Sorry 'bout that. Cheers, Ralf
[OMPI devel] Fixlet for config/ompi_contrib.m4
Hello, please apply this patch, to make future contrib integration just a tad bit easier. I verified that the generated configure script is identical, minus whitespace and comments. Cheers, Ralf 2008-02-11 Ralf Wildenhues * config/ompi_contrib.m4 (OMPI_CONTRIB): Unify listings of contrib software packages. Index: config/ompi_contrib.m4 === --- config/ompi_contrib.m4 (Revision 17419) +++ config/ompi_contrib.m4 (Arbeitskopie) @@ -67,20 +67,13 @@ # Cycle through each of the hard-coded software packages and # configure them if not disabled. May someday be expanded to have # autogen find the packages instead of this hard-coded list -# (https://svn.open-mpi.org/trac/ompi/ticket/1162). I couldn't -# figure out a simple/easy way to have the m4 foreach do the m4 -# include *and* all the rest of the stuff, so I settled for having -# two lists: each contribted software package will need to add its -# configure.m4 list here and then add its name to the m4 define -# for contrib_software_list. Cope. -#dnlm4_include(ompi/contrib/libnbc/configure.m4) -m4_include(ompi/contrib/vt/configure.m4) - -m4_define(contrib_software_list, [vt]) -#dnlm4_define(contrib_software_list, [libnbc, vt]) +# (https://svn.open-mpi.org/trac/ompi/ticket/1162). +# m4_define([contrib_software_list], [libnbc, vt]) +m4_define([contrib_software_list], [vt]) m4_foreach(software, [contrib_software_list], - [OMPI_CONTRIB_DIST_SUBDIRS="$OMPI_CONTRIB_DIST_SUBDIRS contrib/software" - _OMPI_CONTRIB_CONFIGURE(software)]) + [m4_include([ompi/contrib/]software[/configure.m4]) + OMPI_CONTRIB_DIST_SUBDIRS="$OMPI_CONTRIB_DIST_SUBDIRS contrib/software" + _OMPI_CONTRIB_CONFIGURE(software)]) # Setup the top-level glue AC_SUBST(OMPI_CONTRIB_SUBDIRS)
[OMPI devel] Leopard problems
Hi, Since I upgraded to MacOS X 10.5.1, I've been having problems running MPI programs (using both 1.2.4 and 1.2.5). The symptoms are intermittent (i.e. sometimes the application runs fine), and appear as follows: 1. One or more of the application processes die (I've see both one and two processes die). 2. (It appears) that the orted's associated with these application process then spin continually. Here is what I see when I run "mpirun -np 4 ./mpitest": 12467 ?? Rs 1:26.52 orted --bootproxy 1 --name 0.0.1 -- num_procs 5 --vpid_start 0 --nodename node0 --universe greg@Jarrah.local:default-universe-12462 --nsreplica "0.0.0;tcp:// 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp:// 10.37.129.2:56749;tcp://10.211.55.2:56749" --gprreplica "0.0.0;tcp:// 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp:// 10.37.129.2:56749;tcp://10.211.55.2:56749" --set-sid 12468 ?? Rs 1:26.63 orted --bootproxy 1 --name 0.0.2 -- num_procs 5 --vpid_start 0 --nodename node1 --universe greg@Jarrah.local:default-universe-12462 --nsreplica "0.0.0;tcp:// 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp:// 10.37.129.2:56749;tcp://10.211.55.2:56749" --gprreplica "0.0.0;tcp:// 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp:// 10.37.129.2:56749;tcp://10.211.55.2:56749" --set-sid 12469 ?? Ss 0:00.04 orted --bootproxy 1 --name 0.0.3 -- num_procs 5 --vpid_start 0 --nodename node2 --universe greg@Jarrah.local:default-universe-12462 --nsreplica "0.0.0;tcp:// 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp:// 10.37.129.2:56749;tcp://10.211.55.2:56749" --gprreplica "0.0.0;tcp:// 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp:// 10.37.129.2:56749;tcp://10.211.55.2:56749" --set-sid 12470 ?? Ss 0:00.04 orted --bootproxy 1 --name 0.0.4 -- num_procs 5 --vpid_start 0 --nodename node3 --universe greg@Jarrah.local:default-universe-12462 --nsreplica "0.0.0;tcp:// 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp:// 10.37.129.2:56749;tcp://10.211.55.2:56749" --gprreplica "0.0.0;tcp:// 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp:// 10.37.129.2:56749;tcp://10.211.55.2:56749" --set-sid 12471 ?? S 0:00.05 ./mpitest 12472 ?? S 0:00.05 ./mpitest Killing the mpirun results in: $ mpirun -np 4 ./mpitest ^Cmpirun: killing job... ^ C -- WARNING: mpirun is in the process of killing a job, but has detected an interruption (probably control-C). It is dangerous to interrupt mpirun while it is killing a job (proper termination may not be guaranteed). Hit control-C again within 1 second if you really want to kill mpirun immediately. -- ^Cmpirun: forcibly killing job... -- WARNING: mpirun has exited before it received notification that all started processes had terminated. You should double check and ensure that there are no runaway processes still executing. -- At this point, the two spinning orted's are left running, and the only way to kill them is with -9. Is anyone else seeing this problem? Greg
Re: [OMPI devel] New Driver BTL
Cedric, There is not much documentation about this subject. However, we have some templates. Look in ompi/mca/btl/template to see how a new driver is supposed to be written. I have a question. As far as I understand about New Madelaine it already support multi devices, so I guess the matching is done internally. Then the best approach for Open MPI will be to create an MTL instead of a BTL. The MTL interface is much simpler, basically a one to one wrapper for the point-to-point MPI functions. However, if you take this approach, there are few things that will be left out. As an example, no data resilience, no stripping, no pipelining. But if you do all this internally in NewMadeleine, I guess you don't need the Open MPI PML support. Thanks, george. On Feb 11, 2008, at 1:52 PM, Cedric Desmoulin wrote: Hello! I don't know if it is the good method to have some help for developing with open mpi. We are 4 french students and we have a project : we have to write a new driver (new btl) between openmpi and newmadeleine (see the web page, http://pm2.gforge.inria.fr/newmadeleine/doc/html/) With newmad, we use send receive interface. we need just the part of btl which is able to do it. Have you some docs about structure like mca_btl_base_module and its friends ? I don't find where the function mca_btl_tcp_send is used. Do you know it ? PLEASE HELP US! Team ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel smime.p7s Description: S/MIME cryptographic signature
[OMPI devel] 1.3 Release schedule and contents
All: The latest scrub of the 1.3 release schedule and contents is ready for review and comment. Please use the following links: 1.3 milestones: https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3 1.3.1 milestones: https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3.1 In order to try and keep the dates for 1.3 in, I've pushed a bunch of stuff (particularly ORTE things) to 1.3.1. Even though there will be new functionality slated for 1.3.1, the goal is to not have any interface changes between the phases. Please look over the list and schedules and let me or my fellow 1.3co-release manager George Bosilca ( bosi...@eecs.utk.edu) know of any issues, errors, suggestions, omissions, heartburn, etc. Thanks, --Brad Brad Benton IBM
Re: [OMPI devel] 1.3 Release schedule and contents
Out of curiousity, why is one-sided rdma component struck from 1.3? As far as I'm aware, the code is in the trunk and ready for release. Brian On Mon, 11 Feb 2008, Brad Benton wrote: All: The latest scrub of the 1.3 release schedule and contents is ready for review and comment. Please use the following links: 1.3 milestones: https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3 1.3.1 milestones: https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3.1 In order to try and keep the dates for 1.3 in, I've pushed a bunch of stuff (particularly ORTE things) to 1.3.1. Even though there will be new functionality slated for 1.3.1, the goal is to not have any interface changes between the phases. Please look over the list and schedules and let me or my fellow 1.3co-release manager George Bosilca ( bosi...@eecs.utk.edu) know of any issues, errors, suggestions, omissions, heartburn, etc. Thanks, --Brad Brad Benton IBM
Re: [OMPI devel] 1.3 Release schedule and contents
Yo Brian The line through that item means it has already been completed and is ready to go. There should also be a line through item 1.3.a.vi - it has also been fixed. On 2/11/08 8:29 PM, "Brian W. Barrett" wrote: > Out of curiousity, why is one-sided rdma component struck from 1.3? As > far as I'm aware, the code is in the trunk and ready for release. > > Brian > > On Mon, 11 Feb 2008, Brad Benton wrote: > >> All: >> >> The latest scrub of the 1.3 release schedule and contents is ready for >> review and comment. Please use the following links: >> 1.3 milestones: >> https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3 >> 1.3.1 milestones: >> https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3.1 >> >> In order to try and keep the dates for 1.3 in, I've pushed a bunch of stuff >> (particularly ORTE things) to 1.3.1. Even though there will be new >> functionality slated for 1.3.1, the goal is to not have any interface >> changes between the phases. >> >> Please look over the list and schedules and let me or my fellow >> 1.3co-release manager George Bosilca ( >> bosi...@eecs.utk.edu) know of any issues, errors, suggestions, omissions, >> heartburn, etc. >> >> Thanks, >> --Brad >> >> Brad Benton >> IBM >> > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Leopard problems
There is a known problem with Leopard and Open MPI of all versions. We haven't had time to chase it down yet - probably still a few weeks away. Ralph On 2/11/08 1:39 PM, "Greg Watson" wrote: > Hi, > > Since I upgraded to MacOS X 10.5.1, I've been having problems running > MPI programs (using both 1.2.4 and 1.2.5). The symptoms are > intermittent (i.e. sometimes the application runs fine), and appear as > follows: > > 1. One or more of the application processes die (I've see both one and > two processes die). > > 2. (It appears) that the orted's associated with these application > process then spin continually. > > Here is what I see when I run "mpirun -np 4 ./mpitest": > > 12467 ?? Rs 1:26.52 orted --bootproxy 1 --name 0.0.1 -- > num_procs 5 --vpid_start 0 --nodename node0 --universe > greg@Jarrah.local:default-universe-12462 --nsreplica "0.0.0;tcp:// > 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp:// > 10.37.129.2:56749;tcp://10.211.55.2:56749" --gprreplica "0.0.0;tcp:// > 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp:// > 10.37.129.2:56749;tcp://10.211.55.2:56749" --set-sid > 12468 ?? Rs 1:26.63 orted --bootproxy 1 --name 0.0.2 -- > num_procs 5 --vpid_start 0 --nodename node1 --universe > greg@Jarrah.local:default-universe-12462 --nsreplica "0.0.0;tcp:// > 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp:// > 10.37.129.2:56749;tcp://10.211.55.2:56749" --gprreplica "0.0.0;tcp:// > 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp:// > 10.37.129.2:56749;tcp://10.211.55.2:56749" --set-sid > 12469 ?? Ss 0:00.04 orted --bootproxy 1 --name 0.0.3 -- > num_procs 5 --vpid_start 0 --nodename node2 --universe > greg@Jarrah.local:default-universe-12462 --nsreplica "0.0.0;tcp:// > 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp:// > 10.37.129.2:56749;tcp://10.211.55.2:56749" --gprreplica "0.0.0;tcp:// > 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp:// > 10.37.129.2:56749;tcp://10.211.55.2:56749" --set-sid > 12470 ?? Ss 0:00.04 orted --bootproxy 1 --name 0.0.4 -- > num_procs 5 --vpid_start 0 --nodename node3 --universe > greg@Jarrah.local:default-universe-12462 --nsreplica "0.0.0;tcp:// > 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp:// > 10.37.129.2:56749;tcp://10.211.55.2:56749" --gprreplica "0.0.0;tcp:// > 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp:// > 10.37.129.2:56749;tcp://10.211.55.2:56749" --set-sid > 12471 ?? S 0:00.05 ./mpitest > 12472 ?? S 0:00.05 ./mpitest > > Killing the mpirun results in: > > $ mpirun -np 4 ./mpitest > ^Cmpirun: killing job... > > ^ > C > -- > WARNING: mpirun is in the process of killing a job, but has detected an > interruption (probably control-C). > > It is dangerous to interrupt mpirun while it is killing a job (proper > termination may not be guaranteed). Hit control-C again within 1 > second if you really want to kill mpirun immediately. > -- > ^Cmpirun: forcibly killing job... > -- > WARNING: mpirun has exited before it received notification that all > started processes had terminated. You should double check and ensure > that there are no runaway processes still executing. > -- > > At this point, the two spinning orted's are left running, and the only > way to kill them is with -9. > > Is anyone else seeing this problem? > > Greg > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] status of LSF integration work?
Jeff and I chatted about this today, in fact. We know the LSF support is borked, but neither of us had time right now to fix it. We plan to do so, though, before the 1.3 release - just can't promise when. Ralph On 2/11/08 8:00 AM, "Eric Jones" wrote: > Greetings, MPI mavens, > > Perhaps this belongs on users@, but since it's about development status > I thought I start here. I've fairly recently gotten involved in getting > an MPI environment configured for our institute. We have an existing > LSF cluster because most of our work is more High-Throughput than > High-Performance, so if I can use LSF to underlie our MPI environment, > that'd be administratively easiest. > > I tried to compile the LSF support in the public SVN repo and noticed it > was, er, broken. I'll include the trivial changes we made below. But > the behavior is still fairly unpredictable, mostly involving mpirun > never spinning up daemons on other nodes. > > I saw mention that work was being suspended on LSF support pending > technical improvements on the LSF side (mentioning that Platform had > provided a patch or try.) > > Can I assume, based on the inactivity in the repo, that Platform hasn't > resolved the issue? > > Thanks, > Eric > > > Here're the diffs to get LSF support to compile. We also made a change > so it would report the LSF failure code instead of an uninitialized > variable when it fails: > > Index: pls_lsf_module.c > === > --- pls_lsf_module.c(revision 17234) > +++ pls_lsf_module.c(working copy) > @@ -304,7 +304,7 @@ >*/ > if (lsb_launch(nodelist_argv, argv, LSF_DJOB_NOWAIT, env) < 0) { > ORTE_ERROR_LOG(ORTE_ERR_FAILED_TO_START); > -opal_output(0, "lsb_launch failed: %d", rc); > +opal_output(0, "lsb_launch failed: %d", lsberrno); > rc = ORTE_ERR_FAILED_TO_START; > goto cleanup; > } > @@ -356,7 +356,7 @@ > > /* check for failed launch - if so, force terminate */ > if (failed_launch) { > -if (ORTE_SUCCESS != > +/*if (ORTE_SUCCESS != */ > orte_pls_base_daemon_failed(jobid, false, -1, 0, > ORTE_JOB_STATE_FAILED_TO_START); > } > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] Scheduled merge of ORTE devel branch to trunk
Hello all Per last week's telecon, we planned the merge of the latest ORTE devel branch to the OMPI trunk for after Sun had committed its C++ changes. That happened over the weekend. Therefore, based on the requests at the telecon, I will be merging the current ORTE devel branch to the trunk on Wed 2/13. I'll make the commit around 4:30pm Eastern time - will send out warning shortly before the commit to let you know it is coming. I'll advise of any delays. This will be a snapshot of that devel branch - it will include the upgraded launch system, remove the GPR, add the new tool communication library, allow arbitrary mpiruns to interconnect, supports the revamped hostfile and dash-host behaviors per the wiki, etc. However, it is incomplete and contains some known flaws. For example, totalview support has not been enabled yet. Comm_spawn, which is currently broken on the OMPI trunk, is fixed - but singleton comm_spawn remains broken. I am in the process of establishing support for direct and standalone launch capabilities, but those won't be in the merge. I have updated all of the launchers, but can only certify the SLURM, TM, and RSH ones to work - the Xgrid launcher is known to not compile, so if you have Xgrid on your Mac, you need to tell the build system to not build that component. This will give you a chance to look over the new arch, though, and I understand that people would like to begin having a chance to test and review the revised code. Hopefully, you will find most of the bugs to be minor. Please advise of any concerns about this merge. The schedule is totally driven by the requests of the MPI team members (delaying the merge has no impact on ORTE development), so requests to shift the schedule should be discussed amongst the community. Thanks Ralph