Re: [OMPI devel] btl_openib_rnr_retry MCA param
On Tue, Feb 12, 2008 at 05:41:13PM -0500, Jeff Squyres wrote: > I see that in the OOB CPC for the openib BTL, when setting up the send > side of the QP, we set the rnr_retry value depending on whether the > remote receive queue is a per-peer or SRQ: > > - SRQ: btl_openib_rnr_retry MCA param value > - PP: 0 > > The rationale given in a comment is that setting the RNR to 0 is a > good way to find bugs in our flow control. > > Do we really want this in production builds? Or do we want 0 for > developer builds and the same btl_openib_rnr_retry value for PP queues? > The comment is mine and IMO it should stay that way for production builds. SW flow control either work or it doesn't and if it doesn't I prefer to know about it immediately. Setting PP to some value greater then 0 just delays the manifestation of the problem and in the case of iWarp such possibility doesn't even exists. -- Gleb.
Re: [OMPI devel] [RFC] Remove explicit call to progress() from ob1.
On Tue, Feb 12, 2008 at 05:57:22PM -0500, Jeff Squyres wrote: > Were these supposed to cover the time required for pinning and > unpinning? That what the comment says, but CPU executes code and not comments :) Memory pinning happens inside prepare_dst() after prepare_dst() returns the memory is already pinned. If you want to call progress after each call to prepare_dst() you still can do it by setting recv_pipeline_depth to 1. And unpinning happens in entirely different place after RDMA completion is acknowledged. > > Can you explain why you think they're unnecessary? > The much better question is "Why they are necessary?", because if there is not good answer to this question then they should be removed, since they are harmful as they cause uncontrollable recursion calls. > > On Feb 12, 2008, at 5:27 AM, Gleb Natapov wrote: > > > Hi, > > > > I am planning to commit the following patch. Those two progress() > > calls > > are responsible for most of our deep recursion troubles. And I also > > think they are completely unnecessary. > > > > diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c b/ompi/mca/pml/ob1/ > > pml_ob1_recvreq.c > > index 5899243..641176e 100644 > > --- a/ompi/mca/pml/ob1/pml_ob1_recvreq.c > > +++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.c > > @@ -704,9 +704,6 @@ int mca_pml_ob1_recv_request_schedule_once( > > mca_bml_base_free(bml_btl,dst); > > continue; > > } > > - > > -/* run progress as the prepare (pinning) can take some time > > */ > > -mca_bml.bml_progress(); > > } > > > > return OMPI_SUCCESS; > > diff --git a/ompi/mca/pml/ob1/pml_ob1_sendreq.c b/ompi/mca/pml/ob1/ > > pml_ob1_sendreq.c > > index 0998a05..9d7f3f9 100644 > > --- a/ompi/mca/pml/ob1/pml_ob1_sendreq.c > > +++ b/ompi/mca/pml/ob1/pml_ob1_sendreq.c > > @@ -968,7 +968,6 @@ cannot_pack: > > mca_bml_base_free(bml_btl,des); > > continue; > > } > > -mca_bml.bml_progress(); > > } > > > > return OMPI_SUCCESS; > > -- > > Gleb. > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > Cisco Systems > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.
Re: [OMPI devel] merging new PLPA to the trunk
> -Original Message- > From: Jeff Squyres [mailto:jsquy...@cisco.com] > Sent: Tuesday, February 12, 2008 10:34 PM > To: Lenny Verkhovsky > Cc: PLPA users list; Open MPI Developers; Sharon Melamed; Ralph Castain; > Pak Lui > Subject: Re: merging new PLPA to the trunk > > On Feb 12, 2008, at 7:11 AM, Lenny Verkhovsky wrote: > > > During coding new RMAPS component I found strange behavior of PLPA. > > Same > > behavior that was described in > > http://www.open-mpi.org/community/lists/plpa-users/2007/04/0073.php > > > > I believe that it was fixed in new version of PLPA. > > > > This new version needed to be merged to the trunk due to bug fixes and > > changes in API. > > > > If there is no objection I volunteer to do it. > > > That would be great. Please use the official SVN "3rd party import" > guidelines. There's a /vendor/plpa branch that *may* be in good shape > for this, but may not (I don't think I fully grokked the SVN 3rd part > import procedures when I was using that branch before). :-\ In a > worst-cast scenario, we can "reset the clock" in the /vendor/plpa > branch and make the new PLPA version be the "first" version in that > tree (i.e., as if it were the first version we imported). > > What's your timeframe? > > I ask because it would probably be best if I finally get around to > releasing a stable version of PLPA. The last version is technically > still a beta. We are working on the newest version from the trunk http://svn.open-mpi.org/svn/plpa/trunk/ right now. It's newer than /vendor/plpa branch. > > -- > Jeff Squyres > Cisco Systems
Re: [OMPI devel] more vt woes
Hi George, I'm not sure, whether you are able to see my reply of the ticket 1214... ... For building VT on cross-platforms it's possible to build the compiler wrappers (vtcc, vtcxx, vtf77, and vtf90) and the OPARI binary for the front-end. Therefor the user should set the variable CXX_FOR_BUILD to the 'native' compiler on the front-end. That means that the compiler wrappers and OPARI will be built with the CXX_FOR_BUILD instead of the cross-compiler (CXX). Futhermore, the user can set compiler and linker flags for the front-end compiler (e.g. CXXFLAGS_FOR_BUILD). The Makefile.am's for the compiler wrappers (tools/compwrap) and OPARI (tools/opari) overwrite the user-variables (e.g. CXXFLAGS) by the *_FOR_BUILD stuff. Unfortunately, the variables AM_CXXFLAGS, AM_CPPFLAGS, and AM_LDFLAGS cannot be used for do that, because these variables don't overwrite the user-variables but they will be append. This could means that unsupported compiler flags will be passed to the front-end compiler. Example: configure CXX_FOR_BUILD=g++ CXXFLAGS_FOR_BUILD=-m64 CC=cross-xlc CXX=cross-xlC CFLAGS=-q64 CXXFLAGS=-q64 ... In this case the compiler flag -q64 is not supported by g++, so CXXFLAGS_FOR_BUILD should be used instead of CXXFLAGS. So, please ignore the warnings from Automake... Currently, I see no better solution ;-) ... Regards, Matthias On Di, 2008-02-12 at 11:27 -0500, George Bosilca wrote: > I keep getting some warnings when I compile with gcc-4.2 on MAC OS X. > > tools/compwrap/Makefile.am:38: `CXXFLAGS' is a user variable, you > should not override it; > tools/compwrap/Makefile.am:38: use `AM_CXXFLAGS' instead. > tools/compwrap/Makefile.am:40: `CPPFLAGS' is a user variable, you > should not override it; > tools/compwrap/Makefile.am:40: use `AM_CPPFLAGS' instead. > tools/compwrap/Makefile.am:41: `LDFLAGS' is a user variable, you > should not override it; > tools/compwrap/Makefile.am:41: use `AM_LDFLAGS' instead. > tools/opari/tool/Makefile.am:8: `CXXFLAGS' is a user variable, you > should not override it; > tools/opari/tool/Makefile.am:8: use `AM_CXXFLAGS' instead. > tools/opari/tool/Makefile.am:10: `CPPFLAGS' is a user variable, you > should not override it; > tools/opari/tool/Makefile.am:10: use `AM_CPPFLAGS' instead. > tools/opari/tool/Makefile.am:11: `LDFLAGS' is a user variable, you > should not override it; > tools/opari/tool/Makefile.am:11: use `AM_LDFLAGS' instead. > >Thanks, > george. > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Matthias Jurenz, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A106, Zellescher Weg 12, 01062 Dresden phone +49-351-463-31945, fax +49-351-463-37773 smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] Please set svn:ignore properly
Yo Jeff I sympathize with your request. However, we should note that those of us not using subversion for our work (e.g., using Hg or GIT) may not see this problem despite best intentions. Those system set "ignore" on a global basis, not on a per directory basis like svn. So (a) we just don't see any warning about this, and (b) we don't have a way to set those properties in our repositories. When we merge the work from our repository over to an svn checkout, we typically do not build it there. This helps when we are transitioning back and forth between the official svn repository and our local repository. So we again won't see an svn:ignore issue. I know that doesn't help any, but I think it probably explains the majority of what you are seeing. I'm not sure there is a good answer, unfortunately. Ralph On 2/12/08 7:46 AM, "Jeff Squyres" wrote: > Developers -- > > When you add a new component, framework, or anything that includes one > or more new directories: please be sure to set the svn:ignore property > on each new directory properly. Here's the SVN docs on the svn:ignore > property: > > http://svnbook.red-bean.com/en/1.4/svn-book.html#svn.advanced.props.special.ig > nore > > It is proper to ignore all automatically-generated files, such as (but > not limited to): > > *.la > *.lo > .libs > .deps > .dirstamp > Makefile > Makefile.in > static-components.h > ...etc. > > Thanks.
Re: [OMPI devel] more vt woes
Hallo Matthias, * Matthias Jurenz wrote on Wed, Feb 13, 2008 at 01:49:41PM CET: > On Di, 2008-02-12 at 11:27 -0500, George Bosilca wrote: > > > I keep getting some warnings when I compile with gcc-4.2 on MAC OS X. > > > > tools/compwrap/Makefile.am:38: `CXXFLAGS' is a user variable, you > > should not override it; [...] > So, please ignore the warnings from Automake... Currently, I see no > better solution ;-) You can put AUTOMAKE_OPTIONS = -Wno-gnu in tools/compwrap/Makefile.am to avoid the warnings from automake. Cheers, Ralf
Re: [OMPI devel] more vt woes
Thanks for the hint, Ralf ! I will give it a try... On Mi, 2008-02-13 at 13:58 +0100, Ralf Wildenhues wrote: > Hallo Matthias, > > * Matthias Jurenz wrote on Wed, Feb 13, 2008 at 01:49:41PM CET: > > On Di, 2008-02-12 at 11:27 -0500, George Bosilca wrote: > > > > > I keep getting some warnings when I compile with gcc-4.2 on MAC OS X. > > > > > > tools/compwrap/Makefile.am:38: `CXXFLAGS' is a user variable, you > > > should not override it; > [...] > > So, please ignore the warnings from Automake... Currently, I see no > > better solution ;-) > > You can put > AUTOMAKE_OPTIONS = -Wno-gnu > > in tools/compwrap/Makefile.am to avoid the warnings from automake. > > Cheers, > Ralf > -- Matthias Jurenz, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A106, Zellescher Weg 12, 01062 Dresden phone +49-351-463-31945, fax +49-351-463-37773 smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] Please set svn:ignore properly
Understood; I too, have started to use hg internally at Cisco. But I still take care to set svn:ignore properly when I commit back to the main repository, for a few reasons: - SVN is the official SCM for OMPI; it's a choice to *not* use it - there are still a good chunk of developers using SVN exclusively - the svn:ignore information can be mined and used in other SCM systems, such as hg and git (hg has some internal "ignore" problems, but that's a different issue) So I still think that everyone should be setting svn:ignore properly. My $0.02... On Feb 13, 2008, at 7:55 AM, Ralph Castain wrote: Yo Jeff I sympathize with your request. However, we should note that those of us not using subversion for our work (e.g., using Hg or GIT) may not see this problem despite best intentions. Those system set "ignore" on a global basis, not on a per directory basis like svn. So (a) we just don't see any warning about this, and (b) we don't have a way to set those properties in our repositories. When we merge the work from our repository over to an svn checkout, we typically do not build it there. This helps when we are transitioning back and forth between the official svn repository and our local repository. So we again won't see an svn:ignore issue. I know that doesn't help any, but I think it probably explains the majority of what you are seeing. I'm not sure there is a good answer, unfortunately. Ralph On 2/12/08 7:46 AM, "Jeff Squyres" wrote: Developers -- When you add a new component, framework, or anything that includes one or more new directories: please be sure to set the svn:ignore property on each new directory properly. Here's the SVN docs on the svn:ignore property: http://svnbook.red-bean.com/en/1.4/svn-book.html#svn.advanced.props.special.ig nore It is proper to ignore all automatically-generated files, such as (but not limited to): *.la *.lo .libs .deps .dirstamp Makefile Makefile.in static-components.h ...etc. Thanks. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
[OMPI devel] Newest PLPA
Hey! This was *just* discussed on the list yesterday and I said that we needed to use the official 3rd party import SVN procedures for PLPA. This was *NOT* done here! I also said that I would do an actual PLPA release before it was imported into Open MPI so that we could have an official drop rather than someone grabbing an arbitrary PLPA release. Even worse, the new PLPA was mixed in with other code in a single SVN commit. Bad, bad, bad! r17443 should be backed out immediately and done properly. On Feb 13, 2008, at 8:09 AM, shar...@osl.iu.edu wrote: Author: sharonm Date: 2008-02-13 08:09:11 EST (Wed, 13 Feb 2008) New Revision: 17443 URL: https://svn.open-mpi.org/trac/ompi/changeset/17443 Log: Replaced PLPA to the latest PLPA (plpa-1.1a3r123) Text files modified: trunk/ompi/mca/btl/openib/ btl_openib_component.c| 4 trunk/opal/mca/paffinity/base/ base.h| 6 trunk/opal/mca/paffinity/base/ paffinity_base_wrappers.c |13 +- trunk/opal/mca/paffinity/linux/ paffinity_linux_module.c |24 ++-- trunk/opal/mca/paffinity/linux/plpa/src/libplpa/ plpa_bottom.h | 9 trunk/opal/mca/paffinity/linux/plpa/src/libplpa/ plpa_map.c | 218 +++ trunk/opal/mca/paffinity/linux/plpa/src/plpa-info/plpa- info.c |23 ++- trunk/opal/mca/paffinity/linux/plpa/src/plpa-taskset/plpa- taskset.c |15 +- trunk/opal/mca/paffinity/ paffinity.h|12 +- trunk/opal/mca/paffinity/solaris/ paffinity_solaris_module.c |18 +- trunk/opal/mca/paffinity/windows/ paffinity_windows_module.c |18 +- 11 files changed, 224 insertions(+), 136 deletions(-) Modified: trunk/ompi/mca/btl/openib/btl_openib_component.c = = = = = = = = == --- trunk/ompi/mca/btl/openib/btl_openib_component.c(original) +++ trunk/ompi/mca/btl/openib/btl_openib_component.c 2008-02-13 08:09:11 EST (Wed, 13 Feb 2008) @@ -1175,10 +1175,10 @@ { opal_paffinity_base_cpu_set_t cpus; opal_carto_base_node_t *hca_node; -int min_distance = -1, i, max_proc_id; +int min_distance = -1, i, max_proc_id, num_processors; const char *hca = ibv_get_device_name(dev); -if(opal_paffinity_base_max_processor_id(&max_proc_id) != OMPI_SUCCESS) +if(opal_paffinity_base_get_processor_info(&num_processors, &max_proc_id) != OMPI_SUCCESS) max_proc_id = 100; /* Choose something big enough */ hca_node = carto_base_find_node(host_topo, hca); Modified: trunk/opal/mca/paffinity/base/base.h = = = = = = = = == --- trunk/opal/mca/paffinity/base/base.h(original) +++ trunk/opal/mca/paffinity/base/base.h 2008-02-13 08:09:11 EST (Wed, 13 Feb 2008) @@ -167,7 +167,7 @@ * @return int - OPAL_SUCCESS or OPAL_ERR_NOT_SUPPORTED if not * supported */ -OPAL_DECLSPEC int opal_paffinity_base_max_processor_id(int *max_processor_id); +OPAL_DECLSPEC int opal_paffinity_base_get_processor_info(int *num_processors, int *max_processor_id); /** * Return the max socket number @@ -177,7 +177,7 @@ * @return int - OPAL_SUCCESS or OPAL_ERR_NOT_SUPPORTED if not * supported */ -OPAL_DECLSPEC int opal_paffinity_base_max_socket(int *max_socket); +OPAL_DECLSPEC int opal_paffinity_base_get_socket_info(int *num_sockets, int *max_socket_num); /** * Return the max core number for a given socket @@ -188,7 +188,7 @@ * @return int - OPAL_SUCCESS or OPAL_ERR_NOT_SUPPORTED if not * supported */ -OPAL_DECLSPEC int opal_paffinity_base_max_core(int socket, int *max_core); +OPAL_DECLSPEC int opal_paffinity_base_get_core_info(int socket, int *num_cores, int *max_core_num); /** * Indication of whether a component was successfully selected or Modified: trunk/opal/mca/paffinity/base/paffinity_base_wrappers.c = = = = = = = = == --- trunk/opal/mca/paffinity/base/paffinity_base_wrappers.c (original) +++ trunk/opal/mca/paffinity/base/paffinity_base_wrappers.c 2008-02-13 08:09:11 EST (Wed, 13 Feb 2008) @@ -63,27 +63,28 @@ return opal_paffinity_base_module- >paff_map_to_socket_core(processor_id, socket, core); } -int opal_paffinity_base_max_processor_id(int *max_processor_id) + +int opal_paffinity_base_get_processor_info(int *num_processors, int *max_processor_id) { if (!opal_paffinity_base_selected) { return OPAL_ERR_NOT_FOUND; } -return opal_paffinity_base_module- >paff_max_processor_id(max_processor_id); +return opal_paffinity_base_module- >paff_get_processor_info(num_processors, max_processor_id); } -int opal_paffinity_base_max_socke
Re: [OMPI devel] btl_openib_rnr_retry MCA param
Ok. I'll clean up the description of that MCA param to state that it only applies to SRQs. Thanks. On Feb 13, 2008, at 12:59 AM, Gleb Natapov wrote: On Tue, Feb 12, 2008 at 05:41:13PM -0500, Jeff Squyres wrote: I see that in the OOB CPC for the openib BTL, when setting up the send side of the QP, we set the rnr_retry value depending on whether the remote receive queue is a per-peer or SRQ: - SRQ: btl_openib_rnr_retry MCA param value - PP: 0 The rationale given in a comment is that setting the RNR to 0 is a good way to find bugs in our flow control. Do we really want this in production builds? Or do we want 0 for developer builds and the same btl_openib_rnr_retry value for PP queues? The comment is mine and IMO it should stay that way for production builds. SW flow control either work or it doesn't and if it doesn't I prefer to know about it immediately. Setting PP to some value greater then 0 just delays the manifestation of the problem and in the case of iWarp such possibility doesn't even exists. -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] btl_openib_rnr_retry MCA param
Actually, we should then also print out a different error message when RNR occurs in PP QP's, too. It should be something along the lines of "flow control problem occurred; this shouldn't happen..." (right now it says RNR happened, and goes into detail into what that means -- but that's not the real problem). I'll do that as well. On Feb 13, 2008, at 12:59 AM, Gleb Natapov wrote: On Tue, Feb 12, 2008 at 05:41:13PM -0500, Jeff Squyres wrote: I see that in the OOB CPC for the openib BTL, when setting up the send side of the QP, we set the rnr_retry value depending on whether the remote receive queue is a per-peer or SRQ: - SRQ: btl_openib_rnr_retry MCA param value - PP: 0 The rationale given in a comment is that setting the RNR to 0 is a good way to find bugs in our flow control. Do we really want this in production builds? Or do we want 0 for developer builds and the same btl_openib_rnr_retry value for PP queues? The comment is mine and IMO it should stay that way for production builds. SW flow control either work or it doesn't and if it doesn't I prefer to know about it immediately. Setting PP to some value greater then 0 just delays the manifestation of the problem and in the case of iWarp such possibility doesn't even exists. -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] btl_openib_rnr_retry MCA param
On Wed, Feb 13, 2008 at 09:05:24AM -0500, Jeff Squyres wrote: > Actually, we should then also print out a different error message when > RNR occurs in PP QP's, too. It should be something along the lines of > "flow control problem occurred; this shouldn't happen..." (right now > it says RNR happened, and goes into detail into what that means -- but > that's not the real problem). > Good point. > I'll do that as well. Thanks! > > > On Feb 13, 2008, at 12:59 AM, Gleb Natapov wrote: > > > On Tue, Feb 12, 2008 at 05:41:13PM -0500, Jeff Squyres wrote: > >> I see that in the OOB CPC for the openib BTL, when setting up the > >> send > >> side of the QP, we set the rnr_retry value depending on whether the > >> remote receive queue is a per-peer or SRQ: > >> > >> - SRQ: btl_openib_rnr_retry MCA param value > >> - PP: 0 > >> > >> The rationale given in a comment is that setting the RNR to 0 is a > >> good way to find bugs in our flow control. > >> > >> Do we really want this in production builds? Or do we want 0 for > >> developer builds and the same btl_openib_rnr_retry value for PP > >> queues? > >> > > The comment is mine and IMO it should stay that way for production > > builds. SW flow control either work or it doesn't and if it doesn't I > > prefer to know about it immediately. Setting PP to some value greater > > then 0 just delays the manifestation of the problem and in the case of > > iWarp such possibility doesn't even exists. > > > > -- > > Gleb. > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > Cisco Systems > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.
Re: [OMPI devel] [RFC] Remove explicit call to progress() from ob1.
Good enough for me. I'd also say that the comments should be fixed. :-) On Feb 13, 2008, at 3:24 AM, Gleb Natapov wrote: On Tue, Feb 12, 2008 at 05:57:22PM -0500, Jeff Squyres wrote: Were these supposed to cover the time required for pinning and unpinning? That what the comment says, but CPU executes code and not comments :) Memory pinning happens inside prepare_dst() after prepare_dst() returns the memory is already pinned. If you want to call progress after each call to prepare_dst() you still can do it by setting recv_pipeline_depth to 1. And unpinning happens in entirely different place after RDMA completion is acknowledged. Can you explain why you think they're unnecessary? The much better question is "Why they are necessary?", because if there is not good answer to this question then they should be removed, since they are harmful as they cause uncontrollable recursion calls. On Feb 12, 2008, at 5:27 AM, Gleb Natapov wrote: Hi, I am planning to commit the following patch. Those two progress() calls are responsible for most of our deep recursion troubles. And I also think they are completely unnecessary. diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c b/ompi/mca/pml/ob1/ pml_ob1_recvreq.c index 5899243..641176e 100644 --- a/ompi/mca/pml/ob1/pml_ob1_recvreq.c +++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.c @@ -704,9 +704,6 @@ int mca_pml_ob1_recv_request_schedule_once( mca_bml_base_free(bml_btl,dst); continue; } - -/* run progress as the prepare (pinning) can take some time */ -mca_bml.bml_progress(); } return OMPI_SUCCESS; diff --git a/ompi/mca/pml/ob1/pml_ob1_sendreq.c b/ompi/mca/pml/ob1/ pml_ob1_sendreq.c index 0998a05..9d7f3f9 100644 --- a/ompi/mca/pml/ob1/pml_ob1_sendreq.c +++ b/ompi/mca/pml/ob1/pml_ob1_sendreq.c @@ -968,7 +968,6 @@ cannot_pack: mca_bml_base_free(bml_btl,des); continue; } -mca_bml.bml_progress(); } return OMPI_SUCCESS; -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Scheduled merge of ORTE devel branch to trunk
Hi Ralph - Eliminating the dependence of OMPI on the GPR is in some ways actually a plus, as it should make it much easier to enhance the GPR as an optional advanced capability. In general, it would be great if OMPI/ORTE could make it easier to support this sort of extension mechanism, for example by evolving the framework mechanism to a general plugin mechanism supporting dynamic components as well as statically compiled in ones. Probably this is what you meant by dynamic binary modules below. > That said, it would be relatively simple to add an extension that provided a > level of data storage that user-level programs could access. It would not > provide any subscription or trigger capabilities, however - we need to leave > those out of the system to avoid reintroducing the event-driven problems > again. But if you just wanted to store and retrieve data for sharing it > across processes, that could be provided with minimal effort or impact. Yes, this is what I had in mind. I do not understand the problem with event-driven capabilities however; so long as these are only used in some applications and not used for OMPI they should not compromise OMPI. Even given a storage-only GPR, it should be possible for an application to use the RML to accomplish much the same thing. Also, whether there are problems (such as deadlock) with asynchronous, event driven interactions is largely an issue of the interaction patterns employed, and can be managed by careful design of the higher level applications and their interactions. > Another alternative: there is a separate "ORTE" project in Europe that is > building extensions to our ORTE - they are tracking these code changes, Sounds interesting - how would one find out more about this? - Doug On Tue, 12 Feb 2008, Ralph Castain wrote: > Hi Doug > > The changes are rather far-reaching. We essentially revamped the entire RTE > to switch from an event-driven architecture to one based on sequential > logic. This had large benefits, but the GPR was the casualty. Remember, the > aim for the past year has been to create a dedicated "lean, mean OMPI > machine"! > > That said, it would be relatively simple to add an extension that provided a > level of data storage that user-level programs could access. It would not > provide any subscription or trigger capabilities, however - we need to leave > those out of the system to avoid reintroducing the event-driven problems > again. But if you just wanted to store and retrieve data for sharing it > across processes, that could be provided with minimal effort or impact. > Probably best done as a compile-time optional module, though, to avoid > adding to the memory footprint for everyone. > > Another alternative: there is a separate "ORTE" project in Europe that is > building extensions to our ORTE - they are tracking these code changes, but > adding "bolt-ons" such as a GPR-like central data store, hooks for workflow > management and the grid, multi-cluster operations, etc. I'm working with > them on those efforts - if there is interest in such capabilities, I can > probably look into architecting things so that some of the "bolt-ons" could > be dynamically picked up by OMPI as binary modules or something. > > For now, though, there will be no GPR-like storage in the new system. > Ralph > > > > On 2/12/08 1:43 PM, "Doug Tody" wrote: > > > Hi Ralph - > > > > How extensive are the changes involved in removing the GPR? How hard would > > it be for someone to maintain an enhanced version of this as an addon or > > compile-time optional module? Thanks. > > > > - Doug > > > > > > On Mon, 11 Feb 2008, Ralph Castain wrote: > > > >> Hello all > >> > >> Per last week's telecon, we planned the merge of the latest ORTE devel > >> branch to the OMPI trunk for after Sun had committed its C++ changes. That > >> happened over the weekend. > >> > >> Therefore, based on the requests at the telecon, I will be merging the > >> current ORTE devel branch to the trunk on Wed 2/13. I'll make the commit > >> around 4:30pm Eastern time - will send out warning shortly before the > >> commit > >> to let you know it is coming. I'll advise of any delays. > >> > >> This will be a snapshot of that devel branch - it will include the upgraded > >> launch system, remove the GPR, add the new tool communication library, > >> allow > >> arbitrary mpiruns to interconnect, supports the revamped hostfile and > >> dash-host behaviors per the wiki, etc. > >> > >> However, it is incomplete and contains some known flaws. For example, > >> totalview support has not been enabled yet. Comm_spawn, which is currently > >> broken on the OMPI trunk, is fixed - but singleton comm_spawn remains > >> broken. I am in the process of establishing support for direct and > >> standalone launch capabilities, but those won't be in the merge. I have > >> updated all of the launchers, but can only certify the SLURM, TM, and RSH > >> ones to work - the Xgrid
[OMPI devel] --with-visibility
Just curious -- is there a reason we don't have --with-visibility enabled by default on platforms that support it? It seems like a useful mechanism. Also, I notice that we don't have an output line in configure that shows if visibility was enabled or not. Can it be added? -- Jeff Squyres Cisco Systems