Re: [OMPI devel] OMPI 1.3.4 ETA ? (TLAs FTW)
FWIW, here's the v1.3.x bug report we review every week: https://svn.open-mpi.org/trac/ompi/report/14 I still have one "blocker" bug (coll sm) that seems to creep asymptotically close to completion but never seems to get all the way there. :-( On Sep 28, 2009, at 8:50 AM, Terry Dontje wrote: Ralph Castain wrote: > I am not one of the 1.3 release managers, but do serve as gatekeeper. > From what I see in the automated nightly tests, we are certainly no > earlier than 3-4 weeks from release. > > Lots of errors in the nightly tests, and no visible high-priority > effort under way to identify root causes and fix them. So it looks > like this will be a little while. > Well, I am working on identifying root causes for the Sun failures. Definitely in regards to paffinity but I am working my way beyond that. Though I might be invisible. --td > Ralph > > > On Sep 27, 2009, at 5:31 PM, Chris Samuel wrote: > >> Hi folks, >> >> Just wondered if there was any idea of when OMPI 1.3.4 >> might be released ? I know the correct answer is "when >> it's ready" (:-)) but was curious if there was any thoughts >> on a timeframe ? >> >> The cpuset aware CPU affinity code would be very useful >> to us to fix up some codes that sometimes get stuck sharing >> cores with others free (presumably a kernel process scheduler >> bug).. >> >> cheers! >> Chris >> -- >> Christopher Samuel - (03) 9925 4751 - Systems Manager >> The Victorian Partnership for Advanced Computing >> P.O. Box 201, Carlton South, VIC 3053, Australia >> VPAC is a not-for-profit Registered Research Agency >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com
[OMPI devel] RFC: IPv6 support ***REMINDER***
On Sep 16, 2009, at 9:53 PM, Ralph Castain wrote: WHAT: change the IPv6 configuration option to enable IPv6 if and only if specifically requested WHY: IPv6 support is only marginally maintained, and is currently broken yet again. The current default setting is causing user systems to break if (a) their kernel has support for IPv6, but (b) the system administrator has not actually configured the interfaces to use IPv6. TIMEOUT: end of Sept SCOPE: OMPI trunk + 1.3.4 DETAIL: There appears to have been an unfortunate change in the way OMPI supports IPv6. Early on, we had collectively agreed to disable IPv6 support unless specifically instructed to build it. This was decided because IPv6 support was shaky, at best, and used by only a small portion of the community. Given the lack of committed resources to maintain it, we felt at that time that enabling it by default would cause an inordinate amount of trouble. Unfortunately, at some point someone changed this default behavior. We now enable IPv6 support by default if the system has the required header files. This test is inadequate as it in no way determines that the support is active. The current result of this test is to not only cause all the IPv6-related code to compile, but to actually require that every TCP interface provide an IPv6 socket. This latter requirement causes OMPI to abort on any system where the header files exist, but the system admin has not configured every TCP interface to have an IPv6 address...a situation which is proving fairly common. The proposed change will heal the current breakage, and can be reversed at some future time if adequate IPv6 maintenance commitment exists. In the meantime, it will allow me to quit the continual litany of telling users to manually --disable-ipv6, and allow OMPI to run out-of-the-box again. Ralph
Re: [OMPI devel] mca_pml_ob1_rdma_btls and leave_pinned logic
Thanks for your explanation, George. However: suppose we have leave_pinned = FALSE. Then we go to mca_mpool_rdma_find. There we try to find suitable registration in the cache. Suppose we cannot (cache's empty). Then NULL registration is returned, BTL is skipped and 0 is returned from mca_pml_ob1_rdma_btls. This way RDMA never happends. (To make it clear: this is not some theoretical situation, this is what I'm actually observing. OpenMPI 1.2.9) Can somebody explain please? 2009/9/29 George Bosilca : > Roman, > > Before going into explaining the logic, let me state that the memory is > registered (if required/supported) by the BTLs. However, this is done only > at the moment when the memory segment is involved in any kind of > communication. > > We do not want to replicate this at the PML level, in order to make sure > that the amount of memory registered at any moment is minimal. In other > words, the PML leave the decision on when to register and when to unregister > to the BTLs. However, in order to speedup the code a little bit (and to keep > things tidy), the PML will help the BTLs to work around the memory > registration issue. And the code you pointed out is exactly the place where > we do it. > > We need to register the memory if leave_pinned is TRUE, as registering will > leave a trace. If leave_pinned is FALSE then we only check if somehow this > memory is not already registered (by some BTL). In this case, there is no > need to create a registration in the PML (if required the BTL will do it > when needed). > > george. > > On Sep 28, 2009, at 13:44 , Roman Cheplyaka wrote: > >> Hi, >> I'm trying to dig into OpenMPI sources but have some problems. Can >> anyone explain the logic of the following code from >> mca/pml/ob1/pml_ob1_rdma.c please? >> >> if(!mca_pml_ob1.leave_pinned) { >> /* look through existing registrations */ >> btl_mpool->mpool_find(btl_mpool, base, size, ®); >> } else { >> /* register the memory */ >> btl_mpool->mpool_register(btl_mpool, base, size, 0, ®); >> } >> >> It seems to me that we should register new memory if leave_pinned is >> FALSE (i.e. no existing registrations avaliable) and search through >> existing registrations otherwise, but the logic is inversed here. >> >> -- >> Roman I. Cheplyaka >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Roman I. Cheplyaka
Re: [OMPI devel] [OMPI svn] svn:open-mpi r22014
On Mon, Sep/28/2009 03:11:46PM, Ethan Mallove wrote: > On Mon, Sep/28/2009 02:05:14PM, Jeff Squyres wrote: > > Try a newer compiler than gcc 3.4 -- it's pretty ancient. > > I don't get the warning with 4.1.2 either. To get the warning I needed to enable some developer configure options (e.g., mkdir .svn && configure). The below patch gets rid of the warning, but is it the right way? --- ompi/debuggers/debuggers.h +++ ompi/debuggers/debuggers.h @@ -40,6 +40,11 @@ */ OMPI_DECLSPEC void ompi_debugger_notify_abort(char *string); +/** + * Breakpoint function for parallel debuggers. + */ +OMPI_DECLSPEC void *MPIR_Breakpoint(void); + END_C_DECLS #endif /* OMPI_DEBUGGERS_H */ -Ethan > > -Ethan > > > > > > > On Sep 28, 2009, at 2:03 PM, Ethan Mallove wrote: > > > >> On Fri, Sep/25/2009 09:31:51PM, Ralph Castain wrote: > >> > I think there is a problem with this change - here is a warning I get > >> when > >> > compiling on Mac and Linux: > >> > > >> > ompi_debuggers.c:265: warning: no previous prototype for > >> ?MPIR_Breakpoint? > >> > > >> > Can you please take a look? > >> > >> Can you send me your config.log file? I can't reproduce the warning > >> using GCC (3.4.6) on RHEL 4. > >> > >> -Ethan > >> > >> > > >> > Thanks > >> > Ralph > >> > > >> > On Sep 25, 2009, at 1:14 PM, emall...@osl.iu.edu wrote: > >> > > >> >> Author: emallove > >> >> Date: 2009-09-25 15:14:19 EDT (Fri, 25 Sep 2009) > >> >> New Revision: 22014 > >> >> URL: https://svn.open-mpi.org/trac/ompi/changeset/22014 > >> >> > >> >> Log: > >> >> Remove `static` from `MPIR_Breakpoint` so Intel compilers will not > >> inline > >> >> it > >> >> > >> >> Text files modified: > >> >> trunk/ompi/debuggers/ompi_debuggers.c | 2 +- > >> >> 1 files changed, 1 insertions(+), 1 deletions(-) > >> >> > >> >> Modified: trunk/ompi/debuggers/ompi_debuggers.c > >> >> > >> == > >> >> --- trunk/ompi/debuggers/ompi_debuggers.c(original) > >> >> +++ trunk/ompi/debuggers/ompi_debuggers.c2009-09-25 15:14:19 EDT > >> (Fri, 25 > >> >> Sep 2009) > >> >> @@ -261,7 +261,7 @@ > >> >> * defined in orterun for the starter. It should never conflict with > >> >> * this one, but we'll make it static, just to be sure. > >> >> */ > >> >> -static void *MPIR_Breakpoint(void) > >> >> +void *MPIR_Breakpoint(void) > >> >> { > >> >> return NULL; > >> >> } > >> >> ___ > >> >> svn mailing list > >> >> s...@open-mpi.org > >> >> http://www.open-mpi.org/mailman/listinfo.cgi/svn > >> > > >> > > >> > ___ > >> > devel mailing list > >> > de...@open-mpi.org > >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> > >> > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r22014
I don't think we need to DECLSPEC it, do we? We don't need (or want) this symbol to be visible at the link level when user apps link against libmpi. You might want to put in a comment about why it's not static so that we don't repeat this conversation again next year. ;-) I think not having it DECLSPEC'ed should still work for the debugger (since it worked before when it was static), but if you could test it to be sure, that would be great... On Sep 29, 2009, at 4:03 PM, Ethan Mallove wrote: On Mon, Sep/28/2009 03:11:46PM, Ethan Mallove wrote: > On Mon, Sep/28/2009 02:05:14PM, Jeff Squyres wrote: > > Try a newer compiler than gcc 3.4 -- it's pretty ancient. > > I don't get the warning with 4.1.2 either. To get the warning I needed to enable some developer configure options (e.g., mkdir .svn && configure). The below patch gets rid of the warning, but is it the right way? --- ompi/debuggers/debuggers.h +++ ompi/debuggers/debuggers.h @@ -40,6 +40,11 @@ */ OMPI_DECLSPEC void ompi_debugger_notify_abort(char *string); +/** + * Breakpoint function for parallel debuggers. + */ +OMPI_DECLSPEC void *MPIR_Breakpoint(void); + END_C_DECLS #endif /* OMPI_DEBUGGERS_H */ -Ethan > > -Ethan > > > > > > > On Sep 28, 2009, at 2:03 PM, Ethan Mallove wrote: > > > >> On Fri, Sep/25/2009 09:31:51PM, Ralph Castain wrote: > >> > I think there is a problem with this change - here is a warning I get > >> when > >> > compiling on Mac and Linux: > >> > > >> > ompi_debuggers.c:265: warning: no previous prototype for > >> ‘MPIR_Breakpoint’ > >> > > >> > Can you please take a look? > >> > >> Can you send me your config.log file? I can't reproduce the warning > >> using GCC (3.4.6) on RHEL 4. > >> > >> -Ethan > >> > >> > > >> > Thanks > >> > Ralph > >> > > >> > On Sep 25, 2009, at 1:14 PM, emall...@osl.iu.edu wrote: > >> > > >> >> Author: emallove > >> >> Date: 2009-09-25 15:14:19 EDT (Fri, 25 Sep 2009) > >> >> New Revision: 22014 > >> >> URL: https://svn.open-mpi.org/trac/ompi/changeset/22014 > >> >> > >> >> Log: > >> >> Remove `static` from `MPIR_Breakpoint` so Intel compilers will not > >> inline > >> >> it > >> >> > >> >> Text files modified: > >> >> trunk/ompi/debuggers/ompi_debuggers.c | 2 +- > >> >> 1 files changed, 1 insertions(+), 1 deletions(-) > >> >> > >> >> Modified: trunk/ompi/debuggers/ompi_debuggers.c > >> >> > >> = = = = = = = = == > >> >> --- trunk/ompi/debuggers/ompi_debuggers.c(original) > >> >> +++ trunk/ompi/debuggers/ompi_debuggers.c2009-09-25 15:14:19 EDT > >> (Fri, 25 > >> >> Sep 2009) > >> >> @@ -261,7 +261,7 @@ > >> >> * defined in orterun for the starter. It should never conflict with > >> >> * this one, but we'll make it static, just to be sure. > >> >> */ > >> >> -static void *MPIR_Breakpoint(void) > >> >> +void *MPIR_Breakpoint(void) > >> >> { > >> >> return NULL; > >> >> } > >> >> ___ > >> >> svn mailing list > >> >> s...@open-mpi.org > >> >> http://www.open-mpi.org/mailman/listinfo.cgi/svn > >> > > >> > > >> > ___ > >> > devel mailing list > >> > de...@open-mpi.org > >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> > >> > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI devel] [OMPI svn] svn:open-mpi r22014
The issue isn't why or why not static, Jeff - the issue is that we get a compiler warning whenever we do a developer build. On Sep 29, 2009, at 2:32 PM, Jeff Squyres wrote: I don't think we need to DECLSPEC it, do we? We don't need (or want) this symbol to be visible at the link level when user apps link against libmpi. You might want to put in a comment about why it's not static so that we don't repeat this conversation again next year. ;-) I think not having it DECLSPEC'ed should still work for the debugger (since it worked before when it was static), but if you could test it to be sure, that would be great... On Sep 29, 2009, at 4:03 PM, Ethan Mallove wrote: On Mon, Sep/28/2009 03:11:46PM, Ethan Mallove wrote: > On Mon, Sep/28/2009 02:05:14PM, Jeff Squyres wrote: > > Try a newer compiler than gcc 3.4 -- it's pretty ancient. > > I don't get the warning with 4.1.2 either. To get the warning I needed to enable some developer configure options (e.g., mkdir .svn && configure). The below patch gets rid of the warning, but is it the right way? --- ompi/debuggers/debuggers.h +++ ompi/debuggers/debuggers.h @@ -40,6 +40,11 @@ */ OMPI_DECLSPEC void ompi_debugger_notify_abort(char *string); +/** + * Breakpoint function for parallel debuggers. + */ +OMPI_DECLSPEC void *MPIR_Breakpoint(void); + END_C_DECLS #endif /* OMPI_DEBUGGERS_H */ -Ethan > > -Ethan > > > > > > > On Sep 28, 2009, at 2:03 PM, Ethan Mallove wrote: > > > >> On Fri, Sep/25/2009 09:31:51PM, Ralph Castain wrote: > >> > I think there is a problem with this change - here is a warning I get > >> when > >> > compiling on Mac and Linux: > >> > > >> > ompi_debuggers.c:265: warning: no previous prototype for > >> ‘MPIR_Breakpoint’ > >> > > >> > Can you please take a look? > >> > >> Can you send me your config.log file? I can't reproduce the warning > >> using GCC (3.4.6) on RHEL 4. > >> > >> -Ethan > >> > >> > > >> > Thanks > >> > Ralph > >> > > >> > On Sep 25, 2009, at 1:14 PM, emall...@osl.iu.edu wrote: > >> > > >> >> Author: emallove > >> >> Date: 2009-09-25 15:14:19 EDT (Fri, 25 Sep 2009) > >> >> New Revision: 22014 > >> >> URL: https://svn.open-mpi.org/trac/ompi/changeset/22014 > >> >> > >> >> Log: > >> >> Remove `static` from `MPIR_Breakpoint` so Intel compilers will not > >> inline > >> >> it > >> >> > >> >> Text files modified: > >> >> trunk/ompi/debuggers/ompi_debuggers.c | 2 +- > >> >> 1 files changed, 1 insertions(+), 1 deletions(-) > >> >> > >> >> Modified: trunk/ompi/debuggers/ompi_debuggers.c > >> >> > >> = = = = = = = = = = > >> >> --- trunk/ompi/debuggers/ompi_debuggers.c(original) > >> >> +++ trunk/ompi/debuggers/ompi_debuggers.c2009-09-25 15:14:19 EDT > >> (Fri, 25 > >> >> Sep 2009) > >> >> @@ -261,7 +261,7 @@ > >> >> * defined in orterun for the starter. It should never conflict with > >> >> * this one, but we'll make it static, just to be sure. > >> >> */ > >> >> -static void *MPIR_Breakpoint(void) > >> >> +void *MPIR_Breakpoint(void) > >> >> { > >> >> return NULL; > >> >> } > >> >> ___ > >> >> svn mailing list > >> >> s...@open-mpi.org > >> >> http://www.open-mpi.org/mailman/listinfo.cgi/svn > >> > > >> > > >> > ___ > >> > devel mailing list > >> > de...@open-mpi.org > >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> > >> > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r22014
On Sep 29, 2009, at 5:30 PM, Ralph Castain wrote: The issue isn't why or why not static, Jeff - the issue is that we get a compiler warning whenever we do a developer build. Right. The initial issue was the static-ness, though -- Ethan removed the static because some compilers were effectively inlining the function (and therefore removing the symbol from the library, making the parallel debugger attach stuff not work) presumably because a) the function was static, b) the function was short with no side effects, and c) the function was only called once within that .c file. Removing the "static" from the function prototype violated those assumptions so that it could no longer be inlined (And therefore the symbol definitely appears in the library). But then we ran across the "must be prototyped" warning. That's where all this came from. :-) So -- I still don't think we need to DECLSPEC the prototype. :-) On Sep 29, 2009, at 2:32 PM, Jeff Squyres wrote: > I don't think we need to DECLSPEC it, do we? We don't need (or > want) this symbol to be visible at the link level when user apps > link against libmpi. You might want to put in a comment about why > it's not static so that we don't repeat this conversation again next > year. ;-) > > I think not having it DECLSPEC'ed should still work for the debugger > (since it worked before when it was static), but if you could test > it to be sure, that would be great... -- Jeff Squyres jsquy...@cisco.com