Re: [OMPI devel] OMPI 1.3.4 ETA ? (TLAs FTW)

2009-09-29 Thread Jeff Squyres

FWIW, here's the v1.3.x bug report we review every week:

   https://svn.open-mpi.org/trac/ompi/report/14

I still have one "blocker" bug (coll sm) that seems to creep  
asymptotically close to completion but never seems to get all the way  
there.  :-(



On Sep 28, 2009, at 8:50 AM, Terry Dontje wrote:


Ralph Castain wrote:
> I am not one of the 1.3 release managers, but do serve as  
gatekeeper.

> From what I see in the automated nightly tests, we are certainly no
> earlier than 3-4 weeks from release.
>
> Lots of errors in the nightly tests, and no visible high-priority
> effort under way to identify root causes and fix them. So it looks
> like this will be a little while.
>
Well, I am working on identifying root causes for the Sun failures.
Definitely in regards to paffinity but I am working my way beyond  
that.

Though I might be invisible.

--td
> Ralph
>
>
> On Sep 27, 2009, at 5:31 PM, Chris Samuel wrote:
>
>> Hi folks,
>>
>> Just wondered if there was any idea of when OMPI 1.3.4
>> might be released ?  I know the correct answer is "when
>> it's ready" (:-)) but was curious if there was any thoughts
>> on a timeframe ?
>>
>> The cpuset aware CPU affinity code would be very useful
>> to us to fix up some codes that sometimes get stuck sharing
>> cores with others free (presumably a kernel process scheduler
>> bug)..
>>
>> cheers!
>> Chris
>> --
>> Christopher Samuel - (03) 9925 4751 - Systems Manager
>> The Victorian Partnership for Advanced Computing
>> P.O. Box 201, Carlton South, VIC 3053, Australia
>> VPAC is a not-for-profit Registered Research Agency
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com



[OMPI devel] RFC: IPv6 support ***REMINDER***

2009-09-29 Thread Ralph Castain

On Sep 16, 2009, at 9:53 PM, Ralph Castain wrote:

WHAT: change the IPv6 configuration option to enable IPv6 if and  
only if specifically requested


WHY: IPv6 support is only marginally maintained, and is currently  
broken yet again. The current default setting is causing user  
systems to break if (a) their kernel has support for IPv6, but (b)  
the system administrator has not actually configured the interfaces  
to use IPv6.


TIMEOUT: end of Sept

SCOPE: OMPI trunk + 1.3.4

DETAIL:
There appears to have been an unfortunate change in the way OMPI  
supports IPv6. Early on, we had collectively agreed to disable IPv6  
support unless specifically instructed to build it. This was decided  
because IPv6 support was shaky, at best, and used by only a small  
portion of the community. Given the lack of committed resources to  
maintain it, we felt at that time that enabling it by default would  
cause an inordinate amount of trouble.


Unfortunately, at some point someone changed this default behavior.  
We now enable IPv6 support by default if the system has the required  
header files. This test is inadequate as it in no way determines  
that the support is active. The current result of this test is to  
not only cause all the IPv6-related code to compile, but to actually  
require that every TCP interface provide an IPv6 socket.


This latter requirement causes OMPI to abort on any system where the  
header files exist, but the system admin has not configured every  
TCP interface to have an IPv6 address...a situation which is proving  
fairly common.


The proposed change will heal the current breakage, and can be  
reversed at some future time if adequate IPv6 maintenance commitment  
exists. In the meantime, it will allow me to quit the continual  
litany of telling users to manually --disable-ipv6, and allow OMPI  
to run out-of-the-box again.


Ralph





Re: [OMPI devel] mca_pml_ob1_rdma_btls and leave_pinned logic

2009-09-29 Thread Roman Cheplyaka
Thanks for your explanation, George. However:
suppose we have leave_pinned = FALSE. Then we go to
mca_mpool_rdma_find. There we try to find suitable registration in the
cache. Suppose we cannot (cache's empty). Then NULL registration is
returned, BTL is skipped and 0 is returned from mca_pml_ob1_rdma_btls.
This way RDMA never happends.

(To make it clear: this is not some theoretical situation, this is
what I'm actually observing. OpenMPI 1.2.9) Can somebody explain
please?

2009/9/29 George Bosilca :
> Roman,
>
> Before going into explaining the logic, let me state that the memory is
> registered (if required/supported) by the BTLs. However, this is done only
> at the moment when the memory segment is involved in any kind of
> communication.
>
> We do not want to replicate this at the PML level, in order to make sure
> that the amount of memory registered at any moment is minimal. In other
> words, the PML leave the decision on when to register and when to unregister
> to the BTLs. However, in order to speedup the code a little bit (and to keep
> things tidy), the PML will help the BTLs to work around the memory
> registration issue. And the code you pointed out is exactly the place where
> we do it.
>
> We need to register the memory if leave_pinned is TRUE, as registering will
> leave a trace. If leave_pinned is FALSE then we only check if somehow this
> memory is not already registered (by some BTL). In this case, there is no
> need to create a registration in the PML (if required the BTL will do it
> when needed).
>
>  george.
>
> On Sep 28, 2009, at 13:44 , Roman Cheplyaka wrote:
>
>> Hi,
>> I'm trying to dig into OpenMPI sources but have some problems. Can
>> anyone explain the logic of the following code from
>> mca/pml/ob1/pml_ob1_rdma.c please?
>>
>>           if(!mca_pml_ob1.leave_pinned) {
>>               /* look through existing registrations */
>>               btl_mpool->mpool_find(btl_mpool, base, size, ®);
>>           } else {
>>               /* register the memory */
>>               btl_mpool->mpool_register(btl_mpool, base, size, 0, ®);
>>           }
>>
>> It seems to me that we should register new memory if leave_pinned is
>> FALSE (i.e. no existing registrations avaliable) and search through
>> existing registrations otherwise, but the logic is inversed here.
>>
>> --
>> Roman I. Cheplyaka
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



-- 
Roman I. Cheplyaka



Re: [OMPI devel] [OMPI svn] svn:open-mpi r22014

2009-09-29 Thread Ethan Mallove
On Mon, Sep/28/2009 03:11:46PM, Ethan Mallove wrote:
> On Mon, Sep/28/2009 02:05:14PM, Jeff Squyres wrote:
> > Try a newer compiler than gcc 3.4 -- it's pretty ancient.
> 
> I don't get the warning with 4.1.2 either.

To get the warning I needed to enable some developer configure options (e.g.,
mkdir .svn && configure). 

The below patch gets rid of the warning, but is it the right way?

--- ompi/debuggers/debuggers.h
+++ ompi/debuggers/debuggers.h
@@ -40,6 +40,11 @@
  */
 OMPI_DECLSPEC void ompi_debugger_notify_abort(char *string);

+/**
+ * Breakpoint function for parallel debuggers.
+ */
+OMPI_DECLSPEC void *MPIR_Breakpoint(void);
+
 END_C_DECLS

 #endif /* OMPI_DEBUGGERS_H */

-Ethan


> 
> -Ethan
> 
> >
> >
> > On Sep 28, 2009, at 2:03 PM, Ethan Mallove wrote:
> >
> >> On Fri, Sep/25/2009 09:31:51PM, Ralph Castain wrote:
> >> > I think there is a problem with this change - here is a warning I get 
> >> when
> >> > compiling on Mac and Linux:
> >> >
> >> > ompi_debuggers.c:265: warning: no previous prototype for 
> >> ?MPIR_Breakpoint?
> >> >
> >> > Can you please take a look?
> >>
> >> Can you send me your config.log file? I can't reproduce the warning
> >> using GCC (3.4.6) on RHEL 4.
> >>
> >> -Ethan
> >>
> >> >
> >> > Thanks
> >> > Ralph
> >> >
> >> > On Sep 25, 2009, at 1:14 PM, emall...@osl.iu.edu wrote:
> >> >
> >> >> Author: emallove
> >> >> Date: 2009-09-25 15:14:19 EDT (Fri, 25 Sep 2009)
> >> >> New Revision: 22014
> >> >> URL: https://svn.open-mpi.org/trac/ompi/changeset/22014
> >> >>
> >> >> Log:
> >> >> Remove `static` from `MPIR_Breakpoint` so Intel compilers will not 
> >> inline
> >> >> it
> >> >>
> >> >> Text files modified:
> >> >>   trunk/ompi/debuggers/ompi_debuggers.c | 2 +-
> >> >>   1 files changed, 1 insertions(+), 1 deletions(-)
> >> >>
> >> >> Modified: trunk/ompi/debuggers/ompi_debuggers.c
> >> >> 
> >> ==
> >> >> --- trunk/ompi/debuggers/ompi_debuggers.c(original)
> >> >> +++ trunk/ompi/debuggers/ompi_debuggers.c2009-09-25 15:14:19 EDT 
> >> (Fri, 25
> >> >> Sep 2009)
> >> >> @@ -261,7 +261,7 @@
> >> >>  * defined in orterun for the starter.  It should never conflict with
> >> >>  * this one, but we'll make it static, just to be sure.
> >> >>  */
> >> >> -static void *MPIR_Breakpoint(void)
> >> >> +void *MPIR_Breakpoint(void)
> >> >> {
> >> >> return NULL;
> >> >> }
> >> >> ___
> >> >> svn mailing list
> >> >> s...@open-mpi.org
> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/svn
> >> >
> >> >
> >> > ___
> >> > devel mailing list
> >> > de...@open-mpi.org
> >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >> 
> >
> >
> > -- 
> > Jeff Squyres
> > jsquy...@cisco.com
> >
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] [OMPI svn] svn:open-mpi r22014

2009-09-29 Thread Jeff Squyres
I don't think we need to DECLSPEC it, do we?  We don't need (or want)  
this symbol to be visible at the link level when user apps link  
against libmpi.  You might want to put in a comment about why it's not  
static so that we don't repeat this conversation again next year.  ;-)


I think not having it DECLSPEC'ed should still work for the debugger  
(since it worked before when it was static), but if you could test it  
to be sure, that would be great...



On Sep 29, 2009, at 4:03 PM, Ethan Mallove wrote:


On Mon, Sep/28/2009 03:11:46PM, Ethan Mallove wrote:
> On Mon, Sep/28/2009 02:05:14PM, Jeff Squyres wrote:
> > Try a newer compiler than gcc 3.4 -- it's pretty ancient.
>
> I don't get the warning with 4.1.2 either.

To get the warning I needed to enable some developer configure  
options (e.g.,

mkdir .svn && configure).

The below patch gets rid of the warning, but is it the right way?

--- ompi/debuggers/debuggers.h
+++ ompi/debuggers/debuggers.h
@@ -40,6 +40,11 @@
  */
 OMPI_DECLSPEC void ompi_debugger_notify_abort(char *string);

+/**
+ * Breakpoint function for parallel debuggers.
+ */
+OMPI_DECLSPEC void *MPIR_Breakpoint(void);
+
 END_C_DECLS

 #endif /* OMPI_DEBUGGERS_H */

-Ethan


>
> -Ethan
>
> >
> >
> > On Sep 28, 2009, at 2:03 PM, Ethan Mallove wrote:
> >
> >> On Fri, Sep/25/2009 09:31:51PM, Ralph Castain wrote:
> >> > I think there is a problem with this change - here is a  
warning I get

> >> when
> >> > compiling on Mac and Linux:
> >> >
> >> > ompi_debuggers.c:265: warning: no previous prototype for
> >> ‘MPIR_Breakpoint’
> >> >
> >> > Can you please take a look?
> >>
> >> Can you send me your config.log file? I can't reproduce the  
warning

> >> using GCC (3.4.6) on RHEL 4.
> >>
> >> -Ethan
> >>
> >> >
> >> > Thanks
> >> > Ralph
> >> >
> >> > On Sep 25, 2009, at 1:14 PM, emall...@osl.iu.edu wrote:
> >> >
> >> >> Author: emallove
> >> >> Date: 2009-09-25 15:14:19 EDT (Fri, 25 Sep 2009)
> >> >> New Revision: 22014
> >> >> URL: https://svn.open-mpi.org/trac/ompi/changeset/22014
> >> >>
> >> >> Log:
> >> >> Remove `static` from `MPIR_Breakpoint` so Intel compilers  
will not

> >> inline
> >> >> it
> >> >>
> >> >> Text files modified:
> >> >>   trunk/ompi/debuggers/ompi_debuggers.c | 2 +-
> >> >>   1 files changed, 1 insertions(+), 1 deletions(-)
> >> >>
> >> >> Modified: trunk/ompi/debuggers/ompi_debuggers.c
> >> >>
> >>  
= 
= 
= 
= 
= 
= 
= 
= 
==

> >> >> --- trunk/ompi/debuggers/ompi_debuggers.c(original)
> >> >> +++ trunk/ompi/debuggers/ompi_debuggers.c2009-09-25  
15:14:19 EDT

> >> (Fri, 25
> >> >> Sep 2009)
> >> >> @@ -261,7 +261,7 @@
> >> >>  * defined in orterun for the starter.  It should never  
conflict with

> >> >>  * this one, but we'll make it static, just to be sure.
> >> >>  */
> >> >> -static void *MPIR_Breakpoint(void)
> >> >> +void *MPIR_Breakpoint(void)
> >> >> {
> >> >> return NULL;
> >> >> }
> >> >> ___
> >> >> svn mailing list
> >> >> s...@open-mpi.org
> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/svn
> >> >
> >> >
> >> > ___
> >> > devel mailing list
> >> > de...@open-mpi.org
> >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >> 
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> >
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel





--
Jeff Squyres
jsquy...@cisco.com




Re: [OMPI devel] [OMPI svn] svn:open-mpi r22014

2009-09-29 Thread Ralph Castain
The issue isn't why or why not static, Jeff - the issue is that we get  
a compiler warning whenever we do a developer build.


On Sep 29, 2009, at 2:32 PM, Jeff Squyres wrote:

I don't think we need to DECLSPEC it, do we?  We don't need (or  
want) this symbol to be visible at the link level when user apps  
link against libmpi.  You might want to put in a comment about why  
it's not static so that we don't repeat this conversation again next  
year.  ;-)


I think not having it DECLSPEC'ed should still work for the debugger  
(since it worked before when it was static), but if you could test  
it to be sure, that would be great...



On Sep 29, 2009, at 4:03 PM, Ethan Mallove wrote:


On Mon, Sep/28/2009 03:11:46PM, Ethan Mallove wrote:
> On Mon, Sep/28/2009 02:05:14PM, Jeff Squyres wrote:
> > Try a newer compiler than gcc 3.4 -- it's pretty ancient.
>
> I don't get the warning with 4.1.2 either.

To get the warning I needed to enable some developer configure  
options (e.g.,

mkdir .svn && configure).

The below patch gets rid of the warning, but is it the right way?

--- ompi/debuggers/debuggers.h
+++ ompi/debuggers/debuggers.h
@@ -40,6 +40,11 @@
 */
OMPI_DECLSPEC void ompi_debugger_notify_abort(char *string);

+/**
+ * Breakpoint function for parallel debuggers.
+ */
+OMPI_DECLSPEC void *MPIR_Breakpoint(void);
+
END_C_DECLS

#endif /* OMPI_DEBUGGERS_H */

-Ethan


>
> -Ethan
>
> >
> >
> > On Sep 28, 2009, at 2:03 PM, Ethan Mallove wrote:
> >
> >> On Fri, Sep/25/2009 09:31:51PM, Ralph Castain wrote:
> >> > I think there is a problem with this change - here is a  
warning I get

> >> when
> >> > compiling on Mac and Linux:
> >> >
> >> > ompi_debuggers.c:265: warning: no previous prototype for
> >> ‘MPIR_Breakpoint’
> >> >
> >> > Can you please take a look?
> >>
> >> Can you send me your config.log file? I can't reproduce the  
warning

> >> using GCC (3.4.6) on RHEL 4.
> >>
> >> -Ethan
> >>
> >> >
> >> > Thanks
> >> > Ralph
> >> >
> >> > On Sep 25, 2009, at 1:14 PM, emall...@osl.iu.edu wrote:
> >> >
> >> >> Author: emallove
> >> >> Date: 2009-09-25 15:14:19 EDT (Fri, 25 Sep 2009)
> >> >> New Revision: 22014
> >> >> URL: https://svn.open-mpi.org/trac/ompi/changeset/22014
> >> >>
> >> >> Log:
> >> >> Remove `static` from `MPIR_Breakpoint` so Intel compilers  
will not

> >> inline
> >> >> it
> >> >>
> >> >> Text files modified:
> >> >>   trunk/ompi/debuggers/ompi_debuggers.c | 2 +-
> >> >>   1 files changed, 1 insertions(+), 1 deletions(-)
> >> >>
> >> >> Modified: trunk/ompi/debuggers/ompi_debuggers.c
> >> >>
> >>  
= 
= 
= 
= 
= 
= 
= 
= 
= 
=

> >> >> --- trunk/ompi/debuggers/ompi_debuggers.c(original)
> >> >> +++ trunk/ompi/debuggers/ompi_debuggers.c2009-09-25  
15:14:19 EDT

> >> (Fri, 25
> >> >> Sep 2009)
> >> >> @@ -261,7 +261,7 @@
> >> >>  * defined in orterun for the starter.  It should never  
conflict with

> >> >>  * this one, but we'll make it static, just to be sure.
> >> >>  */
> >> >> -static void *MPIR_Breakpoint(void)
> >> >> +void *MPIR_Breakpoint(void)
> >> >> {
> >> >> return NULL;
> >> >> }
> >> >> ___
> >> >> svn mailing list
> >> >> s...@open-mpi.org
> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/svn
> >> >
> >> >
> >> > ___
> >> > devel mailing list
> >> > de...@open-mpi.org
> >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >> 
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> >
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel





--
Jeff Squyres
jsquy...@cisco.com


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] [OMPI svn] svn:open-mpi r22014

2009-09-29 Thread Jeff Squyres

On Sep 29, 2009, at 5:30 PM, Ralph Castain wrote:


The issue isn't why or why not static, Jeff - the issue is that we get
a compiler warning whenever we do a developer build.



Right.  The initial issue was the static-ness, though -- Ethan removed  
the static because some compilers were effectively inlining the  
function (and therefore removing the symbol from the library, making  
the parallel debugger attach stuff not work) presumably because a) the  
function was static, b) the function was short with no side effects,  
and c) the function was only called once within that .c file.


Removing the "static" from the function prototype violated those  
assumptions so that it could no longer be inlined (And therefore the  
symbol definitely appears in the library).  But then we ran across the  
"must be prototyped" warning.


That's where all this came from.  :-)

So -- I still don't think we need to DECLSPEC the prototype.  :-)


On Sep 29, 2009, at 2:32 PM, Jeff Squyres wrote:

> I don't think we need to DECLSPEC it, do we?  We don't need (or
> want) this symbol to be visible at the link level when user apps
> link against libmpi.  You might want to put in a comment about why
> it's not static so that we don't repeat this conversation again next
> year.  ;-)
>
> I think not having it DECLSPEC'ed should still work for the debugger
> (since it worked before when it was static), but if you could test
> it to be sure, that would be great...




--
Jeff Squyres
jsquy...@cisco.com