Re: [OMPI devel] RFC: Move of ompi_bitmap_t

2009-02-02 Thread Terry Dontje
I tend to agree with Brian's comments about the weekly telecon.  There 
should be some way to see the direction of the project and provide 
feedback/objections without having to attend every meeting.  I 
understand this does clog things up because now the project has to some 
indeterminate time and process to follow to get an approval.  It seems 
we need a better/fixed process that we can agree on to be able to supply 
members a way to provide feedback and a project not to be stopped 
indefinitely for an allgather of the community.


The RFC process was suppose to supply the framework for the process but 
it seems like now one needs to attend meetings to be able to rebut ideas 
that are going to be committed.


--td


Brian Barrett wrote:
While I would love to be involved in this change, as I believe it's 
critical it get done right and have some reservations based on the 
work we did while a bunch of us were still at LANL, I just don't have 
time for yet another weekly telecon (particularly since 2:00 MST is 
the same as an existing weekly telecon).


I still think my objections stand, however.  A weekly telecon to 
discuss the issues is no replacement for a detailed explanation of how 
things are going to work, as well as some proof of concept code.  We 
should hold this change up to the same standard we hold all major 
changes to -- which means a working temp branch with negligible 
performance impact.


Brian

On Feb 1, 2009, at 12:14 PM, Graham, Richard L. wrote:


Brian,
 Just fyi, there is a weekly call - thursdays at 4 est where we have 
been discussyng these issues.

 Let's touch base at the forum.

Rich

- Original Message -
From: devel-boun...@open-mpi.org 
To: Open MPI Developers 
Sent: Sun Feb 01 10:36:33 2009
Subject: Re: [OMPI devel] RFC: Move of ompi_bitmap_t

In that case, I remove my objection to this particular RFC.  It
remains for all other RFCs related to moving any of the BTL move code
to the trunk before the critical issues with the BTL move have been
sorted out in a temporary branch.  This includes renaming functions
and such.  Perhaps we should have a discussion about those issues
during the Forum in a couple weeks?

Brian

On Feb 1, 2009, at 5:37 AM, Jeff Squyres wrote:


I just looked through both opal_bitmap_t and ompi_bitmap_t and I
think that the only real difference is that in the ompi version, we
check (in various places) that the size of the bitmap never grows
beyond OMPI_FORTRAN_HANDLE_MAX; the opal version doesn't do these
kind of size checks.

I think it would be fairly straightforward to:

- add generic checks into the opal version, perhaps by adding a new
API call (opal_bitmap_set_max_size())
- if the max size has been set, then ensure that the bitmap never
grows beyond that size, otherwise let it have the same behavior as
today (grow without bound -- assumedly until malloc() fails)

It'll take a little care to ensure to merge the functionality
correctly, but it is possible.  Once that is done, you can:

- remove the ompi_bitmap_t class
- s/ompi_bitmap/opal_bitmap/g in the OMPI layer
- add new calls to opal_bitmap_set_max_size(&bitmap,
OMPI_FORTRAN_HANDLE_MAX) in the OMPI layer (should only be in a few
places -- probably one for each MPI handle type...?  It's been so
long since I've looked at that code that I don't remember offhand)

I'd generally be in favor of this because, although this is not a
lot of repeated code, it *is* repeated code -- so cleaning it up and
consolidating the non-Fortran stuff down in opal is not a Bad Thing.


On Jan 30, 2009, at 4:59 PM, Ralph Castain wrote:


The history is simple. Originally, there was one bitmap_t in orte
that was also used in ompi. Then the folks working on Fortran found
that they had to put a limit in the bitmap code to avoid getting
values outside of Fortran's range. However, this introduced a
problem - if we had the limit in the orte version, then we limited
ourselves unnecessarily, and introduced some abstraction questions
since orte knows nothing about Fortran.

So two were created. Then the orte_bitmap_t was blown away at a
later time when we removed the GPR as George felt it wasn't
necessary (which was true). It was later reborn when we needed it
in the routed system, but this time it was done in opal as others
indicated a potential more general use for that capability.

The problem with uniting the two is that you either have to
introduce Fortran-based limits into opal (which messes up the non-
ompi uses), or deal with the Fortran limits in some other fashion.
Neither is particularly pleasant, though it could be done.

I think it primarily is a question for the Fortran folks to address
- can they deal with Fortran limits in some other manner without
making the code unmanageable and/or taking a performance hit?

Ralph


On Jan 30, 2009, at 2:40 PM, Richard Graham wrote:


This should really be viewed as a code maintenance RFC.  The
reason this
came up in the first place is because we are investigating the b

Re: [OMPI devel] RFC: Move of ompi_bitmap_t

2009-02-02 Thread Graham, Richard L.
Let's take the comment on a meeting within comtext, just as a reminder this is 
going on.  The wiki page is still up and being udated.  What has been described 
as the goal has not changed, and the commitment to providing a temp branch with 
the changes has not changed.

The first rfc for changing constants names is likely to be out this week, with 
changes expected to go into the trunk a week later.  If people want an advanced 
look, take a look at the wiki page.  This has been discussed quite a bit 
outside of the group working on this, and is in the catagory of general cleanup 
that also sets the code base up for where we want to go.

We are still trying to figure out all the details for actually getting the btl 
move done.  

Rich

- Original Message -
From: devel-boun...@open-mpi.org 
To: Open MPI Developers 
Sent: Mon Feb 02 06:48:22 2009
Subject: Re: [OMPI devel] RFC: Move of ompi_bitmap_t

I tend to agree with Brian's comments about the weekly telecon.  There 
should be some way to see the direction of the project and provide 
feedback/objections without having to attend every meeting.  I 
understand this does clog things up because now the project has to some 
indeterminate time and process to follow to get an approval.  It seems 
we need a better/fixed process that we can agree on to be able to supply 
members a way to provide feedback and a project not to be stopped 
indefinitely for an allgather of the community.

The RFC process was suppose to supply the framework for the process but 
it seems like now one needs to attend meetings to be able to rebut ideas 
that are going to be committed.

--td


Brian Barrett wrote:
> While I would love to be involved in this change, as I believe it's 
> critical it get done right and have some reservations based on the 
> work we did while a bunch of us were still at LANL, I just don't have 
> time for yet another weekly telecon (particularly since 2:00 MST is 
> the same as an existing weekly telecon).
>
> I still think my objections stand, however.  A weekly telecon to 
> discuss the issues is no replacement for a detailed explanation of how 
> things are going to work, as well as some proof of concept code.  We 
> should hold this change up to the same standard we hold all major 
> changes to -- which means a working temp branch with negligible 
> performance impact.
>
> Brian
>
> On Feb 1, 2009, at 12:14 PM, Graham, Richard L. wrote:
>
>> Brian,
>>  Just fyi, there is a weekly call - thursdays at 4 est where we have 
>> been discussyng these issues.
>>  Let's touch base at the forum.
>>
>> Rich
>>
>> - Original Message -
>> From: devel-boun...@open-mpi.org 
>> To: Open MPI Developers 
>> Sent: Sun Feb 01 10:36:33 2009
>> Subject: Re: [OMPI devel] RFC: Move of ompi_bitmap_t
>>
>> In that case, I remove my objection to this particular RFC.  It
>> remains for all other RFCs related to moving any of the BTL move code
>> to the trunk before the critical issues with the BTL move have been
>> sorted out in a temporary branch.  This includes renaming functions
>> and such.  Perhaps we should have a discussion about those issues
>> during the Forum in a couple weeks?
>>
>> Brian
>>
>> On Feb 1, 2009, at 5:37 AM, Jeff Squyres wrote:
>>
>>> I just looked through both opal_bitmap_t and ompi_bitmap_t and I
>>> think that the only real difference is that in the ompi version, we
>>> check (in various places) that the size of the bitmap never grows
>>> beyond OMPI_FORTRAN_HANDLE_MAX; the opal version doesn't do these
>>> kind of size checks.
>>>
>>> I think it would be fairly straightforward to:
>>>
>>> - add generic checks into the opal version, perhaps by adding a new
>>> API call (opal_bitmap_set_max_size())
>>> - if the max size has been set, then ensure that the bitmap never
>>> grows beyond that size, otherwise let it have the same behavior as
>>> today (grow without bound -- assumedly until malloc() fails)
>>>
>>> It'll take a little care to ensure to merge the functionality
>>> correctly, but it is possible.  Once that is done, you can:
>>>
>>> - remove the ompi_bitmap_t class
>>> - s/ompi_bitmap/opal_bitmap/g in the OMPI layer
>>> - add new calls to opal_bitmap_set_max_size(&bitmap,
>>> OMPI_FORTRAN_HANDLE_MAX) in the OMPI layer (should only be in a few
>>> places -- probably one for each MPI handle type...?  It's been so
>>> long since I've looked at that code that I don't remember offhand)
>>>
>>> I'd generally be in favor of this because, although this is not a
>>> lot of repeated code, it *is* repeated code -- so cleaning it up and
>>> consolidating the non-Fortran stuff down in opal is not a Bad Thing.
>>>
>>>
>>> On Jan 30, 2009, at 4:59 PM, Ralph Castain wrote:
>>>
 The history is simple. Originally, there was one bitmap_t in orte
 that was also used in ompi. Then the folks working on Fortran found
 that they had to put a limit in the bitmap code to avoid getting
 values outside of Fortran's range. However, this introduced

Re: [OMPI devel] RFC: Adding OMPI_CHECK_WITHDIR checks

2009-02-02 Thread Matney Sr, Kenneth D.
Assuming that this update went through, there were no negative
side-effects for portals.
-- 
Ken


-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: Tuesday, January 27, 2009 11:51 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] RFC: Adding OMPI_CHECK_WITHDIR checks

There was no disagreement about this issue on the teleconf today.  So  
if no one disagrees here on the list, I'll commit this stuff to the  
trunk at COB on Thursday 29 Jan 2009 (i.e., the RFC timeout).


On Jan 20, 2009, at 7:44 PM, Jeff Squyres wrote:

> What: Adding OMPI_CHECK_WITHDIR checks in various .m4 files
>
> Why: Help prevent user errors via --with-=DIR configure options
>
> Where: config/*m4 and */mca/*/*/configure.m4 files, affecting the  
> following environments:
> - bproc (***)
> - gm (***)
> - loadleveler (***)
> - lsf
> - mx (***)
> - open fabrics
> - portals (***)
> - psm (***)
> - tm
> - udapl
> - elan (***)
> - sctp
> - blcr (***)
> - libnuma
> - valgrind
> ===> I could not easily test the (***) environments
>
> When: For OMPI v1.4 (could be convinced to make it for v1.3.1)
>
> Timeout: COB Thursday, Jan 29, 2009
>
> 
>
> The intent for OMPI v1.3's new OMPI_CHECK_WITHDIR m4 macro was to  
> fix a case where a user was doing the following:
>
>  ./configure --with-openib=/path/to/nonexistent/OFED/installation
>
> ...but configure succeeded anyway because the sysadmins had  
> installed OFED into /usr.  Hence, the user was getting something  
> unexpected.
>
> OMPI_CHECK_WITHDIR does a very basic sanity check on directories  
> provided by --with-=DIR configure options.  Specifically, it  
> checks if the directory exists and if a token file exists in that  
> directory (specifically, it calls "ls ", so wildcards  
> are acceptable).  If either of those tests fail, configure aborts  
> with an appropriate error message.  This macro was used in the  
> openib BTL configure stuff, but we didn't add it anywhere else.  I'm  
> now adding it everywhere we have a --with-=DIR, which are in  
> various .m4 files in the environments described above.
>
> Here's the hg where I added OMPI_CHECK_WITHDIR to all the  
> environments listed above, but was unable to test the (***)  
> environments:
>
>http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/ 
> ompi_check_withdir/
>
> We could bring this patch to v1.3.1 or we could wait until v1.4.  I  
> don't really care either way.
>
> I plan to bring this work into the trunk next Thursday COB; it would  
> be great if those who have the (*) environments could pull down the  
> hg tree before then and give it a whirl so we can fix any problems  
> beforehand.
>
> -- 
> Jeff Squyres
> Cisco Systems
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] RFC: Adding OMPI_CHECK_WITHDIR checks

2009-02-02 Thread Jeff Squyres

On Feb 2, 2009, at 8:10 AM, Matney Sr, Kenneth D. wrote:


Assuming that this update went through, there were no negative
side-effects for portals.


Good to hear -- thanks!

(yes, it went through :-) )

--
Jeff Squyres
Cisco Systems