Re: [OMPI devel] RFC: move BTLs out of ompi into separate layer

Jeff Squyres Sat, 14 Mar 2009 09:00:45 -0400

Brian --

Thanks for such a detailed answer!  This helps clarify many things.



On Mar 11, 2009, at 1:31 PM, Brian W. Barrett wrote:

On Wed, 11 Mar 2009, Richard Graham wrote:

> Brian,
> Going back over the e-mail trail it seems like you have raised two
> concerns:
> - BTL performance after the change, which I would take to be
>   - btl latency
>   - btl bandwidth
> - Code maintainability
> - repeated code changes that impact a large number of files
> - A demonstration that the changes actually achieve their goal. Aswe> discussed after you got off the call, there are two separate goalshere
>   - being able to use the btl?s outside the context of mpi, but
> within the ompi code base
>   - ability to use the btl?s in the context of a run-time other than
> orte
> Another concern I have heard raised by others is
>   - mpi startup time
>
> Has anything else been missed here ? I would like to make surethat we
> address all the issues raised in the next version of the RFC.
I think the umbrella concerns for the final success of the changeare btlperformance (in particular, latency and message rates for cache-unfriendlyapplications/benchmarks) and code maintainability. In addition,there aresome intermediate change issues I have, in that this project isworking
different than other large changes.  In particular, there is/was the
appearance of being asked to accept changes which only make sense ifthe
btl move is going to move forward, without any way to judge the
performance or code impact because critical technical issues stillremain.
The latency/message rate issues are fairly straight forward from anendmeasure point-of-view. My concerns on latency/message rate come notfromthe movement of the BTL to another library (for most operatingsystems /
shared library systems that should be negligible), but from the code
changes which surround moving the BTLs. The BTLs are tightlyintertwinedwith a number of pieces of the OMPI layer, in particular the BML andMPoolframeworks and the ompi proc structure. I had a productiveconversationwith Rainer this morning explaining why I'm so concerned about thebml andompi proc structures. The ompi proc structure currently acts notonly asthe identifier for a remote endpoint, but stores endpoint specificdatafor both the PML and BML. The BML structure actually contains eachBTL's
per process endpoint information, in the form of the base_endpoint_t*
structures returned from add_procs(). Moving these structuresaround mustbe done with care, as some of the proposals Jeff, Rainer, and I cameupwith this morning either induced spaghetti code or greatly increasedthespread of information needed for the critical send path through thememory
space (thereby likely increasing cache misses on send for real
applications).
The code maintainability issue comes from three separate andindependent
issues.  First, there is the issue of how the pieces of the OMPI layer
will interact after the move. The BML/BTL/MPool/Rcache dance isalreadycomplicated, and care should be taken to minimize that change.Start-up
is also already quite complex, and moving the BTLs to make them
independent of starting other pieces of Open MPI can be done well orcan
be done poorly.  We need to ensure it's done well, obviously.  Second,
there is the issue of wire-up.  My impression from conversations with
everyone at ORNL was that this move of BTLs would include changes toallowBTLs to wire-up without the RML. I understand that Rich said thiswas not
the case during the part of the admin meeting I missed yesterday, so
that may no longer be a concern. Finally, there has been somediscussion,
mainly second hand in my case, about the mechanisms in which the trunk
would be modified to allow for using OMPI without ORTE. I haveconcernsthat we'd add complexity to the BTLs to achieve that, and again thatcanbe done poorly if we're not careful. Talking with Jeff and Rainerthismorning helped reduce my concern in this area, but I think it alsoaddedto the technical issues with must be solved to consider this projectready
for movement to the trunk.

There are a couple of technical issues which I believe prevent a
reasonable discussion of the performance and maintainability issuesbasedon the current branch. I talked about some of them in the previoustwo
paragraphs, but so that we have a short bullet list, they are:

   - How will the ompi_proc_t be handled?  In particular,
     where will PML/BML data be stored, and how will we
     avoid adding new cache misses.
   - How will the BML and MPool be handled?  The BML holds
     the BTL endpoint data, so changes have to be made if
     it continues to live in OMPI.
   - How will the modex and the intricate dance with adding
     new procs from dynamic processes be handled?
   - How will we handle the progress mechanisms in cases where
     the MTLs are used and the BTLs aren't needed by the RTE?
   - If there are users outside of OMPI, but who want to also use
     OMPI, how will the library versioning / conflict problem be
     solved?
> As was mentioned before, our time frame for this is measured inweeks,
> and not in months.  I believe the date of May 1st was mentioned to
> coincide with the next feature release.

While I understand your deadline, we have in the past been very
conservative with such large changes. The C/R work was delayed forover a
year because people were concerned with the impact to performance and
maintainability. ORTE work is consistently delayed in the name ofcodestability. I believe that changing our desire for high quality codeinthe trunk because of an organization's deadline (particularly whenother
organizations are successfully using branches to meet their deadlines)
sets a poor precedent and goes against previous precedents.

Similarly, my concern with the intermediate changes which have been
proposed or occurred come from the slippery-slope argument. Changeswhichare really only necessary for the btl move (even general codecleanups)should only occur once we're all sure the btl move will work.Otherwise,we're impacting other developers (many of who are working on tempbranches
attempting to get a feature to completion, as our normal process
dictates) in order to reach an end point which may not beachievable. Intalking to Rainer this morning with Jeff, I think we came up with anumberof ideas on how to mitigate this impact and find a better balancewhichallows ORNL to answer the critical technical questions (which arenot justmine, but are shared by others and are critical to the "make itwork" partof the process) and allows the rest of the community some beliefthat we
can avoid any permanent harm if the move doesn't work out.
> One thing that should help when the naming changes are applied isthat> this is scripted, and the script can be made available for othersthat
> are working on temp branches ? which includes us, also.

That unfortunately doesn't help other developers, if they're trying to
strictly follow the version control changes to the trunk. Theproblem isthat we're going to get all those moves (hopefully the script nowsvn moves
instead of svn rm / svn add) through the version control system.  The
script would then cause all the changes to occur a second time, andthatcould be very problematic. The problem with the version controlchangesfiltering down is that it is not all-encompassing. For example, svnwillhave problems if the btl directory moves but I have my own privatespecialBTL. Yes, i might be able to use your scripts to handle that, butif theyaren't written with that scenario in mind, they won't help. It alsowon'thelp if I've added a particular file to an existing BTL and the BTLthen
moves.
I think these cases are worth the pain to non-ORNL developers *IF*all theother issues are addressed. Otherwise, we're unfairly asking themto dealwith a radically changing code base for an incomplete project, asituation
we've worked to avoid in the past.
Hopefully this explains my thoughts on the btl move. I'm notopposed tothe move itself (although I reserve the right to become opposed,based onperformance and maintainability issues). I have a problem with thechange
in process from previous large, invasive changes.

Hope this helps,

Brian
<ATT4444789.txt>



--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] RFC: move BTLs out of ompi into separate layer

Reply via email to