I agree with the goal - we'll have to work this out at a later time. One key will be maintaining a memory-efficient mapping of opal_identifier to an RTE identifier, which typically requires some notion of launch grouping and rank within that grouping.
On Jul 23, 2014, at 7:36 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > A BTL should be completely agnostic to the notions of vpid and jobid. > Unfortunately, as you mentioned, some of the BTLs are relying on this > information in diverses ways. > > - If they rely for output purposes, this is a trivial matter as a BTL is > supposed to rely upward any error and some upper layer will decide how to > handle it. As the callers are in the OMPI layer, they can output the > meaningful message (including rank and what not). > > - Some other BTLs use this information to create connections. Clearly not the > best decision, as it bit us for quite some time (as an example being the > major reason preventing SM support across different MPI worlds). Moreover, > other programming paradigms that can use the BTLs, are not subject to a > rank-base concept. Thus, this usage should be banned and replaced by a more > sensible approach (to be defined). Until then, the current solution provide > an acceptable band-aid. > > George. > > PS: The PML and MTL remaining at the OMPI later do not create any issues with > accessing the local or the MPI rank. > > On Jul 23, 2014, at 22:19 , Ralph Castain <r...@open-mpi.org> wrote: > >> Sounds reasonable. However, keep in mind that some BTLs actually require the >> notion of a jobid and rank-within-that-job. If the current ones don't, I >> assure you that at least one off-trunk one definitely does >> >> Some of the MTL's, of course, definitely rely on those fields. >> >> >> On Jul 23, 2014, at 7:15 PM, George Bosilca <bosi...@icl.utk.edu> wrote: >> >>> I was struggling with a similar issue while trying to fix the OpenIB >>> compilation. And I choose to implement a different approach, which does not >>> require knowledge of what’s inside opal_process_name_t. >>> >>> Look in opal/util/proc.h. You should be able to use: opal_process_name_vpid >>> and opal_process_name_jobid. They will remain there until we figure out a >>> nice way to get rid of them completely. >>> >>> HINT: I personally prefer to get rid of void and jobid completely. As long >>> as need the info only for a visual clue, the output of OPAL_NAME_PRINT >>> might be enough. >>> >>> George. >>> >>> On Jul 23, 2014, at 22:11 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> >>> wrote: >>> >>>> Ralph and I chatted in IM. >>>> >>>> For the moment, I'm masking off the lower 32 bits to get the VPID, the >>>> uppermost 16 as the job family, and the next 16 as the sub-family. >>>> >>>> If George makes the name be a handle with accessors to get the parts, we >>>> can switch to using that. >>>> >>>> >>>> >>>> On Jul 23, 2014, at 9:57 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>> >>>>> You should be able to memcpy it to an ompi_process_name_t and then >>>>> extract it as usual >>>>> >>>>> >>>>> On Jul 23, 2014, at 6:51 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> >>>>> wrote: >>>>> >>>>>> George -- >>>>>> >>>>>> Is there a way to get the MPI_COMM_WORLD rank of an opal_process_name_t? >>>>>> >>>>>> I am currently outputting some information about peer processes in the >>>>>> usnic BTL to include the peer's VPID, which is the MCW rank. I'll be >>>>>> sad if that goes away... >>>>>> >>>>>> >>>>>> On Jul 15, 2014, at 2:06 AM, George Bosilca <bosi...@icl.utk.edu> wrote: >>>>>> >>>>>>> Ralph, >>>>>>> >>>>>>> There are two reasons that prevent me from pushing this RFC forward. >>>>>>> >>>>>>> 1. Minor: The code has some minor issues related to the last set of >>>>>>> BTL/PML changes, and I didn't found the time to fix them. >>>>>>> >>>>>>> 2. Major: Not all BTLs have been updated and validated. What we need at >>>>>>> this point from their respective developers is a little help with the >>>>>>> validation process. We need to validate that the new code works as >>>>>>> expected and passes all tests. >>>>>>> >>>>>>> The move will be ready to go as soon as all BTL developers raise the >>>>>>> green flag. I got it from Jeff (but the last USNIC commit broke >>>>>>> something), and myself. In other words, TCP, self, SM and USNIC are >>>>>>> good to go. For the others, as I didn't heard back from their >>>>>>> developers/maintainers, I assume they are not yet ready. Here I am >>>>>>> referring to OpenIB, Portals4, Scif, smcuda, ugni, usnic and vader. >>>>>>> >>>>>>> George. >>>>>>> >>>>>>> PS: As a reminder the code is available at >>>>>>> https://bitbucket.org/bosilca/ompi-btl >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jul 11, 2014 at 3:17 PM, Pritchard, Howard P <howa...@lanl.gov> >>>>>>> wrote: >>>>>>> Hi Folks, >>>>>>> >>>>>>> Now work is planned for the uGNI BTL at this time either. >>>>>>> >>>>>>> Howard >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff >>>>>>> Squyres (jsquyres) >>>>>>> Sent: Thursday, July 10, 2014 5:04 PM >>>>>>> To: Open MPI Developers List >>>>>>> Subject: Re: [OMPI devel] RFC: Move the Open MPI communication >>>>>>> infrastructure in OPAL >>>>>>> >>>>>>> FWIW: I can't speak for other BTL maintainers, but I'm out of the >>>>>>> office for the next week, and the usnic BTL will be standing still >>>>>>> during that time. Once I return, I will be making additional changes >>>>>>> in the usnic BTL (new features, updates, ...etc.). >>>>>>> >>>>>>> So if you have the cycles, doing it in the next week or so would be >>>>>>> good because at least there will be no conflicts with usnic BTL >>>>>>> concurrent development. :-) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Jul 10, 2014, at 2:56 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>>>> >>>>>>>> George: any update on when this will happen? >>>>>>>> >>>>>>>> >>>>>>>> On Jun 4, 2014, at 9:14 PM, George Bosilca <bosi...@icl.utk.edu> wrote: >>>>>>>> >>>>>>>>> WHAT: Open our low-level communication infrastructure by moving all >>>>>>>>> necessary components >>>>>>>>> (btl/rcache/allocator/mpool) down in OPAL >>>>>>>>> >>>>>>>>> WHY: All the components required for inter-process communications are >>>>>>>>> currently deeply integrated in the OMPI >>>>>>>>> layer. Several groups/institutions have express interest >>>>>>>>> in having a more generic communication >>>>>>>>> infrastructure, without all the OMPI layer dependencies. >>>>>>>>> This communication layer should be made >>>>>>>>> available at a different software level, available to all >>>>>>>>> layers in the Open MPI software stack. As an >>>>>>>>> example, our ORTE layer could replace the current OOB and >>>>>>>>> instead use the BTL directly, gaining >>>>>>>>> access to more reactive network interfaces than TCP. >>>>>>>>> Similarly, external software libraries could take >>>>>>>>> advantage of our highly optimized AM (active message) >>>>>>>>> communication layer for their own purpose. >>>>>>>>> >>>>>>>>> UTK with support from Sandia, developped a version of >>>>>>>>> Open MPI where the entire communication >>>>>>>>> infrastucture has been moved down to OPAL >>>>>>>>> (btl/rcache/allocator/mpool). Most of the moved >>>>>>>>> components have been updated to match the new schema, >>>>>>>>> with few exceptions (mainly BTLs >>>>>>>>> where I have no way of compiling/testing them). Thus, the >>>>>>>>> completion of this RFC is tied to >>>>>>>>> being able to completing this move for all BTLs. For this >>>>>>>>> we need help from the rest of the Open MPI >>>>>>>>> community, especially those supporting some of the BTLs. >>>>>>>>> A non-exhaustive list of BTLs that >>>>>>>>> qualify here is: mx, portals4, scif, udapl, ugni, usnic. >>>>>>>>> >>>>>>>>> WHERE: bitbucket.org/bosilca/ompi-btl (updated today with respect to >>>>>>>>> trunk r31952) >>>>>>>>> >>>>>>>>> TIMEOUT: After all the BTLs have been amended to match the new >>>>>>>>> location and usage. We will discuss >>>>>>>>> the last bits regarding this RFC at the Open MPI >>>>>>>>> developers meeting in Chicago, June 24-26. The >>>>>>>>> RFC will become final only after the meeting. >>>>>>>>> _______________________________________________ >>>>>>>>> devel mailing list >>>>>>>>> de...@open-mpi.org >>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>>> Link to this post: >>>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/06/14974.php >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> devel mailing list >>>>>>>> de...@open-mpi.org >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>> Link to this post: >>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15100.php >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jeff Squyres >>>>>>> jsquy...@cisco.com >>>>>>> For corporate legal information go to: >>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing list >>>>>>> de...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15104.php >>>>>>> _______________________________________________ >>>>>>> devel mailing list >>>>>>> de...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15111.php >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing list >>>>>>> de...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15142.php >>>>>> >>>>>> >>>>>> -- >>>>>> Jeff Squyres >>>>>> jsquy...@cisco.com >>>>>> For corporate legal information go to: >>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15225.php >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15226.php >>>> >>>> >>>> -- >>>> Jeff Squyres >>>> jsquy...@cisco.com >>>> For corporate legal information go to: >>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2014/07/15227.php >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/07/15228.php >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15230.php > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15231.php