I agree with the goal - we'll have to work this out at a later time. One key 
will be maintaining a memory-efficient mapping of opal_identifier to an RTE 
identifier, which typically requires some notion of launch grouping and rank 
within that grouping.


On Jul 23, 2014, at 7:36 PM, George Bosilca <bosi...@icl.utk.edu> wrote:

> A BTL should be completely agnostic to the notions of vpid and jobid.  
> Unfortunately, as you mentioned, some of the BTLs are relying on this 
> information in diverses ways.
> 
> - If they rely for output purposes, this is a trivial matter as a BTL is 
> supposed to rely upward any error and some upper layer will decide how to 
> handle it. As the callers are in the OMPI layer, they can output the 
> meaningful message (including rank and what not).
> 
> - Some other BTLs use this information to create connections. Clearly not the 
> best decision, as it bit us for quite some time (as an example being the 
> major reason preventing SM support across different MPI worlds). Moreover, 
> other programming paradigms that can use the BTLs, are not subject to a 
> rank-base concept. Thus, this usage should be banned and replaced by a more 
> sensible approach (to be defined). Until then, the current solution provide 
> an acceptable band-aid.
> 
> George. 
> 
> PS: The PML and MTL remaining at the OMPI later do not create any issues with 
> accessing the local or the MPI rank.
> 
> On Jul 23, 2014, at 22:19 , Ralph Castain <r...@open-mpi.org> wrote:
> 
>> Sounds reasonable. However, keep in mind that some BTLs actually require the 
>> notion of a jobid and rank-within-that-job. If the current ones don't, I 
>> assure you that at least one off-trunk one definitely does
>> 
>> Some of the MTL's, of course, definitely rely on those fields.
>> 
>> 
>> On Jul 23, 2014, at 7:15 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
>> 
>>> I was struggling with a similar issue while trying to fix the OpenIB 
>>> compilation. And I choose to implement a different approach, which does not 
>>> require knowledge of what’s inside opal_process_name_t.
>>> 
>>> Look in opal/util/proc.h. You should be able to use: opal_process_name_vpid 
>>> and opal_process_name_jobid. They will remain there until we figure out a 
>>> nice way to get rid of them completely.
>>> 
>>> HINT: I personally prefer to get rid of void and jobid completely. As long 
>>> as need the info only for a visual clue, the output of OPAL_NAME_PRINT 
>>> might be enough.
>>> 
>>> George.
>>> 
>>> On Jul 23, 2014, at 22:11 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>>> wrote:
>>> 
>>>> Ralph and I chatted in IM.
>>>> 
>>>> For the moment, I'm masking off the lower 32 bits to get the VPID, the 
>>>> uppermost 16 as the job family, and the next 16 as the sub-family.
>>>> 
>>>> If George makes the name be a handle with accessors to get the parts, we 
>>>> can switch to using that.
>>>> 
>>>> 
>>>> 
>>>> On Jul 23, 2014, at 9:57 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>> 
>>>>> You should be able to memcpy it to an ompi_process_name_t and then 
>>>>> extract it as usual
>>>>> 
>>>>> 
>>>>> On Jul 23, 2014, at 6:51 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>>>>> wrote:
>>>>> 
>>>>>> George --
>>>>>> 
>>>>>> Is there a way to get the MPI_COMM_WORLD rank of an opal_process_name_t?
>>>>>> 
>>>>>> I am currently outputting some information about peer processes in the 
>>>>>> usnic BTL to include the peer's VPID, which is the MCW rank.  I'll be 
>>>>>> sad if that goes away...
>>>>>> 
>>>>>> 
>>>>>> On Jul 15, 2014, at 2:06 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>>>> 
>>>>>>> Ralph,
>>>>>>> 
>>>>>>> There are two reasons that prevent me from pushing this RFC forward.
>>>>>>> 
>>>>>>> 1. Minor: The code has some minor issues related to the last set of 
>>>>>>> BTL/PML changes, and I didn't found the time to fix them.
>>>>>>> 
>>>>>>> 2. Major: Not all BTLs have been updated and validated. What we need at 
>>>>>>> this point from their respective developers is a little help with the 
>>>>>>> validation process. We need to validate that the new code works as 
>>>>>>> expected and passes all tests.
>>>>>>> 
>>>>>>> The move will be ready to go as soon as all BTL developers raise the 
>>>>>>> green flag. I got it from Jeff (but the last USNIC commit broke 
>>>>>>> something), and myself. In other words, TCP, self, SM and USNIC are 
>>>>>>> good to go. For the others, as I didn't heard back from their 
>>>>>>> developers/maintainers, I assume they are not yet ready. Here I am 
>>>>>>> referring to OpenIB, Portals4, Scif, smcuda, ugni, usnic and vader.
>>>>>>> 
>>>>>>> George.
>>>>>>> 
>>>>>>> PS: As a reminder the code is available at 
>>>>>>> https://bitbucket.org/bosilca/ompi-btl
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Jul 11, 2014 at 3:17 PM, Pritchard, Howard P <howa...@lanl.gov> 
>>>>>>> wrote:
>>>>>>> Hi Folks,
>>>>>>> 
>>>>>>> Now work is planned for the uGNI BTL at this time either.
>>>>>>> 
>>>>>>> Howard
>>>>>>> 
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff 
>>>>>>> Squyres (jsquyres)
>>>>>>> Sent: Thursday, July 10, 2014 5:04 PM
>>>>>>> To: Open MPI Developers List
>>>>>>> Subject: Re: [OMPI devel] RFC: Move the Open MPI communication 
>>>>>>> infrastructure in OPAL
>>>>>>> 
>>>>>>> FWIW: I can't speak for other BTL maintainers, but I'm out of the 
>>>>>>> office for the next week, and the usnic BTL will be standing still 
>>>>>>> during that time.  Once I return, I will be making additional changes 
>>>>>>> in the usnic BTL (new features, updates, ...etc.).
>>>>>>> 
>>>>>>> So if you have the cycles, doing it in the next week or so would be 
>>>>>>> good because at least there will be no conflicts with usnic BTL 
>>>>>>> concurrent development.  :-)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Jul 10, 2014, at 2:56 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>>>> 
>>>>>>>> George: any update on when this will happen?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Jun 4, 2014, at 9:14 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>>>>>> 
>>>>>>>>> WHAT:    Open our low-level communication infrastructure by moving all
>>>>>>>>> necessary components
>>>>>>>>>        (btl/rcache/allocator/mpool) down in OPAL
>>>>>>>>> 
>>>>>>>>> WHY: All the components required for inter-process communications are
>>>>>>>>> currently deeply integrated in the OMPI
>>>>>>>>>       layer. Several groups/institutions have express interest
>>>>>>>>> in having a more generic communication
>>>>>>>>>       infrastructure, without all the OMPI layer dependencies.
>>>>>>>>> This communication layer should be made
>>>>>>>>>       available at a different software level, available to all
>>>>>>>>> layers in the Open MPI software stack. As an
>>>>>>>>>       example, our ORTE layer could replace the current OOB and
>>>>>>>>> instead use the BTL directly, gaining
>>>>>>>>>       access to more reactive network interfaces than TCP.
>>>>>>>>> Similarly, external software libraries could take
>>>>>>>>>       advantage of our highly optimized AM (active message)
>>>>>>>>> communication layer for their own purpose.
>>>>>>>>> 
>>>>>>>>>       UTK with support from Sandia, developped a version of
>>>>>>>>> Open MPI where the entire communication
>>>>>>>>>       infrastucture has been moved down to OPAL
>>>>>>>>> (btl/rcache/allocator/mpool). Most of the moved
>>>>>>>>>       components have been updated to match the new schema,
>>>>>>>>> with few exceptions (mainly BTLs
>>>>>>>>>       where I have no way of compiling/testing them). Thus, the
>>>>>>>>> completion of this RFC is tied to
>>>>>>>>>       being able to completing this move for all BTLs. For this
>>>>>>>>> we need help from the rest of the Open MPI
>>>>>>>>>       community, especially those supporting some of the BTLs.
>>>>>>>>> A non-exhaustive list of BTLs that
>>>>>>>>>       qualify here is: mx, portals4, scif, udapl, ugni, usnic.
>>>>>>>>> 
>>>>>>>>> WHERE:  bitbucket.org/bosilca/ompi-btl (updated today with respect to
>>>>>>>>> trunk r31952)
>>>>>>>>> 
>>>>>>>>> TIMEOUT: After all the BTLs have been amended to match the new
>>>>>>>>> location and usage. We will discuss
>>>>>>>>>       the last bits regarding this RFC at the Open MPI
>>>>>>>>> developers meeting in Chicago, June 24-26. The
>>>>>>>>>       RFC will become final only after the meeting.
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> de...@open-mpi.org
>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>> Link to this post:
>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/06/14974.php
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> de...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> Link to this post:
>>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15100.php
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Jeff Squyres
>>>>>>> jsquy...@cisco.com
>>>>>>> For corporate legal information go to: 
>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15104.php
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15111.php
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15142.php
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Jeff Squyres
>>>>>> jsquy...@cisco.com
>>>>>> For corporate legal information go to: 
>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15225.php
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15226.php
>>>> 
>>>> 
>>>> -- 
>>>> Jeff Squyres
>>>> jsquy...@cisco.com
>>>> For corporate legal information go to: 
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15227.php
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/07/15228.php
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15230.php
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15231.php

Reply via email to