Just to report back to the list: the three of us discussed this at some length, 
and decided we like George's proposed solution. Looks like a good clean 
approach that provides flexibility for the future. So we will introduce it when 
the BTLs move down to OPAL as (a) George already has it implemented there, and 
(b) we don't really need it before then.

Thanks George!
Ralph


On May 1, 2014, at 9:40 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

> Done!
> 
> On May 1, 2014, at 11:22 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> Apparently we are good today at 2PM EST. Fire-up the webex ;)
>> 
>> George.
>> 
>> On May 1, 2014, at 10:35 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>> wrote:
>> 
>>> http://doodle.com/hhm4yyr76ipcxgk2
>>> 
>>> 
>>> On May 1, 2014, at 10:25 AM, Ralph Castain <r...@open-mpi.org>
>>> wrote:
>>> 
>>>> sure - might be faster that way :-)
>>>> 
>>>> On May 1, 2014, at 6:59 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>>>> wrote:
>>>> 
>>>>> Want to have a phone call/webex to discuss?
>>>>> 
>>>>> 
>>>>> On May 1, 2014, at 9:43 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>> 
>>>>>> The problem we'll have with BTLs in opal is going to revolve around that 
>>>>>> ompi_process_name_t and will occur in a number of places. I've been 
>>>>>> trying to grok George's statement about accessors and can't figure out a 
>>>>>> clean way to make that work IF every RTE gets to define the process name 
>>>>>> a different way.
>>>>>> 
>>>>>> For example, suppose I define ompi_process_name_t to be a string. I can 
>>>>>> hash the string down to an opal_identifier_t, but that is a 
>>>>>> structureless 64-bit value - there is no concept of a jobid or vpid in 
>>>>>> it. So if you now want to extract a jobid for that identifier, the only 
>>>>>> way you can do it is to "up-call" back to the RTE to parse it.
>>>>>> 
>>>>>> This means that every RTE would have to initialize OPAL with a 
>>>>>> registration of its opal_identifier parser function(s), which seems like 
>>>>>> a really ugly solution.
>>>>>> 
>>>>>> Maybe it is time to shift the process identifier down to the opal layer? 
>>>>>> If we define opal_identifier_t to include the required jobid/vpid, 
>>>>>> perhaps adding a void* so someone can put whatever they want in it?
>>>>>> 
>>>>>> Note that I'm not wild about extending the identifier size beyond 
>>>>>> 64-bits as the memory footprint issue is growing in concern, and I still 
>>>>>> haven't seen any real use-case proposed for extending it.
>>>>>> 
>>>>>> 
>>>>>> On May 1, 2014, at 3:41 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>>>>>> wrote:
>>>>>> 
>>>>>>> On Apr 30, 2014, at 10:01 PM, George Bosilca <bosi...@icl.utk.edu> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Why do you need the ompi_process_name_t? Isn’t the opal_identifier_t 
>>>>>>>> enough to dig for the info of the peer into the opal_db?
>>>>>>> 
>>>>>>> 
>>>>>>> At the moment, I use the ompi_process_name_t for RML sends/receives in 
>>>>>>> the usnic BTL.  I know this will have to change when the BTLs move down 
>>>>>>> to OPAL (when is that going to happen, BTW?).  So my future use case 
>>>>>>> may be somewhat moot.
>>>>>>> 
>>>>>>> More detail
>>>>>>> ===========
>>>>>>> 
>>>>>>> "Why does the usnic BTL use RML sends/receives?", you ask.
>>>>>>> 
>>>>>>> The reason is rooted in the fact that the usnic BTL uses an unreliable, 
>>>>>>> connectionless transport under the covert.  We had some customers have 
>>>>>>> network misconfigurations that resulted in usnic traffic not flowing 
>>>>>>> properly (e.g., MTU mismatches in the network).  But since we don't 
>>>>>>> have a connection-oriented underlying API that will eventually 
>>>>>>> timeout/fail to connect/etc. when there's a problem with the network 
>>>>>>> configuration, we added a "connection validation" service in the usnic 
>>>>>>> BTL that fires up in a thread in the local rank 0 on each server.  This 
>>>>>>> thread provides service to all the MPI processes on its server.
>>>>>>> 
>>>>>>> In short: the service thread sends UDP pings and ACKs to peer service 
>>>>>>> threads on other servers (upon demand/upon first send between servers) 
>>>>>>> to verify network connectivity.  If the pings eventually fail/timeout 
>>>>>>> (i.e., don't get ACKs back), the service thread does a show_help and 
>>>>>>> kills the job. 
>>>>>>> 
>>>>>>> There's more details, but that's the gist of it.
>>>>>>> 
>>>>>>> This basically gives us the ability to highlight problems in the 
>>>>>>> network and kill the MPI job rather than spin infinitely while trying 
>>>>>>> to deliver MPI/BTL messages to a peer that will never get there.
>>>>>>> 
>>>>>>> Since this is really a server-to-server network connectivity issue (vs. 
>>>>>>> an MPI peer-to-peer connectivity issue), we only need to have one 
>>>>>>> service thread for a whole server.  The other MPI procs on the server 
>>>>>>> use RML to talk to it.  E.g., "Please ping the server where MPI proc X 
>>>>>>> lives," and so on.  This seemed better than having a service thread in 
>>>>>>> each MPI process.
>>>>>>> 
>>>>>>> We've thought a bit about what to do when the BTLs move down to OPAL 
>>>>>>> (since they won't be able to use RML any more), but don't have a final 
>>>>>>> solution yet...  We do still want to be able to utilize this capability 
>>>>>>> even after the BTL move.
>>>>>>> 
>>>>>>> -- 
>>>>>>> Jeff Squyres
>>>>>>> jsquy...@cisco.com
>>>>>>> For corporate legal information go to: 
>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14673.php
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14674.php
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Jeff Squyres
>>>>> jsquy...@cisco.com
>>>>> For corporate legal information go to: 
>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14675.php
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14676.php
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/05/14677.php
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/05/14678.php
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14680.php

Reply via email to