A somewhat simpler solution (since we do use the jobid and vpid in various 
places) might be to just define an OMPI_PROCESS_NAME_T data type in the rte.h 
files for each rte, and then let opal_dstore simply store and retrieve that 
type. Problem solved - except you still need to have a jobid and vpid defined 
in a struct format in the result, or else we are back to providing accessors - 
which I remind everyone was rejected by this community years ago when it was 
implemented in the 1.2 series (you all yelled at me for it).

Even simpler solution: revert back to the initial agreement made when we 
created the RTE abstraction that the process name is 64-bits and the problem 
goes away. If/when someone comes along who truly needs something else, and can 
make a compelling case for the change, then we can worry about complicating 
things. Why make life harder in the meantime? I think we all have higher 
priority work on our plate.


Jeff and others are using opal_dstore to retrieve the ompi_process_name_t of 
another process - in Jeff's case, it is local_rank=0 for this node. So he asks 
for the local "leader" for his proc, and gets the other proc's name back.


On Apr 30, 2014, at 7:06 PM, George Bosilca <[email protected]> wrote:

> Another solution will be to get rid of ompi_process_name_t and use everywhere 
> the opal_identifier_t. We can shift the responsibility to convert between 
> this unique identifier (and we have 64 bits to ensure uniqueness) and the 
> orte_process_name_t to the RTE. Similar to the opal_help we can implement:
> - a naming function converting an opal_identifier_t into a const string
> - a comparaison function comparing two opal_identifier_t for identical jobid, 
> and so on
> - maybe an accessor for the jobid/vpid.
> 
> These function will default to some trivial local information, but will be 
> changed by the RTE ASAP, allowing simple and flexible access to the RTE 
> naming scheme. Moreover, this information will only be used sporadically 
> (during connection setup and error reporting), so using functions will not 
> have a significant overhead. Btw, this is the approach I implemented into the 
> OPAL BTL branch, and seems to work pretty well so far.
> 
>  George.
> 
> 
> 
> On Apr 30, 2014, at 22:01 , George Bosilca <[email protected]> wrote:
> 
>> On Apr 30, 2014, at 20:04 , Jeff Squyres (jsquyres) <[email protected]> 
>> wrote:
>> 
>>> All we need in usnic is the ompi_process_name_t of the local process 
>>> leader.  However that happens is fine with me.  :-)
>> 
>> Why do you need the ompi_process_name_t? Isn’t the opal_identifier_t enough 
>> to dig for the info of the peer into the opal_db?
>> 
>> George.
>> 
>> 
>> 
>>> 
>>> 
>>> On Apr 30, 2014, at 6:46 PM, Ralph Castain <[email protected]> wrote:
>>> 
>>>> George makes a fair point - I was unaware they were hashing down to the 
>>>> opal_identifier_t. Only real requirement is that there be some way to 
>>>> reduce to 64-bits when accessing the common data.
>>>> 
>>>> I think the usnic BTL may have an issue with that approach, so maybe some 
>>>> way of "unhashing" will be required?
>>>> 
>>>> 
>>>> On Apr 30, 2014, at 3:42 PM, Jeff Squyres (jsquyres) <[email protected]> 
>>>> wrote:
>>>> 
>>>>> On Apr 30, 2014, at 6:35 PM, George Bosilca <[email protected]> wrote:
>>>>> 
>>>>>> Puzzling. We survived so far without such a requirement.
>>>>> 
>>>>> Ralph tells me that this is a requirement.  So I figured we should check 
>>>>> for it.
>>>>> 
>>>>>> In the BTLs where we needed a 64 bits entity corresponding to the 
>>>>>> ompi_process_name_t we took advantage of the ompi_rte_hash_name 
>>>>>> function. This function is supposed to convert from an 
>>>>>> ompi_process_name_t to a uint64_t (in fact an opal_identifier_t) which 
>>>>>> can be then used to handle hash tables.
>>>>> 
>>>>> ...I don't really care, either way.  I'll do whatever you guys tell me to 
>>>>> do here.  :-)
>>>>> 
>>>>> I put that assert there because Ralph told me it was a requirement, and I 
>>>>> now have code in the usnic BTL that is doing a memcpy from a union 
>>>>> uint64_t member to an ompi_process_name_t, and it assumes that the sizes 
>>>>> are guaranteed to be the same.
>>>>> 
>>>>> If we want to do it some other way, that's fine with me, too.
>>>>> 
>>>>> 
>>>>>> George.
>>>>>> 
>>>>>> 
>>>>>> On Apr 30, 2014, at 18:12 , [email protected] wrote:
>>>>>> 
>>>>>>> Author: jsquyres (Jeff Squyres)
>>>>>>> Date: 2014-04-30 18:12:54 EDT (Wed, 30 Apr 2014)
>>>>>>> New Revision: 31577
>>>>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/31577
>>>>>>> 
>>>>>>> Log:
>>>>>>> rte_base_frame.c: add sanity check to ensure proper sizes
>>>>>>> 
>>>>>>> There's a requirement in several places (e.g., opal dstore) that
>>>>>>> sizeof(ompi_process_name_t) -- which comes from the compile-time
>>>>>>> selected ompi/mca/rte component -- is equal to sizeof(uint64_t).  If
>>>>>>> it's not, Bad Things will happen.
>>>>>>> 
>>>>>>> So put an assert here to catch that case.
>>>>>>> 
>>>>>>> Text files modified: 
>>>>>>> trunk/ompi/mca/rte/base/rte_base_frame.c |    10 +++++++++-             
>>>>>>>                  
>>>>>>> 1 files changed, 9 insertions(+), 1 deletions(-)
>>>>>>> 
>>>>>>> Modified: trunk/ompi/mca/rte/base/rte_base_frame.c
>>>>>>> ==============================================================================
>>>>>>> --- trunk/ompi/mca/rte/base/rte_base_frame.c    Wed Apr 30 18:10:30 
>>>>>>> 2014        (r31576)
>>>>>>> +++ trunk/ompi/mca/rte/base/rte_base_frame.c    2014-04-30 18:12:54 EDT 
>>>>>>> (Wed, 30 Apr 2014)      (r31577)
>>>>>>> @@ -1,6 +1,7 @@
>>>>>>> /*
>>>>>>> * Copyright (c) 2012-2013 Los Alamos National Security, LLC.
>>>>>>> *                         All rights reserved.
>>>>>>> + * Copyright (c) 2014 Cisco Systems, Inc.  All rights reserved.
>>>>>>> * $COPYRIGHT$
>>>>>>> * 
>>>>>>> * Additional copyrights may follow
>>>>>>> @@ -12,6 +13,7 @@
>>>>>>> #include "ompi_config.h"
>>>>>>> #include "ompi/constants.h"
>>>>>>> 
>>>>>>> +#include "opal_stdint.h"
>>>>>>> #include "opal/util/output.h"
>>>>>>> #include "opal/mca/mca.h"
>>>>>>> #include "opal/mca/base/base.h"
>>>>>>> @@ -36,7 +38,13 @@
>>>>>>> static int ompi_rte_base_open(mca_base_open_flag_t flags)
>>>>>>> {
>>>>>>> /* Open up all available components */
>>>>>>> -    return 
>>>>>>> mca_base_framework_components_open(&ompi_rte_base_framework, flags);
>>>>>>> +    int ret = 
>>>>>>> mca_base_framework_components_open(&ompi_rte_base_framework, flags);
>>>>>>> +
>>>>>>> +    /* Sanity check.  Many things will break if this is not true
>>>>>>> +       (e.g., opal dstore needs this to be true). */
>>>>>>> +    assert(sizeof(ompi_process_name_t) == sizeof(uint64_t));
>>>>>>> +
>>>>>>> +    return ret;
>>>>>>> }
>>>>>>> 
>>>>>>> MCA_BASE_FRAMEWORK_DECLARE(ompi, rte, "OMPI Run-Time Environment 
>>>>>>> Interface", NULL,
>>>>>>> _______________________________________________
>>>>>>> svn mailing list
>>>>>>> [email protected]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> [email protected]
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/04/14663.php
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Jeff Squyres
>>>>> [email protected]
>>>>> For corporate legal information go to: 
>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> [email protected]
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2014/04/14664.php
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> [email protected]
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2014/04/14667.php
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> [email protected]
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> [email protected]
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/04/14668.php
>> 
> 
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14670.php

Reply via email to