A somewhat simpler solution (since we do use the jobid and vpid in various places) might be to just define an OMPI_PROCESS_NAME_T data type in the rte.h files for each rte, and then let opal_dstore simply store and retrieve that type. Problem solved - except you still need to have a jobid and vpid defined in a struct format in the result, or else we are back to providing accessors - which I remind everyone was rejected by this community years ago when it was implemented in the 1.2 series (you all yelled at me for it).
Even simpler solution: revert back to the initial agreement made when we created the RTE abstraction that the process name is 64-bits and the problem goes away. If/when someone comes along who truly needs something else, and can make a compelling case for the change, then we can worry about complicating things. Why make life harder in the meantime? I think we all have higher priority work on our plate. Jeff and others are using opal_dstore to retrieve the ompi_process_name_t of another process - in Jeff's case, it is local_rank=0 for this node. So he asks for the local "leader" for his proc, and gets the other proc's name back. On Apr 30, 2014, at 7:06 PM, George Bosilca <[email protected]> wrote: > Another solution will be to get rid of ompi_process_name_t and use everywhere > the opal_identifier_t. We can shift the responsibility to convert between > this unique identifier (and we have 64 bits to ensure uniqueness) and the > orte_process_name_t to the RTE. Similar to the opal_help we can implement: > - a naming function converting an opal_identifier_t into a const string > - a comparaison function comparing two opal_identifier_t for identical jobid, > and so on > - maybe an accessor for the jobid/vpid. > > These function will default to some trivial local information, but will be > changed by the RTE ASAP, allowing simple and flexible access to the RTE > naming scheme. Moreover, this information will only be used sporadically > (during connection setup and error reporting), so using functions will not > have a significant overhead. Btw, this is the approach I implemented into the > OPAL BTL branch, and seems to work pretty well so far. > > George. > > > > On Apr 30, 2014, at 22:01 , George Bosilca <[email protected]> wrote: > >> On Apr 30, 2014, at 20:04 , Jeff Squyres (jsquyres) <[email protected]> >> wrote: >> >>> All we need in usnic is the ompi_process_name_t of the local process >>> leader. However that happens is fine with me. :-) >> >> Why do you need the ompi_process_name_t? Isn’t the opal_identifier_t enough >> to dig for the info of the peer into the opal_db? >> >> George. >> >> >> >>> >>> >>> On Apr 30, 2014, at 6:46 PM, Ralph Castain <[email protected]> wrote: >>> >>>> George makes a fair point - I was unaware they were hashing down to the >>>> opal_identifier_t. Only real requirement is that there be some way to >>>> reduce to 64-bits when accessing the common data. >>>> >>>> I think the usnic BTL may have an issue with that approach, so maybe some >>>> way of "unhashing" will be required? >>>> >>>> >>>> On Apr 30, 2014, at 3:42 PM, Jeff Squyres (jsquyres) <[email protected]> >>>> wrote: >>>> >>>>> On Apr 30, 2014, at 6:35 PM, George Bosilca <[email protected]> wrote: >>>>> >>>>>> Puzzling. We survived so far without such a requirement. >>>>> >>>>> Ralph tells me that this is a requirement. So I figured we should check >>>>> for it. >>>>> >>>>>> In the BTLs where we needed a 64 bits entity corresponding to the >>>>>> ompi_process_name_t we took advantage of the ompi_rte_hash_name >>>>>> function. This function is supposed to convert from an >>>>>> ompi_process_name_t to a uint64_t (in fact an opal_identifier_t) which >>>>>> can be then used to handle hash tables. >>>>> >>>>> ...I don't really care, either way. I'll do whatever you guys tell me to >>>>> do here. :-) >>>>> >>>>> I put that assert there because Ralph told me it was a requirement, and I >>>>> now have code in the usnic BTL that is doing a memcpy from a union >>>>> uint64_t member to an ompi_process_name_t, and it assumes that the sizes >>>>> are guaranteed to be the same. >>>>> >>>>> If we want to do it some other way, that's fine with me, too. >>>>> >>>>> >>>>>> George. >>>>>> >>>>>> >>>>>> On Apr 30, 2014, at 18:12 , [email protected] wrote: >>>>>> >>>>>>> Author: jsquyres (Jeff Squyres) >>>>>>> Date: 2014-04-30 18:12:54 EDT (Wed, 30 Apr 2014) >>>>>>> New Revision: 31577 >>>>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/31577 >>>>>>> >>>>>>> Log: >>>>>>> rte_base_frame.c: add sanity check to ensure proper sizes >>>>>>> >>>>>>> There's a requirement in several places (e.g., opal dstore) that >>>>>>> sizeof(ompi_process_name_t) -- which comes from the compile-time >>>>>>> selected ompi/mca/rte component -- is equal to sizeof(uint64_t). If >>>>>>> it's not, Bad Things will happen. >>>>>>> >>>>>>> So put an assert here to catch that case. >>>>>>> >>>>>>> Text files modified: >>>>>>> trunk/ompi/mca/rte/base/rte_base_frame.c | 10 +++++++++- >>>>>>> >>>>>>> 1 files changed, 9 insertions(+), 1 deletions(-) >>>>>>> >>>>>>> Modified: trunk/ompi/mca/rte/base/rte_base_frame.c >>>>>>> ============================================================================== >>>>>>> --- trunk/ompi/mca/rte/base/rte_base_frame.c Wed Apr 30 18:10:30 >>>>>>> 2014 (r31576) >>>>>>> +++ trunk/ompi/mca/rte/base/rte_base_frame.c 2014-04-30 18:12:54 EDT >>>>>>> (Wed, 30 Apr 2014) (r31577) >>>>>>> @@ -1,6 +1,7 @@ >>>>>>> /* >>>>>>> * Copyright (c) 2012-2013 Los Alamos National Security, LLC. >>>>>>> * All rights reserved. >>>>>>> + * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. >>>>>>> * $COPYRIGHT$ >>>>>>> * >>>>>>> * Additional copyrights may follow >>>>>>> @@ -12,6 +13,7 @@ >>>>>>> #include "ompi_config.h" >>>>>>> #include "ompi/constants.h" >>>>>>> >>>>>>> +#include "opal_stdint.h" >>>>>>> #include "opal/util/output.h" >>>>>>> #include "opal/mca/mca.h" >>>>>>> #include "opal/mca/base/base.h" >>>>>>> @@ -36,7 +38,13 @@ >>>>>>> static int ompi_rte_base_open(mca_base_open_flag_t flags) >>>>>>> { >>>>>>> /* Open up all available components */ >>>>>>> - return >>>>>>> mca_base_framework_components_open(&ompi_rte_base_framework, flags); >>>>>>> + int ret = >>>>>>> mca_base_framework_components_open(&ompi_rte_base_framework, flags); >>>>>>> + >>>>>>> + /* Sanity check. Many things will break if this is not true >>>>>>> + (e.g., opal dstore needs this to be true). */ >>>>>>> + assert(sizeof(ompi_process_name_t) == sizeof(uint64_t)); >>>>>>> + >>>>>>> + return ret; >>>>>>> } >>>>>>> >>>>>>> MCA_BASE_FRAMEWORK_DECLARE(ompi, rte, "OMPI Run-Time Environment >>>>>>> Interface", NULL, >>>>>>> _______________________________________________ >>>>>>> svn mailing list >>>>>>> [email protected] >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> [email protected] >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/devel/2014/04/14663.php >>>>> >>>>> >>>>> -- >>>>> Jeff Squyres >>>>> [email protected] >>>>> For corporate legal information go to: >>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> [email protected] >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2014/04/14664.php >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> [email protected] >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2014/04/14667.php >>> >>> >>> -- >>> Jeff Squyres >>> [email protected] >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> _______________________________________________ >>> devel mailing list >>> [email protected] >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/04/14668.php >> > > _______________________________________________ > devel mailing list > [email protected] > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/04/14670.php
