Ralph and George,
here are attached two patches :
- heterogeneous.v1.patch : a cleanup of the previous patch
- heterogeneous.v2.patch : a new patch based on Ralph suggestion. i made
the minimal changes to move jobid and vpid into the OPAL layer.
Cheers,
Gilles
On 2014/08/07 11:27, Ralph Castain wrote:
> Are we maybe approaching this from the wrong direction? I ask because we had
> to do some gyrations in the pmix framework to work around the difference in
> naming schemes between OPAL and the rest of the code base, and now we have
> more gyrations here.
>
> Given that the MPI and RTE layers both rely on the structured form of the
> name, what about if we just mimic that down in OPAL? I think we could perhaps
> do this in a way that still allows someone to overlay it with a 64-bit
> unstructured identifier if they want, but that would put the extra work on
> their side. In other words, we make it easy to work with the other parts of
> our own code base, acknowledging that those wanting to do something else may
> have to do some extra work.
>
> I ask because every resource manager out there assigns each process a jobid
> and vpid in some form of integer format. So we have to absorb that
> information in {jobid, vpid} format regardless of what we may want to do
> internally. What we now have to do is immediately convert that into the
> unstructured form for OPAL (where we take it in via PMI), then convert it
> back to structured form when passing it up to ORTE so it can be handed to
> OMPI, and then convert it back to unstructured form every time either OMPI or
> ORTE accesses the OPAL layer.
>
> Seems awfully convoluted and error prone. Simplifying things for ourselves
> might make more sense.
>
>
> On Aug 6, 2014, at 1:21 PM, George Bosilca <[email protected]> wrote:
>
>> Gilles,
>>
>> This looks right. It is really unfortunately that we have to change the
>> definition of orte_process_name_t for big endian architectures, but I don't
>> think there is a way around.
>>
>> Regarding your patch I have two comments:
>> 1. There is a flagrant lack of comments ... especially on the ORTE side
>> 2. at the OPAL level we are really implementing a htonll, and I really think
>> we should stick to the POSIX prototype (aka. returning the changes value
>> instead of doing things inplace).
>>
>> George.
>>
>>
>>
>> On Wed, Aug 6, 2014 at 7:02 AM, Gilles Gouaillardet
>> <[email protected]> wrote:
>> Ralph and George,
>>
>> here is attached a patch that fixes the heterogeneous support without the
>> abstraction violation.
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On 2014/08/06 9:40, Gilles Gouaillardet wrote:
>>> hummm
>>>
>>> i intentionally did not swap the two 32 bits (!)
>>>
>>> from the top level, what we have is :
>>>
>>> typedef struct {
>>> union {
>>> uint64_t opal;
>>> struct {
>>> uint32_t jobid;
>>> uint32_t vpid;
>>> } orte;
>>> } meta_process_name_t;
>>>
>>> OPAL is agnostic about jobid and vpid.
>>> jobid and vpid are set in ORTE/MPI and OPAL is used only
>>> to transport the 64 bits
>>> /* opal_process_name_t and orte_process_name_t are often casted into each
>>> other */
>>> at ORTE/MPI level, jobid and vpid are set individually
>>> /* e.g. we do *not* do something like opal = jobid | (vpid<<32) */
>>> this is why everything works fine on homogeneous clusters regardless
>>> endianness.
>>>
>>> now in heterogeneous cluster, thing get a bit trickier ...
>>>
>>> i was initially unhappy with my commit and i think i found out why :
>>> this is an abstraction violation !
>>> the two 32 bits are not swapped by OPAL because this is what is expected by
>>> the ORTE/OMPI.
>>>
>>> now i d like to suggest the following lightweight approach :
>>>
>>> at OPAL, use #if protected htonll/ntohll
>>> (e.g. swap the two 32bits)
>>>
>>> do the trick at the ORTE level :
>>>
>>> simply replace
>>>
>>> struct orte_process_name_t {
>>> orte_jobid_t jobid;
>>> orte_vpid_t vpid;
>>> };
>>>
>>> with
>>>
>>> #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN)
>>> struct orte_process_name_t {
>>> orte_vpid_t vpid;
>>> orte_jobid_t jobid;
>>> };
>>> #else
>>> struct orte_process_name_t {
>>> orte_jobid_t jobid;
>>> orte_vpid_t vpid;
>>> };
>>> #endif
>>>
>>>
>>> so we keep OPAL agnostic about how the uint64_t is really used at the upper
>>> level.
>>> an other option is to make OPAL aware of jobid and vpid but this is a bit
>>> more heavyweight imho.
>>>
>>> i'll try this today and make sure it works.
>>>
>>> any thoughts ?
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>> On Wed, Aug 6, 2014 at 8:17 AM, Ralph Castain <[email protected]> wrote:
>>>
>>>> Ah yes, so it is - sorry I missed that last test :-/
>>>>
>>>> On Aug 5, 2014, at 10:50 AM, George Bosilca <[email protected]> wrote:
>>>>
>>>> The code committed by Gilles is correctly protected for big endian (
>>>> https://svn.open-mpi.org/trac/ompi/changeset/32425). I was merely
>>>> pointing out that I think he should also swap the 2 32 bits in his
>>>> implementation.
>>>>
>>>> George.
>>>>
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 1:32 PM, Ralph Castain <[email protected]> wrote:
>>>>
>>>>> On Aug 5, 2014, at 10:23 AM, George Bosilca <[email protected]> wrote:
>>>>>
>>>>> On Tue, Aug 5, 2014 at 1:15 PM, Ralph Castain <[email protected]> wrote:
>>>>>
>>>>>> Hmmm...wouldn't that then require that you know (a) the other side is
>>>>>> little endian, and (b) that you are on a big endian? Otherwise, you wind
>>>>>> up
>>>>>> with the same issue in reverse, yes?
>>>>>>
>>>>> This is similar to the 32 bits ntohl that we are using in other parts of
>>>>> the project. Any little endian participant will do the conversion, while
>>>>> every big endian participant will use an empty macro instead.
>>>>>
>>>>>
>>>>>> In the ORTE methods, we explicitly set the fields (e.g., jobid =
>>>>>> ntohl(remote-jobid)) to get around this problem. I missed that he did it
>>>>>> by
>>>>>> location instead of named fields - perhaps we should do that instead?
>>>>>>
>>>>> As soon as we impose the ORTE naming scheme at the OPAL level (aka. the
>>>>> notion of jobid and vpid) this approach will become possible.
>>>>>
>>>>>
>>>>> Not proposing that at all so long as the other method will work without
>>>>> knowing the other side's endianness. Sounds like your approach should work
>>>>> fine as long as Gilles adds a #if so big endian defines the macro away
>>>>>
>>>>>
>>>>> George.
>>>>>
>>>>>
>>>>>
>>>>>> On Aug 5, 2014, at 10:06 AM, George Bosilca <[email protected]> wrote:
>>>>>>
>>>>>> Technically speaking, converting a 64 bits to a big endian
>>>>>> representation requires the swap of the 2 32 bits parts. So the correct
>>>>>> approach would have been:
>>>>>> uint64_t htonll(uint64_t v)
>>>>>> {
>>>>>> return ((((uint64_t)ntohl(n)) << 32 | (uint64_t)ntohl(n >> 32));
>>>>>> }
>>>>>>
>>>>>> George.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 5, 2014 at 5:52 AM, Ralph Castain <[email protected]> wrote:
>>>>>>
>>>>>>> FWIW: that's exactly how we do it in ORTE
>>>>>>>
>>>>>>> On Aug 4, 2014, at 10:25 PM, Gilles Gouaillardet <
>>>>>>> [email protected]
>>>>>>>> wrote:
>>>>>>> George,
>>>>>>>
>>>>>>> i confirm there was a problem when running on an heterogeneous cluster,
>>>>>>> this is now fixed in r32425.
>>>>>>>
>>>>>>> i am not convinced i chose the most elegant way to achieve the desired
>>>>>>> result ...
>>>>>>> could you please double check this commit ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Gilles
>>>>>>>
>>>>>>> On 2014/08/02 0:14, George Bosilca wrote:
>>>>>>>
>>>>>>> Gilles,
>>>>>>>
>>>>>>> The design of the BTL move was to let the opal_process_name_t be
>>>>>>> agnostic to what is stored inside, and all accesses should be done
>>>>>>> through the provided accessors. Thus, big endian or little endian
>>>>>>> doesn't make a difference, as long as everything goes through the
>>>>>>> accessors.
>>>>>>>
>>>>>>> I'm skeptical about the support of heterogeneous environments in the
>>>>>>> current code, so I didn't pay much attention to handling the case in
>>>>>>> the TCP BTL. But in case we do care it is enough to make the 2 macros
>>>>>>> point to something meaningful instead of being empty (bswap_64 or
>>>>>>> something).
>>>>>>>
>>>>>>> George.
>>>>>>>
>>>>>>> On Aug 1, 2014, at 06:52 , Gilles Gouaillardet
>>>>>>> <[email protected]> <[email protected]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> George and Ralph,
>>>>>>>
>>>>>>> i am very confused whether there is an issue or not.
>>>>>>>
>>>>>>>
>>>>>>> anyway, today Paul and i ran basic tests on big endian machines and did
>>>>>>> not face any issue related to big endianness.
>>>>>>>
>>>>>>> so i made my homework, digged into the code, and basically,
>>>>>>> opal_process_name_t is used as an orte_process_name_t.
>>>>>>> for example, in ompi_proc_init :
>>>>>>>
>>>>>>> OMPI_CAST_ORTE_NAME(&proc->super.proc_name)->jobid =
>>>>>>> OMPI_PROC_MY_NAME->jobid;
>>>>>>> OMPI_CAST_ORTE_NAME(&proc->super.proc_name)->vpid = i;
>>>>>>>
>>>>>>> and with
>>>>>>>
>>>>>>> #define OMPI_CAST_ORTE_NAME(a) ((orte_process_name_t*)(a))
>>>>>>>
>>>>>>> so as long as an opal_process_name_t is used as an orte_process_name_t,
>>>>>>> there is no problem,
>>>>>>> regardless the endianness of the homogenous cluster we are running on.
>>>>>>>
>>>>>>> for the sake of readability (and for being pedantic too ;-) ) in r32357,
>>>>>>> &proc_temp->super.proc_name
>>>>>>> could be replaced with
>>>>>>> OMPI_CAST_ORTE_NAME(&proc_temp->super.proc_name)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> That being said, in btl/tcp, i noticed :
>>>>>>>
>>>>>>> in mca_btl_tcp_component_recv_handler :
>>>>>>>
>>>>>>> opal_process_name_t guid;
>>>>>>> [...]
>>>>>>> /* recv the process identifier */
>>>>>>> retval = recv(sd, (char *)&guid, sizeof(guid), 0);
>>>>>>> if(retval != sizeof(guid)) {
>>>>>>> CLOSE_THE_SOCKET(sd);
>>>>>>> return;
>>>>>>> }
>>>>>>> OPAL_PROCESS_NAME_NTOH(guid);
>>>>>>>
>>>>>>> and in mca_btl_tcp_endpoint_send_connect_ack :
>>>>>>>
>>>>>>> /* send process identifier to remote endpoint */
>>>>>>> opal_process_name_t guid = btl_proc->proc_opal->proc_name;
>>>>>>>
>>>>>>> OPAL_PROCESS_NAME_HTON(guid);
>>>>>>> if(mca_btl_tcp_endpoint_send_blocking(btl_endpoint, &guid,
>>>>>>> sizeof(guid)) !=
>>>>>>>
>>>>>>> and with
>>>>>>>
>>>>>>> #define OPAL_PROCESS_NAME_NTOH(guid)
>>>>>>> #define OPAL_PROCESS_NAME_HTON(guid)
>>>>>>>
>>>>>>>
>>>>>>> i had no time yet to test yet, but for now, i can only suspect :
>>>>>>> - there will be an issue with the tcp btl on an heterogeneous cluster
>>>>>>> - for this case, the fix is to have a different version of the
>>>>>>> OPAL_PROCESS_NAME_xTOy
>>>>>>> on little endian arch if heterogeneous mode is supported.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> does that make sense ?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Gilles
>>>>>>>
>>>>>>>
>>>>>>> On 2014/07/31 1:29, George Bosilca wrote:
>>>>>>>
>>>>>>> The underlying structure changed, so a little bit of fiddling is normal.
>>>>>>> Instead of using a field in the ompi_proc_t you are now using a field
>>>>>>> down
>>>>>>> in opal_proc_t, a field that simply cannot have the same type as before
>>>>>>> (orte_process_name_t).
>>>>>>>
>>>>>>> George.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 30, 2014 at 12:19 PM, Ralph Castain <[email protected]>
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> George - my point was that we regularly tested using the method in that
>>>>>>> routine, and now we have to do something a little different. So it is an
>>>>>>> "issue" in that we have to make changes across the code base to ensure
>>>>>>> we
>>>>>>> do things the "new" way, that's all
>>>>>>>
>>>>>>> On Jul 30, 2014, at 9:17 AM, George Bosilca <[email protected]>
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> No, this is not going to be an issue if the opal_identifier_t is used
>>>>>>> correctly (aka only via the exposed accessors).
>>>>>>>
>>>>>>> George.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain <[email protected]>
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Yeah, my fix won't work for big endian machines - this is going to be an
>>>>>>> issue across the code base now, so we'll have to troll and fix it. I was
>>>>>>> doing the minimal change required to fix the trunk in the meantime.
>>>>>>>
>>>>>>> On Jul 30, 2014, at 9:06 AM, George Bosilca <[email protected]>
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> Yes. opal_process_name_t has basically no meaning by itself, it is a 64
>>>>>>> bits storage location used by the upper layer to save some local key
>>>>>>> that
>>>>>>> can be later used to extract information. Calling the OPAL level compare
>>>>>>> function might be a better fit there.
>>>>>>>
>>>>>>> George.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Ralph,
>>>>>>>
>>>>>>> was it really that simple ?
>>>>>>>
>>>>>>> proc_temp->super.proc_name has type opal_process_name_t :
>>>>>>> typedef opal_identifier_t opal_process_name_t;
>>>>>>> typedef uint64_t opal_identifier_t;
>>>>>>>
>>>>>>> *but*
>>>>>>>
>>>>>>> item_ptr->peer has type orte_process_name_t :
>>>>>>> struct orte_process_name_t {
>>>>>>> orte_jobid_t jobid;
>>>>>>> orte_vpid_t vpid;
>>>>>>> };
>>>>>>>
>>>>>>> bottom line, is r32357 still valid on a big endian arch ?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Gilles
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <[email protected]>
>>>>>>> <[email protected]>
>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> I just fixed this one - all that was required was an ampersand as the
>>>>>>> name was being passed into the function instead of a pointer to the name
>>>>>>>
>>>>>>> r32357
>>>>>>>
>>>>>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET
>>>>>>> <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Rolf,
>>>>>>>
>>>>>>> r32353 can be seen as a suspect...
>>>>>>> Even if it is correct, it might have exposed the bug discussed in #4815
>>>>>>> even more (e.g. we hit the bug 100% after the fix)
>>>>>>>
>>>>>>> does the attached patch to #4815 fixes the problem ?
>>>>>>>
>>>>>>> If yes, and if you see this issue as a showstopper, feel free to commit
>>>>>>> it and drop a note to #4815
>>>>>>> ( I am afk until tomorrow)
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Gilles
>>>>>>>
>>>>>>> Rolf vandeVaart <[email protected]> <[email protected]> wrote:
>>>>>>>
>>>>>>> Just an FYI that my trunk version (r32355) does not work at all anymore
>>>>>>> if I do not include "--mca coll ^ml". Here is a stack trace from the
>>>>>>> ibm/pt2pt/send test running on a single node.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> (gdb) where
>>>>>>>
>>>>>>> #0 0x00007f6c0d1321d0 in ?? ()
>>>>>>>
>>>>>>> #1 <signal handler called>
>>>>>>>
>>>>>>> #2 0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at
>>>>>>> ../../orte/util/name_fns.c:522
>>>>>>>
>>>>>>> #3 0x00007f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
>>>>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200,
>>>>>>> peer_list=0x7f6c0c0a6748,
>>>>>>> back_files=0x7f6bf3ffd6c8,
>>>>>>>
>>>>>>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606
>>>>>>> "sm_payload_mem_", map_all=false) at
>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>>>>>>>
>>>>>>> #4 0x00007f6c0be98307 in bcol_basesmuma_bank_init_opti
>>>>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
>>>>>>> reg_data=0xba28c0)
>>>>>>>
>>>>>>> at
>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>>>>>>>
>>>>>>> #5 0x00007f6c0cced386 in mca_coll_ml_register_bcols
>>>>>>> (ml_module=0xba5c40) at
>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>>>>>>>
>>>>>>> #6 0x00007f6c0cced68f in ml_module_memory_initialization
>>>>>>> (ml_module=0xba5c40) at
>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>>>>>>>
>>>>>>> #7 0x00007f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>>>>>>>
>>>>>>> #8 0x00007f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
>>>>>>> priority=0x7fffe7991b58) at
>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>>>>>>
>>>>>>> #9 0x00007f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
>>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>>>>
>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>>>>>>>
>>>>>>> #10 0x00007f6c18cc5ac8 in query (component=0x7f6c0cf50940,
>>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>>>>
>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>>>>>>>
>>>>>>> #11 0x00007f6c18cc59d2 in check_one_component (comm=0x6037a0,
>>>>>>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>>>>>>>
>>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>>>>>>>
>>>>>>> #12 0x00007f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
>>>>>>> comm=0x6037a0) at
>>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>>>>>>>
>>>>>>> #13 0x00007f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
>>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>>>>>>>
>>>>>>> #14 0x00007f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
>>>>>>> requested=0, provided=0x7fffe79922e8) at
>>>>>>> ../../ompi/runtime/ompi_mpi_init.c:918
>>>>>>>
>>>>>>> #15 0x00007f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
>>>>>>> argv=0x7fffe7992340) at pinit.c:84
>>>>>>>
>>>>>>> #16 0x0000000000401056 in main (argc=1, argv=0x7fffe79924c8) at
>>>>>>> send.c:32
>>>>>>>
>>>>>>> (gdb) up
>>>>>>>
>>>>>>> #1 <signal handler called>
>>>>>>>
>>>>>>> (gdb) up
>>>>>>>
>>>>>>> #2 0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at
>>>>>>> ../../orte/util/name_fns.c:522
>>>>>>>
>>>>>>> 522 if (name1->jobid < name2->jobid) {
>>>>>>>
>>>>>>> (gdb) print name1
>>>>>>>
>>>>>>> $1 = (const orte_process_name_t *) 0x192350001
>>>>>>>
>>>>>>> (gdb) print *name1
>>>>>>>
>>>>>>> Cannot access memory at address 0x192350001
>>>>>>>
>>>>>>> (gdb) print name2
>>>>>>>
>>>>>>> $2 = (const orte_process_name_t *) 0xbaf76c
>>>>>>>
>>>>>>> (gdb) print *name2
>>>>>>>
>>>>>>> $3 = {jobid = 2452946945, vpid = 1}
>>>>>>>
>>>>>>> (gdb)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: devel [mailto:[email protected]
>>>>>>> <[email protected]>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <[email protected]> <[email protected]>] On Behalf Of
>>>>>>> Gilles
>>>>>>>
>>>>>>>
>>>>>>> Gouaillardet
>>>>>>> Sent: Wednesday, July 30, 2014 2:16 AM
>>>>>>> To: Open MPI Developers
>>>>>>> Subject: Re: [OMPI devel] trunk compilation errors in jenkins
>>>>>>> George,
>>>>>>> #4815 is indirectly related to the move :
>>>>>>> in bcol/basesmuma, we used to compare ompi_process_name_t, and now
>>>>>>> we (try to) compare an ompi_process_name_t and an opal_process_name_t
>>>>>>> (which causes a glory SIGSEGV)
>>>>>>> i proposed a temporary patch which is both broken and unelegant, could
>>>>>>>
>>>>>>> you
>>>>>>>
>>>>>>>
>>>>>>> please advise a correct solution ?
>>>>>>> Cheers,
>>>>>>> Gilles
>>>>>>> On 2014/07/27 7:37, George Bosilca wrote:
>>>>>>>
>>>>>>> If you have any issue with the move, I'll be happy to help and/or
>>>>>>>
>>>>>>> support
>>>>>>>
>>>>>>>
>>>>>>> you on your last move toward a completely generic BTL. To facilitate
>>>>>>>
>>>>>>> your
>>>>>>>
>>>>>>>
>>>>>>> work I exposed a minimalistic set of OMPI information at the OPAL
>>>>>>>
>>>>>>> level. Take
>>>>>>>
>>>>>>>
>>>>>>> a look at opal/util/proc.h for more info, but please try not to expose
>>>>>>>
>>>>>>> more.
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing [email protected]
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post: http://www.open-
>>>>>>>
>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>>>>>>>
>>>>>>> mpi.org/community/lists/devel/2014/07/15348.php
>>>>>>>
>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------
>>>>>>> This email message is for the sole use of the intended recipient(s)
>>>>>>> and may contain confidential information. Any unauthorized review, use,
>>>>>>> disclosure or distribution is prohibited. If you are not the intended
>>>>>>> recipient, please contact the sender by reply email and destroy all
>>>>>>> copies
>>>>>>> of the original message.
>>>>>>> ------------------------------
>>>>>>> _______________________________________________
>>>>>>> devel mailing [email protected]
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this
>>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15355.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing [email protected]
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this
>>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15356.php
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing [email protected]
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this
>>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15363.php
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing [email protected]
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this
>>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15364.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing [email protected]
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this
>>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15365.php
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing [email protected]
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this
>>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15366.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing [email protected]
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this
>>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15367.php
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing [email protected]
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15368.php
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing [email protected]
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15446.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing [email protected]
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15454.php
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> [email protected]
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15509.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> [email protected]
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15514.php
>>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> [email protected]
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15518.php
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> [email protected]
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15519.php
>>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> [email protected]
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15520.php
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> [email protected]
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15521.php
>>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> [email protected]
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15523.php
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> [email protected]
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15526.php
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> [email protected]
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/08/15527.php
>>
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/08/15529.php
>>
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/08/15530.php
>
>
>
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15531.php
Index: opal/util/proc.h
===================================================================
--- opal/util/proc.h (revision 32440)
+++ opal/util/proc.h (working copy)
@@ -21,7 +21,7 @@
#include "opal/dss/dss.h"
#if OPAL_ENABLE_HETEROGENEOUS_SUPPORT
-#include <arpa/inet.h>
+#include "opal/types.h"
#endif
/**
@@ -35,22 +35,11 @@
typedef opal_identifier_t opal_process_name_t;
#if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN)
-#define OPAL_PROCESS_NAME_NTOH(guid) opal_process_name_ntoh_intr(&(guid))
-static inline __opal_attribute_always_inline__ void
-opal_process_name_ntoh_intr(opal_process_name_t *name)
-{
- uint32_t * w = (uint32_t *)name;
- w[0] = ntohl(w[0]);
- w[1] = ntohl(w[1]);
-}
-#define OPAL_PROCESS_NAME_HTON(guid) opal_process_name_hton_intr(&(guid))
-static inline __opal_attribute_always_inline__ void
-opal_process_name_hton_intr(opal_process_name_t *name)
-{
- uint32_t * w = (uint32_t *)name;
- w[0] = htonl(w[0]);
- w[1] = htonl(w[1]);
-}
+#define OPAL_PROCESS_NAME_NTOH(guid) \
+ guid = ntoh64(guid)
+
+#define OPAL_PROCESS_NAME_HTON(guid) \
+ guid = hton64(guid)
#else
#define OPAL_PROCESS_NAME_NTOH(guid)
#define OPAL_PROCESS_NAME_HTON(guid)
Index: orte/include/orte/types.h
===================================================================
--- orte/include/orte/types.h (revision 32440)
+++ orte/include/orte/types.h (working copy)
@@ -10,6 +10,8 @@
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
* Copyright (c) 2014 Intel, Inc. All rights reserved.
+ * Copyright (c) 2014 Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@@ -83,18 +85,18 @@
#define ORTE_VPID_MAX UINT32_MAX-2
#define ORTE_VPID_MIN 0
-#define ORTE_PROCESS_NAME_HTON(n) \
-do { \
- n.jobid = htonl(n.jobid); \
- n.vpid = htonl(n.vpid); \
-} while (0)
+#if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN)
+#define ORTE_PROCESS_NAME_HTON(n) \
+ OPAL_PROCESS_NAME_HTON(*(opal_process_name_t *)&(n))
-#define ORTE_PROCESS_NAME_NTOH(n) \
-do { \
- n.jobid = ntohl(n.jobid); \
- n.vpid = ntohl(n.vpid); \
-} while (0)
+#define ORTE_PROCESS_NAME_NTOH(n) \
+ OPAL_PROCESS_NAME_NTOH(*(opal_process_name_t *)&(n))
+#else
+#define ORTE_PROCESS_NAME_HTON(n)
+#define ORTE_PROCESS_NAME_NTOH(n)
+#endif
+
#define ORTE_NAME_ARGS(n) \
(unsigned long) ((NULL == n) ? (unsigned long)ORTE_JOBID_INVALID :
(unsigned long)(n)->jobid), \
(unsigned long) ((NULL == n) ? (unsigned long)ORTE_VPID_INVALID :
(unsigned long)(n)->vpid) \
@@ -115,11 +117,23 @@
/*
* define the process name structure
+ * the OPAL layer sees an orte_process_name_t as an opal_process_name_t aka
uint64_t
+ * if heterogeneous is supported, when converting this uint64_t to
+ * an endian neutral format, vpid and jobid will be swapped.
+ * consequently, the orte_process_name_t struct must have different definitions
+ * (swap jobid and vpid) on little and big endian arch.
*/
+#if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN)
struct orte_process_name_t {
+ orte_vpid_t vpid; /**< Process id - equivalent to rank */
orte_jobid_t jobid; /**< Job number */
+};
+#else
+struct orte_process_name_t {
+ orte_jobid_t jobid; /**< Job number */
orte_vpid_t vpid; /**< Process id - equivalent to rank */
};
+#endif
typedef struct orte_process_name_t orte_process_name_t;
Index: oshmem/mca/scoll/mpi/scoll_mpi_module.c
===================================================================
--- oshmem/mca/scoll/mpi/scoll_mpi_module.c (revision 32440)
+++ oshmem/mca/scoll/mpi/scoll_mpi_module.c (working copy)
@@ -1,11 +1,13 @@
/**
- Copyright (c) 2011 Mellanox Technologies. All rights reserved.
- Copyright (c) 2014 Cisco Systems, Inc. All rights reserved.
- $COPYRIGHT$
-
- Additional copyrights may follow
-
- $HEADER$
+ * Copyright (c) 2011 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved.
+ * Copyright (c) 2014 Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
+ * $COPYRIGHT$
+ *
+ * Additional copyrights may follow
+ *
+ * $HEADER$
*/
#include "ompi_config.h"
@@ -125,7 +127,7 @@
ompi_proc_t* ompi_proc;
for( int j = 0; j < ompi_group_size(parent_group); j++ ) {
ompi_proc = ompi_group_peer_lookup(parent_group, j);
- if( ompi_proc->super.proc_name ==
osh_group->proc_array[i]->super.proc_name) {
+ if( ompi_proc->super.proc_name.id ==
osh_group->proc_array[i]->super.proc_name.id) {
ranks[i] = j;
break;
}
Index: opal/mca/btl/tcp/btl_tcp_proc.c
===================================================================
--- opal/mca/btl/tcp/btl_tcp_proc.c (revision 32440)
+++ opal/mca/btl/tcp/btl_tcp_proc.c (working copy)
@@ -12,6 +12,8 @@
* All rights reserved.
* Copyright (c) 2008-2010 Oracle and/or its affiliates. All rights reserved
* Copyright (c) 2013 Intel, Inc. All rights reserved
+ * Copyright (c) 2014 Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@@ -77,7 +79,7 @@
/* remove from list of all proc instances */
OPAL_THREAD_LOCK(&mca_btl_tcp_component.tcp_lock);
opal_hash_table_remove_value_uint64(&mca_btl_tcp_component.tcp_procs,
- tcp_proc->proc_opal->proc_name);
+ tcp_proc->proc_opal->proc_name.id);
OPAL_THREAD_UNLOCK(&mca_btl_tcp_component.tcp_lock);
/* release resources */
@@ -97,7 +99,7 @@
mca_btl_tcp_proc_t* mca_btl_tcp_proc_create(const opal_proc_t* proc)
{
- uint64_t hash = proc->proc_name;
+ uint64_t hash = proc->proc_name.id;
mca_btl_tcp_proc_t* btl_proc;
size_t size;
int rc;
@@ -719,7 +721,7 @@
mca_btl_tcp_proc_t* proc = NULL;
OPAL_THREAD_LOCK(&mca_btl_tcp_component.tcp_lock);
opal_hash_table_get_value_uint64(&mca_btl_tcp_component.tcp_procs,
- *name, (void**)&proc);
+ name->id, (void**)&proc);
OPAL_THREAD_UNLOCK(&mca_btl_tcp_component.tcp_lock);
return proc;
}
Index: opal/mca/btl/openib/btl_openib.c
===================================================================
--- opal/mca/btl/openib/btl_openib.c (revision 32440)
+++ opal/mca/btl/openib/btl_openib.c (working copy)
@@ -1064,7 +1064,7 @@
rc = mca_btl_openib_ib_address_add_new(
ib_proc->proc_ports[j].pm_port_info.lid,
ib_proc->proc_ports[j].pm_port_info.subnet_id,
- opal_process_name_jobid(proc->proc_name), endpoint);
+ proc->proc_name, endpoint);
if (OPAL_SUCCESS != rc ) {
OPAL_THREAD_UNLOCK(&ib_proc->proc_lock);
return OPAL_ERROR;
Index: opal/util/proc.c
===================================================================
--- opal/util/proc.c (revision 32440)
+++ opal/util/proc.c (working copy)
@@ -3,6 +3,8 @@
* of Tennessee Research Foundation. All rights
* reserved.
* Copyright (c) 2013 Inria. All rights reserved.
+ * Copyright (c) 2014 Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@@ -29,7 +31,7 @@
static opal_proc_t opal_local_proc = {
{ .opal_list_next = NULL,
.opal_list_prev = NULL},
- 0x1122334455667788,
+ { .id = 0x1122334455667788},
0,
0,
NULL,
@@ -42,13 +44,13 @@
proc->proc_arch = opal_local_arch;
proc->proc_convertor = NULL;
proc->proc_flags = 0;
- proc->proc_name = 0;
+ proc->proc_name.id = 0;
}
static void opal_proc_destruct(opal_proc_t* proc)
{
proc->proc_flags = 0;
- proc->proc_name = 0;
+ proc->proc_name.id = 0;
proc->proc_hostname = NULL;
proc->proc_convertor = NULL;
}
@@ -60,8 +62,8 @@
opal_compare_opal_procs(const opal_process_name_t proc1,
const opal_process_name_t proc2)
{
- if( proc1 == proc2 ) return 0;
- if( proc1 < proc2 ) return -1;
+ if( proc1.id == proc2.id ) return 0;
+ if( proc1.id < proc2.id ) return -1;
return 1;
}
Index: opal/util/proc.h
===================================================================
--- opal/util/proc.h (revision 32440)
+++ opal/util/proc.h (working copy)
@@ -32,25 +32,30 @@
* is to be copied from one structure to another, otherwise it should
* only be used via the accessors defined below.
*/
-typedef opal_identifier_t opal_process_name_t;
+typedef uint32_t opal_jobid_t;
+typedef uint32_t opal_vpid_t;
+typedef struct {
+ opal_jobid_t jobid;
+ opal_jobid_t vpid;
+} opal_proc_name_t ;
+typedef union {
+ opal_proc_name_t name;
+ opal_identifier_t id;
+} opal_process_name_t;
+
#if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN)
-#define OPAL_PROCESS_NAME_NTOH(guid) opal_process_name_ntoh_intr(&(guid))
-static inline __opal_attribute_always_inline__ void
-opal_process_name_ntoh_intr(opal_process_name_t *name)
-{
- uint32_t * w = (uint32_t *)name;
- w[0] = ntohl(w[0]);
- w[1] = ntohl(w[1]);
-}
-#define OPAL_PROCESS_NAME_HTON(guid) opal_process_name_hton_intr(&(guid))
-static inline __opal_attribute_always_inline__ void
-opal_process_name_hton_intr(opal_process_name_t *name)
-{
- uint32_t * w = (uint32_t *)name;
- w[0] = htonl(w[0]);
- w[1] = htonl(w[1]);
-}
+#define OPAL_PROCESS_NAME_NTOH(n) \
+do { \
+ n.name.jobid = ntohl(n.name.jobid); \
+ n.name.vpid = ntohl(n.name.vpid); \
+} while (0);
+
+#define OPAL_PROCESS_NAME_HTON(n) \
+do { \
+ n.name.jobid = htonl(n.name.jobid); \
+ n.name.vpid = htonl(n.name.vpid); \
+} while (0);
#else
#define OPAL_PROCESS_NAME_NTOH(guid)
#define OPAL_PROCESS_NAME_HTON(guid)
Index: ompi/mca/dpm/orte/dpm_orte.c
===================================================================
--- ompi/mca/dpm/orte/dpm_orte.c (revision 32440)
+++ ompi/mca/dpm/orte/dpm_orte.c (working copy)
@@ -16,6 +16,8 @@
* Copyright (c) 2011-2013 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2013-2014 Intel, Inc. All rights reserved
+ * Copyright (c) 2014 Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@@ -1767,7 +1769,7 @@
}
static void paccept_recv(int status,
- struct orte_process_name_t* peer,
+ orte_process_name_t* peer,
struct opal_buffer_t* buffer,
orte_rml_tag_t tag,
void* cbdata)
Index: orte/mca/rml/rml.h
===================================================================
--- orte/mca/rml/rml.h (revision 32440)
+++ orte/mca/rml/rml.h (working copy)
@@ -11,6 +11,8 @@
* All rights reserved.
* Copyright (c) 2011-2013 Los Alamos National Security, LLC. All rights
* reserved.
+ * Copyright (c) 2014 Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@@ -52,7 +54,6 @@
struct opal_buffer_t;
-struct orte_process_name_t;
struct orte_rml_module_t;
typedef struct {
opal_object_t super;
@@ -146,7 +147,7 @@
* @param[in] cbdata User data passed to send_nb()
*/
typedef void (*orte_rml_callback_fn_t)(int status,
- struct orte_process_name_t* peer,
+ orte_process_name_t* peer,
struct iovec* msg,
int count,
orte_rml_tag_t tag,
@@ -171,7 +172,7 @@
* @param[in] cbdata User data passed to send_buffer_nb() or recv_buffer_nb()
*/
typedef void (*orte_rml_buffer_callback_fn_t)(int status,
- struct orte_process_name_t* peer,
+ orte_process_name_t* peer,
struct opal_buffer_t* buffer,
orte_rml_tag_t tag,
void* cbdata);
@@ -315,7 +316,7 @@
* receiving process is not available
* @retval ORTE_ERROR An unspecified error occurred
*/
-typedef int (*orte_rml_module_send_nb_fn_t)(struct orte_process_name_t* peer,
+typedef int (*orte_rml_module_send_nb_fn_t)(orte_process_name_t* peer,
struct iovec* msg,
int count,
orte_rml_tag_t tag,
@@ -345,7 +346,7 @@
* receiving process is not available
* @retval ORTE_ERROR An unspecified error occurred
*/
-typedef int (*orte_rml_module_send_buffer_nb_fn_t)(struct orte_process_name_t*
peer,
+typedef int (*orte_rml_module_send_buffer_nb_fn_t)(orte_process_name_t* peer,
struct opal_buffer_t*
buffer,
orte_rml_tag_t tag,
orte_rml_buffer_callback_fn_t cbfunc,
@@ -360,7 +361,7 @@
* @param[in] cbfunc Callback function on message comlpetion
* @param[in] cbdata User data to provide during completion callback
*/
-typedef void (*orte_rml_module_recv_nb_fn_t)(struct orte_process_name_t* peer,
+typedef void (*orte_rml_module_recv_nb_fn_t)(orte_process_name_t* peer,
orte_rml_tag_t tag,
bool persistent,
orte_rml_callback_fn_t cbfunc,
@@ -376,7 +377,7 @@
* @param[in] cbfunc Callback function on message comlpetion
* @param[in] cbdata User data to provide during completion callback
*/
-typedef void (*orte_rml_module_recv_buffer_nb_fn_t)(struct
orte_process_name_t* peer,
+typedef void (*orte_rml_module_recv_buffer_nb_fn_t)(orte_process_name_t* peer,
orte_rml_tag_t tag,
bool persistent,
orte_rml_buffer_callback_fn_t cbfunc,
@@ -427,7 +428,7 @@
* to/from a specified process. Used when a process aborts
* and is to be restarted
*/
-typedef void (*orte_rml_module_purge_fn_t)(struct orte_process_name_t *peer);
+typedef void (*orte_rml_module_purge_fn_t)(orte_process_name_t *peer);
/* ******************************************************************** */
Index: orte/mca/rml/base/base.h
===================================================================
--- orte/mca/rml/base/base.h (revision 32440)
+++ orte/mca/rml/base/base.h (working copy)
@@ -12,6 +12,8 @@
* All rights reserved.
* Copyright (c) 2007-2014 Los Alamos National Security, LLC. All rights
* reserved.
+ * Copyright (c) 2014 Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@@ -245,23 +247,23 @@
ORTE_DECLSPEC void orte_rml_base_process_error(int fd, short flags, void
*cbdata);
/* null functions */
-int orte_rml_base_null_send_nb(struct orte_process_name_t* peer,
+int orte_rml_base_null_send_nb(orte_process_name_t* peer,
struct iovec* msg,
int count,
orte_rml_tag_t tag,
orte_rml_callback_fn_t cbfunc,
void* cbdata);
-int orte_rml_base_null_send_buffer_nb(struct orte_process_name_t* peer,
+int orte_rml_base_null_send_buffer_nb(orte_process_name_t* peer,
struct opal_buffer_t* buffer,
orte_rml_tag_t tag,
orte_rml_buffer_callback_fn_t cbfunc,
void* cbdata);
-void orte_rml_base_null_recv_nb(struct orte_process_name_t* peer,
+void orte_rml_base_null_recv_nb(orte_process_name_t* peer,
orte_rml_tag_t tag,
bool persistent,
orte_rml_callback_fn_t cbfunc,
void* cbdata);
-void orte_rml_base_null_recv_buffer_nb(struct orte_process_name_t* peer,
+void orte_rml_base_null_recv_buffer_nb(orte_process_name_t* peer,
orte_rml_tag_t tag,
bool persistent,
orte_rml_buffer_callback_fn_t cbfunc,
Index: orte/mca/routed/routed.h
===================================================================
--- orte/mca/routed/routed.h (revision 32440)
+++ orte/mca/routed/routed.h (working copy)
@@ -51,7 +51,6 @@
struct opal_buffer_t;
-struct orte_process_name_t;
struct orte_rml_module_t;
Index: orte/include/orte/types.h
===================================================================
--- orte/include/orte/types.h (revision 32440)
+++ orte/include/orte/types.h (working copy)
@@ -10,6 +10,8 @@
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
* Copyright (c) 2014 Intel, Inc. All rights reserved.
+ * Copyright (c) 2014 Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@@ -27,6 +29,7 @@
#include <sys/types.h>
#endif
#include "opal/dss/dss_types.h"
+#include "opal/util/proc.h"
/**
* Supported datatypes for messaging and storage operations.
@@ -74,11 +77,11 @@
* the other, and it will cause problems in the communication subsystems
*/
-typedef uint32_t orte_jobid_t;
+typedef opal_jobid_t orte_jobid_t;
#define ORTE_JOBID_T OPAL_UINT32
#define ORTE_JOBID_MAX UINT32_MAX-2
#define ORTE_JOBID_MIN 0
-typedef uint32_t orte_vpid_t;
+typedef opal_vpid_t orte_vpid_t;
#define ORTE_VPID_T OPAL_UINT32
#define ORTE_VPID_MAX UINT32_MAX-2
#define ORTE_VPID_MIN 0
@@ -116,11 +119,7 @@
/*
* define the process name structure
*/
-struct orte_process_name_t {
- orte_jobid_t jobid; /**< Job number */
- orte_vpid_t vpid; /**< Process id - equivalent to rank */
-};
-typedef struct orte_process_name_t orte_process_name_t;
+typedef opal_proc_name_t orte_process_name_t;
/**