Ralph and George,

i just made PR 249 https://github.com/open-mpi/ompi/pull/249
this fixes heterogeneous support for the master by moving the jobid,vpid
from the ORTE down to the OPAL layer.

this required to add the OPAL_NAME dss type in order to correctly convert
an opal_process_name_t on an heterogeneous cluster.

could you please have a look at it when you get a chance ?

Cheers,

Gilles

On 2014/10/16 12:26, Gilles Gouaillardet wrote:
> OK, revert done :
>
> commit b5aea782ce1111c116af095a7e7a7310e9e2a018
> Author: Gilles Gouaillardet <gilles.gouaillar...@iferc.org>
> Date:   Thu Oct 16 12:24:38 2014 +0900
>
>     Revert "Fix heterogeneous support"
>    
>     Per the discussion at
> http://www.open-mpi.org/community/lists/devel/2014/10/16050.php
>    
>     This reverts commit c9c5d4011bf6ea1ade1a5bd9b6a77f02157dc774.
>
>
> Cheers,
>
> Gilles
>
> On 2014/10/16 12:13, Ralph Castain wrote:
>> On Oct 15, 2014, at 8:08 PM, Gilles Gouaillardet 
>> <gilles.gouaillar...@iferc.org> wrote:
>>
>>> Ralph,
>>>
>>> let me quickly reply about this one :
>>>
>>> On 2014/10/16 12:00, Ralph Castain wrote:
>>>> I also don't understand some of the changes in this commit. For example, 
>>>> why did you replace the OPAL_MODEX_SEND_STRING macro with essentially a 
>>>> hard-coded replica of that macro?
>>> OPAL_MODEX_SEND_STRING put a key of type OPAL_BYTE_OBJECT
>>>
>>> in ompi_proc_complete_init:
>>>                OPAL_MODEX_RECV_VALUE(ret, OPAL_DSTORE_ARCH,
>>> (opal_proc_t*)&proc->super,
>>>                                      (void**)&ui32ptr, OPAL_UINT32);
>>>
>>> a key of type OPAL_UINT32 is expected, and an key of type
>>> OPAL_BYTE_OBJECT was sent
>>>
>>> i chose to "fix" the sender (e.g. send a key of type OPAL_UINT32)
>>>
>>> should i have "fixed" the receiver instead ?
>> Hmmm...probably the receiver, but let me take a look at it. The two should 
>> have mirrored each other, which is why I couldn't understand the change. The 
>> problem may be that the recv should be recv_string, but I need to look at 
>> the two macros and see why the mirrors weren't used.
>>
>>>> Would you mind reverting this until we can better understand what is going 
>>>> on, and decide on a path forward?
>>> no problem
>>> based on my previous comment, shall i also revert the change in
>>> ompi/proc/proc.c as well ?
>> I'd revert the commit as a whole. Let's look at the hetero issue in its 
>> entirety and figure out how we want to handle it.
>>
>> Thanks
>> Ralph
>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/10/16049.php
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/10/16050.php
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/10/16051.php

Reply via email to