Ralph and George, i just made PR 249 https://github.com/open-mpi/ompi/pull/249 this fixes heterogeneous support for the master by moving the jobid,vpid from the ORTE down to the OPAL layer.
this required to add the OPAL_NAME dss type in order to correctly convert an opal_process_name_t on an heterogeneous cluster. could you please have a look at it when you get a chance ? Cheers, Gilles On 2014/10/16 12:26, Gilles Gouaillardet wrote: > OK, revert done : > > commit b5aea782ce1111c116af095a7e7a7310e9e2a018 > Author: Gilles Gouaillardet <gilles.gouaillar...@iferc.org> > Date: Thu Oct 16 12:24:38 2014 +0900 > > Revert "Fix heterogeneous support" > > Per the discussion at > http://www.open-mpi.org/community/lists/devel/2014/10/16050.php > > This reverts commit c9c5d4011bf6ea1ade1a5bd9b6a77f02157dc774. > > > Cheers, > > Gilles > > On 2014/10/16 12:13, Ralph Castain wrote: >> On Oct 15, 2014, at 8:08 PM, Gilles Gouaillardet >> <gilles.gouaillar...@iferc.org> wrote: >> >>> Ralph, >>> >>> let me quickly reply about this one : >>> >>> On 2014/10/16 12:00, Ralph Castain wrote: >>>> I also don't understand some of the changes in this commit. For example, >>>> why did you replace the OPAL_MODEX_SEND_STRING macro with essentially a >>>> hard-coded replica of that macro? >>> OPAL_MODEX_SEND_STRING put a key of type OPAL_BYTE_OBJECT >>> >>> in ompi_proc_complete_init: >>> OPAL_MODEX_RECV_VALUE(ret, OPAL_DSTORE_ARCH, >>> (opal_proc_t*)&proc->super, >>> (void**)&ui32ptr, OPAL_UINT32); >>> >>> a key of type OPAL_UINT32 is expected, and an key of type >>> OPAL_BYTE_OBJECT was sent >>> >>> i chose to "fix" the sender (e.g. send a key of type OPAL_UINT32) >>> >>> should i have "fixed" the receiver instead ? >> Hmmm...probably the receiver, but let me take a look at it. The two should >> have mirrored each other, which is why I couldn't understand the change. The >> problem may be that the recv should be recv_string, but I need to look at >> the two macros and see why the mirrors weren't used. >> >>>> Would you mind reverting this until we can better understand what is going >>>> on, and decide on a path forward? >>> no problem >>> based on my previous comment, shall i also revert the change in >>> ompi/proc/proc.c as well ? >> I'd revert the commit as a whole. Let's look at the hetero issue in its >> entirety and figure out how we want to handle it. >> >> Thanks >> Ralph >> >>> Cheers, >>> >>> Gilles >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/10/16049.php >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/10/16050.php > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/10/16051.php