Re: [OMPI devel] dfference between my_node_rank and my_local_rank in orte proc_info_t
The s1 method is correct - other two are wrong Sent from my iPhone > On Sep 26, 2014, at 9:24 AM, Pritchard Jr., Howard wrote: > > Hi Folks, > > I’m trying to figure out something about the kind of info pmi’s are suppose > to be feeding back up in to orte/ompi, partly because native launch > doesn’t seem to work too well in trunk. > > One of the things I’m puzzling about is the purpose of the my_node_rank > field in orte_proc_info_t. > > I’m particularly puzzled, because with the new pmix s2 and cray components, > we are returning the NODE RANK (i.e. which NODE in a virtual sense, the > proc is on) via a pmix_X_parse_pmap function. See find_my_node in both > the pmix_cray_parse_pmap and pmix_s2_parse_pmap functions. This > value is subsequently returned when the pmix component is queried with the > PMIX_NODE_RANK attribute. > > But, with s1, we seem to just be assigning the same value to my_node_rank > as to my_local_rank, based on the contents of the array returned from > PMI_Get_clique_ranks. > > It appears that in the 1.7/1.8 branch, the behavior is to do the s1 route, > irrespective of whether pmi2 from slurm or the pmi.fuzzy from cray is used. > > Thanks for any help, > > Howard > > > - > Howard Pritchard > HPC-5 > Los Alamos National Laboratory > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15923.php
[OMPI devel] dfference between my_node_rank and my_local_rank in orte proc_info_t
Hi Folks, I'm trying to figure out something about the kind of info pmi's are suppose to be feeding back up in to orte/ompi, partly because native launch doesn't seem to work too well in trunk. One of the things I'm puzzling about is the purpose of the my_node_rank field in orte_proc_info_t. I'm particularly puzzled, because with the new pmix s2 and cray components, we are returning the NODE RANK (i.e. which NODE in a virtual sense, the proc is on) via a pmix_X_parse_pmap function. See find_my_node in both the pmix_cray_parse_pmap and pmix_s2_parse_pmap functions. This value is subsequently returned when the pmix component is queried with the PMIX_NODE_RANK attribute. But, with s1, we seem to just be assigning the same value to my_node_rank as to my_local_rank, based on the contents of the array returned from PMI_Get_clique_ranks. It appears that in the 1.7/1.8 branch, the behavior is to do the s1 route, irrespective of whether pmi2 from slurm or the pmi.fuzzy from cray is used. Thanks for any help, Howard - Howard Pritchard HPC-5 Los Alamos National Laboratory
Re: [OMPI devel] race condition in oob/tcp
Thanks! On Fri, Sep 26, 2014 at 12:56 AM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Ralph, > > i just commited r32799 in order to fix this issue. > i cmr'ed (#4923) and set the target for 1.8.4 > > Cheers, > > Gilles > > > On 2014/09/23 22:55, Ralph Castain wrote: > > Thanks! I won't have time to work on it this week, but appreciate your > effort. Also, thanks for clarifying the race condition vis 1.8 - I agree it > is not a blocker for that release. > > Ralph > > On Sep 22, 2014, at 4:49 PM, Gilles Gouaillardet > wrote: > > > Ralph, > > here is the patch i am using so far. > i will resume working on this from Wednesday (there is at least one remaining > race condition yet) unless you have the time to take care of it today. > > so far, the race condition has only been observed in real life with the > grpcomm/rcd module, and this is not the default in v1.8, so imho this is not > a blocker for v1.8.3 > > Cheers, > > Gilles > > On Tue, Sep 23, 2014 at 7:46 AM, Ralph Castain > wrote: > Gilles - please let me know if/when you think you'll do this. I'm debating > about adding it to 1.8.3, but don't want to delay that release too long. > Alternatively, I can take care of it if you don't have time (I'm asking if > you can do it solely because you have the reproducer). > > > On Sep 21, 2014, at 6:54 AM, Ralph Castain > wrote: > > > Sounds fine with me - please go ahead, and thanks > > On Sep 20, 2014, at 10:26 PM, Gilles Gouaillardet > wrote: > > > Thanks for the pointer George ! > > On Sat, Sep 20, 2014 at 5:46 AM, George Bosilca > wrote: > Or copy the handshake protocol design of the TCP BTL... > > > the main difference between oob/tcp and btl/tcp is the way we resolve the > situation in which two processes send their first message to each other at > the same time. > > in oob/tcp, all (e.g. one or two) sockets are closed and the higher vpid is > directed to retry establishing a connection. > > in btl/tcp, the useless socket is closed (e.g. the one that was connect-ed on > the lower vpid and the one that was accept-ed on the higher vpid. > > > my first impression is that oob/tcp is un-necessary complex and it should use > the simpler and most efficient protocol of btl/tcp. > that being said, this conclusion could be too naive and for some good reasons > i ignore, the btl/tcp handshake protocol might not be a good fit for oob/tcp. > > any thoughts ? > > i will revamp oob/tcp in order to use the same btl/tcp handshake protocol > from tomorrow unless indicated otherwise > > Cheers, > > Gilles > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15885.php > > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15895.php > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15897.php > > > > ___ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15900.php > > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15920.php >
Re: [OMPI devel] 1.8.3rc2 available
just FYI: configure && make && make install && make test succeeded on my SPARC64/Linux/GCC (both enable-debug=yes and no). Takahiro Kawashima, MPI development team, Fujitsu > Usual place: > > http://www.open-mpi.org/software/ompi/v1.8/ > > Please beat it up as we want to release on Fri, barring discovery of a blocker > Ralph
Re: [OMPI devel] race condition in oob/tcp
Ralph, i just commited r32799 in order to fix this issue. i cmr'ed (#4923) and set the target for 1.8.4 Cheers, Gilles On 2014/09/23 22:55, Ralph Castain wrote: > Thanks! I won't have time to work on it this week, but appreciate your > effort. Also, thanks for clarifying the race condition vis 1.8 - I agree it > is not a blocker for that release. > > Ralph > > On Sep 22, 2014, at 4:49 PM, Gilles Gouaillardet > wrote: > >> Ralph, >> >> here is the patch i am using so far. >> i will resume working on this from Wednesday (there is at least one >> remaining race condition yet) unless you have the time to take care of it >> today. >> >> so far, the race condition has only been observed in real life with the >> grpcomm/rcd module, and this is not the default in v1.8, so imho this is not >> a blocker for v1.8.3 >> >> Cheers, >> >> Gilles >> >> On Tue, Sep 23, 2014 at 7:46 AM, Ralph Castain wrote: >> Gilles - please let me know if/when you think you'll do this. I'm debating >> about adding it to 1.8.3, but don't want to delay that release too long. >> Alternatively, I can take care of it if you don't have time (I'm asking if >> you can do it solely because you have the reproducer). >> >> >> On Sep 21, 2014, at 6:54 AM, Ralph Castain wrote: >> >>> Sounds fine with me - please go ahead, and thanks >>> >>> On Sep 20, 2014, at 10:26 PM, Gilles Gouaillardet >>> wrote: >>> Thanks for the pointer George ! On Sat, Sep 20, 2014 at 5:46 AM, George Bosilca wrote: Or copy the handshake protocol design of the TCP BTL... the main difference between oob/tcp and btl/tcp is the way we resolve the situation in which two processes send their first message to each other at the same time. in oob/tcp, all (e.g. one or two) sockets are closed and the higher vpid is directed to retry establishing a connection. in btl/tcp, the useless socket is closed (e.g. the one that was connect-ed on the lower vpid and the one that was accept-ed on the higher vpid. my first impression is that oob/tcp is un-necessary complex and it should use the simpler and most efficient protocol of btl/tcp. that being said, this conclusion could be too naive and for some good reasons i ignore, the btl/tcp handshake protocol might not be a good fit for oob/tcp. any thoughts ? i will revamp oob/tcp in order to use the same btl/tcp handshake protocol from tomorrow unless indicated otherwise Cheers, Gilles ___ devel mailing list de...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2014/09/15885.php >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/09/15895.php >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/09/15897.php > > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15900.php