Thanks! I won't have time to work on it this week, but appreciate your effort. Also, thanks for clarifying the race condition vis 1.8 - I agree it is not a blocker for that release.
Ralph On Sep 22, 2014, at 4:49 PM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com> wrote: > Ralph, > > here is the patch i am using so far. > i will resume working on this from Wednesday (there is at least one remaining > race condition yet) unless you have the time to take care of it today. > > so far, the race condition has only been observed in real life with the > grpcomm/rcd module, and this is not the default in v1.8, so imho this is not > a blocker for v1.8.3 > > Cheers, > > Gilles > > On Tue, Sep 23, 2014 at 7:46 AM, Ralph Castain <r...@open-mpi.org> wrote: > Gilles - please let me know if/when you think you'll do this. I'm debating > about adding it to 1.8.3, but don't want to delay that release too long. > Alternatively, I can take care of it if you don't have time (I'm asking if > you can do it solely because you have the reproducer). > > > On Sep 21, 2014, at 6:54 AM, Ralph Castain <r...@open-mpi.org> wrote: > >> Sounds fine with me - please go ahead, and thanks >> >> On Sep 20, 2014, at 10:26 PM, Gilles Gouaillardet >> <gilles.gouaillar...@gmail.com> wrote: >> >>> Thanks for the pointer George ! >>> >>> On Sat, Sep 20, 2014 at 5:46 AM, George Bosilca <bosi...@icl.utk.edu> wrote: >>> Or copy the handshake protocol design of the TCP BTL... >>> >>> >>> the main difference between oob/tcp and btl/tcp is the way we resolve the >>> situation in which two processes send their first message to each other at >>> the same time. >>> >>> in oob/tcp, all (e.g. one or two) sockets are closed and the higher vpid is >>> directed to retry establishing a connection. >>> >>> in btl/tcp, the useless socket is closed (e.g. the one that was connect-ed >>> on the lower vpid and the one that was accept-ed on the higher vpid. >>> >>> >>> my first impression is that oob/tcp is un-necessary complex and it should >>> use the simpler and most efficient protocol of btl/tcp. >>> that being said, this conclusion could be too naive and for some good >>> reasons i ignore, the btl/tcp handshake protocol might not be a good fit >>> for oob/tcp. >>> >>> any thoughts ? >>> >>> i will revamp oob/tcp in order to use the same btl/tcp handshake protocol >>> from tomorrow unless indicated otherwise >>> >>> Cheers, >>> >>> Gilles >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/09/15885.php >> > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15895.php > > <oobtcp2.patch>_______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15897.php