----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/40660/#review109044 -----------------------------------------------------------
Ship it! We may also want to link in the recovery path, but the agent <-> executor protocol is such that we don't need to in order to fix the issue. - Ben Mahler On Nov. 24, 2015, 6:25 p.m., Anand Mazumdar wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/40660/ > ----------------------------------------------------------- > > (Updated Nov. 24, 2015, 6:25 p.m.) > > > Review request for mesos and Vinod Kone. > > > Bugs: MESOS-3851 > https://issues.apache.org/jira/browse/MESOS-3851 > > > Repository: mesos > > > Description > ------- > > Previously, we did not `link` against the executor `PID` while > (re)-registering. This might lead to libprocess creating ephemeral sockets > everytime a `send(...)` was invoked. This was leading to races where messages > might appear on the Executor out of order. This change does a `link(...)` on > the executor PID to ensure ordered message delivery. > > ---Not to be included in commit message--- > I am still not comfortable bringing back the reverted commit > https://reviews.apache.org/r/40107/ . I can see one more race condition even > with a `link(...)`. We can still have messages coming out of order when the > first socket fails after sending the first message when still in flight. A > new socket gets created when we send the second message now, which might > arrive earlier then the first message leading to a race. But, this is a > behavior that is heavily relied upon elsewhere in our code-base. Happy to be > proven wrong though and be convinced that we can bring back the reverted > commit now after this change. > > > Diffs > ----- > > src/slave/slave.cpp 9055f2a789cb19f3579c15a379ea505dfef0578c > > Diff: https://reviews.apache.org/r/40660/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Anand Mazumdar > >