-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40660/#review109044
-----------------------------------------------------------

Ship it!


We may also want to link in the recovery path, but the agent <-> executor 
protocol is such that we don't need to in order to fix the issue.

- Ben Mahler


On Nov. 24, 2015, 6:25 p.m., Anand Mazumdar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40660/
> -----------------------------------------------------------
> 
> (Updated Nov. 24, 2015, 6:25 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-3851
>     https://issues.apache.org/jira/browse/MESOS-3851
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, we did not `link` against the executor `PID` while 
> (re)-registering. This might lead to libprocess creating ephemeral sockets 
> everytime a `send(...)` was invoked. This was leading to races where messages 
> might appear on the Executor out of order. This change does a `link(...)` on 
> the executor PID to ensure ordered message delivery.
> 
> ---Not to be included in commit message---
> I am still not comfortable bringing back the reverted commit 
> https://reviews.apache.org/r/40107/ . I can see one more race condition even 
> with a `link(...)`. We can still have messages coming out of order when the 
> first socket fails after sending the first message when still in flight. A 
> new socket gets created when we send the second message now, which might 
> arrive earlier then the first message leading to a race. But, this is a 
> behavior that is heavily relied upon elsewhere in our code-base. Happy to be 
> proven wrong though and be convinced that we can bring back the reverted 
> commit now after this change.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp 9055f2a789cb19f3579c15a379ea505dfef0578c 
> 
> Diff: https://reviews.apache.org/r/40660/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>

Reply via email to