-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59584/#review176167
-----------------------------------------------------------




src/slave/flags.cpp
Lines 355 (patched)
<https://reviews.apache.org/r/59584/#comment249521>

    s/sent to the executor/sent to the executor during recovery/



src/slave/flags.cpp
Lines 356 (patched)
<https://reviews.apache.org/r/59584/#comment249522>

    s/MESOS-5322/MESOS-5332/



src/slave/flags.cpp
Lines 365-366 (patched)
<https://reviews.apache.org/r/59584/#comment249523>

    Maybe something like:
    
    these "old" executors will reply on their half-open connection and receive 
a RST; without any retries, they will fail to reconnect and be killed by the 
agent once the executor re-registration timeout elapses.



src/slave/slave.cpp
Lines 5964-5965 (patched)
<https://reviews.apache.org/r/59584/#comment249525>

    Ditto, as above.



src/slave/slave.cpp
Lines 5967 (patched)
<https://reviews.apache.org/r/59584/#comment249526>

    s/an optional/optional/



src/slave/slave.cpp
Lines 5972-5973 (patched)
<https://reviews.apache.org/r/59584/#comment249527>

    Is this TODO necessary, since this entire block only executes when 
`executor->pid.isSome() && executor->pid.get()`?



src/slave/slave.cpp
Lines 5975-5979 (patched)
<https://reviews.apache.org/r/59584/#comment249530>

    Why const ref for the IDs but not for the retry interval?


- Greg Mann


On May 26, 2017, 12:56 a.m., Benjamin Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59584/
> -----------------------------------------------------------
> 
> (Updated May 26, 2017, 12:56 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Greg Mann, and Vinod Kone.
> 
> 
> Bugs: MESOS-5332, MESOS-7057 and MESOS-7569
>     https://issues.apache.org/jira/browse/MESOS-5332
>     https://issues.apache.org/jira/browse/MESOS-7057
>     https://issues.apache.org/jira/browse/MESOS-7569
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> PID-based v0 executors using Mesos libraries >= 1.1.2 always re-link
> with the agent upon receiving the reconnect message. This avoids the
> executor replying on a half-open TCP connection to the old agent
> (possible if netfilter is dropping packets, see: MESOS-7057).
> However, PID-based executors using Mesos libraries < 1.1.2 do not
> re-link and are therefore prone to replying on a half-open connection
> after the agent restarts. If we only send a single reconnect message,
> these "old" executors will reply on their half-open connection,
> receive a RST, and think the agent just died. To ensure these "old"
> executors can reconnect in the presence of netfilter dropping packets,
> we introduced optional retries of the reconnect message. This results
> in "old" executors correctly establishing a link when processing the
> second reconnect message.
> 
> Generally, users should not enable this flag unless they are affected
> by this issue.
> 
> 
> Diffs
> -----
> 
>   src/slave/flags.hpp b66995630f89dfb95a6d0cf66efc5d7590e90cbc 
>   src/slave/flags.cpp 0c8276e425a6a7d22ee68edc6cc25b331635ec44 
>   src/slave/slave.cpp 15e4d68714556ca30a766acd3b9729367df680c3 
> 
> 
> Diff: https://reviews.apache.org/r/59584/diff/1/
> 
> 
> Testing
> -------
> 
> Added tests in follow up reviews.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>

Reply via email to