----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59584/ -----------------------------------------------------------
Review request for mesos, Anand Mazumdar, Greg Mann, and Vinod Kone. Bugs: MESOS-5332 and MESOS-7057 https://issues.apache.org/jira/browse/MESOS-5332 https://issues.apache.org/jira/browse/MESOS-7057 Repository: mesos Description ------- PID-based v0 executors using Mesos libraries >= 1.1.2 always re-link with the agent upon receiving the reconnect message. This avoids the executor replying on a half-open TCP connection to the old agent (possible if netfilter is dropping packets, see: MESOS-7057). However, PID-based executors using Mesos libraries < 1.1.2 do not re-link and are therefore prone to replying on a half-open connection after the agent restarts. If we only send a single reconnect message, these "old" executors will reply on their half-open connection, receive a RST, and think the agent just died. To ensure these "old" executors can reconnect in the presence of netfilter dropping packets, we introduced optional retries of the reconnect message. This results in "old" executors correctly establishing a link when processing the second reconnect message. Generally, users should not enable this flag unless they are affected by this issue. Diffs ----- src/slave/flags.hpp b66995630f89dfb95a6d0cf66efc5d7590e90cbc src/slave/flags.cpp 0c8276e425a6a7d22ee68edc6cc25b331635ec44 src/slave/slave.cpp 15e4d68714556ca30a766acd3b9729367df680c3 Diff: https://reviews.apache.org/r/59584/diff/1/ Testing ------- Added tests in follow up reviews. Thanks, Benjamin Mahler