----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65449/#review197293 -----------------------------------------------------------
Fix it, then Ship it! src/slave/slave.cpp Line 1909 (original), 1916 (patched) <https://reviews.apache.org/r/65449/#comment277417> Can you add a comment here on why you are not sending `ExitedExecutorMessage`? Is it because the expectation is that agent will (eventually) re-register and reconcile the state in this scenario? src/slave/slave.cpp Lines 2678 (patched) <https://reviews.apache.org/r/65449/#comment277419> s/task,/task;/ src/slave/slave.cpp Lines 2692 (patched) <https://reviews.apache.org/r/65449/#comment277420> s/received/it was received/ src/slave/slave.cpp Lines 2738 (patched) <https://reviews.apache.org/r/65449/#comment277422> There is another step for us to trigger this right? ``` (4) Master now sends another `runTaskMessage` for the same executor id with `launch_executor = true`. ``` src/slave/slave.cpp Lines 2759 (patched) <https://reviews.apache.org/r/65449/#comment277424> s/launch the executor/launch the executor but the master doesn't know about it yet because the `ExitedExecutorMessage` is still in flight./ src/slave/slave.cpp Lines 2785 (patched) <https://reviews.apache.org/r/65449/#comment277426> Should we be sending an `ExitedExecutorMessage` in this case to be safe? I guess not strictly require since the expectation is that one is already in flight and if it gets dropped we will hopefully reconcile. Worth adding a comment though for posterity. src/slave/slave.cpp Lines 2789 (patched) <https://reviews.apache.org/r/65449/#comment277427> Ideally we should have tests for each of these scenarios here. Do we? src/slave/slave.cpp Lines 3402 (patched) <https://reviews.apache.org/r/65449/#comment277428> "Consider doing a `CHECK` here since this shouldn't be possible." - Vinod Kone On Feb. 5, 2018, 3 a.m., Meng Zhu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65449/ > ----------------------------------------------------------- > > (Updated Feb. 5, 2018, 3 a.m.) > > > Review request for mesos, Benjamin Mahler, Chun-Hung Hsiao, and Vinod Kone. > > > Bugs: MESOS-1720 > https://issues.apache.org/jira/browse/MESOS-1720 > > > Repository: mesos > > > Description > ------- > > Master relies on `ExitedExecutorMessage` from the agent to recycle > executor entry. However, this message won't be sent if the executor > never actually launched (due to transient error), leaving executor > info on the master lingering and resource claimed. > See MESOS-1720. > > This patch fixes this issue by sending the `ExitedExecutorMessage` > from the agent if the executor is never launched. > > > Diffs > ----- > > src/slave/slave.hpp 30151c4886e12e9183a971b86b854e28a8ca1b39 > src/slave/slave.cpp f98f37321872d090176b7cc50873fc3c627773f5 > src/tests/mock_slave.hpp 942ead57fc67bdd2a268c67575952349838dc280 > src/tests/mock_slave.cpp 597d7abef20dd5f89b16e4616233f02760b9d037 > src/tests/slave_tests.cpp b54688add9949b9c4ac2ce3a42503a411e6da09f > > > Diff: https://reviews.apache.org/r/65449/diff/3/ > > > Testing > ------- > > make check > Dedicated test in #65448 > > > Thanks, > > Meng Zhu > >