----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/53202/#review153939 -----------------------------------------------------------
src/master/master.cpp (lines 6036 - 6037) <https://reviews.apache.org/r/53202/#comment223438> Assuming frameworks are not partition-aware based on the agent verison doesn't feel right. Ultimately it doesn't seem to make a difference in terms of messages Mesos sends: if the framework is not connected, no update is sent in this method. Later when it reconnects and reconciles, Master checks its capability and decides: `(5) Task is unknown, slave is unreachable: TASK_UNREACHABLE` or `TASK_LOST` if the frameworks is not partition-aware. Would it make sense to set the state to `TASK_UNREACHABLE` in this case? Looks like the only differences it makes are: - - metrics: Regardless of framework capabilities, the agent is indeed unreachable: `TASK_UNREACHABLE` is more in line with the (1.1) master's logic and the metrics don't reflect 100% of what the master sends out anyways. - documentation: we set the state because it makes sense to the master and not by guessing the framework's capabilities. also worth-mentioning is the fact that this doesn't violate the API semantics: partition-awareness is checked at reconciliaton time. - Jiang Yan Xu On Oct. 26, 2016, 12:51 p.m., Neil Conway wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/53202/ > ----------------------------------------------------------- > > (Updated Oct. 26, 2016, 12:51 p.m.) > > > Review request for mesos, Vinod Kone and Jiang Yan Xu. > > > Bugs: MESOS-6483 > https://issues.apache.org/jira/browse/MESOS-6483 > > > Repository: mesos > > > Description > ------- > > We don't guarantee compatibility with pre-1.0 agents. However, since it > is easy to avoid a CHECK failure in the master when an old agent > re-registers, it seems worth doing so. > > > Diffs > ----- > > src/master/master.cpp 23ddb995b4ad0fcdb589974308a2e81ececdad31 > > Diff: https://reviews.apache.org/r/53202/diff/ > > > Testing > ------- > > `make check` > > Disabled the code that fills-in `frameworks.recovered`; verified that > `PartitionTest.DisconnectedFramework` dies with a `CHECK` failure if this RR > is not applied but passes this with RR applied. > > > Thanks, > > Neil Conway > >