> On Aug. 10, 2017, 9 p.m., Jiang Yan Xu wrote:
> > Some of the comments below were made before I started to feel that we are 
> > probably doing too many conversions to justify storing these tasks in 
> > TASK_UNREACHABLE. Perhaps we can just store them in 
> > `Framework.unreachableTasks` but in TASK_LOST state?
> > 
> > It's possible that we can add another map `BoundedHashMap<TaskID, 
> > process::Owned<Task>> unreachableNonPartitionAwareTasks;` for these tasks 
> > but it's clunky in the sense that you have to clarify that 
> > `unreachableTasks` is only for partition aware tasks but in fact all of 
> > these tasks belong to the same framework which is either parition aware or 
> > not, however with the possibility of changing capability... so it's 
> > probably easier to describe things if we just put all of them in 
> > `unreachableTasks` and simply say that "if the framework is not 
> > partition-aware, the tasks stored in `unreachableTasks` may be in 
> > `TASK_LOST`".
> > 
> > If we do that, then some of the comments below don't apply any more but I 
> > am keep them just for posterity (some styling issues etc).

Dicussed in person, +1 for keeping the non partition aware but unreachable 
tasks in Framework.unreachableTasks in state TASK_LOST.


- Megha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61473/#review182605
-----------------------------------------------------------


On Aug. 10, 2017, 4:07 p.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61473/
> -----------------------------------------------------------
> 
> (Updated Aug. 10, 2017, 4:07 p.m.)
> 
> 
> Review request for mesos, Vinod Kone and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7215
>     https://issues.apache.org/jira/browse/MESOS-7215
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Master will not kill the tasks for non-Partition aware frameworks
> when an unreachable agent re-registers with the master.
> Master used to send a ShutdownFrameworkMessages to the agent
> to kill the tasks from non partition aware frameworks including the
> ones that are still registered which was problematic because the offer
> from this agent could still go to the same framework which could then
> launch new tasks. The agent would then receive tasks of the same
> framework and ignore them because it thinks the framework is shutting
> down. The framework is not shutting down of course, so from the master
> and the scheduler’s perspective the task is pending in STAGING forever
> until the next agent reregistration, which could happen much later.
> This commit fixes the problem by not shutting down the non-partition
> aware frameworks on such an agent.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 959091c8ec03b6ac7bcb5d21b04d2f7d5aff7d54 
>   src/master/master.hpp b802fd153a10f6012cea381f153c28cc78cae995 
>   src/master/master.cpp 7f38a5e21884546d4b4c866ca5918db779af8f99 
>   src/tests/partition_tests.cpp 62a84f797201ccd18b71490949e3130d2b9c3668 
> 
> 
> Diff: https://reviews.apache.org/r/61473/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>

Reply via email to