[ https://issues.apache.org/jira/browse/MESOS-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neil Conway updated MESOS-5344: ------------------------------- Description: This epic covers two related tasks: 1. Clarifying the semantics of TASK_LOST, and allow frameworks to learn when a task is *truly* lost (i.e., not running), versus the current LOST semantics of "may or may not be running". 2. Allowing frameworks to control how partitioned tasks are handled. was: The TASK_LOST task status describes two different situations: (a) the task was not launched because of an error (e.g., insufficient available resources), or (b) the master lost contact with a running task (e.g., due to a network partition); the master will kill the task when it can (e.g., when the network partition heals), but in the meantime the task may still be running. This has two problems: 1. Using the same task status for two fairly different situations is confusing. 2. In the partitioned-but-still-running case, frameworks have no easy way to determine when a task has truly terminated. To address these problems, we propose introducing a new task status, TASK_GONE, which would be used whenever a task can be guaranteed to not be running. > Revise TaskStatus semantics > --------------------------- > > Key: MESOS-5344 > URL: https://issues.apache.org/jira/browse/MESOS-5344 > Project: Mesos > Issue Type: Epic > Components: master > Reporter: Neil Conway > Labels: mesosphere > > This epic covers two related tasks: > 1. Clarifying the semantics of TASK_LOST, and allow frameworks to learn when > a task is *truly* lost (i.e., not running), versus the current LOST semantics > of "may or may not be running". > 2. Allowing frameworks to control how partitioned tasks are handled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)