Benjamin Mahler created MESOS-1407:
--------------------------------------
Summary: Provide state reconciliation for frameworks.
Key: MESOS-1407
URL: https://issues.apache.org/jira/browse/MESOS-1407
Project: Mesos
Issue Type: Epic
Reporter: Benjamin Mahler
State inconsistencies can arise between the framework scheduler's view of tasks
and the view of tasks within Mesos.
Frameworks, like Aurora, have had to compensate for these inconsistencies by
running a specialized executor on the slave that reconciles what happened on
the slave against what the scheduler thinks is the current state of tasks.
This ticket is to track ways to allow frameworks to detect state
inconsistencies both when:
(1) There are tasks known to the framework, but unknown to Mesos. This can
arise when the framework's intent was not carried out, or when a terminal event
is not delivered to the framework.
(2) There are tasks known to Mesos but unknown to the framework. This can arise
when the framework suffered information loss, _assuming the framework always
persists its intent prior to taking an action_.
We have recently added a reconciliation message that allows frameworks to deal
with (1), but nothing for (2) just yet. This could be accomplished using an
"implicit" form of the same reconciliation message, or we could consider
providing a way for frameworks to receive a full list of the tasks, which
allows them to reconcile both (1) and (2).
--
This message was sent by Atlassian JIRA
(v6.2#6252)