[ https://issues.apache.org/jira/browse/MESOS-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15991310#comment-15991310 ]
Anand Mazumdar commented on MESOS-7426: --------------------------------------- Link to Design Doc: https://docs.google.com/document/d/1XvP0acT8xadSev8UG2BXtsPlEh0Rb7R3WV3s-TnTeqg > Support for agent lifecycle management. > --------------------------------------- > > Key: MESOS-7426 > URL: https://issues.apache.org/jira/browse/MESOS-7426 > Project: Mesos > Issue Type: Epic > Components: agent > Reporter: Anand Mazumdar > Assignee: Anand Mazumdar > Labels: agent-lifecycle, mesosphere > > This epic co-ordinates the work for introducing agent lifecycle management in > Mesos allowing a framework to be notified in case of agent node failures. The > existing {{Event::Failure}} is not enough for frameworks to know that the > given agent node isn't ever coming back. > The primary motivations for introducing such a feature would be: > - Currently, when an agent running a task fails, there is inherently an > operator interference needed (manual step) to remove the node via a > configuration API exposed by the framework e.g., dcos cassandra node replace > for the cassandra framework. This needs to be done once for every stateful > framework running on the cluster. > - When an agent is marked as unhealthy, the removal rate is bounded if the > `--agent_rate_removal_limit` option is set. This is specifically problematic > for operators relying on EC2 autoscaling groups or for workload bursting to > another cloud. > - When an agent is marked as unhealthy, the removal rate is bounded if the > `--agent_rate_removal_limit` option is set. This is specifically problematic > for operators relying on EC2 autoscaling groups or for workload bursting to > another cloud. > - When the fault domain associated with an agent changes (e.g., it is moved > from an unallocated rack to an allocated rack), there is no feedback > mechanism for the framework. -- This message was sent by Atlassian JIRA (v6.3.15#6346)