[jira] [Updated] (MESOS-7426) Support for agent lifecycle management.
[ https://issues.apache.org/jira/browse/MESOS-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-7426: -- Target Version/s: 1.6.0 (was: 1.5.0) > Support for agent lifecycle management. > --- > > Key: MESOS-7426 > URL: https://issues.apache.org/jira/browse/MESOS-7426 > Project: Mesos > Issue Type: Epic > Components: agent >Reporter: Anand Mazumdar >Assignee: Anand Mazumdar > Labels: agent-lifecycle, mesosphere > > This epic co-ordinates the work for introducing agent lifecycle management in > Mesos allowing a framework to be notified in case of agent node failures. The > existing {{Event::Failure}} is not enough for frameworks to know that the > given agent node isn't ever coming back. > The primary motivations for introducing such a feature would be: > - Currently, when an agent running a task fails, there is inherently an > operator interference needed (manual step) to remove the node via a > configuration API exposed by the framework e.g., dcos cassandra node replace > for the cassandra framework. This needs to be done once for every stateful > framework running on the cluster. > - When an agent is marked as unhealthy, the removal rate is bounded if the > `--agent_rate_removal_limit` option is set. This is specifically problematic > for operators relying on EC2 autoscaling groups or for workload bursting to > another cloud. > - When an agent is marked as unhealthy, the removal rate is bounded if the > `--agent_rate_removal_limit` option is set. This is specifically problematic > for operators relying on EC2 autoscaling groups or for workload bursting to > another cloud. > - When the fault domain associated with an agent changes (e.g., it is moved > from an unallocated rack to an allocated rack), there is no feedback > mechanism for the framework. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7426) Support for agent lifecycle management.
[ https://issues.apache.org/jira/browse/MESOS-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-7426: -- Target Version/s: 1.5.0 > Support for agent lifecycle management. > --- > > Key: MESOS-7426 > URL: https://issues.apache.org/jira/browse/MESOS-7426 > Project: Mesos > Issue Type: Epic > Components: agent >Reporter: Anand Mazumdar >Assignee: Anand Mazumdar > Labels: agent-lifecycle, mesosphere > > This epic co-ordinates the work for introducing agent lifecycle management in > Mesos allowing a framework to be notified in case of agent node failures. The > existing {{Event::Failure}} is not enough for frameworks to know that the > given agent node isn't ever coming back. > The primary motivations for introducing such a feature would be: > - Currently, when an agent running a task fails, there is inherently an > operator interference needed (manual step) to remove the node via a > configuration API exposed by the framework e.g., dcos cassandra node replace > for the cassandra framework. This needs to be done once for every stateful > framework running on the cluster. > - When an agent is marked as unhealthy, the removal rate is bounded if the > `--agent_rate_removal_limit` option is set. This is specifically problematic > for operators relying on EC2 autoscaling groups or for workload bursting to > another cloud. > - When an agent is marked as unhealthy, the removal rate is bounded if the > `--agent_rate_removal_limit` option is set. This is specifically problematic > for operators relying on EC2 autoscaling groups or for workload bursting to > another cloud. > - When the fault domain associated with an agent changes (e.g., it is moved > from an unallocated rack to an allocated rack), there is no feedback > mechanism for the framework. -- This message was sent by Atlassian JIRA (v6.4.14#64029)