[jira] [Updated] (MESOS-7426) Support for agent lifecycle management.

2017-12-22 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7426:
--
Target Version/s: 1.6.0  (was: 1.5.0)

> Support for agent lifecycle management.
> ---
>
> Key: MESOS-7426
> URL: https://issues.apache.org/jira/browse/MESOS-7426
> Project: Mesos
>  Issue Type: Epic
>  Components: agent
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: agent-lifecycle, mesosphere
>
> This epic co-ordinates the work for introducing agent lifecycle management in 
> Mesos allowing a framework to be notified in case of agent node failures. The 
> existing {{Event::Failure}} is not enough for frameworks to know that the 
> given agent node isn't ever coming back.
> The primary motivations for introducing such a feature would be:
> - Currently, when an agent running a task fails, there is inherently an 
> operator interference needed (manual step) to remove the node via a 
> configuration API exposed by the framework e.g., dcos cassandra node replace 
> for the cassandra framework. This needs to be done once for every stateful 
> framework running on the cluster.
> - When an agent is marked as unhealthy, the removal rate is bounded if the 
> `--agent_rate_removal_limit` option is set. This is specifically problematic 
> for operators relying on EC2 autoscaling groups or for workload bursting to 
> another cloud.
> - When an agent is marked as unhealthy, the removal rate is bounded if the 
> `--agent_rate_removal_limit` option is set. This is specifically problematic 
> for operators relying on EC2 autoscaling groups or for workload bursting to 
> another cloud.
> - When the fault domain associated with an agent changes (e.g., it is moved 
> from an unallocated rack to an allocated rack), there is no feedback 
> mechanism for the framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7426) Support for agent lifecycle management.

2017-09-22 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7426:
--
Target Version/s: 1.5.0

> Support for agent lifecycle management.
> ---
>
> Key: MESOS-7426
> URL: https://issues.apache.org/jira/browse/MESOS-7426
> Project: Mesos
>  Issue Type: Epic
>  Components: agent
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: agent-lifecycle, mesosphere
>
> This epic co-ordinates the work for introducing agent lifecycle management in 
> Mesos allowing a framework to be notified in case of agent node failures. The 
> existing {{Event::Failure}} is not enough for frameworks to know that the 
> given agent node isn't ever coming back.
> The primary motivations for introducing such a feature would be:
> - Currently, when an agent running a task fails, there is inherently an 
> operator interference needed (manual step) to remove the node via a 
> configuration API exposed by the framework e.g., dcos cassandra node replace 
> for the cassandra framework. This needs to be done once for every stateful 
> framework running on the cluster.
> - When an agent is marked as unhealthy, the removal rate is bounded if the 
> `--agent_rate_removal_limit` option is set. This is specifically problematic 
> for operators relying on EC2 autoscaling groups or for workload bursting to 
> another cloud.
> - When an agent is marked as unhealthy, the removal rate is bounded if the 
> `--agent_rate_removal_limit` option is set. This is specifically problematic 
> for operators relying on EC2 autoscaling groups or for workload bursting to 
> another cloud.
> - When the fault domain associated with an agent changes (e.g., it is moved 
> from an unallocated rack to an allocated rack), there is no feedback 
> mechanism for the framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)