[ 
https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888650#comment-15888650
 ] 

Neil Conway commented on MESOS-6223:
------------------------------------

When we implement this, we should make sure that we benchmark the performance 
impact on agent recovery, in particular when there is frequent task churn on 
the agent. For example, when an agent has 10k-100k completed tasks and a few (< 
20) running/live tasks; when the agent reboots, we should benchmark how long it 
takes for the agent to complete recovery. This is the situation that motivated 
the introduction of the "boot id" shortcut in the first place. (cc 
[~megha.sharma] [~xujyan])

> Allow agents to re-register post a host reboot
> ----------------------------------------------
>
>                 Key: MESOS-6223
>                 URL: https://issues.apache.org/jira/browse/MESOS-6223
>             Project: Mesos
>          Issue Type: Improvement
>          Components: agent
>            Reporter: Megha Sharma
>            Assignee: Megha Sharma
>
> Agent does’t recover its state post a host reboot, it registers with the 
> master and gets a new SlaveID. With partition awareness, the agents are now 
> allowed to re-register after they have been marked Unreachable. The executors 
> are anyway terminated on the agent when it reboots so there is no harm in 
> letting the agent keep its SlaveID, re-register with the master and reconcile 
> the lost executors. This is a pre-requisite for supporting 
> persistent/restartable tasks in mesos (MESOS-3545).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to