Re: MESOS-6233 Allow agents to re-register post a host reboot

James Peach Tue, 29 Nov 2016 09:12:49 -0800

> On Nov 28, 2016, at 6:09 PM, Yan Xu <[email protected]> wrote:
> 
> So one thing that was brought up during offline conversations was that if the 
> host reboot is associated with hardware change (e.g., a new memory stick):
> 
>       • Currently: the agent would skip the recovery (and the chance of 
> running into incompatible agent info) and register as a new agent.
>       • With the change: the agent could run into incompatible agent info due 
> to resource change and flap indefinitely until the operator intervenes.
> 
> To mitigate this and maintain the current behavior, we can have the agent 
> remove `rm -f <work_dir>/meta/slaves/latest` automatically upon recovery 
> failure but only after the host has rebooted. This way the agent can restart 
> as a new agent without operator intervention. 
> 
> Any thoughts?


I still think you need a mechanism for the master/agent to tell you whether it 
will honor the restart policy. Without this, you have to lock the framework to 
a Mesos version.

An empty RestartPolicy is also problematic since it precludes using 
RestartPolicy in pods. If you later want to restart a task inside a pod but not 
across agent restarts you would have no way to express that.

J

Re: MESOS-6233 Allow agents to re-register post a host reboot

Reply via email to