> On Nov 28, 2016, at 6:09 PM, Yan Xu <[email protected]> wrote: > > So one thing that was brought up during offline conversations was that if the > host reboot is associated with hardware change (e.g., a new memory stick): > > • Currently: the agent would skip the recovery (and the chance of > running into incompatible agent info) and register as a new agent. > • With the change: the agent could run into incompatible agent info due > to resource change and flap indefinitely until the operator intervenes. > > To mitigate this and maintain the current behavior, we can have the agent > remove `rm -f <work_dir>/meta/slaves/latest` automatically upon recovery > failure but only after the host has rebooted. This way the agent can restart > as a new agent without operator intervention. > > Any thoughts?
I still think you need a mechanism for the master/agent to tell you whether it will honor the restart policy. Without this, you have to lock the framework to a Mesos version. An empty RestartPolicy is also problematic since it precludes using RestartPolicy in pods. If you later want to restart a task inside a pod but not across agent restarts you would have no way to express that. J
