On Wed, Jul 13, 2016 at 2:44 PM, Evers Benno <[email protected]> wrote: > imagine the following situation: I am a framework with failover timeout > of 1 hour, and 59 minutes and 55 seconds after shutting down I want to > register with the master again. > > If my registration attempt arrives at the master within the time limit > everything will be fine and I even get back the old tasks for > reconciliation, but if it arrives slightly later the framework id is > permanently blocked by mesos, and I am not able to register. Instead, I > will receive an error()-callback with the message "Framework has been > removed".
Right: if you set a failover_timeout of 1 hour, your framework is expected to reregister within one hour. If it does not, all of its tasks will be killed and you need to start over with a new FrameworkID. Can you clarify which aspect of this behavior is problematic for you? Note that a failover_timeout of 1 hour is probably a little low. > Is there any way to reliably connect to the master while also > reconciling old tasks if possible? Sorry, not sure what you mean by this. Neil
