Re: Registering and framework failover

Neil Conway Wed, 13 Jul 2016 05:50:54 -0700

On Wed, Jul 13, 2016 at 2:44 PM, Evers Benno <[email protected]> wrote:
> imagine the following situation: I am a framework with failover timeout
> of 1 hour, and 59 minutes and 55 seconds after shutting down I want to
> register with the master again.
>
> If my registration attempt arrives at the master within the time limit
> everything will be fine and I even get back the old tasks for
> reconciliation, but if it arrives slightly later the framework id is
> permanently blocked by mesos, and I am not able to register. Instead, I
> will receive an error()-callback with the message "Framework has been
> removed".


Right: if you set a failover_timeout of 1 hour, your framework is
expected to reregister within one hour. If it does not, all of its
tasks will be killed and you need to start over with a new
FrameworkID. Can you clarify which aspect of this behavior is
problematic for you?

Note that a failover_timeout of 1 hour is probably a little low.

> Is there any way to reliably connect to the master while also
> reconciling old tasks if possible?

Sorry, not sure what you mean by this.

Neil

Re: Registering and framework failover

Reply via email to