Let me try to clarify: The problem is that I don't get to decide manually if the framwork should try to take a new id or re-use the old one, but it needs to be decided programmatically, by an algorithm.
Afaik it's not possible to get the time when the framework disconnected from mesos, so it's not possible to know how much time is left until the failover timeout runs out. Therefore, if I want to attempt task reconciliation, I just have to try registering with my old framework id and see what happens. However, in the case where the failover timeout already passed, I now need to programmatically detect this error and try again with an empty framework id. My question was, is it possible to do this? (also, we actually use a failover timeout of 1 week, but it doesn't really change the problem and I mistakenly assumed that an example with smaller values would be more intuitive) On 13.07.2016 14:50, Neil Conway wrote: > On Wed, Jul 13, 2016 at 2:44 PM, Evers Benno <ben...@yandex-team.ru> wrote: >> imagine the following situation: I am a framework with failover timeout >> of 1 hour, and 59 minutes and 55 seconds after shutting down I want to >> register with the master again. >> >> If my registration attempt arrives at the master within the time limit >> everything will be fine and I even get back the old tasks for >> reconciliation, but if it arrives slightly later the framework id is >> permanently blocked by mesos, and I am not able to register. Instead, I >> will receive an error()-callback with the message "Framework has been >> removed". > > Right: if you set a failover_timeout of 1 hour, your framework is > expected to reregister within one hour. If it does not, all of its > tasks will be killed and you need to start over with a new > FrameworkID. Can you clarify which aspect of this behavior is > problematic for you? > > Note that a failover_timeout of 1 hour is probably a little low. > >> Is there any way to reliably connect to the master while also >> reconciling old tasks if possible? > > Sorry, not sure what you mean by this. > > Neil >