Hi all,

imagine the following situation: I am a framework with failover timeout
of 1 hour, and 59 minutes and 55 seconds after shutting down I want to
register with the master again.

If my registration attempt arrives at the master within the time limit
everything will be fine and I even get back the old tasks for
reconciliation, but if it arrives slightly later the framework id is
permanently blocked by mesos, and I am not able to register. Instead, I
will receive an error()-callback with the message "Framework has been
removed".

Is there any way to reliably connect to the master while also
reconciling old tasks if possible?

I was looking around how other frameworks solve this, but it seems that
Kafka doesn't handle this at all
(https://dcosjira.atlassian.net/browse/KAFKA-4), and Marathon scans the
error message for the string "Framework has been removed" and changes
the framework id in this case.

If the latter is the intended solution, are these strings considered
part of the mesos API? Is it guaranteed they will not be changed after
the 1.0 release?

Best regards,
Benno

Reply via email to