Hi all, we are intending to change the behavior of the suppressOffers() method of MesosSchedulerDriver with regard to the transparent re-registration.
Currently, when driver becomes disconnected from a master, it performs on its own a re-registration with an empty set of suppressed roles. This causes un-suppression of all the suppressed roles of the framework. The plan is to alter this behavior into preserving the suppression state on this re-registration. The required set of suppressed roles will be stored in the driver, which will be now performing re-registration with this set (instead of an empty one), and updating the stored set whenever a call modifying the suppression state of the roles in the allocator is performed. Currently, the driver has two methods which perform such calls: suppressOffers() and reviveOffers(). Please feel free to raise any concerns or objections - especially if you are aware of any V0 frameworks which (probably implicitly) depend on un-suppression of the roles when this re-registration occurs. Note that: - Frameworks which do not call suppressOffers() are, obviously, unaffected by this change. - Frameworks that reliably prevent transparent-re-registration (for example, by calling driver.abort() immediately from the disconnected() callback), should also be not affected. - Storing the suppressed roles list for re-registration and clearing it in reviveOffers() do not change anything for the existing frameworks. It is setting this list in suppressOffers() which might be a cause of concerns. - I'm using the word "un-suppression" because re-registering with roles removed from the suppressed roles list is NOT equivalent to performing REVIVE call for these roles (unlike REVIVE, it does not clear offerFilters in the allocator). ===== A bit of background on why this change is needed. To properly support V0 frameworks with large number of roles, it is necessary for the driver not to change the suppression state of the roles on its own. Therefore, due to the existence of the transparent re-registration in the driver, we will need to store the required suppression state in the driver and make it re-register using this state. We could possibly avoid the proposed change of suppressOffers() by adding to the driver new interface for changing the suppression state, leaving suppressOffers() as it is, and marking it as deprecated. However, this will leave the behaviour of suppressOffers() deeply inconsistent with everything else. Compare the following two sequences of events. First one: - The framework creates and starts a driver with roles "role1", "role2"... "role500", the driver registers - The framework calls a new method driver.suppressOffersForRoles({"role1", ..., "role500"}), the driver performs SUPPRESS call for these roles and stores them in its suppressed roles set. (Alternative with the same result: the framework calls driver.updateFramework(FrameworkInfo, suppressedRoles={"role1", ..., "role500"}), the driver performs UPDATE_FRAMEWORK call with those parameters and stores the new suppressed roles set). - The driver, due to some reason, disconnects and re-registers with the same master, providing the stored suppressed roles set. - All the roles are still suppressed Second one: - The framework creates and starts a driver with roles "role1", "role2"... "role500", the driver registers - The framework calls driver.suppressOffers(), the driver performs SUPPRESS call for all roles, but doesn't modify required suppression state. - The driver, due to some reason, disconnects and re-registers with the same master, providing the stored suppressed roles set, which is empty. - Now, none of the roles are suppressed, allocator generates offers for 500 roles which will likely be declined by the framework. This is one of the examples which makes us strongly consider altering the interaction between suppressOffers() and the transparent re-registration when we add storing the suppression state to the driver. Regards, Andrei Sekretenko