Hi all,

we are intending to change the behavior of the suppressOffers() method of
MesosSchedulerDriver with regard to the transparent re-registration.

Currently, when driver becomes disconnected from a master, it performs on
its own a re-registration with an empty set of suppressed roles. This
causes un-suppression
of all the suppressed roles of the framework.

The plan is to alter this behavior into preserving the suppression state on
this re-registration.

The required set of suppressed roles will be stored in the driver, which
will be now performing re-registration with this set (instead of an empty
one),
and updating the stored set whenever a call modifying the suppression state
of the roles in the allocator is performed.
Currently, the driver has two methods which perform such calls:
suppressOffers()  and reviveOffers().

Please feel free to raise any concerns or objections - especially if you
are aware of any V0 frameworks which (probably implicitly) depend on
un-suppression of the roles when this re-registration occurs.



Note that:
 - Frameworks which do not call suppressOffers() are, obviously, unaffected
by this change.

 - Frameworks that reliably prevent transparent-re-registration (for
example, by calling driver.abort() immediately from the disconnected()
callback), should also be not affected.

 - Storing the suppressed roles list for re-registration and clearing it in
reviveOffers() do not change anything for the existing frameworks. It is
setting this list in suppressOffers() which might be a cause of concerns.

 - I'm using the word "un-suppression" because re-registering with roles
removed from the suppressed roles list is NOT equivalent to performing
REVIVE call for these roles (unlike REVIVE, it does not clear offerFilters
in the allocator).

=====
A bit of background on why this change is needed.

To properly support V0 frameworks with large number of roles, it is
necessary for the driver not to change the suppression state of the roles
on its own.
Therefore, due to the existence of the transparent re-registration in the
driver, we will need to store the required suppression state in the driver
and make it re-register using this state.

We could possibly avoid the proposed change of suppressOffers() by adding
to the driver new interface for changing the suppression state, leaving
suppressOffers() as it is, and marking it as deprecated.

However, this will leave the behaviour of suppressOffers() deeply
inconsistent with everything else.
Compare the following two sequences of events.
First one:
 - The framework creates and starts a driver with roles "role1", "role2"...
"role500", the driver registers
 - The framework calls a new method driver.suppressOffersForRoles({"role1",
..., "role500"}), the driver performs SUPPRESS call for these roles and
stores them in its suppressed roles set.
   (Alternative with the same result: the framework calls
driver.updateFramework(FrameworkInfo, suppressedRoles={"role1", ...,
"role500"}), the driver performs UPDATE_FRAMEWORK call with those
parameters and stores the new suppressed roles set).
 - The driver, due to some reason, disconnects and re-registers with the
same master, providing the stored suppressed roles set.
 - All the roles are still suppressed
Second one:
 - The framework creates and starts a driver with roles "role1", "role2"...
"role500", the driver registers
 - The framework calls driver.suppressOffers(), the driver performs
SUPPRESS call for all roles, but doesn't modify required suppression state.
 - The driver, due to some reason, disconnects and re-registers with the
same master, providing the stored suppressed roles set, which is empty.
 - Now, none of the roles are suppressed, allocator generates offers for
500 roles which will likely be declined by the framework.

This is one of the examples which makes us strongly consider altering the
interaction between suppressOffers() and the transparent re-registration
when we add storing the suppression state to the driver.

Regards,
Andrei Sekretenko

Reply via email to