Andrei Sekretenko created MESOS-10023:
-----------------------------------------

             Summary: Allocator method dispatches can be reordered (relative to 
scheduler API calls which triggered them).
                 Key: MESOS-10023
                 URL: https://issues.apache.org/jira/browse/MESOS-10023
             Project: Mesos
          Issue Type: Bug
            Reporter: Andrei Sekretenko


Observed an example of such reordering on a testing cluster with a V1 framework.
Framework side:
 - framework issues ACCrEPT for a slave with no operations and a 365+ days 
filter 
 - framework issues REVIVE call for all roles (which should clear all filters)
 - framework waits for an offer for that slave and never receives it

Master side:
 - master receives ACCEPT, processes the first part and starts authorization
 - master receives REVIVE and dispatches reviveOffers() to the allocator
 - master receives a response from authorizer (for ACCEPT) and dispatches 
recoverResources() with a 365-day filter to the allocator

*We need to provide an ability for the framework to avoid such kind of 
reorderings.*

Things to consider:
 - v1 framework are not required to use a single connection for API requests; 
even if they were, there still is a reconnection case, during which the views 
of the framework and the master on the state of connection might differ. This 
means that we cannot completely avoid this problem by sequencing processing of 
requests from the same connection.

- Currently, all calls directly influencing allocator (except for 
UPDATE_FRAMEWORK) return '202 ACCEPTED` at an early stage of processing. 
Unconditionally changing this might break compatibility with some existing 
frameworks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to