Andrei Sekretenko created MESOS-9763:
----------------------------------------

             Summary: Race between two re-subscriptions against an empty master.
                 Key: MESOS-9763
                 URL: https://issues.apache.org/jira/browse/MESOS-9763
             Project: Mesos
          Issue Type: Bug
          Components: master, scheduler api
            Reporter: Andrei Sekretenko


Currently, subscription (and re-subscription)  is not atomic.
It consists of three steps performed by two actors:
 - Validating the supplied FrameworkInfo against the master state (which 
possibly includes an existing FrameworkInfo)
 - Authorizing the (re-)subscribing framework
 - Applying the update

A partitioned or buggy (or both) framework can trigger a race by sending two 
SUBSCRIBE calls with differing FrameworkInfo's on master failover.

One of the possible sequences of events:
1. FrameworkInfo A is validated by master (which has no data about this 
framework)
2. conflicting FrameworkInfo B is validated by master  (which stores no data 
about this framework as SchedulerA is not even authorized yet)
3. Scheduler A is authorized
4. Scheduler B is authorized
5. FrameworkInfo A is applied
6. Master attempts to apply FrameworkInfoB which is no longer valid after the 
previous step.

One simple example affected is an attempt to re-subscribe with two different 
principals: currently the scheduler B's principal will be silently ignored at 
step 6.

At the moment of writing I'm not sure if there are other problems cause by this 
race.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to