Andrei Sekretenko created MESOS-9763:
----------------------------------------
Summary: Race between two re-subscriptions against an empty master.
Key: MESOS-9763
URL: https://issues.apache.org/jira/browse/MESOS-9763
Project: Mesos
Issue Type: Bug
Components: master, scheduler api
Reporter: Andrei Sekretenko
Currently, subscription (and re-subscription) is not atomic.
It consists of three steps performed by two actors:
- Validating the supplied FrameworkInfo against the master state (which
possibly includes an existing FrameworkInfo)
- Authorizing the (re-)subscribing framework
- Applying the update
A partitioned or buggy (or both) framework can trigger a race by sending two
SUBSCRIBE calls with differing FrameworkInfo's on master failover.
One of the possible sequences of events:
1. FrameworkInfo A is validated by master (which has no data about this
framework)
2. conflicting FrameworkInfo B is validated by master (which stores no data
about this framework as SchedulerA is not even authorized yet)
3. Scheduler A is authorized
4. Scheduler B is authorized
5. FrameworkInfo A is applied
6. Master attempts to apply FrameworkInfoB which is no longer valid after the
previous step.
One simple example affected is an attempt to re-subscribe with two different
principals: currently the scheduler B's principal will be silently ignored at
step 6.
At the moment of writing I'm not sure if there are other problems cause by this
race.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)