[
https://issues.apache.org/jira/browse/MESOS-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Zhu reassigned MESOS-10085:
--------------------------------
Assignee: Dong Zhu
> Operator API events are silently dropped on transient authorization failures.
> -----------------------------------------------------------------------------
>
> Key: MESOS-10085
> URL: https://issues.apache.org/jira/browse/MESOS-10085
> Project: Mesos
> Issue Type: Bug
> Reporter: Andrei Sekretenko
> Assignee: Dong Zhu
> Priority: Major
>
> One of the purposes of the operator V1 API events is to allow subscribers
> maintain an up-to-date view of master's state: as a response to SUBSCRIBE
> call, the events subscriber first receives an initial view of master state
> and then receives updates to that view in the form of `Event`s.
> The parts of the state and updates to them which the subscriber's principal
> is not authorized to see, are filtered out by objectApprover::approve()
> method.
> In case of authorization failure, `approve()` returns an Error.
> Currently, the event filtering code handles `false` (i.e. not authorized) and
> Error in the same way: the event is dropped.
> (See
> https://github.com/apache/mesos/blob/f8a3dd334934094ec44e07fa350f958d218bc78f/src/common/http.hpp#L414
> and, for example,
> https://github.com/apache/mesos/blob/f8a3dd334934094ec44e07fa350f958d218bc78f/src/master/master.cpp#L12257
> )
> In presence of transient authorization failures, this can lead to
> inconsistencies in Event stream. The simplet example would be receiving
> TASK_UPDATED event without ever receiving TASK_ADDED for the task in question.
> Such inconsistencies may result in the subscriber being unable to maintain
> correct view of master's state.
> One of the options to fix this issue is to disconnect the subscriber in case
> of authorization failure, so that it gets the full master's view when it
> subscribes back.
> Note that before introduction of synchronous authorization (in Mesos 1.9 and
> earlier) this issue also existed, but the transient errors were happening in
> `Authorizer::getObjectApprover()` method which was then called per event (as
> opposed to per-subscriber after synchronous authz was introduced).
> Similar issue is present in processing of Operator API calls, including
> SUBSCRIBE call: the objects are silently dropped on transient authorization
> failures (see MESOS-10099).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)