Kostas Kloudas created FLINK-7771:
-------------------------------------

             Summary: Make the operator state queryable
                 Key: FLINK-7771
                 URL: https://issues.apache.org/jira/browse/FLINK-7771
             Project: Flink
          Issue Type: Improvement
          Components: Queryable State
    Affects Versions: 1.4.0
            Reporter: Kostas Kloudas
            Assignee: Kostas Kloudas
             Fix For: 1.4.0


There seem to be some requests for making the operator (non-keyed) state 
queryable. This means that the user will specify the *uuid* of the operator and 
the *taskId*, and he will be able to access the state that corresponds to that 
operator and for that specific task.

This issue will serve to document the discussion on the topic, so that 
everybody can participate.

Personally, I think that such a feature should wait until some things on state 
handling are stabilized (_e.g._ replication and checkpoint management). My main 
concerns have to do with the semantics and guarantees that such a feature could 
offer *for now*. 

 At first, operator state is essentially a list state that can be reshuffled 
arbitrarily upon restoring or rescaling. This means that task1 will have at a 
given execution attempt elements _A,B,C_ while after restoring (even without 
rescaling) it may have _D,B,E_ without this implying that something happened to 
states _A_ and _C_. They were simply assigned to another task. This makes it 
hard to reason about the results that you get at any point in time, as it 
provides *no locality/consistency guarantees between executions*.

 The above, in combination with the fact that (for now) it is not possible to 
query the state at a specific point in time (_e.g._ the last checkpointed 
state), means that there is no easy way to get a consistent view of the state 
of an operator. So in the example above, when querying _(operatorA, task1)_ and 
_(operatorA, task2)_, the user can get states belonging to different "points in 
time" which can result to duplicates, lost values and all the problems 
encountered in distributed systems when there are no consistency guarantees.

The above illustrates some of the consistency problems that such a feature 
could face now.
 
I also link [~till.rohrmann] and [~skonto] as he also mentioned that this 
feature could be helpful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to