Sergio Esteves created FLINK-6054:
-------------------------------------

             Summary: Add new state backend that dynamically stores data in 
memory and external storage
                 Key: FLINK-6054
                 URL: https://issues.apache.org/jira/browse/FLINK-6054
             Project: Flink
          Issue Type: New Feature
          Components: State Backends, Checkpointing
            Reporter: Sergio Esteves
            Priority: Minor


This feature would be useful for memory-intensive applications that need to 
maintain state for long periods of time; e.g., event-time streaming application 
with long-lived windows that tolerate large amounts of lateness.

This feature would allow to scale the state and, in the example above, tolerate 
a very large (possibly unbounded) amount of lateness, which can be useful in a 
set of scenarios, like the one of Photon in the Google Advertising System 
(white paper: "Photon: Fault-tolerant and Scalable Joining of Continuous Data 
Streams").

In a nutshell, the idea would be to have a quota for the maximum memory that a 
state cell (different keys and namespaces) can occupy. When that quota gets 
fully occupied, new state data would be written out to disk. Then, when state 
needs to be retrieved, data is read entirely from memory - persisted data is 
loaded into memory in the background at the same time that data pertaining to 
the quota is being fetched (this reduces I/O overhead).

Different policies, defining when to offload/load data from/to memory, can be 
implemented to govern the overall memory utilization. We already have a 
preliminary implementation with promising results in terms of memory savings 
(in the context of streaming applications with windows that tolerate lateness).

More details are to be given soon through a design document.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to