Sergio Esteves created FLINK-6054:
-------------------------------------
Summary: Add new state backend that dynamically stores data in
memory and external storage
Key: FLINK-6054
URL: https://issues.apache.org/jira/browse/FLINK-6054
Project: Flink
Issue Type: New Feature
Components: State Backends, Checkpointing
Reporter: Sergio Esteves
Priority: Minor
This feature would be useful for memory-intensive applications that need to
maintain state for long periods of time; e.g., event-time streaming application
with long-lived windows that tolerate large amounts of lateness.
This feature would allow to scale the state and, in the example above, tolerate
a very large (possibly unbounded) amount of lateness, which can be useful in a
set of scenarios, like the one of Photon in the Google Advertising System
(white paper: "Photon: Fault-tolerant and Scalable Joining of Continuous Data
Streams").
In a nutshell, the idea would be to have a quota for the maximum memory that a
state cell (different keys and namespaces) can occupy. When that quota gets
fully occupied, new state data would be written out to disk. Then, when state
needs to be retrieved, data is read entirely from memory - persisted data is
loaded into memory in the background at the same time that data pertaining to
the quota is being fetched (this reduces I/O overhead).
Different policies, defining when to offload/load data from/to memory, can be
implemented to govern the overall memory utilization. We already have a
preliminary implementation with promising results in terms of memory savings
(in the context of streaming applications with windows that tolerate lateness).
More details are to be given soon through a design document.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)