Sergio Esteves created FLINK-6054: ------------------------------------- Summary: Add new state backend that dynamically stores data in memory and external storage Key: FLINK-6054 URL: https://issues.apache.org/jira/browse/FLINK-6054 Project: Flink Issue Type: New Feature Components: State Backends, Checkpointing Reporter: Sergio Esteves Priority: Minor
This feature would be useful for memory-intensive applications that need to maintain state for long periods of time; e.g., event-time streaming application with long-lived windows that tolerate large amounts of lateness. This feature would allow to scale the state and, in the example above, tolerate a very large (possibly unbounded) amount of lateness, which can be useful in a set of scenarios, like the one of Photon in the Google Advertising System (white paper: "Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams"). In a nutshell, the idea would be to have a quota for the maximum memory that a state cell (different keys and namespaces) can occupy. When that quota gets fully occupied, new state data would be written out to disk. Then, when state needs to be retrieved, data is read entirely from memory - persisted data is loaded into memory in the background at the same time that data pertaining to the quota is being fetched (this reduces I/O overhead). Different policies, defining when to offload/load data from/to memory, can be implemented to govern the overall memory utilization. We already have a preliminary implementation with promising results in terms of memory savings (in the context of streaming applications with windows that tolerate lateness). More details are to be given soon through a design document. -- This message was sent by Atlassian JIRA (v6.3.15#6346)