[
https://issues.apache.org/jira/browse/FLINK-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chen Qin updated FLINK-4266:
----------------------------
Summary: Remote Storage Statebackend (was: Cassandra StateBackend)
> Remote Storage Statebackend
> ---------------------------
>
> Key: FLINK-4266
> URL: https://issues.apache.org/jira/browse/FLINK-4266
> Project: Flink
> Issue Type: New Feature
> Components: State Backends, Checkpointing
> Affects Versions: 1.0.3, 1.2.0
> Reporter: Chen Qin
> Priority: Minor
>
> Current FileSystem statebackend limits whole state size to disk space.
> For long running task that hold window content for long period of time, it
> needs to split out states to durable remote storage and replicated across
> data centers.
> We look into implementation from leverage checkpoint timestamp as versioning
> and do range query to get current state; we also want to reduce "hot states"
> hitting remote db per every update between adjacent checkpoints by tracking
> update logs and merge them, do batch updates only when checkpoint; lastly, we
> are looking for eviction policy that can identify "hot keys" in k/v state and
> lazy load those "cold keys" from Cassandra.
> For now, we don't have good story regarding to data retirement. We might have
> to keep forever until manually run command and clean per job related state
> data. Some of features might related to incremental checkpointing feature, we
> hope to align with effort there also.
> Welcome comments, I will try to put a draft design doc after gathering some
> feedback.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)