[ 
https://issues.apache.org/jira/browse/FLINK-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711090#comment-14711090
 ] 

Stephan Ewen commented on FLINK-1730:
-------------------------------------

There is a lot of support in Flink's runtime to actually store an intermediate 
data set persistent, starting in memory, going to disk if needed. It has not 
been connected to the APIs yet, because the committers that worked prioritized 
some other aspects (high availability) in the meantime.

This feature should really work on the network stack level (caching buffers is 
most efficient) and the execution graph level (can recompute lost results upon 
failures, properly represent dependencies in the DAG).

Concerning other caching levels: I am not sure any other level is really 
needed. Any disk level caching should start in memory as long as memory is 
available. Any memory-level caching that silently drops the cache and requires 
lineage-based recomputation is pretty unpredictable, so 
memory-destaging-to-disk is the most sensible version.

> Add a FlinkTools.persist style method to the Data Set.
> ------------------------------------------------------
>
>                 Key: FLINK-1730
>                 URL: https://issues.apache.org/jira/browse/FLINK-1730
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: Stephan Ewen
>            Priority: Minor
>
> I think this is an operation that will be needed more prominently. Defining a 
> point where one long logical program is broken into different executions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to