[ 
https://issues.apache.org/jira/browse/FLINK-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711113#comment-14711113
 ] 

Sachin Goel commented on FLINK-1730:
------------------------------------

Yes. Going through the entire pact task logic, I observed all of that. I was 
almost surprised how well it could support this functionality. 
One of the ideas I have is to implement two specific gates: One for input, 
which resides directly on memory manager, and an output gate, whose output is 
written to the memory, and not transferred over network.
This way, the Pack task can just create one of these two gates and add to the 
existing gates, depending on whether the results are available in the cache or 
not. After that, it's just a matter of initializing the {{NoOpDriver}}. 
Further, although I'm not sure about it, the memory manager itself can spill 
data to disk if needed, right? That way, it's not required at all to implement 
something in-memory-cum-disk. It's already there.
The relevant storage on the memory manager will have locks based on task name 
and indexes, so that the cache is not cleared out until the accessing tasks 
have finished reading it. And we could perhaps follow a LRU scheme for clearing 
out the cache storage.

> Add a FlinkTools.persist style method to the Data Set.
> ------------------------------------------------------
>
>                 Key: FLINK-1730
>                 URL: https://issues.apache.org/jira/browse/FLINK-1730
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: Stephan Ewen
>            Priority: Minor
>
> I think this is an operation that will be needed more prominently. Defining a 
> point where one long logical program is broken into different executions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to