[jira] [Commented] (FLINK-12070) Make blocking result partitions consumable multiple times

Yingjie Cao (JIRA) Tue, 28 May 2019 04:20:07 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849615#comment-16849615
 ]


Yingjie Cao commented on FLINK-12070:
-------------------------------------

I am now testing the latest implementation, and find some problems. When the 
data volume is large and can not be fit into memory，the memory can be ran out, 
and no new physical pages can be mapped, the TM seemed blocked and not 
responsed any more (I tried to check the stack using jstack but failed). 
Besides, some tools like free and top are also influenced (did not response in 
time),  and the cpu usage of TM and kernal swap process increased suddenly. 
Meanwhile, the io spped of disk is also low, it seemed the kernal flush is also 
influenced. This will continue for several or even tens of minutes, finally the 
job can succeed or incur heartbeat timeout (I use large heartbeat timeout and 
akka timeout). The old implementation (spilled subpartition) dose not have the 
problem. Though the old implementation can also leverage the page cache to 
accelerate the write process, page cache is not a must, if no more memory is 
left, data can be write to disk directly.

 

Latter, I will post more test results under this JIRA.

 

Concerning the Bug I mentioned earlier: I mean the file was deleted (close is 
wrong) when sending the EOF event, that is a bug of the earlier mmappartition 
branch, and there is no problem with the master branch.

> Make blocking result partitions consumable multiple times
> ---------------------------------------------------------
>
>                 Key: FLINK-12070
>                 URL: https://issues.apache.org/jira/browse/FLINK-12070
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Network
>            Reporter: Till Rohrmann
>            Assignee: Stephan Ewen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.9.0
>
>         Attachments: image-2019-04-18-17-38-24-949.png
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In order to avoid writing produced results multiple times for multiple 
> consumers and in order to speed up batch recoveries, we should make the 
> blocking result partitions to be consumable multiple times. At the moment a 
> blocking result partition will be released once the consumers has processed 
> all data. Instead the result partition should be released once the next 
> blocking result has been produced and all consumers of a blocking result 
> partition have terminated. Moreover, blocking results should not hold on slot 
> resources like network buffers or memory as it is currently the case with 
> {{SpillableSubpartitions}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-12070) Make blocking result partitions consumable multiple times

Reply via email to