[ 
https://issues.apache.org/jira/browse/FLINK-15981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036678#comment-17036678
 ] 

Zhijiang commented on FLINK-15981:
----------------------------------

Thanks for the confirmation and in general we are on the same page.

> When a task is released, it cannot occupy any pooled resources any more.

We can stick to this rule. Although it is interpretable to decouple task and 
partition resources, it would make things more complex from the respective of 
JM/RM. So I also think it is better to limit this issue only in network 
component if possible.

> So we would need buffers per TCP channel. That is often fewer than per 
> subpartition (because if multiplexing) but not always (one slot TMs).

I also prefer to the way of per-channel. From the amount aspect, it seems no 
obvious difference between per-thread and per-channel, or it is hard to say in 
different scenarios. But it is not practical to rely on per-thread for the 
current code base, and we have to realize the assumption that the previous 
buffer must be released when the thread loop to fetch the next data. I 
remembered we ever discussed this issue in another story. :)  Actually it would 
be much easier to rely on per-channel in practice now.

> Control the direct memory in FileChannelBoundedData.FileBufferReader
> --------------------------------------------------------------------
>
>                 Key: FLINK-15981
>                 URL: https://issues.apache.org/jira/browse/FLINK-15981
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Network
>    Affects Versions: 1.10.0
>            Reporter: Jingsong Lee
>            Priority: Critical
>             Fix For: 1.10.1, 1.11.0
>
>
> Now, the default blocking BoundedData is FileChannelBoundedData. In its 
> reader, will create new direct buffer 64KB.
> When parallelism greater than 100, users need configure 
> "taskmanager.memory.task.off-heap.size" to avoid direct memory OOM. It is 
> hard to configure, and it cost a lot of memory. Consider 1000 parallelism, 
> maybe we need 1GB+ for a task manager.
> This is not conducive to the scenario of less slots and large parallelism. 
> Batch jobs could run little by little, but memory shortage would consume a 
> lot.
> If we provided N-Input operators, maybe things will be worse. This means the 
> number of subpartitions that can be requested at the same time will be more. 
> We have no idea how much memory.
> Here are my rough thoughts:
>  * Obtain memory from network buffers.
>  * provide "The maximum number of subpartitions that can be requested at the 
> same time".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to