[ 
https://issues.apache.org/jira/browse/FLINK-10995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916286#comment-16916286
 ] 

zhijiang commented on FLINK-10995:
----------------------------------

[~ykt836], this Jira is for improving the serialization stack in RecordWriter, 
which could bring benefit for both PIPELINED and BLOCKING partitions at the 
moment. In detail the intermediate serialization data buffer would need copy 
only once for all the subpartitions.  But for the blocking subpartitions, the 
same referenced target buffer would still be persisted into separate file as 
you mentioned. In theory it can be persisted into only one file for all the 
subpartitions, but it is not in the scope of this jira. Further the same data 
could be transported only once in the network stack if many consumers are in 
the same TaskManager. We might further focus on these improvements future, but 
might not in release-1.10. But this Jira would be covered in release-1.10.

> Copy intermediate serialization results only once for broadcast mode
> --------------------------------------------------------------------
>
>                 Key: FLINK-10995
>                 URL: https://issues.apache.org/jira/browse/FLINK-10995
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Network
>    Affects Versions: 1.8.0
>            Reporter: zhijiang
>            Assignee: zhijiang
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The emitted records from operator would be firstly serialized into 
> intermediate bytes array in {{RecordSerializer}}, then copy the intermediate 
> results into target buffers for different sub partitions.  For broadcast 
> mode, the same intermediate results would be copied as many times as the 
> number of sub partitions, and this would affect the performance seriously in 
> large scale jobs.
> We can copy to only one target buffer which would be shared by all the sub 
> partitions to reduce the overheads. For emitting latency marker in broadcast 
> mode, we should flush the previous shared target buffers first, and then 
> request a new buffer for the target sub partition to send latency marker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to