[ 
https://issues.apache.org/jira/browse/FLINK-14845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978284#comment-16978284
 ] 

Yingjie Cao commented on FLINK-14845:
-------------------------------------

[~pnowojski]  You are right, taking both Blocking and Pipelined scenario into 
consideration would make the system more complete. Do you think only 
compressing the whole buffer using the task thread and leaving the the flushing 
sliced buffer uncompressed is acceptable for Pipelined mode?

Currently, the blocking subpartition does not interact with netty thread and 
the task thread directly write Buffers out, If we choose to compress with netty 
thread, we have to consider the interaction between netty thread and task 
thread.

> Introduce data compression to blocking shuffle.
> -----------------------------------------------
>
>                 Key: FLINK-14845
>                 URL: https://issues.apache.org/jira/browse/FLINK-14845
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Network
>            Reporter: Yingjie Cao
>            Assignee: Yingjie Cao
>            Priority: Major
>
> Currently, blocking shuffle writer writes raw output data to disk without 
> compression. For IO bounded scenario, this can be optimized by compressing 
> the output data. It is better to introduce a compression mechanism and offer 
> users a config option to let the user decide whether to compress the shuffle 
> data. Actually, we hava implemented compression in our inner Flink version 
> and  here are some key points:
> 1. Where to compress/decompress?
> Compressing at upstream and decompressing at downstream.
> 2. Which thread do compress/decompress?
> Task threads do compress/decompress.
> 3. Data compression granularity.
> Per buffer.
> 4. How to handle that when data size become even bigger after compression?
> Give up compression in this case and introduce an extra flag to identify if 
> the data was compressed, that is, the output may be a mixture of compressed 
> and uncompressed data.
>  
> We'd like to introduce blocking shuffle data compression to Flink if there 
> are interests.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to