[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096747#comment-13096747
 ] 

Harsh J commented on MAPREDUCE-2910:
------------------------------------

How much is the overhead of compressed, empty partition files?

> Allow empty MapOutputFile segments
> ----------------------------------
>
>                 Key: MAPREDUCE-2910
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2910
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task, tasktracker
>    Affects Versions: 0.20.2, 0.23.0
>            Reporter: Binglin Chang
>            Priority: Minor
>             Fix For: 0.23.0
>
>
> As the scale of cluster and job get larger, we see a lot of empty partitions 
> in MapOutputFile due to large reduce numbers or partition skew. When map 
> output compression is enabled, empty map output partitions gets larger & has 
> additional compressor/decompressor initialization overhead. 
> This can be optimized by allowing empty MapOutputFile segments, where the 
> rawLength & partLength of IndexRecord all equal to 0. Corresponding support 
> need to be added to IFile reader, writer, and reduce shuffle copier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to