[ https://issues.apache.org/jira/browse/MAPREDUCE-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096747#comment-13096747 ]
Harsh J commented on MAPREDUCE-2910: ------------------------------------ How much is the overhead of compressed, empty partition files? > Allow empty MapOutputFile segments > ---------------------------------- > > Key: MAPREDUCE-2910 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2910 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task, tasktracker > Affects Versions: 0.20.2, 0.23.0 > Reporter: Binglin Chang > Priority: Minor > Fix For: 0.23.0 > > > As the scale of cluster and job get larger, we see a lot of empty partitions > in MapOutputFile due to large reduce numbers or partition skew. When map > output compression is enabled, empty map output partitions gets larger & has > additional compressor/decompressor initialization overhead. > This can be optimized by allowing empty MapOutputFile segments, where the > rawLength & partLength of IndexRecord all equal to 0. Corresponding support > need to be added to IFile reader, writer, and reduce shuffle copier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira