[ https://issues.apache.org/jira/browse/TEZ-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195726#comment-14195726 ]
Rajesh Balamohan commented on TEZ-1733: --------------------------------------- lgtm. +1 one very minor comment {code} + public int compare(FileChunk o1, FileChunk o2) { + if (o1.getLength() < o2.getLength()) { + return -1; + } else if (o1.getLength() > o2.getLength()) { + return 1; + } + return 0; + } + }); {code} can be simplified to {code} public int compare(FileChunk o1, FileChunk o2) { return o1.getLength() - o2.getLength(); } {code} > TezMerger should sort FileChunks on decompressed size > ----------------------------------------------------- > > Key: TEZ-1733 > URL: https://issues.apache.org/jira/browse/TEZ-1733 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.5.2 > Reporter: Gopal V > Assignee: Gopal V > Priority: Critical > Attachments: TEZ-1733.1.patch, TEZ-1733.2.patch > > > MAPREDUCE-3685 fixed the Merger sort order for file chunks to use the > decompressed size, to cut-down on CPU and IO costs. > TezMerger needs an equivalent sorted TreeSet which sorts by the data with-in > sizes rather than actual file sizes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)