[jira] [Commented] (TEZ-1733) TezMerger should sort FileChunks on decompressed size

Rajesh Balamohan (JIRA) Mon, 03 Nov 2014 21:20:06 -0800

    [ 
https://issues.apache.org/jira/browse/TEZ-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195726#comment-14195726
 ]


Rajesh Balamohan commented on TEZ-1733:
---------------------------------------

lgtm.  +1

one very minor comment
{code}
+    public int compare(FileChunk o1, FileChunk o2) {
+      if (o1.getLength() < o2.getLength()) {
+        return -1;
+      } else if (o1.getLength() > o2.getLength()) {
+        return 1;
+      }
+      return 0;
+    }
+  });
{code}

can be simplified to 

{code}
public int compare(FileChunk o1, FileChunk o2) {
  return o1.getLength() - o2.getLength();
}
{code}


> TezMerger should sort FileChunks on decompressed size
> -----------------------------------------------------
>
>                 Key: TEZ-1733
>                 URL: https://issues.apache.org/jira/browse/TEZ-1733
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.2
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Critical
>         Attachments: TEZ-1733.1.patch, TEZ-1733.2.patch
>
>
>  MAPREDUCE-3685 fixed the Merger sort order for file chunks to use the 
> decompressed size, to cut-down on CPU and IO costs.
> TezMerger needs an equivalent sorted TreeSet which sorts by the data with-in 
> sizes rather than actual file sizes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1733) TezMerger should sort FileChunks on decompressed size

Reply via email to