[ 
https://issues.apache.org/jira/browse/PIG-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701886#comment-14701886
 ] 

Rohini Palaniswamy commented on PIG-4657:
-----------------------------------------

Just edited the previous comment giving more details. It was due to the nature 
of comparison in that job and also the volume of data was huge with lot of 
spills. For datatypes like bytearray I don't think there is much difference and 
I have not seen any slowness reported. 

> [Pig on Tez] Optimize GroupBy and Distinct key comparison
> ---------------------------------------------------------
>
>                 Key: PIG-4657
>                 URL: https://issues.apache.org/jira/browse/PIG-4657
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>         Attachments: PIG-4657-1.patch
>
>
>    While bytes comparator cannot be used for joins till TEZ-2715 is 
> available, they can be used for group by and distinct if they have only one 
> Tez input. If there is more than one input due to union optimization 
> (OrderedGroupedMergedKVInput) , full comparator has to be still used as 
> OrderedGroupedMergedKVInput uses the comparator to merge the two underlying 
> inputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to