[ 
https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606142#comment-14606142
 ] 

Jason Dere commented on HIVE-10673:
-----------------------------------

[~mmokhtar] or [~gopalv] can probably give more detail, but they found that 
during a shuffle join a large amount of the CPU/IO was spent sorting. 

While this does not work for MR, for other execution engines (such as Tez), it 
is possible to create a reduce-side join that uses unsorted inputs, in order to 
eliminate the sorting. We use the hash join algorithm to perform the join in 
the reducer, so this requires the small tables in the join to fit in the hash 
table for this to work. Testing with this patch [~mmokhtar] found some decent 
time savings.

> Dynamically partitioned hash join for Tez
> -----------------------------------------
>
>                 Key: HIVE-10673
>                 URL: https://issues.apache.org/jira/browse/HIVE-10673
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning, Query Processor
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>         Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, 
> HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch
>
>
> Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the 
> reducer are unsorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to