[ https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606142#comment-14606142 ]
Jason Dere commented on HIVE-10673: ----------------------------------- [~mmokhtar] or [~gopalv] can probably give more detail, but they found that during a shuffle join a large amount of the CPU/IO was spent sorting. While this does not work for MR, for other execution engines (such as Tez), it is possible to create a reduce-side join that uses unsorted inputs, in order to eliminate the sorting. We use the hash join algorithm to perform the join in the reducer, so this requires the small tables in the join to fit in the hash table for this to work. Testing with this patch [~mmokhtar] found some decent time savings. > Dynamically partitioned hash join for Tez > ----------------------------------------- > > Key: HIVE-10673 > URL: https://issues.apache.org/jira/browse/HIVE-10673 > Project: Hive > Issue Type: Bug > Components: Query Planning, Query Processor > Reporter: Jason Dere > Assignee: Jason Dere > Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, > HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch > > > Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the > reducer are unsorted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)