Xuefu Zhang created HIVE-8202:
---------------------------------
Summary: Support SMB Join for Hive on Spark [Spark Branch]
Key: HIVE-8202
URL: https://issues.apache.org/jira/browse/HIVE-8202
Project: Hive
Issue Type: Sub-task
Components: Spark
Reporter: Xuefu Zhang
SMB joins are used wherever the tables are sorted and bucketed. It's a
reduce-side join. The join boils down to just merging the already sorted
tables, allowing this operation to be faster than an ordinary map-join.
However, if the tables are partitioned, there could be a slow down as each
mapper would need to get a very small chunk of a partition which has a single
key. Thus, in some scenarios it's beneficial to convert SMB join to SMB map
join as well.
The task is to research and support the conversion from regular SMB join to SMB
map join for Spark execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)