Cheng Hao created SPARK-11512:
---------------------------------

             Summary: Bucket Join
                 Key: SPARK-11512
                 URL: https://issues.apache.org/jira/browse/SPARK-11512
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
            Reporter: Cheng Hao


Sort merge join on two datasets on the file system that have already been 
partitioned the same with the same number of partitions and sorted within each 
partition, and we don't need to sort it again while join with the 
sorted/partitioned keys

This functionality exists in
- Hive (hive.optimize.bucketmapjoin.sortedmerge)
- Pig (USING 'merge')
- MapReduce (CompositeInputFormat)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to