Cheng Hao created SPARK-11512: --------------------------------- Summary: Bucket Join Key: SPARK-11512 URL: https://issues.apache.org/jira/browse/SPARK-11512 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Cheng Hao
Sort merge join on two datasets on the file system that have already been partitioned the same with the same number of partitions and sorted within each partition, and we don't need to sort it again while join with the sorted/partitioned keys This functionality exists in - Hive (hive.optimize.bucketmapjoin.sortedmerge) - Pig (USING 'merge') - MapReduce (CompositeInputFormat) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org