[ https://issues.apache.org/jira/browse/SPARK-11512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990867#comment-14990867 ]
Cheng Hao commented on SPARK-11512: ----------------------------------- Oh, yes, but SPARK-5292 is only about to support the Hive bucket, but in a generic way, we need to add support the bucket for Data Source API. Anyway, I will add a link with that jira issue. > Bucket Join > ----------- > > Key: SPARK-11512 > URL: https://issues.apache.org/jira/browse/SPARK-11512 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Cheng Hao > > Sort merge join on two datasets on the file system that have already been > partitioned the same with the same number of partitions and sorted within > each partition, and we don't need to sort it again while join with the > sorted/partitioned keys > This functionality exists in > - Hive (hive.optimize.bucketmapjoin.sortedmerge) > - Pig (USING 'merge') > - MapReduce (CompositeInputFormat) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org