[ https://issues.apache.org/jira/browse/SPARK-11512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990868#comment-14990868 ]
Cheng Hao commented on SPARK-11512: ----------------------------------- We need to support the "bucket" for DataSource API. > Bucket Join > ----------- > > Key: SPARK-11512 > URL: https://issues.apache.org/jira/browse/SPARK-11512 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Cheng Hao > > Sort merge join on two datasets on the file system that have already been > partitioned the same with the same number of partitions and sorted within > each partition, and we don't need to sort it again while join with the > sorted/partitioned keys > This functionality exists in > - Hive (hive.optimize.bucketmapjoin.sortedmerge) > - Pig (USING 'merge') > - MapReduce (CompositeInputFormat) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org