[ https://issues.apache.org/jira/browse/PIG-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532035#comment-14532035 ]
Mohit Sabharwal commented on PIG-4422: -------------------------------------- fyi [~kellyzly], [~praveenr019], [~xuefuz] Attached patch implements merge join in Spark engine as regular join. There seem to be three flavors of Merge Join (aka Sort Merge Join) in Pig as described here: http://pig.apache.org/docs/r0.10.0/perf.html#merge-joins 1) Inner join with at most 2 tables. 2) Outer join (full, left, right) with at most 2 tables. Inner Join with 3+ tables. 3) Sparse Merge join This patch addresses 1) only. Both 2) and 3) require input loadfuncs to implement certain interfaces. And since Spark engine has not implemented merge join algorithm, it cannot take advantage to these interfaces. As such, this patch disables those tests for now. > Implement visitMergeJoin in SparkCompiler > ----------------------------------------- > > Key: PIG-4422 > URL: https://issues.apache.org/jira/browse/PIG-4422 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: liyunzhang_intel > Assignee: Mohit Sabharwal > Fix For: spark-branch > > Attachments: PIG-4422.patch > > > in PIG-4374_6.patch. SparkCompiler#visitMergeJoin is marked "TODO" -- This message was sent by Atlassian JIRA (v6.3.4#6332)