[ https://issues.apache.org/jira/browse/PIG-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210078#comment-15210078 ]
Xianda Ke commented on PIG-4810: -------------------------------- 1. SparkCompiler: We rebuild the right SparkOperator of MergeJoin. Replace the loader with DefaultIndexableLoader, and use MergeJoinIndexer to create the index file. Then we connect indexing job to actual join job, and ensure indexing job runs before the actual job. 2. MergeJoinConverter: notes: (1) Set the endOfAllInput flag after the last record in the left is buffered. (2) When the last record in the left is buffered, we will run getNextResult for one more time to do the work of joining last left record with right side. PIG-4810-2.patch is attached. > Implement Merge join for spark engine > ------------------------------------- > > Key: PIG-4810 > URL: https://issues.apache.org/jira/browse/PIG-4810 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: liyunzhang_intel > Assignee: Xianda Ke > Fix For: spark-branch > > Attachments: PIG-4810.patch > > > In current code base(a9151ac), we use regular join to implement merge join in > spark mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)