[jira] [Commented] (PIG-4810) Implement Merge join for spark engine

Xianda Ke (JIRA) Thu, 24 Mar 2016 04:04:53 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210078#comment-15210078
 ]


Xianda Ke commented on PIG-4810:
--------------------------------

1. SparkCompiler:
We rebuild the right SparkOperator of MergeJoin. Replace the loader with 
DefaultIndexableLoader, and use MergeJoinIndexer to create the index file. Then 
we connect indexing job to actual join job, and ensure indexing job runs before 
the actual job.

2. MergeJoinConverter:
notes:
(1) Set the endOfAllInput flag after the last record in the left is buffered.
(2) When the last record in the left is buffered, we will  run getNextResult 
for one more time to do the work of joining last left record with right side.

PIG-4810-2.patch is attached.  

> Implement Merge join for spark engine
> -------------------------------------
>
>                 Key: PIG-4810
>                 URL: https://issues.apache.org/jira/browse/PIG-4810
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: Xianda Ke
>             Fix For: spark-branch
>
>         Attachments: PIG-4810.patch
>
>
> In current code base(a9151ac), we use regular join to implement merge join in 
> spark mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4810) Implement Merge join for spark engine

Reply via email to