[jira] [Commented] (PIG-4810) Implement Merge join for spark engine

2016-06-17 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335526#comment-15335526
 ] 

liyunzhang_intel commented on PIG-4810:
---

[~kexianda]:LGTM  +1 
[~xuefuz]: Please merge PIG-4810-7.patch to the branch, thanks.

> Implement Merge join for spark engine
> -
>
> Key: PIG-4810
> URL: https://issues.apache.org/jira/browse/PIG-4810
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4810-2.patch, PIG-4810-3.patch, PIG-4810-4.patch, 
> PIG-4810-5.patch, PIG-4810-6.patch, PIG-4810-7.patch, PIG-4810.patch
>
>
> In current code base(a9151ac), we use regular join to implement merge join in 
> spark mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4810) Implement Merge join for spark engine

2016-06-13 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328915#comment-15328915
 ] 

liyunzhang_intel commented on PIG-4810:
---

[~kexianda]: Ok, after PIG-4856 is resolved, PIG-4870 can be also fixed soon.

> Implement Merge join for spark engine
> -
>
> Key: PIG-4810
> URL: https://issues.apache.org/jira/browse/PIG-4810
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4810-2.patch, PIG-4810-3.patch, PIG-4810-4.patch, 
> PIG-4810-5.patch, PIG-4810.patch
>
>
> In current code base(a9151ac), we use regular join to implement merge join in 
> spark mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4810) Implement Merge join for spark engine

2016-06-13 Thread Xianda Ke (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328798#comment-15328798
 ] 

Xianda Ke commented on PIG-4810:


Hi [~kellyzly], Thanks for your comments. 
1. setReplication() make sense. Thanks.
2. MergeJoin require sorted data as input. MergeJoin optimization will fail UT. 
That why ORDER query is added.
3. I will fix indent issue.

I will update the patch soon.

> Implement Merge join for spark engine
> -
>
> Key: PIG-4810
> URL: https://issues.apache.org/jira/browse/PIG-4810
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4810-2.patch, PIG-4810-3.patch, PIG-4810-4.patch, 
> PIG-4810-5.patch, PIG-4810.patch
>
>
> In current code base(a9151ac), we use regular join to implement merge join in 
> spark mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4810) Implement Merge join for spark engine

2016-03-24 Thread Xianda Ke (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210078#comment-15210078
 ] 

Xianda Ke commented on PIG-4810:


1. SparkCompiler:
We rebuild the right SparkOperator of MergeJoin. Replace the loader with 
DefaultIndexableLoader, and use MergeJoinIndexer to create the index file. Then 
we connect indexing job to actual join job, and ensure indexing job runs before 
the actual job.

2. MergeJoinConverter:
notes:
(1) Set the endOfAllInput flag after the last record in the left is buffered.
(2) When the last record in the left is buffered, we will  run getNextResult 
for one more time to do the work of joining last left record with right side.

PIG-4810-2.patch is attached.  

> Implement Merge join for spark engine
> -
>
> Key: PIG-4810
> URL: https://issues.apache.org/jira/browse/PIG-4810
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4810.patch
>
>
> In current code base(a9151ac), we use regular join to implement merge join in 
> spark mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)