[ 
https://issues.apache.org/jira/browse/PIG-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianda Ke updated PIG-4858:
---------------------------
    Attachment: PIG-4858_4.patch

Hi [~kellyzly],  please help review this patch when your are free.

1.      based on PIG-5044. rewrite SparkCompiler.getSkewedJoinJob(), 
broadcasting the sampling index.  Some code is duplicated with 
SparkComiler.getSamplingJob(), because SparkComiler.getSamplingJob() is too 
big. it needs to be refactored. I will file a new jira for this.

2. merge the fix from PIG-3417

3. currently, skewed join does not support outer join.  I will file a new jira 
for this. 


> Implement Skewed join for spark engine
> --------------------------------------
>
>                 Key: PIG-4858
>                 URL: https://issues.apache.org/jira/browse/PIG-4858
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: Xianda Ke
>             Fix For: spark-branch
>
>         Attachments: PIG-4858_2.patch, PIG-4858_3.patch, PIG-4858_4.patch, 
> PIG-4858.patch, SkewedJoinInSparkMode.pdf
>
>
> Now we use regular join to replace skewed join. Need implement it 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to