[ https://issues.apache.org/jira/browse/PIG-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xianda Ke updated PIG-4858: --------------------------- Attachment: PIG-4858_4.patch Hi [~kellyzly], please help review this patch when your are free. 1. based on PIG-5044. rewrite SparkCompiler.getSkewedJoinJob(), broadcasting the sampling index. Some code is duplicated with SparkComiler.getSamplingJob(), because SparkComiler.getSamplingJob() is too big. it needs to be refactored. I will file a new jira for this. 2. merge the fix from PIG-3417 3. currently, skewed join does not support outer join. I will file a new jira for this. > Implement Skewed join for spark engine > -------------------------------------- > > Key: PIG-4858 > URL: https://issues.apache.org/jira/browse/PIG-4858 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: liyunzhang_intel > Assignee: Xianda Ke > Fix For: spark-branch > > Attachments: PIG-4858_2.patch, PIG-4858_3.patch, PIG-4858_4.patch, > PIG-4858.patch, SkewedJoinInSparkMode.pdf > > > Now we use regular join to replace skewed join. Need implement it -- This message was sent by Atlassian JIRA (v6.3.15#6346)