[ https://issues.apache.org/jira/browse/PIG-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803582#comment-15803582 ]
liyunzhang_intel edited comment on PIG-4858 at 1/6/17 5:07 AM: --------------------------------------------------------------- [~nkollar]: i guess what you mean is following where I marked "Here apply the patch from PIG-3417” ? I have updated patch in PIG-5044 and you can view the whole code in the review board of that patch. SparkCompiler#getSamplingJob {code} private SparkOperator getSamplingJob(POSort sort, SparkOperator sampleOperator, List<PhysicalPlan> transformPlans, int rp, String udfClassName, String[] udfArgs) throws PlanException, VisitorException, ExecException { addSampleOperatorForSkewedJoin(sampleOperator); List<Boolean> flat1 = new ArrayList<Boolean>(); List<PhysicalPlan> eps1 = new ArrayList<PhysicalPlan>(); // if transform plans are not specified, project the columns of sorting keys if (transformPlans == null) { ...... } else { for (int i = 0; i < transformPlans.size(); i++) { eps1.add(transformPlans.get(i)); flat1.add(i == transformPlans.size() - 1 ? true : false); #Here apply the patch from PIG-3417 } } {code} was (Author: kellyzly): [~nkollar]: i guess what you mean is following where I marked "Here apply the patch from PIG-3417” ? SparkCompiler#getSamplingJob {code} private SparkOperator getSamplingJob(POSort sort, SparkOperator sampleOperator, List<PhysicalPlan> transformPlans, int rp, String udfClassName, String[] udfArgs) throws PlanException, VisitorException, ExecException { addSampleOperatorForSkewedJoin(sampleOperator); List<Boolean> flat1 = new ArrayList<Boolean>(); List<PhysicalPlan> eps1 = new ArrayList<PhysicalPlan>(); // if transform plans are not specified, project the columns of sorting keys if (transformPlans == null) { ...... } else { for (int i = 0; i < transformPlans.size(); i++) { eps1.add(transformPlans.get(i)); flat1.add(i == transformPlans.size() - 1 ? true : false); #Here apply the patch from PIG-3417 } } {code} > Implement Skewed join for spark engine > -------------------------------------- > > Key: PIG-4858 > URL: https://issues.apache.org/jira/browse/PIG-4858 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: liyunzhang_intel > Assignee: Xianda Ke > Fix For: spark-branch > > Attachments: PIG-4858.patch, PIG-4858_2.patch, PIG-4858_3.patch, > SkewedJoinInSparkMode.pdf > > > Now we use regular join to replace skewed join. Need implement it -- This message was sent by Atlassian JIRA (v6.3.4#6332)