[ https://issues.apache.org/jira/browse/PIG-4644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654139#comment-14654139 ]
Anthony Hsu commented on PIG-4644: ---------------------------------- We're using a custom loader. The (simplified) user script looks something like this {code} a = LOAD 'data' USING CustomLoader(); a = foreach a { b = foreach foo generate c.d#'e'; generate b; }; b = filter a by foo is not null; c = filter a by foo is null; d = UNION b,c; dump d; {code} The physical plan looks like this: {code} #----------------------------------------------- # Physical Plan: #----------------------------------------------- d: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-22 | |---d: Union[bag] - scope-21 | |---b: Filter[bag] - scope-12 | | | | | Not[boolean] - scope-15 | | | | | |---POIsNull[boolean] - scope-14 | | | | | |---Project[bag][0] - scope-13 | | | |---a: Filter[bag] - scope-10 | | | | | Constant(true) - scope-11 | | | |---a: Split - scope-9 | | | |---a: New For Each(false)[bag] - scope-8 | | | | | RelationToExpressionProject[bag][*] - scope-1 | | | | | |---foo: New For Each(false)[bag] - scope-7 | | | | | | | POMapLookUp[chararray] - scope-5 | | | | | | | |---Project[map][0] - scope-4 | | | | | | | |---Project[tuple][4] - scope-3 | | | | | |---Project[bag][0] - scope-2 | | | |---a: Load(data:CustomLoader) - scope-0 | |---c: Filter[bag] - scope-18 | | | POIsNull[boolean] - scope-20 | | | |---Project[bag][0] - scope-19 | |---a: Filter[bag] - scope-16 | | | Constant(true) - scope-17 | |---a: Split - scope-9 | |---a: New For Each(false)[bag] - scope-8 | | | RelationToExpressionProject[bag][*] - scope-1 | | | |---foo: New For Each(false)[bag] - scope-7 | | | | | POMapLookUp[chararray] - scope-5 | | | | | |---Project[map][0] - scope-4 | | | | | |---Project[tuple][4] - scope-3 | | | |---Project[bag][0] - scope-2 | |---a: Load(data:CustomLoader) - scope-0 {code} and the map reduce plan looks like: {code} #-------------------------------------------------- # Map Reduce Plan #-------------------------------------------------- MapReduce node scope-29 Map Plan d: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-22 | |---d: Union[bag] - scope-21 | |---b: Filter[bag] - scope-12 | | | | | Not[boolean] - scope-15 | | | | | |---POIsNull[boolean] - scope-14 | | | | | |---Project[bag][0] - scope-13 | | | |---a: New For Each(false)[bag] - scope-44 | | | | | RelationToExpressionProject[bag][*] - scope-42 | | | | | |---foo: New For Each(false)[bag] - scope-41 | | | | | | | POMapLookUp[chararray] - scope-38 | | | | | | | |---Project[map][0] - scope-40 | | | | | | | |---Project[tuple][4] - scope-39 | | | | | |---Project[bag][0] - scope-43 | | | |---a: Load(data:CustomLoader) - scope-45 | |---c: Filter[bag] - scope-18 | | | POIsNull[boolean] - scope-20 | | | |---Project[bag][0] - scope-19 | |---a: New For Each(false)[bag] - scope-36 | | | RelationToExpressionProject[bag][*] - scope-34 | | | |---foo: New For Each(false)[bag] - scope-33 | | | | | POMapLookUp[chararray] - scope-30 | | | | | |---Project[map][0] - scope-32 | | | | | |---Project[tuple][4] - scope-31 | | | |---Project[bag][0] - scope-35 | |---a: Load(data:CustomLoader) - scope-37-------- Global sort: false ---------------- {code} I haven't been able to reproduce this issue using PigStorage and some sample data. When I try, though the physical plan looks the same, the MR plan ends up having two MR jobs instead of one and the issue doesn't surface. > POProject's implementation of clone seems broken > ------------------------------------------------ > > Key: PIG-4644 > URL: https://issues.apache.org/jira/browse/PIG-4644 > Project: Pig > Issue Type: Bug > Reporter: Ratandeep Ratti > > We are receiving the following exception when using Pig > {noformat} > Caused by: java.lang.ClassCastException: > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject > cannot be cast to org.apache.pig.backend.hadoop.executionen\ > gine.physicalLayer.expressionOperators.PORelationToExprProject > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.clone(PORelationToExprProject.java:144) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.clone(PORelationToExprProject.java:50) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone(PhysicalPlan.java:227) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.clone(POForEach.java:639) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.clone(POForEach.java:53) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone(PhysicalPlan.java:227) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.mergeDiamondMROper(MultiQueryOptimizer.java:298) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.visitMROp(MultiQueryOptimizer.java:219) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:273) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:46) > at > org.apache.pig.impl.plan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:71) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.visit(MultiQueryOptimizer.java:94) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:629) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:148) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1264) > {noformat} > On further investigation it seems that POProject's clone method is > implemented as > {noformat} > @Override > public POProject clone() throws CloneNotSupportedException { > ArrayList<Integer> cols = new ArrayList<Integer>(columns.size()); > // Can resuse the same Integer objects, as they are immutable > for (Integer i : columns) { > cols.add(i); > } > POProject clone = new POProject(new OperatorKey(mKey.scope, > NodeIdGenerator.getGenerator().getNextNodeId(mKey.scope)), > requestedParallelism, cols); > clone.cloneHelper(this); > clone.overloaded = overloaded; > clone.startCol = startCol; > clone.isProjectToEnd = isProjectToEnd; > clone.resultType = resultType; > return clone; > } > {noformat} > It uses a constructor to clone POProject (which break the weak rule of object > cloning) > In the subclass , PORelationToExprProject implements cloneable as > {noformat} > @Override > public PORelationToExprProject clone() throws CloneNotSupportedException { > return (PORelationToExprProject) super.clone(); > } > {noformat} > As seen from the POProject's implementation of cloneable, super.clone will > never be of type PORelationToExprProject, -- This message was sent by Atlassian JIRA (v6.3.4#6332)