[ 
https://issues.apache.org/jira/browse/PIG-4644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654139#comment-14654139
 ] 

Anthony Hsu commented on PIG-4644:
----------------------------------

We're using a custom loader. The (simplified) user script looks something like 
this
{code}
a = LOAD 'data' USING CustomLoader();

a = foreach a {
  b = foreach foo generate c.d#'e';
  generate b;
};

b = filter a by foo is not null;
c = filter a by foo is null;

d = UNION b,c;
dump d;
{code}

The physical plan looks like this:
{code}
#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
d: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-22
|
|---d: Union[bag] - scope-21
    |
    |---b: Filter[bag] - scope-12
    |   |   |
    |   |   Not[boolean] - scope-15
    |   |   |
    |   |   |---POIsNull[boolean] - scope-14
    |   |       |
    |   |       |---Project[bag][0] - scope-13
    |   |
    |   |---a: Filter[bag] - scope-10
    |       |   |
    |       |   Constant(true) - scope-11
    |       |
    |       |---a: Split - scope-9
    |           |
    |           |---a: New For Each(false)[bag] - scope-8
    |               |   |
    |               |   RelationToExpressionProject[bag][*] - scope-1
    |               |   |
    |               |   |---foo: New For Each(false)[bag] - scope-7
    |               |       |   |
    |               |       |   POMapLookUp[chararray] - scope-5
    |               |       |   |
    |               |       |   |---Project[map][0] - scope-4
    |               |       |       |
    |               |       |       |---Project[tuple][4] - scope-3
    |               |       |
    |               |       |---Project[bag][0] - scope-2
    |               |
    |               |---a: Load(data:CustomLoader) - scope-0
    |
    |---c: Filter[bag] - scope-18
        |   |
        |   POIsNull[boolean] - scope-20
        |   |
        |   |---Project[bag][0] - scope-19
        |
        |---a: Filter[bag] - scope-16
            |   |
            |   Constant(true) - scope-17
            |
            |---a: Split - scope-9
                |
                |---a: New For Each(false)[bag] - scope-8
                    |   |
                    |   RelationToExpressionProject[bag][*] - scope-1
                    |   |
                    |   |---foo: New For Each(false)[bag] - scope-7
                    |       |   |
                    |       |   POMapLookUp[chararray] - scope-5
                    |       |   |
                    |       |   |---Project[map][0] - scope-4
                    |       |       |
                    |       |       |---Project[tuple][4] - scope-3
                    |       |
                    |       |---Project[bag][0] - scope-2
                    |
                    |---a: Load(data:CustomLoader) - scope-0
{code}
and the map reduce plan looks like:
{code}
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-29
Map Plan
d: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-22
|
|---d: Union[bag] - scope-21
    |
    |---b: Filter[bag] - scope-12
    |   |   |
    |   |   Not[boolean] - scope-15
    |   |   |
    |   |   |---POIsNull[boolean] - scope-14
    |   |       |
    |   |       |---Project[bag][0] - scope-13
    |   |
    |   |---a: New For Each(false)[bag] - scope-44
    |       |   |
    |       |   RelationToExpressionProject[bag][*] - scope-42
    |       |   |
    |       |   |---foo: New For Each(false)[bag] - scope-41
    |       |       |   |
    |       |       |   POMapLookUp[chararray] - scope-38
    |       |       |   |
    |       |       |   |---Project[map][0] - scope-40
    |       |       |       |
    |       |       |       |---Project[tuple][4] - scope-39
    |       |       |
    |       |       |---Project[bag][0] - scope-43
    |       |
    |       |---a: Load(data:CustomLoader) - scope-45
    |
    |---c: Filter[bag] - scope-18
        |   |
        |   POIsNull[boolean] - scope-20
        |   |
        |   |---Project[bag][0] - scope-19
        |
        |---a: New For Each(false)[bag] - scope-36
            |   |
            |   RelationToExpressionProject[bag][*] - scope-34
            |   |
            |   |---foo: New For Each(false)[bag] - scope-33
            |       |   |
            |       |   POMapLookUp[chararray] - scope-30
            |       |   |
            |       |   |---Project[map][0] - scope-32
            |       |       |
            |       |       |---Project[tuple][4] - scope-31
            |       |
            |       |---Project[bag][0] - scope-35
            |
            |---a: Load(data:CustomLoader) - scope-37--------
Global sort: false
----------------
{code}

I haven't been able to reproduce this issue using PigStorage and some sample 
data. When I try, though the physical plan looks the same, the MR plan ends up 
having two MR jobs instead of one and the issue doesn't surface.

> POProject's implementation of clone seems broken
> ------------------------------------------------
>
>                 Key: PIG-4644
>                 URL: https://issues.apache.org/jira/browse/PIG-4644
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ratandeep Ratti
>
> We are receiving the following exception when using Pig
> {noformat}
> Caused by: java.lang.ClassCastException: 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject
>  cannot be cast to org.apache.pig.backend.hadoop.executionen\
> gine.physicalLayer.expressionOperators.PORelationToExprProject
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.clone(PORelationToExprProject.java:144)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.clone(PORelationToExprProject.java:50)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone(PhysicalPlan.java:227)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.clone(POForEach.java:639)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.clone(POForEach.java:53)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone(PhysicalPlan.java:227)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.mergeDiamondMROper(MultiQueryOptimizer.java:298)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.visitMROp(MultiQueryOptimizer.java:219)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:273)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:46)
>         at 
> org.apache.pig.impl.plan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:71)
>         at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.visit(MultiQueryOptimizer.java:94)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:629)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:148)
>         at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
> {noformat}
> On further investigation it seems that POProject's clone method is 
> implemented as 
> {noformat}
>     @Override
>     public POProject clone() throws CloneNotSupportedException {
>         ArrayList<Integer> cols = new ArrayList<Integer>(columns.size());
>         // Can resuse the same Integer objects, as they are immutable
>         for (Integer i : columns) {
>             cols.add(i);
>         }
>         POProject clone = new POProject(new OperatorKey(mKey.scope,
>             NodeIdGenerator.getGenerator().getNextNodeId(mKey.scope)),
>             requestedParallelism, cols);
>         clone.cloneHelper(this);
>         clone.overloaded = overloaded;
>         clone.startCol = startCol;
>         clone.isProjectToEnd = isProjectToEnd;
>         clone.resultType = resultType;
>         return clone;
>     }
> {noformat}
> It uses a constructor to clone POProject (which break the weak rule of object 
> cloning)
> In the subclass , PORelationToExprProject implements cloneable as
> {noformat}
> @Override
>     public PORelationToExprProject clone() throws CloneNotSupportedException {
>         return (PORelationToExprProject) super.clone();
>     }
> {noformat}
> As seen from the POProject's implementation of cloneable, super.clone will 
> never be of type PORelationToExprProject,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to