[ 
https://issues.apache.org/jira/browse/PIG-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953426#comment-15953426
 ] 

Adam Szita commented on PIG-5207:
---------------------------------

[~rohini]: in our case the order of the edges in mToEdges (from one node to 
others) matters since the other nodes are POProject operators which all have a 
certain col set in them. That is mapped to the order that the user specifies 
upon using COR function like {{COR(var0col, var1col, var2col)}}. The order is 
held by the implementation which is basically an 
{{ArrayList<PhysicalOperator>}}.

You can test this with the following use case which is similar to what we have 
in this E2E test:
{code}
PhysicalPlan plan = new PhysicalPlan();
    //Creating ops
    PhysicalOperator proj0 = new POProject(new OperatorKey("scope",0),1,1);
    PhysicalOperator proj1 = new POProject(new OperatorKey("scope",1),1,0);

    PhysicalOperator proj2 = new POProject(new OperatorKey("scope",2),1,1);
    PhysicalOperator proj3 = new POProject(new OperatorKey("scope",3),1,1);

    PhysicalOperator proj4 = new POProject(new OperatorKey("scope",4),1,1);
    PhysicalOperator proj5 = new POProject(new OperatorKey("scope",5),1,2);

    POUserFunc udfOp = new POUserFunc(new OperatorKey("scope",6), 1, 
Lists.newArrayList(proj1,proj3,proj5), new
            FuncSpec(COR.class.getCanonicalName()));

    //Adding and connecting ops
    plan.add(proj0);
    plan.add(proj1);
    plan.connect(proj0, proj1);

    plan.add(proj2);
    plan.add(proj3);
    plan.connect(proj2, proj3);

    plan.add(proj4);
    plan.add(proj5);
    plan.connect(proj4, proj5);

    plan.add(udfOp);
    plan.connect(proj1, udfOp);
    plan.connect(proj3, udfOp);
    plan.connect(proj5, udfOp);

    PhysicalPlan clonedPlan = plan.clone();

    //mToEdges is protected...
    Field f = OperatorPlan.class.getDeclaredField("mToEdges");
    f.setAccessible(true);
    MultiMap originalToEdgesMap = (MultiMap)f.get(plan);
    MultiMap clonedToEdgesMap = (MultiMap)f.get(clonedPlan);

    System.out.println("Original column order");
    for (Object op : originalToEdgesMap.keySet()){
      if (op instanceof POUserFunc) {
        for (Object entry : (List)(originalToEdgesMap.get(op))){
          System.out.println(((POProject)entry).getColumn());
        }
      }
    }
    System.out.println("Cloned column order");
    for (Object op : clonedToEdgesMap.keySet()){
      if (op instanceof POUserFunc) {
        for (Object entry : (List)(clonedToEdgesMap.get(op))){
          System.out.println(((POProject)entry).getColumn());
        }
      }
    }
{code}

This gives me:
{code}
Original column order
0
1
2
Cloned column order
2
0
1
{code}
The plan constructed in the example is:
{code}
POUserFunc(org.apache.pig.builtin.COR)[tuple] - scope-6
|
|---Project[tuple][0] - scope-1
|   |
|   |---Project[tuple][1] - scope-0
|
|---Project[tuple][1] - scope-3
|   |
|   |---Project[tuple][1] - scope-2
|
|---Project[tuple][2] - scope-5
    |
    |---Project[tuple][1] - scope-4
{code}

> BugFix e2e tests fail on spark
> ------------------------------
>
>                 Key: PIG-5207
>                 URL: https://issues.apache.org/jira/browse/PIG-5207
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Adam Szita
>            Assignee: Adam Szita
>             Fix For: spark-branch
>
>         Attachments: PIG-5207.0.patch
>
>
> Observed ClassCastException in BugFix 1 and 2 test cases. The exception is 
> thrown from and UDF: COR.Final



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to