[ 
https://issues.apache.org/jira/browse/PIG-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996669#comment-12996669
 ] 

Daniel Dai commented on PIG-1858:
---------------------------------

Actually they are the same query. Checked the old logical plan, nested sort is 
not even in the plan. The plan generated is completely wrong. We feed bag B 
directly to MyAnotherUDF without sort, projection.

The first question is whether MyAnotherUDF mean to take Pvs.vs as bag or tuple. 

If it takes a bag, move MyAnotherUDF to generate will work. The meaning for 
this query is sort B first, get a sorted bag, then feed to MyAnotherUDF.

If it takes a tuple, which means MyAnotherUDF take individual tuple of B, then 
it is similar to a nested foreach. We do not currently support it 
(Unfortunately old logical plan does not complain and give wrong result). In 
nested plan, we can only transform tuple coming from input bag using 
sort/filter/limit/distinct/simple projection.

In sum, no matter MyAnotherUDF takes tuple/bag, old plan generates wrong plan, 
new plan fail on frontend. If bag, the right syntax is move MyAnotherUDF into 
generate. If tuple, it is not currently supported. 

To fix it, currently we can provide meaningful message. In the future, we can 
support nested foreach to address this use case.

> NullPointerException while compiling the new logical plan
> ---------------------------------------------------------
>
>                 Key: PIG-1858
>                 URL: https://issues.apache.org/jira/browse/PIG-1858
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Daniel Dai
>             Fix For: 0.9.0
>
>         Attachments: MyAnotherUDF.java
>
>
> The below is my script :
> {code}
> register myanotherudf.jar;
> A = load 'myinput' using PigStorage() as ( 
> date:chararray,bcookie:chararray,count:int,avg:double,pvs:int);
> B = foreach A generate (int)(avg / 100.0) * 100   as avg, pvs;
> C = group B by ( avg );
> D = foreach C {
>         Pvs = order B by pvs;
>         Const = org.vivek.MyAnotherUDF(Pvs.pvs).(count,sum);
>         generate Const.sum as sum;
>         };
> store D into 'out_D';
> {code}
> The script is failing during compilation of the plan. The usage of the udf 
> inside the foreach is causing the problem. The udf implements algebraic and 
> the 
> output schema is also defined.
> The below is the exception that I get :
> ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false.
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2042: Error in new 
> logical plan. Try -Dpig.usenewlogicalplan=false.
>         at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:309)
>         at org.apache.pig.PigServer.compilePp(PigServer.java:1364)
>         at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1206)
>         at org.apache.pig.PigServer.execute(PigServer.java:1200)
>         at org.apache.pig.PigServer.access$100(PigServer.java:128)
>         at org.apache.pig.PigServer$Graph.execute(PigServer.java:1527)
>         at org.apache.pig.PigServer.executeBatchEx(PigServer.java:372)
>         at org.apache.pig.PigServer.executeBatch(PigServer.java:339)
>         at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:169)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
>         at org.apache.pig.Main.run(Main.java:500)
>         at org.apache.pig.Main.main(Main.java:107)
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
>         at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>         at 
> org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
>         at 
> org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:229)
>         at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>         at 
> org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:94)
>         at 
> org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:71)
>         at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>         at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>         at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:261)
>         ... 13 more
>  
> When i trun off new logical plan the script executes successfully. The issue 
> is observed in both 0.8 and 0.9

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to