[ https://issues.apache.org/jira/browse/PIG-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996669#comment-12996669 ]
Daniel Dai commented on PIG-1858: --------------------------------- Actually they are the same query. Checked the old logical plan, nested sort is not even in the plan. The plan generated is completely wrong. We feed bag B directly to MyAnotherUDF without sort, projection. The first question is whether MyAnotherUDF mean to take Pvs.vs as bag or tuple. If it takes a bag, move MyAnotherUDF to generate will work. The meaning for this query is sort B first, get a sorted bag, then feed to MyAnotherUDF. If it takes a tuple, which means MyAnotherUDF take individual tuple of B, then it is similar to a nested foreach. We do not currently support it (Unfortunately old logical plan does not complain and give wrong result). In nested plan, we can only transform tuple coming from input bag using sort/filter/limit/distinct/simple projection. In sum, no matter MyAnotherUDF takes tuple/bag, old plan generates wrong plan, new plan fail on frontend. If bag, the right syntax is move MyAnotherUDF into generate. If tuple, it is not currently supported. To fix it, currently we can provide meaningful message. In the future, we can support nested foreach to address this use case. > NullPointerException while compiling the new logical plan > --------------------------------------------------------- > > Key: PIG-1858 > URL: https://issues.apache.org/jira/browse/PIG-1858 > Project: Pig > Issue Type: Bug > Affects Versions: 0.8.0, 0.9.0 > Reporter: Vivek Padmanabhan > Assignee: Daniel Dai > Fix For: 0.9.0 > > Attachments: MyAnotherUDF.java > > > The below is my script : > {code} > register myanotherudf.jar; > A = load 'myinput' using PigStorage() as ( > date:chararray,bcookie:chararray,count:int,avg:double,pvs:int); > B = foreach A generate (int)(avg / 100.0) * 100 as avg, pvs; > C = group B by ( avg ); > D = foreach C { > Pvs = order B by pvs; > Const = org.vivek.MyAnotherUDF(Pvs.pvs).(count,sum); > generate Const.sum as sum; > }; > store D into 'out_D'; > {code} > The script is failing during compilation of the plan. The usage of the udf > inside the foreach is causing the problem. The udf implements algebraic and > the > output schema is also defined. > The below is the exception that I get : > ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false. > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2042: Error in new > logical plan. Try -Dpig.usenewlogicalplan=false. > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:309) > at org.apache.pig.PigServer.compilePp(PigServer.java:1364) > at > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1206) > at org.apache.pig.PigServer.execute(PigServer.java:1200) > at org.apache.pig.PigServer.access$100(PigServer.java:128) > at org.apache.pig.PigServer$Graph.execute(PigServer.java:1527) > at org.apache.pig.PigServer.executeBatchEx(PigServer.java:372) > at org.apache.pig.PigServer.executeBatch(PigServer.java:339) > at > org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:169) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90) > at org.apache.pig.Main.run(Main.java:500) > at org.apache.pig.Main.main(Main.java:107) > Caused by: java.lang.NullPointerException > at > org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) > at > org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:229) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:94) > at > org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:71) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:261) > ... 13 more > > When i trun off new logical plan the script executes successfully. The issue > is observed in both 0.8 and 0.9 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira