Hello, I have tried below mentioned script when new optimizer is enabled. I am wondering, is this possible to have filter condition on any column of the "group" (a tuple) column generated by Group By clause with multiple key.
The following script does a GroupBy multiple columns: A = load '<any file>' USING PigStorage(',') as (a1:int,a2:int,a3:int); G1 = GROUP A by (a1,a2); D = Filter G1 by group.$0 > 1; explain D; The above fails with the following error when the new optimizer is enabled ( *it fails with the old framework too but only when it gets to the execution stage*): Caused by: java.lang.NullPointerException at org.apache.pig.experimental. logical.LogicalPlanMigrationVistor$LogicalExpPlanMigrationVistor.visit(LogicalPlanMigrationVistor.java:424) at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:404) at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:58) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69) at org.apache.pig.experimental.logical.LogicalPlanMigrationVistor.translateExpressionPlan(LogicalPlanMigrationVistor.java:155) at org.apache.pig.experimental.logical.LogicalPlanMigrationVistor.visit(LogicalPlanMigrationVistor.java:295) at org.apache.pig.impl.logicalLayer.LOFilter.visit(LOFilter.java:116) at org.apache.pig.impl.logicalLayer.LOFilter.visit(LOFilter.java:41) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:237) I really appreciate if someone put some light on this. Thanks, Swati