[ https://issues.apache.org/jira/browse/PIG-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031767#comment-13031767 ]
Thomas Kappler commented on PIG-1683: ------------------------------------- I found a strange problem that looks like a special case of this issue. Apologies if it isn't. I wanted to use REGEX_EXTRACT in a nested generate block where I clean up some strings. Pig accepts or rejects the block depending on the order of the "is null" condition. The simplest example I could come up with that shows the problem is this: {noformat} a = load '1.txt' using PigStorage(',') as (a0:chararray, a1:chararray); b = foreach a { b0 = TRIM(a0); b1 = REGEX_EXTRACT(b0, '^\\((.+)\\)$', 1); generate ((b1 is null) ? b0 : b1) as cleaned_name; -- FAILS -- generate ((b1 is not null) ? b1 : b0) as cleaned_name; -- SUCCEEDS -- generate ((b1 is null) ? b0 : b1); -- FAILS } store b into 'out'; {noformat} 1.txt is {noformat} foo1,bar1 (foo2),bar2 {noformat} The "b is null" variant fails with the original error message of this issue: "Attempt to give operator of type org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject multiple outputs. This operator does not support multiple outputs." The inverted, logically equivalent "b is not null" variant succeeds. If I replace the REGEX_EXTRACT call with a simple expression like "b1 = a0", it works. But the way I read the Pig Latin reference, it should be allowed at this point since it's not a relational operator? > New logical plan: Nested foreach plan fail if one inner alias is refered more > than once > --------------------------------------------------------------------------------------- > > Key: PIG-1683 > URL: https://issues.apache.org/jira/browse/PIG-1683 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.8.0 > Reporter: Daniel Dai > Assignee: Daniel Dai > Fix For: 0.8.0 > > Attachments: PIG-1683-1.patch > > > The following script fail: > {code} > a = load '1.txt' as (a0, a1, a2); > b = load '2.txt' as (b0, b1); > c = join a by a0, b by b0; > d = foreach c { > d0 = a::a0; > d1 = a::a1; > generate ((d0 is not null)? d0 : d1); > } > explain d; > {code} > Stack: > ERROR 2015: Invalid physical operators in the physical plan > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias d > at org.apache.pig.PigServer.explain(PigServer.java:957) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:353) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:285) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:248) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.Explain(PigScriptParser.java:605) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:327) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90) > at org.apache.pig.Main.run(Main.java:498) > at org.apache.pig.Main.main(Main.java:107) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2042: > Error in new logical plan. Try -Dpig.usenewlogicalplan=false. > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:308) > at org.apache.pig.PigServer.compilePp(PigServer.java:1350) > at org.apache.pig.PigServer.explain(PigServer.java:926) > ... 10 more > Caused by: > org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException: > ERROR 2015: Invalid physical operators in the physical plan > at > org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:474) > at > org.apache.pig.newplan.logical.expression.BinCondExpression.accept(BinCondExpression.java:82) > at > org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70) > at > org.apache.pig.newplan.logical.relational.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:519) > at > org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:71) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:295) > ... 12 more > Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to give > operator of type > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject > multiple outputs. This operator does not support multiple outputs. > at > org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:180) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.connect(PhysicalPlan.java:133) > at > org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:470) > ... 19 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira