[jira] [Created] (PIG-2062) Script silently ended
Script silently ended - Key: PIG-2062 URL: https://issues.apache.org/jira/browse/PIG-2062 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Daniel Dai Fix For: 0.9.0 The following script ended silently without execution. {code} a = load '1.txt' as (a0, a1); b = load '2.txt' as (b0, b1); all = join a by a0, b by b0; store all into ''; {code} If change the alias "all", it will run. We need to throw exception saying "all" is a keyword. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2056) Jython error messages should show script name
[ https://issues.apache.org/jira/browse/PIG-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032152#comment-13032152 ] Thejas M Nair commented on PIG-2056: +1 > Jython error messages should show script name > - > > Key: PIG-2056 > URL: https://issues.apache.org/jira/browse/PIG-2056 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Fix For: 0.9.0 > > Attachments: PIG-2056.patch > > > Instead of messages like > {code} > Traceback (most recent call last): > File "", line 12, in > {code} > It should display the script file name: > {code} > Traceback (most recent call last): > File "test.py", line 12, in > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2059) PIG doesn't validate incomplete query in batch mode even if -c option is given
[ https://issues.apache.org/jira/browse/PIG-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032087#comment-13032087 ] Daniel Dai commented on PIG-2059: - Might be better to change the name of the method. validateQuery->parseAndValidate, compile->validate. Other part looks good. > PIG doesn't validate incomplete query in batch mode even if -c option is given > -- > > Key: PIG-2059 > URL: https://issues.apache.org/jira/browse/PIG-2059 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Fix For: 0.9.0 > > Attachments: PIG-2059.patch > > > Given the following in a file to Pig, pig doesn't report any error, even if > -c option is given: > A = load 'x' as (u, v); > B = foreach A generate $3; > It's questionable whether to validate the query in batch mode as it doesn't > contain any store/dump statement. However, if -c option is given, validation > should be nevertheless performed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-2044) Patten match bug in org.apache.pig.newplan.optimizer.Rule
[ https://issues.apache.org/jira/browse/PIG-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-2044: --- Assignee: Koji Noguchi (was: Daniel Dai) > Patten match bug in org.apache.pig.newplan.optimizer.Rule > - > > Key: PIG-2044 > URL: https://issues.apache.org/jira/browse/PIG-2044 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Koji Noguchi > Fix For: 0.10 > > > Koji find that we have a bug org.apache.pig.newplan.optimizer.Rule. The > "break" in line 179 seems to be wrong. This multiple branch matching is not > used in Pig, but could be a problem for the future. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2061) NewPlan match() is sensitive to ordering
NewPlan match() is sensitive to ordering Key: PIG-2061 URL: https://issues.apache.org/jira/browse/PIG-2061 Project: Pig Issue Type: Bug Reporter: Koji Noguchi Priority: Minor There is no current Rule that is affected by this but inside TestNewPlanRule.java {noformat} 155 public void testMultiNode() throws Exception { ... 175 pattern.connect(op1, op3); 176 pattern.connect(op2, op3); ... 178 Rule r = new SillyRule("basic", pattern); 179 List l = r.match(plan); 180 assertEquals(1, l.size()); {noformat} but this test fail when we swap line 175 and 176 even though they are structurally equivalent. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Graham updated PIG-1825: - Attachment: PIG-1825_3.patch Here's patch #3 with {{testStoreToHBase_2_no_WAL}} removed. I agree we should remove it if HBase doesn't even deal with it in the unit test mode. I think using the {{-noWAL}} option makes the most sense, since it's very clear what it does. I've added comments in the Javadocs to make sure the risks are clear. If someone uses an obscurely named flag (i.e., -noWAL) without understanding what it does by reading either the Pig javadocs or the HBase documentation, then they're really flying blind. > ability to turn off the write ahead log for pig's HBaseStorage > -- > > Key: PIG-1825 > URL: https://issues.apache.org/jira/browse/PIG-1825 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Corbin Hoenes >Assignee: Bill Graham >Priority: Minor > Attachments: HBaseStorage_noWAL.patch, PIG-1825_1.patch, > PIG-1825_2.patch, PIG-1825_3.patch > > > Added an option to allow a caller of HBaseStorage to turn off the > WriteAheadLog feature while doing bulk loads into hbase. > From the performance tuning wikipage: > http://wiki.apache.org/hadoop/PerformanceTuning > "To speed up the inserts in a non critical job (like an import job), you can > use Put.writeToWAL(false) to bypass writing to the write ahead log." > We've tested this on HBase 0.20.6 and it helps dramatically. > The -noWAL options is passed in just like other options for hbase storage: > STORE myalias INTO 'MyTable' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 > mycolumnfamily:field2','-noWAL'); > This would be my first patch so please educate me with any steps I need to > do. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2056) Jython error messages should show script name
[ https://issues.apache.org/jira/browse/PIG-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031911#comment-13031911 ] Richard Ding commented on PIG-2056: --- Result of test-patch: {code} [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. {code} > Jython error messages should show script name > - > > Key: PIG-2056 > URL: https://issues.apache.org/jira/browse/PIG-2056 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Fix For: 0.9.0 > > Attachments: PIG-2056.patch > > > Instead of messages like > {code} > Traceback (most recent call last): > File "", line 12, in > {code} > It should display the script file name: > {code} > Traceback (most recent call last): > File "test.py", line 12, in > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2059) PIG doesn't validate incomplete query in batch mode even if -c option is given
[ https://issues.apache.org/jira/browse/PIG-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031905#comment-13031905 ] Xuefu Zhang commented on PIG-2059: -- Test-patch run: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. > PIG doesn't validate incomplete query in batch mode even if -c option is given > -- > > Key: PIG-2059 > URL: https://issues.apache.org/jira/browse/PIG-2059 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Fix For: 0.9.0 > > Attachments: PIG-2059.patch > > > Given the following in a file to Pig, pig doesn't report any error, even if > -c option is given: > A = load 'x' as (u, v); > B = foreach A generate $3; > It's questionable whether to validate the query in batch mode as it doesn't > contain any store/dump statement. However, if -c option is given, validation > should be nevertheless performed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2059) PIG doesn't validate incomplete query in batch mode even if -c option is given
[ https://issues.apache.org/jira/browse/PIG-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-2059: - Attachment: PIG-2059.patch > PIG doesn't validate incomplete query in batch mode even if -c option is given > -- > > Key: PIG-2059 > URL: https://issues.apache.org/jira/browse/PIG-2059 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Fix For: 0.9.0 > > Attachments: PIG-2059.patch > > > Given the following in a file to Pig, pig doesn't report any error, even if > -c option is given: > A = load 'x' as (u, v); > B = foreach A generate $3; > It's questionable whether to validate the query in batch mode as it doesn't > contain any store/dump statement. However, if -c option is given, validation > should be nevertheless performed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1683) New logical plan: Nested foreach plan fail if one inner alias is refered more than once
[ https://issues.apache.org/jira/browse/PIG-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031873#comment-13031873 ] Daniel Dai commented on PIG-1683: - Tried it with 0.8.1, even use old logical plan (-Dpig.usenewlogicalplan=true), the issue is the same. However, 0.9 fixed this issue. Seems to be a problem in the old parser and 0.9 new parser fix the issue. > New logical plan: Nested foreach plan fail if one inner alias is refered more > than once > --- > > Key: PIG-1683 > URL: https://issues.apache.org/jira/browse/PIG-1683 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.8.0 > > Attachments: PIG-1683-1.patch > > > The following script fail: > {code} > a = load '1.txt' as (a0, a1, a2); > b = load '2.txt' as (b0, b1); > c = join a by a0, b by b0; > d = foreach c { > d0 = a::a0; > d1 = a::a1; > generate ((d0 is not null)? d0 : d1); > } > explain d; > {code} > Stack: > ERROR 2015: Invalid physical operators in the physical plan > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias d > at org.apache.pig.PigServer.explain(PigServer.java:957) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:353) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:285) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:248) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.Explain(PigScriptParser.java:605) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:327) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90) > at org.apache.pig.Main.run(Main.java:498) > at org.apache.pig.Main.main(Main.java:107) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2042: > Error in new logical plan. Try -Dpig.usenewlogicalplan=false. > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:308) > at org.apache.pig.PigServer.compilePp(PigServer.java:1350) > at org.apache.pig.PigServer.explain(PigServer.java:926) > ... 10 more > Caused by: > org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException: > ERROR 2015: Invalid physical operators in the physical plan > at > org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:474) > at > org.apache.pig.newplan.logical.expression.BinCondExpression.accept(BinCondExpression.java:82) > at > org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70) > at > org.apache.pig.newplan.logical.relational.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:519) > at > org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:71) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:295) > ... 12 more > Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to give > operator of type > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject > multiple outputs. This operator does not support multiple outputs. > at > org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:180) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.connect(PhysicalPlan.java:133) > at > org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:470) > ... 19 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2058) Macro missing returns clause doesn't give a good error message
[ https://issues.apache.org/jira/browse/PIG-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2058. --- Resolution: Fixed Hadoop Flags: [Reviewed] Patch committed to trunk and 0.9 branch. > Macro missing returns clause doesn't give a good error message > -- > > Key: PIG-2058 > URL: https://issues.apache.org/jira/browse/PIG-2058 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Xuefu Zhang >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2058.patch > > > For the following query: > define test( out1,out2 ){ >A = load 'x' as (u:int, v:int); >$B = filter A by u < 3 and v < 20; > } > Pig gives the following error message: Syntax error,unexpected symbol at or > near '{' > Previously, it gives: mismatched input '{' expecting RETURNS > The previous message is more meaningful. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2058) Macro missing returns clause doesn't give a good error message
[ https://issues.apache.org/jira/browse/PIG-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031833#comment-13031833 ] Xuefu Zhang commented on PIG-2058: -- +1 > Macro missing returns clause doesn't give a good error message > -- > > Key: PIG-2058 > URL: https://issues.apache.org/jira/browse/PIG-2058 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Xuefu Zhang >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2058.patch > > > For the following query: > define test( out1,out2 ){ >A = load 'x' as (u:int, v:int); >$B = filter A by u < 3 and v < 20; > } > Pig gives the following error message: Syntax error,unexpected symbol at or > near '{' > Previously, it gives: mismatched input '{' expecting RETURNS > The previous message is more meaningful. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2014) SAMPLE shouldn't be pushed up
[ https://issues.apache.org/jira/browse/PIG-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-2014: --- Release Note: A new annotation, @Nondeterministic, is introduced to allow UDF authors to mark their UDFs as such. A non-deterministic UDF is one that can produce different results when invoked on the same input. Examples of non-deterministic behavior might be, for example, getCurrentTime() or RANDOM. Certain Pig optimizations depend on UDFs being deterministic. It is therefore very important for correctness that non-deterministic UDFs be annotated as such. Status: Patch Available (was: Open) > SAMPLE shouldn't be pushed up > - > > Key: PIG-2014 > URL: https://issues.apache.org/jira/browse/PIG-2014 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0, 0.10 >Reporter: Jacob Perkins >Assignee: Dmitriy V. Ryaboy > Fix For: 0.9.0 > > Attachments: PIG-2014.2.patch, PIG-2014.patch > > > Consider the following code: > {code:none} > tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, > weight:double); > grouped = GROUP tfidf_all BY doc_id; > vectors = FOREACH grouped GENERATE group AS doc_id, tfidf_all.(token, > weight) AS vector; > DUMP vectors; > {code} > This, of course, runs just fine. In a real example, tfidf_all contains > 1,428,280 records. The reduce output records should be exactly the number of > documents, which turn out to be 18,863 in this case. All well and good. > The strangeness comes when you add a SAMPLE command: > {code:none} > sampled = SAMPLE vectors 0.0012; > DUMP sampled; > {code} > Running this results in 1,513 reduce output records. The reduce output > records be much much closer to 22 or 23 records (eg. 0.0012*18863). > Evidently, Pig rewrites SAMPLE into filter, and then pushes that filter in > front of the group. It shouldn't push that filter > since the UDF is non-deterministic. > Quick fix: If you add "-t PushUpFilter" to your command line when invoking > pig this won't happen. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2014) SAMPLE shouldn't be pushed up
[ https://issues.apache.org/jira/browse/PIG-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-2014: --- Attachment: PIG-2014.2.patch This addresses PushUpFilter and FilterAboveForeach, and fixes the SAMPLE issue. I didn't tackle PushDownForeachFlatten -- there's a lot going on there and I'm not sure I understand it all. We should open a separate ticket for making sure that optimization does not break on nondeterministic operations. > SAMPLE shouldn't be pushed up > - > > Key: PIG-2014 > URL: https://issues.apache.org/jira/browse/PIG-2014 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0, 0.10 >Reporter: Jacob Perkins >Assignee: Dmitriy V. Ryaboy > Fix For: 0.9.0 > > Attachments: PIG-2014.2.patch, PIG-2014.patch > > > Consider the following code: > {code:none} > tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, > weight:double); > grouped = GROUP tfidf_all BY doc_id; > vectors = FOREACH grouped GENERATE group AS doc_id, tfidf_all.(token, > weight) AS vector; > DUMP vectors; > {code} > This, of course, runs just fine. In a real example, tfidf_all contains > 1,428,280 records. The reduce output records should be exactly the number of > documents, which turn out to be 18,863 in this case. All well and good. > The strangeness comes when you add a SAMPLE command: > {code:none} > sampled = SAMPLE vectors 0.0012; > DUMP sampled; > {code} > Running this results in 1,513 reduce output records. The reduce output > records be much much closer to 22 or 23 records (eg. 0.0012*18863). > Evidently, Pig rewrites SAMPLE into filter, and then pushes that filter in > front of the group. It shouldn't push that filter > since the UDF is non-deterministic. > Quick fix: If you add "-t PushUpFilter" to your command line when invoking > pig this won't happen. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2014) SAMPLE shouldn't be pushed up
[ https://issues.apache.org/jira/browse/PIG-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-2014: --- Status: Open (was: Patch Available) > SAMPLE shouldn't be pushed up > - > > Key: PIG-2014 > URL: https://issues.apache.org/jira/browse/PIG-2014 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0, 0.10 >Reporter: Jacob Perkins >Assignee: Dmitriy V. Ryaboy > Fix For: 0.9.0 > > Attachments: PIG-2014.patch > > > Consider the following code: > {code:none} > tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, > weight:double); > grouped = GROUP tfidf_all BY doc_id; > vectors = FOREACH grouped GENERATE group AS doc_id, tfidf_all.(token, > weight) AS vector; > DUMP vectors; > {code} > This, of course, runs just fine. In a real example, tfidf_all contains > 1,428,280 records. The reduce output records should be exactly the number of > documents, which turn out to be 18,863 in this case. All well and good. > The strangeness comes when you add a SAMPLE command: > {code:none} > sampled = SAMPLE vectors 0.0012; > DUMP sampled; > {code} > Running this results in 1,513 reduce output records. The reduce output > records be much much closer to 22 or 23 records (eg. 0.0012*18863). > Evidently, Pig rewrites SAMPLE into filter, and then pushes that filter in > front of the group. It shouldn't push that filter > since the UDF is non-deterministic. > Quick fix: If you add "-t PushUpFilter" to your command line when invoking > pig this won't happen. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
DOAP for Pig
Hi, I noticed that Pig is not among the Apache projects at http://projects.apache.org/indexes/alpha.html#P Thought you'd like to know... Craig Craig L Russell Secretary, Apache Software Foundation Chair, OpenJPA PMC c...@apache.org http://db.apache.org/jdo
[jira] [Commented] (PIG-1683) New logical plan: Nested foreach plan fail if one inner alias is refered more than once
[ https://issues.apache.org/jira/browse/PIG-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031769#comment-13031769 ] Thomas Kappler commented on PIG-1683: - Sorry, I forgot: this is with Pig 0.8.1 and the included Hadoop. > New logical plan: Nested foreach plan fail if one inner alias is refered more > than once > --- > > Key: PIG-1683 > URL: https://issues.apache.org/jira/browse/PIG-1683 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.8.0 > > Attachments: PIG-1683-1.patch > > > The following script fail: > {code} > a = load '1.txt' as (a0, a1, a2); > b = load '2.txt' as (b0, b1); > c = join a by a0, b by b0; > d = foreach c { > d0 = a::a0; > d1 = a::a1; > generate ((d0 is not null)? d0 : d1); > } > explain d; > {code} > Stack: > ERROR 2015: Invalid physical operators in the physical plan > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias d > at org.apache.pig.PigServer.explain(PigServer.java:957) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:353) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:285) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:248) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.Explain(PigScriptParser.java:605) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:327) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90) > at org.apache.pig.Main.run(Main.java:498) > at org.apache.pig.Main.main(Main.java:107) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2042: > Error in new logical plan. Try -Dpig.usenewlogicalplan=false. > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:308) > at org.apache.pig.PigServer.compilePp(PigServer.java:1350) > at org.apache.pig.PigServer.explain(PigServer.java:926) > ... 10 more > Caused by: > org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException: > ERROR 2015: Invalid physical operators in the physical plan > at > org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:474) > at > org.apache.pig.newplan.logical.expression.BinCondExpression.accept(BinCondExpression.java:82) > at > org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70) > at > org.apache.pig.newplan.logical.relational.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:519) > at > org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:71) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:295) > ... 12 more > Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to give > operator of type > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject > multiple outputs. This operator does not support multiple outputs. > at > org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:180) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.connect(PhysicalPlan.java:133) > at > org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:470) > ... 19 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1683) New logical plan: Nested foreach plan fail if one inner alias is refered more than once
[ https://issues.apache.org/jira/browse/PIG-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031767#comment-13031767 ] Thomas Kappler commented on PIG-1683: - I found a strange problem that looks like a special case of this issue. Apologies if it isn't. I wanted to use REGEX_EXTRACT in a nested generate block where I clean up some strings. Pig accepts or rejects the block depending on the order of the "is null" condition. The simplest example I could come up with that shows the problem is this: {noformat} a = load '1.txt' using PigStorage(',') as (a0:chararray, a1:chararray); b = foreach a { b0 = TRIM(a0); b1 = REGEX_EXTRACT(b0, '^\\((.+)\\)$', 1); generate ((b1 is null) ? b0 : b1) as cleaned_name; -- FAILS -- generate ((b1 is not null) ? b1 : b0) as cleaned_name; -- SUCCEEDS -- generate ((b1 is null) ? b0 : b1); -- FAILS } store b into 'out'; {noformat} 1.txt is {noformat} foo1,bar1 (foo2),bar2 {noformat} The "b is null" variant fails with the original error message of this issue: "Attempt to give operator of type org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject multiple outputs. This operator does not support multiple outputs." The inverted, logically equivalent "b is not null" variant succeeds. If I replace the REGEX_EXTRACT call with a simple expression like "b1 = a0", it works. But the way I read the Pig Latin reference, it should be allowed at this point since it's not a relational operator? > New logical plan: Nested foreach plan fail if one inner alias is refered more > than once > --- > > Key: PIG-1683 > URL: https://issues.apache.org/jira/browse/PIG-1683 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.8.0 > > Attachments: PIG-1683-1.patch > > > The following script fail: > {code} > a = load '1.txt' as (a0, a1, a2); > b = load '2.txt' as (b0, b1); > c = join a by a0, b by b0; > d = foreach c { > d0 = a::a0; > d1 = a::a1; > generate ((d0 is not null)? d0 : d1); > } > explain d; > {code} > Stack: > ERROR 2015: Invalid physical operators in the physical plan > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias d > at org.apache.pig.PigServer.explain(PigServer.java:957) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:353) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:285) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:248) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.Explain(PigScriptParser.java:605) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:327) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90) > at org.apache.pig.Main.run(Main.java:498) > at org.apache.pig.Main.main(Main.java:107) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2042: > Error in new logical plan. Try -Dpig.usenewlogicalplan=false. > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:308) > at org.apache.pig.PigServer.compilePp(PigServer.java:1350) > at org.apache.pig.PigServer.explain(PigServer.java:926) > ... 10 more > Caused by: > org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException: > ERROR 2015: Invalid physical operators in the physical plan > at > org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:474) > at > org.apache.pig.newplan.logical.expression.BinCondExpression.accept(BinCondExpression.java:82) > at > org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70) > at > org.apache.pig.newplan.logical.relational.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:519) > at > org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:71) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:295) > ... 12 more > Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0
[jira] [Commented] (PIG-2014) SAMPLE shouldn't be pushed up
[ https://issues.apache.org/jira/browse/PIG-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031730#comment-13031730 ] Dmitriy V. Ryaboy commented on PIG-2014: Daniel, So this is interesting. I took my fix out, left the test in, and the test still passed -- because, as you correctly pointed out, TestNewPlanFilterAboveForeach only invokes a few of the rules. If I add PushUpFilter to MyPlanOptimizer within that test, my new test starts failing if the fix is not present, and passes if the fix is present. So the PushUpFilter is definitely at least part of what's causing the movement of Filter in this case. So I need to fix up PushDownForEachFlatten and FilterAboveForeach, *and* I need to fix my test :). > SAMPLE shouldn't be pushed up > - > > Key: PIG-2014 > URL: https://issues.apache.org/jira/browse/PIG-2014 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0, 0.10 >Reporter: Jacob Perkins >Assignee: Dmitriy V. Ryaboy > Fix For: 0.9.0 > > Attachments: PIG-2014.patch > > > Consider the following code: > {code:none} > tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, > weight:double); > grouped = GROUP tfidf_all BY doc_id; > vectors = FOREACH grouped GENERATE group AS doc_id, tfidf_all.(token, > weight) AS vector; > DUMP vectors; > {code} > This, of course, runs just fine. In a real example, tfidf_all contains > 1,428,280 records. The reduce output records should be exactly the number of > documents, which turn out to be 18,863 in this case. All well and good. > The strangeness comes when you add a SAMPLE command: > {code:none} > sampled = SAMPLE vectors 0.0012; > DUMP sampled; > {code} > Running this results in 1,513 reduce output records. The reduce output > records be much much closer to 22 or 23 records (eg. 0.0012*18863). > Evidently, Pig rewrites SAMPLE into filter, and then pushes that filter in > front of the group. It shouldn't push that filter > since the UDF is non-deterministic. > Quick fix: If you add "-t PushUpFilter" to your command line when invoking > pig this won't happen. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2060) Fix errors in pig grammars reported by ANTLRWorks
Fix errors in pig grammars reported by ANTLRWorks - Key: PIG-2060 URL: https://issues.apache.org/jira/browse/PIG-2060 Project: Pig Issue Type: Bug Reporter: Gianmarco De Francisci Morales Assignee: Gianmarco De Francisci Morales Priority: Minor There are various errors in pig's grammar files highlighted by ANTLRWorks. In particular, on token MATCHES, ANY and EVAL. The first one should be removed, as there is already STR_OP_MATCHES, the second one is an imaginary tokens that should be defined in the appropriate section. On the third one I am not sure. I have been told it is from the old parsers but it is not used anywhere. Is it correct? Is it reserved for future uses? Has it anything to do with FUNC_EVAL? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira