[jira] [Updated] (PIG-2058) Macro missing returns clause doesn't give a good error message
[ https://issues.apache.org/jira/browse/PIG-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2058: -- Attachment: PIG-2058.patch Thanks Xuefu. Attaching a patch with the fix. > Macro missing returns clause doesn't give a good error message > -- > > Key: PIG-2058 > URL: https://issues.apache.org/jira/browse/PIG-2058 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Xuefu Zhang >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2058.patch > > > For the following query: > define test( out1,out2 ){ >A = load 'x' as (u:int, v:int); >$B = filter A by u < 3 and v < 20; > } > Pig gives the following error message: Syntax error,unexpected symbol at or > near '{' > Previously, it gives: mismatched input '{' expecting RETURNS > The previous message is more meaningful. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro
[ https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2035. --- Resolution: Fixed Hadoop Flags: [Reviewed] Patch committed to trunk and 0.9 branch. > Macro expansion doesn't handle multiple expansions of same macro inside > another macro > - > > Key: PIG-2035 > URL: https://issues.apache.org/jira/browse/PIG-2035 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2035_1.patch > > > Here is the use case: > {code} > define test ( in, out, x ) returns c { > a = load '$in' as (name, age, gpa); > b = group a by gpa; > $c = foreach b generate group, COUNT(a.$x); > store $c into '$out'; > }; > define test2( in, out ) returns x { > $x = test( '$in', '$out', 'name' ); > $x = test( '$in', '$out.1', 'age' ); > $x = test( '$in', '$out.2', 'gpa' ); > }; > x = test2('studenttab10k', 'myoutput'); > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2059) PIG doesn't validate incomplete query in batch mode even if -c option is given
PIG doesn't validate incomplete query in batch mode even if -c option is given -- Key: PIG-2059 URL: https://issues.apache.org/jira/browse/PIG-2059 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.9.0 Given the following in a file to Pig, pig doesn't report any error, even if -c option is given: A = load 'x' as (u, v); B = foreach A generate $3; It's questionable whether to validate the query in batch mode as it doesn't contain any store/dump statement. However, if -c option is given, validation should be nevertheless performed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro
[ https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031483#comment-13031483 ] Daniel Dai commented on PIG-2035: - +1 > Macro expansion doesn't handle multiple expansions of same macro inside > another macro > - > > Key: PIG-2035 > URL: https://issues.apache.org/jira/browse/PIG-2035 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2035_1.patch > > > Here is the use case: > {code} > define test ( in, out, x ) returns c { > a = load '$in' as (name, age, gpa); > b = group a by gpa; > $c = foreach b generate group, COUNT(a.$x); > store $c into '$out'; > }; > define test2( in, out ) returns x { > $x = test( '$in', '$out', 'name' ); > $x = test( '$in', '$out.1', 'age' ); > $x = test( '$in', '$out.2', 'gpa' ); > }; > x = test2('studenttab10k', 'myoutput'); > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2058) Macro missing returns clause doesn't give a good error message
[ https://issues.apache.org/jira/browse/PIG-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-2058: - Assignee: Richard Ding (was: Xuefu Zhang) The problem introduced by RETURN VOID support. Changing the grammar as follows will solve the problem. macro_return_clause : RETURNS ( ( alias ( COMMA alias )* ) | VOID ) -> ^( RETURN_VAL alias* ) > Macro missing returns clause doesn't give a good error message > -- > > Key: PIG-2058 > URL: https://issues.apache.org/jira/browse/PIG-2058 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Xuefu Zhang >Assignee: Richard Ding > Fix For: 0.9.0 > > > For the following query: > define test( out1,out2 ){ >A = load 'x' as (u:int, v:int); >$B = filter A by u < 3 and v < 20; > } > Pig gives the following error message: Syntax error,unexpected symbol at or > near '{' > Previously, it gives: mismatched input '{' expecting RETURNS > The previous message is more meaningful. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2044) Patten match bug in org.apache.pig.newplan.optimizer.Rule
[ https://issues.apache.org/jira/browse/PIG-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-2044: Fix Version/s: (was: 0.9.0) 0.10 Unlink to 0.9. It is a potential bug, but currently we are not using this ability. > Patten match bug in org.apache.pig.newplan.optimizer.Rule > - > > Key: PIG-2044 > URL: https://issues.apache.org/jira/browse/PIG-2044 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.10 > > > Koji find that we have a bug org.apache.pig.newplan.optimizer.Rule. The > "break" in line 179 seems to be wrong. This multiple branch matching is not > used in Pig, but could be a problem for the future. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2014) SAMPLE shouldn't be pushed up
[ https://issues.apache.org/jira/browse/PIG-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031447#comment-13031447 ] Daniel Dai commented on PIG-2014: - I think it is because in TestNewPlanFilterAboveForeach, we only invoke some of the rules. If you do an explain, you will see filter still pushed up. > SAMPLE shouldn't be pushed up > - > > Key: PIG-2014 > URL: https://issues.apache.org/jira/browse/PIG-2014 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0, 0.10 >Reporter: Jacob Perkins >Assignee: Dmitriy V. Ryaboy > Fix For: 0.9.0 > > Attachments: PIG-2014.patch > > > Consider the following code: > {code:none} > tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, > weight:double); > grouped = GROUP tfidf_all BY doc_id; > vectors = FOREACH grouped GENERATE group AS doc_id, tfidf_all.(token, > weight) AS vector; > DUMP vectors; > {code} > This, of course, runs just fine. In a real example, tfidf_all contains > 1,428,280 records. The reduce output records should be exactly the number of > documents, which turn out to be 18,863 in this case. All well and good. > The strangeness comes when you add a SAMPLE command: > {code:none} > sampled = SAMPLE vectors 0.0012; > DUMP sampled; > {code} > Running this results in 1,513 reduce output records. The reduce output > records be much much closer to 22 or 23 records (eg. 0.0012*18863). > Evidently, Pig rewrites SAMPLE into filter, and then pushes that filter in > front of the group. It shouldn't push that filter > since the UDF is non-deterministic. > Quick fix: If you add "-t PushUpFilter" to your command line when invoking > pig this won't happen. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2037) Valid query fails to validate
[ https://issues.apache.org/jira/browse/PIG-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-2037: Fix Version/s: (was: 0.9.0) Xuefu find this test case because it is in a valid test case in old logical plan test suite: TestLogicalPlanBuilder. However, the script fail in all 0.6, 0.7, 0.8. The reason the test case success is because we don't invoke validator in tests. Some generate thoughts about alias conflict: We shall defer dup-alias check as long as we don't have ambiguity. For example: {code} B = foreach A generate name, UDF(name) as name; store B into '111'; {code} This should be valid since there is no ambiguity. The following script should fail: {code} B = foreach A generate name, UDF(name) as name; C = foreach B generate name; {code} Because "name" in C is ambiguous. Unlink this to 0.9 since it is not a regression. > Valid query fails to validate > - > > Key: PIG-2037 > URL: https://issues.apache.org/jira/browse/PIG-2037 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Xuefu Zhang >Assignee: Daniel Dai > > The following test case seems valid, but it fails to validate in 0.9. > a = load 'st10k' as (name, age, gpa); > b = group a by name; > c = foreach b generate flatten(a); > d = filter c by name != 'fred'; > e = group d by name;\n" + > f = foreach e generate flatten(d); > g = foreach f generate name, d::a::name, a::name; > store g into 'output';" > ERROR 1108: Duplicate schema alias: d::a::name -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2014) SAMPLE shouldn't be pushed up
[ https://issues.apache.org/jira/browse/PIG-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031439#comment-13031439 ] Dmitriy V. Ryaboy commented on PIG-2014: I'll add to the other rules -- but for the record, I looked at the plan and saw the sample not being pushed up after my patch :-). > SAMPLE shouldn't be pushed up > - > > Key: PIG-2014 > URL: https://issues.apache.org/jira/browse/PIG-2014 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0, 0.10 >Reporter: Jacob Perkins >Assignee: Dmitriy V. Ryaboy > Fix For: 0.9.0 > > Attachments: PIG-2014.patch > > > Consider the following code: > {code:none} > tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, > weight:double); > grouped = GROUP tfidf_all BY doc_id; > vectors = FOREACH grouped GENERATE group AS doc_id, tfidf_all.(token, > weight) AS vector; > DUMP vectors; > {code} > This, of course, runs just fine. In a real example, tfidf_all contains > 1,428,280 records. The reduce output records should be exactly the number of > documents, which turn out to be 18,863 in this case. All well and good. > The strangeness comes when you add a SAMPLE command: > {code:none} > sampled = SAMPLE vectors 0.0012; > DUMP sampled; > {code} > Running this results in 1,513 reduce output records. The reduce output > records be much much closer to 22 or 23 records (eg. 0.0012*18863). > Evidently, Pig rewrites SAMPLE into filter, and then pushes that filter in > front of the group. It shouldn't push that filter > since the UDF is non-deterministic. > Quick fix: If you add "-t PushUpFilter" to your command line when invoking > pig this won't happen. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2014) SAMPLE shouldn't be pushed up
[ https://issues.apache.org/jira/browse/PIG-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031422#comment-13031422 ] Daniel Dai commented on PIG-2014: - I think this is more a bug fix, should go into 0.9. However, the script will trigger rule FilterAboveForeach not PushUpFilter. So the patch does not fix the problem. The fix should go to all these rules: PushUpFilter, PushDownForEachFlatten, FilterAboveForeach. > SAMPLE shouldn't be pushed up > - > > Key: PIG-2014 > URL: https://issues.apache.org/jira/browse/PIG-2014 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0, 0.10 >Reporter: Jacob Perkins >Assignee: Dmitriy V. Ryaboy > Fix For: 0.9.0 > > Attachments: PIG-2014.patch > > > Consider the following code: > {code:none} > tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, > weight:double); > grouped = GROUP tfidf_all BY doc_id; > vectors = FOREACH grouped GENERATE group AS doc_id, tfidf_all.(token, > weight) AS vector; > DUMP vectors; > {code} > This, of course, runs just fine. In a real example, tfidf_all contains > 1,428,280 records. The reduce output records should be exactly the number of > documents, which turn out to be 18,863 in this case. All well and good. > The strangeness comes when you add a SAMPLE command: > {code:none} > sampled = SAMPLE vectors 0.0012; > DUMP sampled; > {code} > Running this results in 1,513 reduce output records. The reduce output > records be much much closer to 22 or 23 records (eg. 0.0012*18863). > Evidently, Pig rewrites SAMPLE into filter, and then pushes that filter in > front of the group. It shouldn't push that filter > since the UDF is non-deterministic. > Quick fix: If you add "-t PushUpFilter" to your command line when invoking > pig this won't happen. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2058) Macro missing returns clause doesn't give a good error message
Macro missing returns clause doesn't give a good error message -- Key: PIG-2058 URL: https://issues.apache.org/jira/browse/PIG-2058 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.9.0 For the following query: define test( out1,out2 ){ A = load 'x' as (u:int, v:int); $B = filter A by u < 3 and v < 20; } Pig gives the following error message: Syntax error,unexpected symbol at or near '{' Previously, it gives: mismatched input '{' expecting RETURNS The previous message is more meaningful. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2020) Valid query fails to validate
[ https://issues.apache.org/jira/browse/PIG-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031382#comment-13031382 ] Xuefu Zhang commented on PIG-2020: -- I reported it, and thus need to explain. $0 should be cast to a bag because it's used as the input to a nested filter. $1 and $2 are referring to the columns in the bag. This usage seems reasonable: user doesn't have to explicitly cast $0 to a bag, as it's the case if the type were long or int. If the load statement becomes: A = load 'x' as (b:{}), pig doesn't complain any more. The point is, Pig should detect the need of inserting a cast in this case. > Valid query fails to validate > - > > Key: PIG-2020 > URL: https://issues.apache.org/jira/browse/PIG-2020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0, 0.9.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > > The following query seems valid: > A = load 'x'; > B = foreach A { T = filter $0 by $1 > $2; generate T; }; > Store B into 'y'; > However, the query fails due to validation error in 0.8: > 2011-04-28 09:08:06,846 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1026: Attempt to fetch field 1 from schema of size 1 > Similar error is given in 0.9. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2020) Valid query fails to validate
[ https://issues.apache.org/jira/browse/PIG-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031381#comment-13031381 ] Olga Natkovich commented on PIG-2020: - I am not even sure why this is a valid script. Can somebody explain? > Valid query fails to validate > - > > Key: PIG-2020 > URL: https://issues.apache.org/jira/browse/PIG-2020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0, 0.9.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > > The following query seems valid: > A = load 'x'; > B = foreach A { T = filter $0 by $1 > $2; generate T; }; > Store B into 'y'; > However, the query fails due to validation error in 0.8: > 2011-04-28 09:08:06,846 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1026: Attempt to fetch field 1 from schema of size 1 > Similar error is given in 0.9. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2021) Parser error while referring a map nested foreach
[ https://issues.apache.org/jira/browse/PIG-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-2021. - Resolution: Fixed Vivek, please, re-open if the issue still happens with the latest pig 0.9 code. > Parser error while referring a map nested foreach > - > > Key: PIG-2021 > URL: https://issues.apache.org/jira/browse/PIG-2021 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Vivek Padmanabhan >Assignee: Xuefu Zhang > Fix For: 0.9.0 > > > The below script is throwing parser errors > {code} > register string.jar; > A = load 'test1' using MapLoader() as ( s, m, l ); > B = foreach A generate *, string.URLPARSE((chararray) s#'url') as parsedurl; > C = foreach B { > urlpath = (chararray) parsedurl#'path'; > lc_urlpath = string.TOLOWERCASE((chararray) urlpath); > generate *; > }; > {code} > Error message; > | Failed to generate logical plan. > |Nested exception: org.apache.pig.impl.logicalLayer.FrontendException: ERROR > 2225: Projection with nothing to reference! > PIG-2002 reports a similar issue, but when i tried with the patch of PIG-2002 > i was getting the below exception; > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: 11, column 33> mismatched input '(' expecting SEMI_COLON -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2020) Valid query fails to validate
[ https://issues.apache.org/jira/browse/PIG-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-2020: - Fix Version/s: (was: 0.9.0) Since this is there even in 0.8, there is no rush to fix it in 0.9. Thus, I suggest we push this forward. > Valid query fails to validate > - > > Key: PIG-2020 > URL: https://issues.apache.org/jira/browse/PIG-2020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0, 0.9.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > > The following query seems valid: > A = load 'x'; > B = foreach A { T = filter $0 by $1 > $2; generate T; }; > Store B into 'y'; > However, the query fails due to validation error in 0.8: > 2011-04-28 09:08:06,846 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1026: Attempt to fetch field 1 from schema of size 1 > Similar error is given in 0.9. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2056) Jython error messages should show script name
[ https://issues.apache.org/jira/browse/PIG-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2056: -- Attachment: PIG-2056.patch > Jython error messages should show script name > - > > Key: PIG-2056 > URL: https://issues.apache.org/jira/browse/PIG-2056 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Fix For: 0.9.0 > > Attachments: PIG-2056.patch > > > Instead of messages like > {code} > Traceback (most recent call last): > File "", line 12, in > {code} > It should display the script file name: > {code} > Traceback (most recent call last): > File "test.py", line 12, in > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2030) Merged join/cogroup does not automatically ship loader
[ https://issues.apache.org/jira/browse/PIG-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-2030. - Resolution: Fixed Hadoop Flags: [Reviewed] Patch committed to both trunk and 0.9 branch. > Merged join/cogroup does not automatically ship loader > -- > > Key: PIG-2030 > URL: https://issues.apache.org/jira/browse/PIG-2030 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.9.0 > > Attachments: PIG-2030-1.patch, PIG-2030-2.patch > > > The following script fail due to TableLoader class not found (If the jar is > in classpath): > {code} > a = load '/user/pig/tests/data/zebra/singlefile/studentsortedtab10k' using > org.apache.hadoop.zebra.pig.TableLoader('', 'sorted'); > b = load '/user/pig/tests/data/zebra/singlefile/votersortedtab10k' using > org.apache.hadoop.zebra.pig.TableLoader('', 'sorted'); > g = cogroup a by $0, b by $0 using 'merge'; > store g into '/user/pig/out/jianyong.1304374720/ZebraMapCogrp_1.out'; > {code} > If we use register, the error goes away. However, Pig always ship jars > containing LoadFunc automatically. It should be the same for merged > cogroup/join. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2021) Parser error while referring a map nested foreach
[ https://issues.apache.org/jira/browse/PIG-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031368#comment-13031368 ] Xuefu Zhang commented on PIG-2021: -- Hi Vivek, This morning we found that there was a little disparity between 0.9 and trunk regarding this fix. Yes, you would still have this problem in 0.9, but with the latest checkin, the problem should have been addressed. Let me know if you found that this is not the case. --Xuefu > Parser error while referring a map nested foreach > - > > Key: PIG-2021 > URL: https://issues.apache.org/jira/browse/PIG-2021 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Vivek Padmanabhan >Assignee: Xuefu Zhang > Fix For: 0.9.0 > > > The below script is throwing parser errors > {code} > register string.jar; > A = load 'test1' using MapLoader() as ( s, m, l ); > B = foreach A generate *, string.URLPARSE((chararray) s#'url') as parsedurl; > C = foreach B { > urlpath = (chararray) parsedurl#'path'; > lc_urlpath = string.TOLOWERCASE((chararray) urlpath); > generate *; > }; > {code} > Error message; > | Failed to generate logical plan. > |Nested exception: org.apache.pig.impl.logicalLayer.FrontendException: ERROR > 2225: Projection with nothing to reference! > PIG-2002 reports a similar issue, but when i tried with the patch of PIG-2002 > i was getting the below exception; > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: 11, column 33> mismatched input '(' expecting SEMI_COLON -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2030) Merged join/cogroup does not automatically ship loader
[ https://issues.apache.org/jira/browse/PIG-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031358#comment-13031358 ] Ashutosh Chauhan commented on PIG-2030: --- +1. For documentation purposes: After this user need not to register their jars (for loadfuncs/udfs) if they are already in the classpath. But if they have some dependency on which their supplied loadfuncs/udfs depend, they need to register that dependecy. Else, they can bundle all their dependencies in one jar and put it in the classpath. > Merged join/cogroup does not automatically ship loader > -- > > Key: PIG-2030 > URL: https://issues.apache.org/jira/browse/PIG-2030 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.9.0 > > Attachments: PIG-2030-1.patch, PIG-2030-2.patch > > > The following script fail due to TableLoader class not found (If the jar is > in classpath): > {code} > a = load '/user/pig/tests/data/zebra/singlefile/studentsortedtab10k' using > org.apache.hadoop.zebra.pig.TableLoader('', 'sorted'); > b = load '/user/pig/tests/data/zebra/singlefile/votersortedtab10k' using > org.apache.hadoop.zebra.pig.TableLoader('', 'sorted'); > g = cogroup a by $0, b by $0 using 'merge'; > store g into '/user/pig/out/jianyong.1304374720/ZebraMapCogrp_1.out'; > {code} > If we use register, the error goes away. However, Pig always ship jars > containing LoadFunc automatically. It should be the same for merged > cogroup/join. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2039) IndexOutOfBounException for a case
[ https://issues.apache.org/jira/browse/PIG-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved PIG-2039. -- Resolution: Fixed Patch PIG-2039.patch is committed into both trunk and 0.9.0. > IndexOutOfBounException for a case > -- > > Key: PIG-2039 > URL: https://issues.apache.org/jira/browse/PIG-2039 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Fix For: 0.9.0 > > Attachments: PIG-2039.patch > > > The following query gives an exception: > a = load '1.txt' as (a0:int, a1:int, a2:int); > b = group a by a0; > c = foreach b { c1 = limit a 10; c2 = distinct c1.a1; c3 = distinct c1.a2; > generate c2, c3;}; > store c into 'output'; > 2011-05-04 12:36:01,720 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2999: Unexpected internal error. Index: 0, Size: 0 > Stack trace: > java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at > org.apache.pig.newplan.logical.expression.ProjectExpression.getFieldSchema(ProjectExpression.java:279) > at > org.apache.pig.newplan.logical.relational.LOGenerate.getSchema(LOGenerate.java:88) > at > org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:60) > at > org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:104) > at > org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:240) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at > org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:99) > at > org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:73) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at org.apache.pig.PigServer$Graph.compile(PigServer.java:1664) > at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1615) > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1586) > at org.apache.pig.PigServer.registerQuery(PigServer.java:580) > at > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:930) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:176) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:152) > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) > at org.apache.pig.Main.run(Main.java:488) > at org.apache.pig.Main.main(Main.java:109) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2007) Parsing error when map key referred directly from udf in nested foreach
[ https://issues.apache.org/jira/browse/PIG-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031348#comment-13031348 ] Xuefu Zhang commented on PIG-2007: -- PIG-2007-2.patch is committed to 0.9.0. > Parsing error when map key referred directly from udf in nested foreach > > > Key: PIG-2007 > URL: https://issues.apache.org/jira/browse/PIG-2007 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Anitha Raju >Assignee: Xuefu Zhang > Fix For: 0.9.0 > > Attachments: PIG-2007-2.patch, PIG-2007.patch > > > The below script when executed with version 0.9 fails with parsing error. > {code} > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. > mismatched input '{' expecting GENERATE > {code} > Script1 > {code} > register myudf.jar; > A = load 'test.txt' using PigStorage() as (a:int,b:chararray); > B1 = foreach A { > C = test.TOMAP('key1',$1)#'key1'; > generate C as C; > } > {code} > The above happens when, in a nested foreach i refer to a map key directly > from a udf result > The same would work if one executes without the nested foreach. > {code} > register myudf.jar; > A = load 'test.txt' using PigStorage() as (a:int,b:chararray); > B1 = foreach A generate test.TOMAP('key1',$1)#'key1'; > dump B1; > {code} > Script1 works well with 0.8. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2057) udf having project-star/project-range as argument has single tuple in argument schema
udf having project-star/project-range as argument has single tuple in argument schema - Key: PIG-2057 URL: https://issues.apache.org/jira/browse/PIG-2057 Project: Pig Issue Type: Bug Affects Versions: 0.8.0, 0.7.0, 0.9.0 Reporter: Thejas M Nair Assignee: Thejas M Nair When a udf has a project-star(*) or project-range-to-end (eg- $3 ..)as the argument, then to find the appropriate matching udf class, pig type checker (TypeCheckingRelVisitor) creates a UDF input schema that has a single tuple as the argument. But at runtime, the udf will actually get an input that has the expanded list of columns - not a tuple containing a single tuple as indicated by schema used in typechecking. The patch in PIG-1938 has fix for the case where input schema is present, as it expands the project-star or project-range in that case. Project-range is expanded even in input schema is not present, if it is not a project-to-end, as the number of columns present in such cases is known. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1938) support project-range as udf argument
[ https://issues.apache.org/jira/browse/PIG-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031340#comment-13031340 ] Thejas M Nair commented on PIG-1938: This patch also expands project-star and project-range expression within udf, in the query plan generation phase. The expanded argument schema for the udf gets used in typechecker. Earlier, there was an inconsistency in the behavior when project-star was used in the udf, the typechecker would see the udf having a single argument of type tuple, but at runtime the udf would get multiple arguments. This inconsistency has not been resolved for the case when schema of input relation is null. I will open another jira to address that. > support project-range as udf argument > - > > Key: PIG-1938 > URL: https://issues.apache.org/jira/browse/PIG-1938 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.9.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.9.0 > > Attachments: PIG-1938.1.patch > > > With changes in PIG-1693, project-range ('..') is supported in all use cases > where '*' (project-star) is supported, except as udf argument. > To be consistent with usage of project-star, project-range should be > supported as udf argument as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2055) inconsistentcy behavior in parser generated during build
[ https://issues.apache.org/jira/browse/PIG-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031332#comment-13031332 ] Thejas M Nair commented on PIG-2055: bq. I see that before. Once I do "ant clean", the message go away. I have seen it even after doing 'ant clean', so it does not seem to be caused by unclean build, but by some non deterministic code generation in antlr. > inconsistentcy behavior in parser generated during build > - > > Key: PIG-2055 > URL: https://issues.apache.org/jira/browse/PIG-2055 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Thejas M Nair > > On certain builds, i see that pig fails to support this syntax - > {code} > grunt> l = load 'x' using PigStorage(':'); > 2011-05-10 09:21:41,565 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1200: mismatched input '(' expecting SEMI_COLON > Details at logfile: /Users/tejas/pig_trunk_cp/trunk/pig_1305044484712.log > {code} > I seem to be the only one who has seen this behavior, and I have seen on > occassion when I build on mac. It could be problem with antlr and apple jvm > interaction. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2007) Parsing error when map key referred directly from udf in nested foreach
[ https://issues.apache.org/jira/browse/PIG-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031322#comment-13031322 ] Thejas M Nair commented on PIG-2007: +1 for PIG-2007-2.patch > Parsing error when map key referred directly from udf in nested foreach > > > Key: PIG-2007 > URL: https://issues.apache.org/jira/browse/PIG-2007 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Anitha Raju >Assignee: Xuefu Zhang > Fix For: 0.9.0 > > Attachments: PIG-2007-2.patch, PIG-2007.patch > > > The below script when executed with version 0.9 fails with parsing error. > {code} > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. > mismatched input '{' expecting GENERATE > {code} > Script1 > {code} > register myudf.jar; > A = load 'test.txt' using PigStorage() as (a:int,b:chararray); > B1 = foreach A { > C = test.TOMAP('key1',$1)#'key1'; > generate C as C; > } > {code} > The above happens when, in a nested foreach i refer to a map key directly > from a udf result > The same would work if one executes without the nested foreach. > {code} > register myudf.jar; > A = load 'test.txt' using PigStorage() as (a:int,b:chararray); > B1 = foreach A generate test.TOMAP('key1',$1)#'key1'; > dump B1; > {code} > Script1 works well with 0.8. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2052) Ship guava.jar to backend
[ https://issues.apache.org/jira/browse/PIG-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-2052. - Resolution: Fixed Hadoop Flags: [Reviewed] Patch committed to trunk and 0.9 branch. > Ship guava.jar to backend > - > > Key: PIG-2052 > URL: https://issues.apache.org/jira/browse/PIG-2052 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Daniel Dai >Assignee: Dmitriy V. Ryaboy > Fix For: 0.9.0 > > Attachments: PIG-2052-1.patch > > > We need to ship guava.jar to backend. GenericInvoker is using it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2055) inconsistentcy behavior in parser generated during build
[ https://issues.apache.org/jira/browse/PIG-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031321#comment-13031321 ] Daniel Dai commented on PIG-2055: - I see that before. Once I do "ant clean", the message go away. > inconsistentcy behavior in parser generated during build > - > > Key: PIG-2055 > URL: https://issues.apache.org/jira/browse/PIG-2055 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Thejas M Nair > > On certain builds, i see that pig fails to support this syntax - > {code} > grunt> l = load 'x' using PigStorage(':'); > 2011-05-10 09:21:41,565 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1200: mismatched input '(' expecting SEMI_COLON > Details at logfile: /Users/tejas/pig_trunk_cp/trunk/pig_1305044484712.log > {code} > I seem to be the only one who has seen this behavior, and I have seen on > occassion when I build on mac. It could be problem with antlr and apple jvm > interaction. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1983) Clarify requiredFieldList in LoadPushDown.pushProjection is read only
[ https://issues.apache.org/jira/browse/PIG-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031319#comment-13031319 ] Daniel Dai commented on PIG-1983: - Done javadoc change. > Clarify requiredFieldList in LoadPushDown.pushProjection is read only > - > > Key: PIG-1983 > URL: https://issues.apache.org/jira/browse/PIG-1983 > Project: Pig > Issue Type: Improvement > Components: documentation >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Corinne Chandel >Priority: Minor > Fix For: 0.9.0 > > > In Pig UDF manual, LoadPushDown.pushProjection(), add a clarification that > requiredFieldRequest is read only, cannot be changed by LoadFunc -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2007) Parsing error when map key referred directly from udf in nested foreach
[ https://issues.apache.org/jira/browse/PIG-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-2007: - Attachment: PIG-2007-2.patch Patch PIG-2007-2.patch is what was actually committed to the trunk. This needs to be for 0.9 as well. > Parsing error when map key referred directly from udf in nested foreach > > > Key: PIG-2007 > URL: https://issues.apache.org/jira/browse/PIG-2007 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Anitha Raju >Assignee: Xuefu Zhang > Fix For: 0.9.0 > > Attachments: PIG-2007-2.patch, PIG-2007.patch > > > The below script when executed with version 0.9 fails with parsing error. > {code} > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. > mismatched input '{' expecting GENERATE > {code} > Script1 > {code} > register myudf.jar; > A = load 'test.txt' using PigStorage() as (a:int,b:chararray); > B1 = foreach A { > C = test.TOMAP('key1',$1)#'key1'; > generate C as C; > } > {code} > The above happens when, in a nested foreach i refer to a map key directly > from a udf result > The same would work if one executes without the nested foreach. > {code} > register myudf.jar; > A = load 'test.txt' using PigStorage() as (a:int,b:chararray); > B1 = foreach A generate test.TOMAP('key1',$1)#'key1'; > dump B1; > {code} > Script1 works well with 0.8. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2056) Jython error messages should show script name
Jython error messages should show script name - Key: PIG-2056 URL: https://issues.apache.org/jira/browse/PIG-2056 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Fix For: 0.9.0 Instead of messages like {code} Traceback (most recent call last): File "", line 12, in {code} It should display the script file name: {code} Traceback (most recent call last): File "test.py", line 12, in {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2039) IndexOutOfBounException for a case
[ https://issues.apache.org/jira/browse/PIG-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031307#comment-13031307 ] Daniel Dai commented on PIG-2039: - +1 > IndexOutOfBounException for a case > -- > > Key: PIG-2039 > URL: https://issues.apache.org/jira/browse/PIG-2039 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Fix For: 0.9.0 > > Attachments: PIG-2039.patch > > > The following query gives an exception: > a = load '1.txt' as (a0:int, a1:int, a2:int); > b = group a by a0; > c = foreach b { c1 = limit a 10; c2 = distinct c1.a1; c3 = distinct c1.a2; > generate c2, c3;}; > store c into 'output'; > 2011-05-04 12:36:01,720 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2999: Unexpected internal error. Index: 0, Size: 0 > Stack trace: > java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at > org.apache.pig.newplan.logical.expression.ProjectExpression.getFieldSchema(ProjectExpression.java:279) > at > org.apache.pig.newplan.logical.relational.LOGenerate.getSchema(LOGenerate.java:88) > at > org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:60) > at > org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:104) > at > org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:240) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at > org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:99) > at > org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:73) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at org.apache.pig.PigServer$Graph.compile(PigServer.java:1664) > at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1615) > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1586) > at org.apache.pig.PigServer.registerQuery(PigServer.java:580) > at > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:930) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:176) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:152) > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) > at org.apache.pig.Main.run(Main.java:488) > at org.apache.pig.Main.main(Main.java:109) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031274#comment-13031274 ] Alan Gates commented on PIG-1824: - I'll start running the tests and such. I also want to add some end to end tests. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch, > 1824d.patch > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1994) e2e test harness deployment implementation for existing cluster
[ https://issues.apache.org/jira/browse/PIG-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1994: Resolution: Fixed Status: Resolved (was: Patch Available) Patch checked in. > e2e test harness deployment implementation for existing cluster > --- > > Key: PIG-1994 > URL: https://issues.apache.org/jira/browse/PIG-1994 > Project: Pig > Issue Type: Sub-task > Components: tools >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: 0.10 > > Attachments: PIG-1994.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2055) inconsistentcy behavior in parser generated during build
inconsistentcy behavior in parser generated during build - Key: PIG-2055 URL: https://issues.apache.org/jira/browse/PIG-2055 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Thejas M Nair On certain builds, i see that pig fails to support this syntax - {code} grunt> l = load 'x' using PigStorage(':'); 2011-05-10 09:21:41,565 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: mismatched input '(' expecting SEMI_COLON Details at logfile: /Users/tejas/pig_trunk_cp/trunk/pig_1305044484712.log {code} I seem to be the only one who has seen this behavior, and I have seen on occassion when I build on mac. It could be problem with antlr and apple jvm interaction. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1938) support project-range as udf argument
[ https://issues.apache.org/jira/browse/PIG-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1938: --- Attachment: PIG-1938.1.patch PIG-1938.1.patch - passes unit tests. test-patch showed additional javac warning, but that was from the code generated from antlr. > support project-range as udf argument > - > > Key: PIG-1938 > URL: https://issues.apache.org/jira/browse/PIG-1938 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.9.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.9.0 > > Attachments: PIG-1938.1.patch > > > With changes in PIG-1693, project-range ('..') is supported in all use cases > where '*' (project-star) is supported, except as udf argument. > To be consistent with usage of project-star, project-range should be > supported as udf argument as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2021) Parser error while referring a map nested foreach
[ https://issues.apache.org/jira/browse/PIG-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031114#comment-13031114 ] Vivek Padmanabhan commented on PIG-2021: Hi Xuefu, Extremely sorry about that.I was just trying to remove the dependencies.Please check whether the below is a valid case; {code} register mymapudf.jar; A = load 'temp' as ( s, m, l ); B = foreach A generate *, org.vivek.udfs.mToMapUDF((chararray) s) as mapout; C = foreach B { urlpath = (chararray) mapout#'k1'; lc_urlpath = org.vivek.udfs.LOWER((chararray) urlpath); generate urlpath,lc_urlpath; }; {code} Source for org.vivek.udfs.mToMapUDF {code} package org.vivek.udfs; import java.io.IOException; import java.util.HashMap; import java.util.Map; import org.apache.pig.EvalFunc; import org.apache.pig.data.DataType; import org.apache.pig.data.Tuple; import org.apache.pig.impl.logicalLayer.schema.Schema; public class mToMapUDF extends EvalFunc { public Map exec(Tuple arg0) throws IOException { Map myMapTResult = new HashMap(); myMapTResult.put("k1", "SomeString"); myMapTResult.put("k3", "SomeOtherString"); return myMapTResult; } public Schema outputSchema(Schema input) { return new Schema(new Schema.FieldSchema("mapout",DataType.MAP)); } } {code} Source for org.vivek.udfs.LOWER {code} package org.vivek.udfs; import java.io.IOException; import java.util.List; import java.util.ArrayList; import org.apache.pig.EvalFunc; import org.apache.pig.data.Tuple; import org.apache.pig.data.DataType; import org.apache.pig.impl.logicalLayer.schema.Schema; import org.apache.pig.impl.logicalLayer.FrontendException; import org.apache.pig.FuncSpec; public class LOWER extends EvalFunc { public String exec(Tuple input) throws IOException { if (input == null || input.size() == 0) return null; try { String str = (String)input.get(0); return str.toLowerCase(); } catch(Exception e){ return null; } } public Schema outputSchema(Schema input) { return new Schema(new Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), input), DataType.CHARARRAY)); } public List getArgToFuncMapping() throws FrontendException { List funcList = new ArrayList(); funcList.add(new FuncSpec(this.getClass().getName(), new Schema(new Schema.FieldSchema(null, DataType.CHARARRAY; return funcList; } } {code} > Parser error while referring a map nested foreach > - > > Key: PIG-2021 > URL: https://issues.apache.org/jira/browse/PIG-2021 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Vivek Padmanabhan >Assignee: Xuefu Zhang > Fix For: 0.9.0 > > > The below script is throwing parser errors > {code} > register string.jar; > A = load 'test1' using MapLoader() as ( s, m, l ); > B = foreach A generate *, string.URLPARSE((chararray) s#'url') as parsedurl; > C = foreach B { > urlpath = (chararray) parsedurl#'path'; > lc_urlpath = string.TOLOWERCASE((chararray) urlpath); > generate *; > }; > {code} > Error message; > | Failed to generate logical plan. > |Nested exception: org.apache.pig.impl.logicalLayer.FrontendException: ERROR > 2225: Projection with nothing to reference! > PIG-2002 reports a similar issue, but when i tried with the patch of PIG-2002 > i was getting the below exception; > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: 11, column 33> mismatched input '(' expecting SEMI_COLON -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1983) Clarify requiredFieldList in LoadPushDown.pushProjection is read only
[ https://issues.apache.org/jira/browse/PIG-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031074#comment-13031074 ] Daniel Dai commented on PIG-1983: - Corinne's change is not in javadoc, it's in UDF manual. Yeah, I need to change javadoc as well. > Clarify requiredFieldList in LoadPushDown.pushProjection is read only > - > > Key: PIG-1983 > URL: https://issues.apache.org/jira/browse/PIG-1983 > Project: Pig > Issue Type: Improvement > Components: documentation >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Corinne Chandel >Priority: Minor > Fix For: 0.9.0 > > > In Pig UDF manual, LoadPushDown.pushProjection(), add a clarification that > requiredFieldRequest is read only, cannot be changed by LoadFunc -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031068#comment-13031068 ] Dmitriy V. Ryaboy commented on PIG-1825: Cool. At this point I don't think we need testStoreToHBase_2_no_WAL() ? HBase itself doesn't actually test noWAL directly. I'm ok with not testing the full path, just testing that we are using the HBase api correctly. I do almost want to make it "-noSafety" just to be clear about what one is doing when invoking this "optimization" > ability to turn off the write ahead log for pig's HBaseStorage > -- > > Key: PIG-1825 > URL: https://issues.apache.org/jira/browse/PIG-1825 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Corbin Hoenes >Priority: Minor > Attachments: HBaseStorage_noWAL.patch, PIG-1825_1.patch, > PIG-1825_2.patch > > > Added an option to allow a caller of HBaseStorage to turn off the > WriteAheadLog feature while doing bulk loads into hbase. > From the performance tuning wikipage: > http://wiki.apache.org/hadoop/PerformanceTuning > "To speed up the inserts in a non critical job (like an import job), you can > use Put.writeToWAL(false) to bypass writing to the write ahead log." > We've tested this on HBase 0.20.6 and it helps dramatically. > The -noWAL options is passed in just like other options for hbase storage: > STORE myalias INTO 'MyTable' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 > mycolumnfamily:field2','-noWAL'); > This would be my first patch so please educate me with any steps I need to > do. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy reassigned PIG-1825: -- Assignee: Bill Graham > ability to turn off the write ahead log for pig's HBaseStorage > -- > > Key: PIG-1825 > URL: https://issues.apache.org/jira/browse/PIG-1825 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Corbin Hoenes >Assignee: Bill Graham >Priority: Minor > Attachments: HBaseStorage_noWAL.patch, PIG-1825_1.patch, > PIG-1825_2.patch > > > Added an option to allow a caller of HBaseStorage to turn off the > WriteAheadLog feature while doing bulk loads into hbase. > From the performance tuning wikipage: > http://wiki.apache.org/hadoop/PerformanceTuning > "To speed up the inserts in a non critical job (like an import job), you can > use Put.writeToWAL(false) to bypass writing to the write ahead log." > We've tested this on HBase 0.20.6 and it helps dramatically. > The -noWAL options is passed in just like other options for hbase storage: > STORE myalias INTO 'MyTable' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 > mycolumnfamily:field2','-noWAL'); > This would be my first patch so please educate me with any steps I need to > do. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone
[ https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031065#comment-13031065 ] Dmitriy V. Ryaboy commented on PIG-1946: This is for column families: http://hbase.apache.org/xref/org/apache/hadoop/hbase/HColumnDescriptor.html#278 (no slashes, colons, or ISOControl chars, and no starting with "."). I believe columns are similar. > HBaseStorage constructor syntax is error prone > -- > > Key: PIG-1946 > URL: https://issues.apache.org/jira/browse/PIG-1946 > Project: Pig > Issue Type: Improvement >Reporter: Bill Graham >Assignee: Bill Graham > Fix For: 0.10 > > Attachments: PIG-1946_1.patch > > > Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it > will yield unexpected results: > {code} > STORE result INTO 'hbase://foo' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage( > 'info:first_name, info:last_name'); > {code} > The problem us that a column named {{info:first_name,}} will be created, with > the trailing comma included. I've had numerous developers get tripped up on > this issue since everywhere else in Pig variables are separated by commas, so > I propose we fix it. > I propose we trim leading/trailing commas from column names, but I'm open to > other ideas. > Also should we accept column names that are comman-delimited without spaces? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (PIG-1778) Some dependencies not packaged with Pig 0.8 release
[ https://issues.apache.org/jira/browse/PIG-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy closed PIG-1778. -- > Some dependencies not packaged with Pig 0.8 release > --- > > Key: PIG-1778 > URL: https://issues.apache.org/jira/browse/PIG-1778 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Dmitriy V. Ryaboy > > Some of the libraries required for new Pig features are not included in the > built tarball of 0.8 release: > guava, required for HBaseStorage > jython, required for Jython UDFs > We should discuss how to properly package these dependencies. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-1778) Some dependencies not packaged with Pig 0.8 release
[ https://issues.apache.org/jira/browse/PIG-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy resolved PIG-1778. Resolution: Duplicate resolving as duplicate. > Some dependencies not packaged with Pig 0.8 release > --- > > Key: PIG-1778 > URL: https://issues.apache.org/jira/browse/PIG-1778 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Dmitriy V. Ryaboy > > Some of the libraries required for new Pig features are not included in the > built tarball of 0.8 release: > guava, required for HBaseStorage > jython, required for Jython UDFs > We should discuss how to properly package these dependencies. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira