[jira] [Updated] (PIG-2058) Macro missing returns clause doesn't give a good error message

2011-05-10 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2058:
--

Attachment: PIG-2058.patch

Thanks Xuefu. Attaching a patch with the fix.

> Macro missing returns clause doesn't give a good error message
> --
>
> Key: PIG-2058
> URL: https://issues.apache.org/jira/browse/PIG-2058
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2058.patch
>
>
> For the following query:
> define test( out1,out2 ){
>A  = load 'x' as (u:int, v:int);
>$B  = filter A by u < 3 and v <  20;
> }
> Pig gives the following error message: Syntax error,unexpected symbol at or 
> near '{'
> Previously, it gives: mismatched input '{' expecting RETURNS
> The previous message is more meaningful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro

2011-05-10 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2035.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to trunk and 0.9 branch.

> Macro expansion doesn't handle multiple expansions of same macro inside 
> another macro
> -
>
> Key: PIG-2035
> URL: https://issues.apache.org/jira/browse/PIG-2035
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2035_1.patch
>
>
> Here is the use case:
> {code}
> define test ( in, out, x ) returns c { 
> a = load '$in' as (name, age, gpa);
> b = group a by gpa;
> $c = foreach b generate group, COUNT(a.$x);
> store $c into '$out';
> };
> define test2( in, out ) returns x { 
> $x = test( '$in', '$out', 'name' );
> $x = test( '$in', '$out.1', 'age' );
> $x = test( '$in', '$out.2', 'gpa' );
> };
> x = test2('studenttab10k', 'myoutput');
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2059) PIG doesn't validate incomplete query in batch mode even if -c option is given

2011-05-10 Thread Xuefu Zhang (JIRA)
PIG doesn't validate incomplete query in batch mode even if -c option is given
--

 Key: PIG-2059
 URL: https://issues.apache.org/jira/browse/PIG-2059
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.9.0


Given the following in a file to Pig, pig doesn't report any error, even if -c 
option is given:

A = load 'x' as (u, v);
B = foreach A generate $3;

It's questionable whether to validate the query in batch mode as it doesn't 
contain any store/dump statement. However, if -c option is given, validation 
should be nevertheless performed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro

2011-05-10 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031483#comment-13031483
 ] 

Daniel Dai commented on PIG-2035:
-

+1

> Macro expansion doesn't handle multiple expansions of same macro inside 
> another macro
> -
>
> Key: PIG-2035
> URL: https://issues.apache.org/jira/browse/PIG-2035
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2035_1.patch
>
>
> Here is the use case:
> {code}
> define test ( in, out, x ) returns c { 
> a = load '$in' as (name, age, gpa);
> b = group a by gpa;
> $c = foreach b generate group, COUNT(a.$x);
> store $c into '$out';
> };
> define test2( in, out ) returns x { 
> $x = test( '$in', '$out', 'name' );
> $x = test( '$in', '$out.1', 'age' );
> $x = test( '$in', '$out.2', 'gpa' );
> };
> x = test2('studenttab10k', 'myoutput');
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2058) Macro missing returns clause doesn't give a good error message

2011-05-10 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-2058:
-

Assignee: Richard Ding  (was: Xuefu Zhang)

The problem introduced by RETURN VOID support. Changing the grammar as follows 
will solve the problem.

macro_return_clause : RETURNS ( ( alias ( COMMA alias )* ) | VOID )
   -> ^( RETURN_VAL alias* )


> Macro missing returns clause doesn't give a good error message
> --
>
> Key: PIG-2058
> URL: https://issues.apache.org/jira/browse/PIG-2058
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Richard Ding
> Fix For: 0.9.0
>
>
> For the following query:
> define test( out1,out2 ){
>A  = load 'x' as (u:int, v:int);
>$B  = filter A by u < 3 and v <  20;
> }
> Pig gives the following error message: Syntax error,unexpected symbol at or 
> near '{'
> Previously, it gives: mismatched input '{' expecting RETURNS
> The previous message is more meaningful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2044) Patten match bug in org.apache.pig.newplan.optimizer.Rule

2011-05-10 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2044:


Fix Version/s: (was: 0.9.0)
   0.10

Unlink to 0.9. It is a potential bug, but currently we are not using this 
ability.

> Patten match bug in org.apache.pig.newplan.optimizer.Rule
> -
>
> Key: PIG-2044
> URL: https://issues.apache.org/jira/browse/PIG-2044
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.10
>
>
> Koji find that we have a bug org.apache.pig.newplan.optimizer.Rule. The 
> "break" in line 179 seems to be wrong. This multiple branch matching is not 
> used in Pig, but could be a problem for the future. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2014) SAMPLE shouldn't be pushed up

2011-05-10 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031447#comment-13031447
 ] 

Daniel Dai commented on PIG-2014:
-

I think it is because in TestNewPlanFilterAboveForeach, we only invoke some of 
the rules. If you do an explain, you will see filter still pushed up.

> SAMPLE shouldn't be pushed up
> -
>
> Key: PIG-2014
> URL: https://issues.apache.org/jira/browse/PIG-2014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0, 0.10
>Reporter: Jacob Perkins
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.9.0
>
> Attachments: PIG-2014.patch
>
>
> Consider the following code:
> {code:none}
> tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, 
> weight:double);
> grouped   = GROUP tfidf_all BY doc_id;
> vectors   = FOREACH grouped GENERATE group AS doc_id, tfidf_all.(token, 
> weight) AS vector;
> DUMP vectors;
> {code}
> This, of course, runs just fine. In a real example, tfidf_all contains 
> 1,428,280 records. The reduce output records should be exactly the number of 
> documents, which turn out to be 18,863 in this case. All well and good.
> The strangeness comes when you add a SAMPLE command:
> {code:none}
> sampled = SAMPLE vectors 0.0012;
> DUMP sampled;
> {code}
> Running this results in 1,513 reduce output records. The reduce output 
> records be much much closer to 22 or 23 records (eg. 0.0012*18863).
> Evidently, Pig rewrites SAMPLE into filter, and then pushes that filter in 
> front of the group. It shouldn't push that filter  
> since the UDF is non-deterministic.  
> Quick fix: If you add "-t PushUpFilter" to your command line when invoking 
> pig this won't happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2037) Valid query fails to validate

2011-05-10 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2037:


Fix Version/s: (was: 0.9.0)

Xuefu find this test case because it is in a valid test case in old logical 
plan test suite: TestLogicalPlanBuilder. However, the script fail in all 0.6, 
0.7, 0.8. The reason the test case success is because we don't invoke validator 
in tests. 

Some generate thoughts about alias conflict: We shall defer dup-alias check as 
long as we don't have ambiguity. For example:

{code}
B = foreach A generate name, UDF(name) as name;
store B into '111';
{code}

This should be valid since there is no ambiguity. The following script should 
fail:

{code}
B = foreach A generate name, UDF(name) as name;
C = foreach B generate name;
{code}

Because "name" in C is ambiguous. 

Unlink this to 0.9 since it is not a regression.

> Valid query fails to validate
> -
>
> Key: PIG-2037
> URL: https://issues.apache.org/jira/browse/PIG-2037
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Daniel Dai
>
> The following test case seems valid, but it fails to validate in 0.9.
> a = load 'st10k' as (name, age, gpa);
> b = group a by name;
> c = foreach b generate flatten(a);
> d = filter c by name != 'fred';
> e = group d by name;\n"  +
> f = foreach e generate flatten(d);
> g = foreach f generate name, d::a::name, a::name;
> store g into 'output';"
> ERROR 1108:  Duplicate schema alias: d::a::name

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2014) SAMPLE shouldn't be pushed up

2011-05-10 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031439#comment-13031439
 ] 

Dmitriy V. Ryaboy commented on PIG-2014:


I'll add to the other rules -- but for the record, I looked at the plan and saw 
the sample not being pushed up after my patch :-).

> SAMPLE shouldn't be pushed up
> -
>
> Key: PIG-2014
> URL: https://issues.apache.org/jira/browse/PIG-2014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0, 0.10
>Reporter: Jacob Perkins
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.9.0
>
> Attachments: PIG-2014.patch
>
>
> Consider the following code:
> {code:none}
> tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, 
> weight:double);
> grouped   = GROUP tfidf_all BY doc_id;
> vectors   = FOREACH grouped GENERATE group AS doc_id, tfidf_all.(token, 
> weight) AS vector;
> DUMP vectors;
> {code}
> This, of course, runs just fine. In a real example, tfidf_all contains 
> 1,428,280 records. The reduce output records should be exactly the number of 
> documents, which turn out to be 18,863 in this case. All well and good.
> The strangeness comes when you add a SAMPLE command:
> {code:none}
> sampled = SAMPLE vectors 0.0012;
> DUMP sampled;
> {code}
> Running this results in 1,513 reduce output records. The reduce output 
> records be much much closer to 22 or 23 records (eg. 0.0012*18863).
> Evidently, Pig rewrites SAMPLE into filter, and then pushes that filter in 
> front of the group. It shouldn't push that filter  
> since the UDF is non-deterministic.  
> Quick fix: If you add "-t PushUpFilter" to your command line when invoking 
> pig this won't happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2014) SAMPLE shouldn't be pushed up

2011-05-10 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031422#comment-13031422
 ] 

Daniel Dai commented on PIG-2014:
-

I think this is more a bug fix, should go into 0.9. 

However, the script will trigger rule FilterAboveForeach not PushUpFilter. So 
the patch does not fix the problem. The fix should go to all these rules: 
PushUpFilter, PushDownForEachFlatten, FilterAboveForeach.

> SAMPLE shouldn't be pushed up
> -
>
> Key: PIG-2014
> URL: https://issues.apache.org/jira/browse/PIG-2014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0, 0.10
>Reporter: Jacob Perkins
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.9.0
>
> Attachments: PIG-2014.patch
>
>
> Consider the following code:
> {code:none}
> tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, 
> weight:double);
> grouped   = GROUP tfidf_all BY doc_id;
> vectors   = FOREACH grouped GENERATE group AS doc_id, tfidf_all.(token, 
> weight) AS vector;
> DUMP vectors;
> {code}
> This, of course, runs just fine. In a real example, tfidf_all contains 
> 1,428,280 records. The reduce output records should be exactly the number of 
> documents, which turn out to be 18,863 in this case. All well and good.
> The strangeness comes when you add a SAMPLE command:
> {code:none}
> sampled = SAMPLE vectors 0.0012;
> DUMP sampled;
> {code}
> Running this results in 1,513 reduce output records. The reduce output 
> records be much much closer to 22 or 23 records (eg. 0.0012*18863).
> Evidently, Pig rewrites SAMPLE into filter, and then pushes that filter in 
> front of the group. It shouldn't push that filter  
> since the UDF is non-deterministic.  
> Quick fix: If you add "-t PushUpFilter" to your command line when invoking 
> pig this won't happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2058) Macro missing returns clause doesn't give a good error message

2011-05-10 Thread Xuefu Zhang (JIRA)
Macro missing returns clause doesn't give a good error message
--

 Key: PIG-2058
 URL: https://issues.apache.org/jira/browse/PIG-2058
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.9.0


For the following query:

define test( out1,out2 ){
   A  = load 'x' as (u:int, v:int);
   $B  = filter A by u < 3 and v <  20;
}

Pig gives the following error message: Syntax error,unexpected symbol at or 
near '{'

Previously, it gives: mismatched input '{' expecting RETURNS

The previous message is more meaningful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2020) Valid query fails to validate

2011-05-10 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031382#comment-13031382
 ] 

Xuefu Zhang commented on PIG-2020:
--

I reported it, and thus need to explain.

$0 should be cast to a bag because it's used as the input to a nested filter. 
$1 and $2 are referring to the columns in the bag. This usage seems reasonable: 
user doesn't have to explicitly cast $0 to a bag, as it's the case if the type 
were long or int.

If the load statement becomes: A = load 'x' as (b:{}), pig doesn't complain any 
more. The point is, Pig should detect the need of inserting a cast in this case.


> Valid query fails to validate
> -
>
> Key: PIG-2020
> URL: https://issues.apache.org/jira/browse/PIG-2020
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> The following query seems valid:
> A = load 'x';
> B = foreach A { T = filter $0 by $1 > $2; generate T; };
> Store B into 'y';
> However, the query fails due to validation error in 0.8:
> 2011-04-28 09:08:06,846 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1026: Attempt to fetch field 1 from schema of size 1
> Similar error is given in 0.9.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2020) Valid query fails to validate

2011-05-10 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031381#comment-13031381
 ] 

Olga Natkovich commented on PIG-2020:
-

I am not even sure why this is a valid script. Can somebody explain?

> Valid query fails to validate
> -
>
> Key: PIG-2020
> URL: https://issues.apache.org/jira/browse/PIG-2020
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> The following query seems valid:
> A = load 'x';
> B = foreach A { T = filter $0 by $1 > $2; generate T; };
> Store B into 'y';
> However, the query fails due to validation error in 0.8:
> 2011-04-28 09:08:06,846 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1026: Attempt to fetch field 1 from schema of size 1
> Similar error is given in 0.9.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2021) Parser error while referring a map nested foreach

2011-05-10 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-2021.
-

Resolution: Fixed

Vivek, please, re-open if the issue still happens with the latest pig 0.9 code.

> Parser error while referring a map nested foreach
> -
>
> Key: PIG-2021
> URL: https://issues.apache.org/jira/browse/PIG-2021
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Vivek Padmanabhan
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> The below script is throwing parser errors
> {code}
> register string.jar;
> A = load 'test1'  using MapLoader() as ( s, m, l );   
> B = foreach A generate *, string.URLPARSE((chararray) s#'url') as parsedurl;
> C = foreach B {
>   urlpath = (chararray) parsedurl#'path';
>   lc_urlpath = string.TOLOWERCASE((chararray) urlpath);
>   generate *;
> };
> {code}
> Error message;
> | Failed to generate logical plan.
> |Nested exception: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 
> 2225: Projection with nothing to reference!
> PIG-2002 reports a similar issue, but when i tried with the patch of PIG-2002 
> i was getting the below exception;
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200:  11, column 33>  mismatched input '(' expecting SEMI_COLON

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2020) Valid query fails to validate

2011-05-10 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-2020:
-

Fix Version/s: (was: 0.9.0)

Since this is there even in 0.8, there is no rush to fix it in 0.9. Thus, I 
suggest we push this forward.

> Valid query fails to validate
> -
>
> Key: PIG-2020
> URL: https://issues.apache.org/jira/browse/PIG-2020
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> The following query seems valid:
> A = load 'x';
> B = foreach A { T = filter $0 by $1 > $2; generate T; };
> Store B into 'y';
> However, the query fails due to validation error in 0.8:
> 2011-04-28 09:08:06,846 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1026: Attempt to fetch field 1 from schema of size 1
> Similar error is given in 0.9.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2056) Jython error messages should show script name

2011-05-10 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2056:
--

Attachment: PIG-2056.patch

> Jython error messages should show script name
> -
>
> Key: PIG-2056
> URL: https://issues.apache.org/jira/browse/PIG-2056
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: PIG-2056.patch
>
>
> Instead of messages like
> {code}
> Traceback (most recent call last):
>   File "", line 12, in 
> {code}
> It should display the script file name:
> {code}
> Traceback (most recent call last):
>   File "test.py", line 12, in 
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2030) Merged join/cogroup does not automatically ship loader

2011-05-10 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-2030.
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to both trunk and 0.9 branch.

> Merged join/cogroup does not automatically ship loader
> --
>
> Key: PIG-2030
> URL: https://issues.apache.org/jira/browse/PIG-2030
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-2030-1.patch, PIG-2030-2.patch
>
>
> The following script fail due to TableLoader class not found (If the jar is 
> in classpath):
> {code}
> a = load '/user/pig/tests/data/zebra/singlefile/studentsortedtab10k' using 
> org.apache.hadoop.zebra.pig.TableLoader('', 'sorted');
> b = load '/user/pig/tests/data/zebra/singlefile/votersortedtab10k' using 
> org.apache.hadoop.zebra.pig.TableLoader('', 'sorted');
> g = cogroup a by $0, b by $0 using 'merge';
> store g into '/user/pig/out/jianyong.1304374720/ZebraMapCogrp_1.out';
> {code}
> If we use register, the error goes away. However, Pig always ship jars 
> containing LoadFunc automatically. It should be the same for merged 
> cogroup/join.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2021) Parser error while referring a map nested foreach

2011-05-10 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031368#comment-13031368
 ] 

Xuefu Zhang commented on PIG-2021:
--

Hi Vivek,

This morning we found that there was a little disparity between 0.9 and trunk 
regarding this fix. Yes, you would still have this problem in 0.9, but with the 
latest checkin, the problem should have been addressed. Let me know if you 
found that this is not the case.

--Xuefu

> Parser error while referring a map nested foreach
> -
>
> Key: PIG-2021
> URL: https://issues.apache.org/jira/browse/PIG-2021
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Vivek Padmanabhan
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> The below script is throwing parser errors
> {code}
> register string.jar;
> A = load 'test1'  using MapLoader() as ( s, m, l );   
> B = foreach A generate *, string.URLPARSE((chararray) s#'url') as parsedurl;
> C = foreach B {
>   urlpath = (chararray) parsedurl#'path';
>   lc_urlpath = string.TOLOWERCASE((chararray) urlpath);
>   generate *;
> };
> {code}
> Error message;
> | Failed to generate logical plan.
> |Nested exception: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 
> 2225: Projection with nothing to reference!
> PIG-2002 reports a similar issue, but when i tried with the patch of PIG-2002 
> i was getting the below exception;
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200:  11, column 33>  mismatched input '(' expecting SEMI_COLON

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2030) Merged join/cogroup does not automatically ship loader

2011-05-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031358#comment-13031358
 ] 

Ashutosh Chauhan commented on PIG-2030:
---

+1.

For documentation purposes: After this user need not to register their jars 
(for loadfuncs/udfs) if they are already in the classpath. But if they have 
some dependency on which their supplied loadfuncs/udfs depend, they need to 
register that dependecy. Else, they can bundle all their dependencies in one 
jar and put it in the classpath.  

> Merged join/cogroup does not automatically ship loader
> --
>
> Key: PIG-2030
> URL: https://issues.apache.org/jira/browse/PIG-2030
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-2030-1.patch, PIG-2030-2.patch
>
>
> The following script fail due to TableLoader class not found (If the jar is 
> in classpath):
> {code}
> a = load '/user/pig/tests/data/zebra/singlefile/studentsortedtab10k' using 
> org.apache.hadoop.zebra.pig.TableLoader('', 'sorted');
> b = load '/user/pig/tests/data/zebra/singlefile/votersortedtab10k' using 
> org.apache.hadoop.zebra.pig.TableLoader('', 'sorted');
> g = cogroup a by $0, b by $0 using 'merge';
> store g into '/user/pig/out/jianyong.1304374720/ZebraMapCogrp_1.out';
> {code}
> If we use register, the error goes away. However, Pig always ship jars 
> containing LoadFunc automatically. It should be the same for merged 
> cogroup/join.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2039) IndexOutOfBounException for a case

2011-05-10 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved PIG-2039.
--

Resolution: Fixed

Patch PIG-2039.patch is committed into both trunk and 0.9.0.

> IndexOutOfBounException for a case
> --
>
> Key: PIG-2039
> URL: https://issues.apache.org/jira/browse/PIG-2039
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: PIG-2039.patch
>
>
> The following query gives an exception:
> a = load '1.txt' as (a0:int, a1:int, a2:int);
> b = group a by a0;
> c = foreach b { c1 = limit a 10; c2 =  distinct c1.a1; c3 = distinct c1.a2; 
> generate c2, c3;};
> store c into 'output';
> 2011-05-04 12:36:01,720 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2999: Unexpected internal error. Index: 0, Size: 0
> Stack trace:
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> at java.util.ArrayList.get(ArrayList.java:322)
> at 
> org.apache.pig.newplan.logical.expression.ProjectExpression.getFieldSchema(ProjectExpression.java:279)
> at 
> org.apache.pig.newplan.logical.relational.LOGenerate.getSchema(LOGenerate.java:88)
> at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:60)
> at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:104)
> at 
> org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:240)
> at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:99)
> at 
> org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:73)
> at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> at org.apache.pig.PigServer$Graph.compile(PigServer.java:1664)
> at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1615)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1586)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:580)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:930)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:176)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:152)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> at org.apache.pig.Main.run(Main.java:488)
> at org.apache.pig.Main.main(Main.java:109)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2007) Parsing error when map key referred directly from udf in nested foreach

2011-05-10 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031348#comment-13031348
 ] 

Xuefu Zhang commented on PIG-2007:
--

PIG-2007-2.patch is committed to 0.9.0.

> Parsing error when map key referred directly from udf in nested foreach 
> 
>
> Key: PIG-2007
> URL: https://issues.apache.org/jira/browse/PIG-2007
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Anitha Raju
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: PIG-2007-2.patch, PIG-2007.patch
>
>
> The below script when executed with version 0.9 fails with parsing error.
> {code}
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
>  mismatched input '{' expecting GENERATE
> {code}
> Script1
> {code}
> register myudf.jar;
> A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
> B1 = foreach A {
> C = test.TOMAP('key1',$1)#'key1';
> generate C as C;
> }
> {code}
> The above happens when, in a nested foreach i refer to a map key directly 
> from a udf result
> The same would work if one executes without the nested foreach.
> {code}
> register myudf.jar;
> A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
> B1 = foreach A generate test.TOMAP('key1',$1)#'key1';
> dump B1;
> {code}
> Script1 works well with 0.8.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2057) udf having project-star/project-range as argument has single tuple in argument schema

2011-05-10 Thread Thejas M Nair (JIRA)
udf having project-star/project-range as argument has single tuple in argument 
schema
-

 Key: PIG-2057
 URL: https://issues.apache.org/jira/browse/PIG-2057
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0, 0.7.0, 0.9.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair


When a udf has a project-star(*) or project-range-to-end (eg- $3 ..)as the 
argument, then to find the appropriate matching udf class, pig type checker 
(TypeCheckingRelVisitor) creates a UDF input schema that has a single tuple as 
the argument. But at runtime, the udf will actually get an input that has the 
expanded list of columns - not a tuple containing a single tuple as indicated 
by schema used in typechecking.

The patch in PIG-1938 has fix for the case where input schema is present, as it 
expands the project-star or project-range in that case. Project-range is 
expanded even in input schema is not present, if it is not a project-to-end, as 
the number of columns present in such cases is known.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1938) support project-range as udf argument

2011-05-10 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031340#comment-13031340
 ] 

Thejas M Nair commented on PIG-1938:


This patch also expands project-star and project-range expression within udf, 
in the query plan generation phase. The expanded argument schema for the udf 
gets used in typechecker. Earlier, there was an inconsistency in the behavior 
when project-star was used in the udf, the typechecker would see the udf having 
a single argument of type tuple,  but at runtime the udf would get multiple 
arguments.
This inconsistency has not been resolved for the case when schema of input 
relation is null. I will open another jira to address that.


> support project-range as udf argument
> -
>
> Key: PIG-1938
> URL: https://issues.apache.org/jira/browse/PIG-1938
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1938.1.patch
>
>
> With changes in PIG-1693, project-range ('..') is supported in all use cases 
> where '*' (project-star) is supported, except as udf argument. 
> To be consistent with usage of project-star, project-range should be 
> supported as udf argument as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2055) inconsistentcy behavior in parser generated during build

2011-05-10 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031332#comment-13031332
 ] 

Thejas M Nair commented on PIG-2055:


bq. I see that before. Once I do "ant clean", the message go away.
I have seen it even after doing 'ant clean', so it does not seem to be caused 
by unclean build, but by some non deterministic code generation in antlr. 


> inconsistentcy behavior in parser generated during build 
> -
>
> Key: PIG-2055
> URL: https://issues.apache.org/jira/browse/PIG-2055
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>
> On certain builds, i see that pig fails to support this syntax -
> {code}
> grunt> l = load 'x' using PigStorage(':');   
> 2011-05-10 09:21:41,565 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1200:   mismatched input '(' expecting SEMI_COLON
> Details at logfile: /Users/tejas/pig_trunk_cp/trunk/pig_1305044484712.log
> {code}
> I seem to be the only one who has seen this behavior, and I have seen on 
> occassion when I build on mac. It could be problem with antlr and apple jvm 
> interaction. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2007) Parsing error when map key referred directly from udf in nested foreach

2011-05-10 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031322#comment-13031322
 ] 

Thejas M Nair commented on PIG-2007:


+1 for PIG-2007-2.patch

> Parsing error when map key referred directly from udf in nested foreach 
> 
>
> Key: PIG-2007
> URL: https://issues.apache.org/jira/browse/PIG-2007
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Anitha Raju
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: PIG-2007-2.patch, PIG-2007.patch
>
>
> The below script when executed with version 0.9 fails with parsing error.
> {code}
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
>  mismatched input '{' expecting GENERATE
> {code}
> Script1
> {code}
> register myudf.jar;
> A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
> B1 = foreach A {
> C = test.TOMAP('key1',$1)#'key1';
> generate C as C;
> }
> {code}
> The above happens when, in a nested foreach i refer to a map key directly 
> from a udf result
> The same would work if one executes without the nested foreach.
> {code}
> register myudf.jar;
> A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
> B1 = foreach A generate test.TOMAP('key1',$1)#'key1';
> dump B1;
> {code}
> Script1 works well with 0.8.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2052) Ship guava.jar to backend

2011-05-10 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-2052.
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to trunk and 0.9 branch.

> Ship guava.jar to backend
> -
>
> Key: PIG-2052
> URL: https://issues.apache.org/jira/browse/PIG-2052
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Daniel Dai
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.9.0
>
> Attachments: PIG-2052-1.patch
>
>
> We need to ship guava.jar to backend. GenericInvoker is using it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2055) inconsistentcy behavior in parser generated during build

2011-05-10 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031321#comment-13031321
 ] 

Daniel Dai commented on PIG-2055:
-

I see that before. Once I do "ant clean", the message go away.

> inconsistentcy behavior in parser generated during build 
> -
>
> Key: PIG-2055
> URL: https://issues.apache.org/jira/browse/PIG-2055
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>
> On certain builds, i see that pig fails to support this syntax -
> {code}
> grunt> l = load 'x' using PigStorage(':');   
> 2011-05-10 09:21:41,565 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1200:   mismatched input '(' expecting SEMI_COLON
> Details at logfile: /Users/tejas/pig_trunk_cp/trunk/pig_1305044484712.log
> {code}
> I seem to be the only one who has seen this behavior, and I have seen on 
> occassion when I build on mac. It could be problem with antlr and apple jvm 
> interaction. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1983) Clarify requiredFieldList in LoadPushDown.pushProjection is read only

2011-05-10 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031319#comment-13031319
 ] 

Daniel Dai commented on PIG-1983:
-

Done javadoc change.

> Clarify requiredFieldList in LoadPushDown.pushProjection is read only
> -
>
> Key: PIG-1983
> URL: https://issues.apache.org/jira/browse/PIG-1983
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Corinne Chandel
>Priority: Minor
> Fix For: 0.9.0
>
>
> In Pig UDF manual, LoadPushDown.pushProjection(), add a clarification that 
> requiredFieldRequest is read only, cannot be changed by LoadFunc

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2007) Parsing error when map key referred directly from udf in nested foreach

2011-05-10 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-2007:
-

Attachment: PIG-2007-2.patch

Patch PIG-2007-2.patch is what was actually committed to the trunk. This needs 
to be for 0.9 as well.

> Parsing error when map key referred directly from udf in nested foreach 
> 
>
> Key: PIG-2007
> URL: https://issues.apache.org/jira/browse/PIG-2007
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Anitha Raju
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: PIG-2007-2.patch, PIG-2007.patch
>
>
> The below script when executed with version 0.9 fails with parsing error.
> {code}
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
>  mismatched input '{' expecting GENERATE
> {code}
> Script1
> {code}
> register myudf.jar;
> A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
> B1 = foreach A {
> C = test.TOMAP('key1',$1)#'key1';
> generate C as C;
> }
> {code}
> The above happens when, in a nested foreach i refer to a map key directly 
> from a udf result
> The same would work if one executes without the nested foreach.
> {code}
> register myudf.jar;
> A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
> B1 = foreach A generate test.TOMAP('key1',$1)#'key1';
> dump B1;
> {code}
> Script1 works well with 0.8.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2056) Jython error messages should show script name

2011-05-10 Thread Richard Ding (JIRA)
Jython error messages should show script name
-

 Key: PIG-2056
 URL: https://issues.apache.org/jira/browse/PIG-2056
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Fix For: 0.9.0


Instead of messages like

{code}
Traceback (most recent call last):
  File "", line 12, in 
{code}

It should display the script file name:

{code}
Traceback (most recent call last):
  File "test.py", line 12, in 
{code}



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2039) IndexOutOfBounException for a case

2011-05-10 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031307#comment-13031307
 ] 

Daniel Dai commented on PIG-2039:
-

+1

> IndexOutOfBounException for a case
> --
>
> Key: PIG-2039
> URL: https://issues.apache.org/jira/browse/PIG-2039
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: PIG-2039.patch
>
>
> The following query gives an exception:
> a = load '1.txt' as (a0:int, a1:int, a2:int);
> b = group a by a0;
> c = foreach b { c1 = limit a 10; c2 =  distinct c1.a1; c3 = distinct c1.a2; 
> generate c2, c3;};
> store c into 'output';
> 2011-05-04 12:36:01,720 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2999: Unexpected internal error. Index: 0, Size: 0
> Stack trace:
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> at java.util.ArrayList.get(ArrayList.java:322)
> at 
> org.apache.pig.newplan.logical.expression.ProjectExpression.getFieldSchema(ProjectExpression.java:279)
> at 
> org.apache.pig.newplan.logical.relational.LOGenerate.getSchema(LOGenerate.java:88)
> at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:60)
> at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:104)
> at 
> org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:240)
> at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:99)
> at 
> org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:73)
> at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> at org.apache.pig.PigServer$Graph.compile(PigServer.java:1664)
> at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1615)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1586)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:580)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:930)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:176)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:152)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> at org.apache.pig.Main.run(Main.java:488)
> at org.apache.pig.Main.main(Main.java:109)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1824) Support import modules in Jython UDF

2011-05-10 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031274#comment-13031274
 ] 

Alan Gates commented on PIG-1824:
-

I'll start running the tests and such.  I also want to add some end to end 
tests.

> Support import modules in Jython UDF
> 
>
> Key: PIG-1824
> URL: https://issues.apache.org/jira/browse/PIG-1824
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Richard Ding
>Assignee: Woody Anderson
> Fix For: 0.10
>
> Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch, 
> 1824d.patch
>
>
> Currently, Jython UDF script doesn't support Jython import statement as in 
> the following example:
> {code}
> #!/usr/bin/python
> import re
> @outputSchema("word:chararray")
> def resplit(content, regex, index):
> return re.compile(regex).split(content)[index]
> {code}
> Can Pig automatically locate the Jython module file and ship it to the 
> backend? Or should we add a ship clause to let user explicitly specify the 
> module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1994) e2e test harness deployment implementation for existing cluster

2011-05-10 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1994:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked in.

> e2e test harness deployment implementation for existing cluster
> ---
>
> Key: PIG-1994
> URL: https://issues.apache.org/jira/browse/PIG-1994
> Project: Pig
>  Issue Type: Sub-task
>  Components: tools
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.10
>
> Attachments: PIG-1994.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2055) inconsistentcy behavior in parser generated during build

2011-05-10 Thread Thejas M Nair (JIRA)
inconsistentcy behavior in parser generated during build 
-

 Key: PIG-2055
 URL: https://issues.apache.org/jira/browse/PIG-2055
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Thejas M Nair


On certain builds, i see that pig fails to support this syntax -

{code}
grunt> l = load 'x' using PigStorage(':');   
2011-05-10 09:21:41,565 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1200:   mismatched input '(' expecting SEMI_COLON
Details at logfile: /Users/tejas/pig_trunk_cp/trunk/pig_1305044484712.log

{code}

I seem to be the only one who has seen this behavior, and I have seen on 
occassion when I build on mac. It could be problem with antlr and apple jvm 
interaction. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1938) support project-range as udf argument

2011-05-10 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1938:
---

Attachment: PIG-1938.1.patch

PIG-1938.1.patch - passes unit tests. test-patch showed additional javac 
warning, but that was from the code generated from antlr.


> support project-range as udf argument
> -
>
> Key: PIG-1938
> URL: https://issues.apache.org/jira/browse/PIG-1938
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1938.1.patch
>
>
> With changes in PIG-1693, project-range ('..') is supported in all use cases 
> where '*' (project-star) is supported, except as udf argument. 
> To be consistent with usage of project-star, project-range should be 
> supported as udf argument as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2021) Parser error while referring a map nested foreach

2011-05-10 Thread Vivek Padmanabhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031114#comment-13031114
 ] 

Vivek Padmanabhan commented on PIG-2021:


Hi Xuefu,
 Extremely sorry about that.I was just trying to remove the dependencies.Please 
check whether the below is a valid case;

{code}
register mymapudf.jar;
A = load 'temp' as ( s, m, l );
B = foreach A generate *, org.vivek.udfs.mToMapUDF((chararray) s) as mapout;
C = foreach B {
  urlpath = (chararray) mapout#'k1';
  lc_urlpath = org.vivek.udfs.LOWER((chararray) urlpath);
  generate urlpath,lc_urlpath;
};
{code}


Source for org.vivek.udfs.mToMapUDF
{code}
package org.vivek.udfs;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.logicalLayer.schema.Schema;
public class mToMapUDF  extends EvalFunc {
public Map exec(Tuple arg0) throws IOException {
Map  myMapTResult =  new HashMap();
myMapTResult.put("k1", "SomeString");
myMapTResult.put("k3", "SomeOtherString");
return myMapTResult;
}
public Schema outputSchema(Schema input) {
return new Schema(new 
Schema.FieldSchema("mapout",DataType.MAP));
}
}
{code}



Source for org.vivek.udfs.LOWER
{code}
package org.vivek.udfs;
import java.io.IOException;
import java.util.List;
import java.util.ArrayList;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.DataType;
import org.apache.pig.impl.logicalLayer.schema.Schema;
import org.apache.pig.impl.logicalLayer.FrontendException;
import org.apache.pig.FuncSpec;
public class LOWER extends EvalFunc {
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
try {
String str = (String)input.get(0);
return str.toLowerCase();
} catch(Exception e){
return null;
}
}
public Schema outputSchema(Schema input) {
return new Schema(new 
Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), 
input), DataType.CHARARRAY));
}
 public List getArgToFuncMapping() throws FrontendException {
List funcList = new ArrayList();
funcList.add(new FuncSpec(this.getClass().getName(), new Schema(new 
Schema.FieldSchema(null, DataType.CHARARRAY;
return funcList;
 }
}
{code}



> Parser error while referring a map nested foreach
> -
>
> Key: PIG-2021
> URL: https://issues.apache.org/jira/browse/PIG-2021
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Vivek Padmanabhan
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> The below script is throwing parser errors
> {code}
> register string.jar;
> A = load 'test1'  using MapLoader() as ( s, m, l );   
> B = foreach A generate *, string.URLPARSE((chararray) s#'url') as parsedurl;
> C = foreach B {
>   urlpath = (chararray) parsedurl#'path';
>   lc_urlpath = string.TOLOWERCASE((chararray) urlpath);
>   generate *;
> };
> {code}
> Error message;
> | Failed to generate logical plan.
> |Nested exception: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 
> 2225: Projection with nothing to reference!
> PIG-2002 reports a similar issue, but when i tried with the patch of PIG-2002 
> i was getting the below exception;
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200:  11, column 33>  mismatched input '(' expecting SEMI_COLON

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1983) Clarify requiredFieldList in LoadPushDown.pushProjection is read only

2011-05-10 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031074#comment-13031074
 ] 

Daniel Dai commented on PIG-1983:
-

Corinne's change is not in javadoc, it's in UDF manual. Yeah, I need to change 
javadoc as well.

> Clarify requiredFieldList in LoadPushDown.pushProjection is read only
> -
>
> Key: PIG-1983
> URL: https://issues.apache.org/jira/browse/PIG-1983
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Corinne Chandel
>Priority: Minor
> Fix For: 0.9.0
>
>
> In Pig UDF manual, LoadPushDown.pushProjection(), add a clarification that 
> requiredFieldRequest is read only, cannot be changed by LoadFunc

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage

2011-05-10 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031068#comment-13031068
 ] 

Dmitriy V. Ryaboy commented on PIG-1825:


Cool. At this point I don't think we need testStoreToHBase_2_no_WAL() ?

HBase itself doesn't actually test noWAL directly. I'm ok with not testing the 
full path, just testing that we are using the HBase api correctly.

I do almost want to make it "-noSafety" just to be clear about what one is 
doing when invoking this "optimization"


> ability to turn off the write ahead log for pig's HBaseStorage
> --
>
> Key: PIG-1825
> URL: https://issues.apache.org/jira/browse/PIG-1825
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Corbin Hoenes
>Priority: Minor
> Attachments: HBaseStorage_noWAL.patch, PIG-1825_1.patch, 
> PIG-1825_2.patch
>
>
> Added an option to allow a caller of HBaseStorage to turn off the 
> WriteAheadLog feature while doing bulk loads into hbase.
> From the performance tuning wikipage: 
> http://wiki.apache.org/hadoop/PerformanceTuning
> "To speed up the inserts in a non critical job (like an import job), you can 
> use Put.writeToWAL(false) to bypass writing to the write ahead log."
> We've tested this on HBase 0.20.6 and it helps dramatically.  
> The -noWAL options is passed in just like other options for hbase storage:
> STORE myalias INTO 'MyTable' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 
> mycolumnfamily:field2','-noWAL');
> This would be my first patch so please educate me with any steps I need to 
> do.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage

2011-05-10 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy reassigned PIG-1825:
--

Assignee: Bill Graham

> ability to turn off the write ahead log for pig's HBaseStorage
> --
>
> Key: PIG-1825
> URL: https://issues.apache.org/jira/browse/PIG-1825
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Corbin Hoenes
>Assignee: Bill Graham
>Priority: Minor
> Attachments: HBaseStorage_noWAL.patch, PIG-1825_1.patch, 
> PIG-1825_2.patch
>
>
> Added an option to allow a caller of HBaseStorage to turn off the 
> WriteAheadLog feature while doing bulk loads into hbase.
> From the performance tuning wikipage: 
> http://wiki.apache.org/hadoop/PerformanceTuning
> "To speed up the inserts in a non critical job (like an import job), you can 
> use Put.writeToWAL(false) to bypass writing to the write ahead log."
> We've tested this on HBase 0.20.6 and it helps dramatically.  
> The -noWAL options is passed in just like other options for hbase storage:
> STORE myalias INTO 'MyTable' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 
> mycolumnfamily:field2','-noWAL');
> This would be my first patch so please educate me with any steps I need to 
> do.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone

2011-05-10 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031065#comment-13031065
 ] 

Dmitriy V. Ryaboy commented on PIG-1946:


This is for column families: 
http://hbase.apache.org/xref/org/apache/hadoop/hbase/HColumnDescriptor.html#278 
(no slashes, colons, or ISOControl chars, and no starting with ".").  I believe 
columns are similar.

> HBaseStorage constructor syntax is error prone
> --
>
> Key: PIG-1946
> URL: https://issues.apache.org/jira/browse/PIG-1946
> Project: Pig
>  Issue Type: Improvement
>Reporter: Bill Graham
>Assignee: Bill Graham
> Fix For: 0.10
>
> Attachments: PIG-1946_1.patch
>
>
> Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it 
> will yield unexpected results:
> {code}
> STORE result INTO 'hbase://foo' USING
>  org.apache.pig.backend.hadoop.hbase.HBaseStorage(
>  'info:first_name, info:last_name');
> {code}
> The problem us that a column named {{info:first_name,}} will be created, with 
> the trailing comma included. I've had numerous developers get tripped up on 
> this issue since everywhere else in Pig variables are separated by commas, so 
> I propose we fix it.
> I propose we trim leading/trailing commas from column names, but I'm open to 
> other ideas.
> Also should we accept column names that are comman-delimited without spaces?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Closed] (PIG-1778) Some dependencies not packaged with Pig 0.8 release

2011-05-10 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy closed PIG-1778.
--


> Some dependencies not packaged with Pig 0.8 release
> ---
>
> Key: PIG-1778
> URL: https://issues.apache.org/jira/browse/PIG-1778
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Dmitriy V. Ryaboy
>
> Some of the libraries required for new Pig features are not included in the 
> built tarball of 0.8 release:
> guava, required for HBaseStorage
> jython, required for Jython UDFs
> We should discuss how to properly package these dependencies.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-1778) Some dependencies not packaged with Pig 0.8 release

2011-05-10 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy resolved PIG-1778.


Resolution: Duplicate

resolving as duplicate.

> Some dependencies not packaged with Pig 0.8 release
> ---
>
> Key: PIG-1778
> URL: https://issues.apache.org/jira/browse/PIG-1778
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Dmitriy V. Ryaboy
>
> Some of the libraries required for new Pig features are not included in the 
> built tarball of 0.8 release:
> guava, required for HBaseStorage
> jython, required for Jython UDFs
> We should discuss how to properly package these dependencies.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira