Jenkins build is back to normal : Pig-trunk-commit #765

2011-04-25 Thread Apache Jenkins Server
See 




[jira] [Commented] (PIG-2007) Parsing error when map key referred directly from udf in nested foreach

2011-04-25 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025075#comment-13025075
 ] 

Xuefu Zhang commented on PIG-2007:
--

Patch is committed to 0.9 as well.

> Parsing error when map key referred directly from udf in nested foreach 
> 
>
> Key: PIG-2007
> URL: https://issues.apache.org/jira/browse/PIG-2007
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Anitha Raju
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: PIG-2007.patch
>
>
> The below script when executed with version 0.9 fails with parsing error.
> {code}
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
>  mismatched input '{' expecting GENERATE
> {code}
> Script1
> {code}
> register myudf.jar;
> A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
> B1 = foreach A {
> C = test.TOMAP('key1',$1)#'key1';
> generate C as C;
> }
> {code}
> The above happens when, in a nested foreach i refer to a map key directly 
> from a udf result
> The same would work if one executes without the nested foreach.
> {code}
> register myudf.jar;
> A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
> B1 = foreach A generate test.TOMAP('key1',$1)#'key1';
> dump B1;
> {code}
> Script1 works well with 0.8.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2003) Using keyward as alias doesn't either emit an error or produce a logical plan.

2011-04-25 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved PIG-2003.
--

Resolution: Fixed

> Using keyward as alias doesn't either emit an error or produce a logical plan.
> --
>
> Key: PIG-2003
> URL: https://issues.apache.org/jira/browse/PIG-2003
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: PIG-2003.patch
>
>
> The following is the symptom:
> grunt> ship = load 'x';
> grunt> describe ship;
> 2011-04-19 13:52:52,809 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1005: No plan for ship to describe
> The correct behavior is to give an error.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2003) Using keyward as alias doesn't either emit an error or produce a logical plan.

2011-04-25 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025074#comment-13025074
 ] 

Xuefu Zhang commented on PIG-2003:
--

unit test passed. patch is committed to both trunk and 0.9.

> Using keyward as alias doesn't either emit an error or produce a logical plan.
> --
>
> Key: PIG-2003
> URL: https://issues.apache.org/jira/browse/PIG-2003
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: PIG-2003.patch
>
>
> The following is the symptom:
> grunt> ship = load 'x';
> grunt> describe ship;
> 2011-04-19 13:52:52,809 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1005: No plan for ship to describe
> The correct behavior is to give an error.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2004) Incorrect input types passed on to eval function

2011-04-25 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2004:
---

Attachment: PIG-2004.1.patch

PIG-2004.1.patch
- Reset fieldschema of all expressions from TypeCheckingExpVisitor constructor, 
instead of doing it in each visit function.
- Reset target fieldschema in CastExpression, copied LHS fieldschema in 
BinCondExpression so that uid of inner schema is not re-used.
- Fixed a NPE in LogicalSchema that was seen in test cases after this issue was 
fixed.

> Incorrect input types passed on to eval function
> 
>
> Key: PIG-2004
> URL: https://issues.apache.org/jira/browse/PIG-2004
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Vivek Padmanabhan
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-2004-0.patch, PIG-2004.1.patch
>
>
> The below script fails by throwing a ClassCastException from the MAX udf. The 
> udf expects the value of the bag supplied to be databyte array, but at run 
> time the udf gets the actual type, ie Double in this case.  This causes the 
> script execution to fail with exception;
> | Caused by: java.lang.ClassCastException: java.lang.Double cannot be cast to 
> org.apache.pig.data.DataByteArray
> The same script runs properly with Pig 0.8.
> {code}
> A = LOAD 'myinput' as (f1,f2,f3);
> B = foreach A generate f1,f2+f3/1000.0 as doub;
> C = group B by f1;
> D = foreach C generate (long)(MAX(B.doub)) as f4;
> dump D;
> {code}
> myinput
> ---
> a   100012345
> b   200023456
> c   300034567
> a   150054321
> b   250065432

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1989) complex type casting should return null on casting failure

2011-04-25 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025018#comment-13025018
 ] 

Thejas M Nair commented on PIG-1989:


The problem is seen in this case -
{code}
input:
(a,b,3)

a = load 'inp.txt' as (t:tuple(i0, i1, i2));  
b = foreach a generate (tuple(tuple(int)))$0;
dump b;


We get: ((,b,3)), instead of ()
{code}

> complex type casting should return null on casting failure 
> ---
>
> Key: PIG-1989
> URL: https://issues.apache.org/jira/browse/PIG-1989
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
>
> When casting fails for complex objects, pig is currently returning un-casted 
> object if the cast fails. 
> It should return null instead. That is consistent with the behavior when 
> casting to other basic types. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1826) Unexpected data type -1 found in stream error

2011-04-25 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1826:


Attachment: PIG-1826-1.patch

PIG-1826-1.patch fix the error message. It also piggyback a change in number of 
retries in hadoop. I decrease this number from 4 to 1 to accelerate the unit 
tests.

> Unexpected data type -1 found in stream error
> -
>
> Key: PIG-1826
> URL: https://issues.apache.org/jira/browse/PIG-1826
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
> Environment: This is pig 0.8.0 on a linux box
>Reporter: Jonathan Coveney
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1826-1.patch, PIG-1826.tar.gz, numgraph.java
>
>
> When running the attached udf I get the title error. By inserting printlns 
> extensively, the script is functioning properly and returning a DataBag, but 
> for whatever reason, pig does not detect it as such.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1999) Macro alias masker should consider schema context

2011-04-25 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1999:
--

Attachment: PIG-1999_1.patch

> Macro alias masker should consider schema context 
> --
>
> Key: PIG-1999
> URL: https://issues.apache.org/jira/browse/PIG-1999
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-1999_1.patch
>
>
> Macro alias masker doesn't consider the current schema context. This results 
> errors when deciding with alias to mask. Here is an example:
> {code}
> define toBytearray(in, intermediate) returns e { 
>a = load '$in' as (name:chararray, age:long, gpa: float);
>b = group a by  name;
>c = foreach b generate a, (1,2,3);
>store c into '$intermediate' using BinStorage();
>d = load '$intermediate' using BinStorage() as (b:bag{t:tuple(x,y,z)}, 
> t2:tuple(a,b,c));
>$e = foreach d generate COUNT(b), t2.a, t2.b, t2.c;
> };
>  
> f = toBytearray ('data', 'output1');
> {code} 
> Now the alias masker mistakes b in COUNT(b) as an alias instead of b in the 
> current schema.
> The workaround is to not use alias as as names in the schema definition. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1989) complex type casting should return null on casting failure

2011-04-25 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025011#comment-13025011
 ] 

Daniel Dai commented on PIG-1989:
-

I tried some test cases, seems currently we set the inner fields which cannot 
cast to null:

{code}
input:
(a,b,3)

a = load '1.txt' as (t:tuple(i0, i1, i2));
b = foreach a generate (tuple(int,int,int))t;

We get: ((,,3))
{code}

{code}
input:
{(a,1)}

a = load '1.txt' as (a0:bag{t:tuple(i0,i1)});
b = foreach a generate (bag{tuple(int,int)})a0;

We get: ({(,1)})
{code}

{code}
input:
[key#value]

a = load '1.txt' as (m:map[]);
b = foreach a generate (map[int])m;
dump b;

We get: ([key#])
{code}

Sounds Ok to me. Thejas, do you see something else, or you feel this is not 
proper?

> complex type casting should return null on casting failure 
> ---
>
> Key: PIG-1989
> URL: https://issues.apache.org/jira/browse/PIG-1989
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
>
> When casting fails for complex objects, pig is currently returning un-casted 
> object if the cast fails. 
> It should return null instead. That is consistent with the behavior when 
> casting to other basic types. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1824) Support import modules in Jython UDF

2011-04-25 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1824:


Attachment: 1824c.patch

> Support import modules in Jython UDF
> 
>
> Key: PIG-1824
> URL: https://issues.apache.org/jira/browse/PIG-1824
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Richard Ding
>Assignee: Woody Anderson
> Fix For: 0.9.0
>
> Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch
>
>
> Currently, Jython UDF script doesn't support Jython import statement as in 
> the following example:
> {code}
> #!/usr/bin/python
> import re
> @outputSchema("word:chararray")
> def resplit(content, regex, index):
> return re.compile(regex).split(content)[index]
> {code}
> Can Pig automatically locate the Jython module file and ship it to the 
> backend? Or should we add a ship clause to let user explicitly specify the 
> module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1824) Support import modules in Jython UDF

2011-04-25 Thread Woody Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025007#comment-13025007
 ] 

Woody Anderson commented on PIG-1824:
-

agree:

inre: PYTHON_CACHEDIR: the code behaves as you wish, in that it only deletes 
the dir if it (pig) created it.
sorry for not being being clear in comments about that, but if you read the 
code you'll see it.

if we can't write, i (pig) was creating an alternate directory. It may be 
possible to pre-populate this, and i understand (and had) the desire to have an 
error instead of a new directory, but I was initially experiencing this error:
{code}
*sys-package-mgr*: can't create package cache dir, 
'/grid/0/Releases/pig-0.8.0..1103222002-20110401-000/share/pig-0.8.0..1103222002/lib/cachedir/packages'
{code}

which is why i added the 'is writable' check, but after reviewing (per your 
comment), it seems that cachedir is not set on the grid (at least at the point 
when the static block runs). If left as null, it seems to default to some grid 
location that is not writable (and thus doesn't work), but if i set it to a 
writable tmp first, it works.
so.. i can safely agree that an error if the dir isn't writable is both 
desirable and works.

as for the getScriptAsStream():
i followed the existing code convention on that one, though i didn't like it 
either.
again, if you read down a bit you'll see that the impl of getScriptAsStream() 
is:
{code}
..
if (is == null) {
throw new IllegalStateException(
"Could not initialize interpreter (from file system or 
classpath) with " + scriptPath);
}  
return is;
{code}

so, the null check is superfluous but does quiet the "not null check" warnings.
i didn't add an additional throw statement in this case b/c essentially, my 
code wouldn't add any _new_ errors that the existing code didn't already 
exhibit if somehow the impl of getScriptAsStream changed and could return null.

anyway, ill upload a new patch to address the writable issue, if you think it's 
a big deal we can add an 'else throw' statement around getScriptAsStream

> Support import modules in Jython UDF
> 
>
> Key: PIG-1824
> URL: https://issues.apache.org/jira/browse/PIG-1824
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Richard Ding
>Assignee: Woody Anderson
> Fix For: 0.9.0
>
> Attachments: 1824.patch, 1824a.patch, 1824b.patch
>
>
> Currently, Jython UDF script doesn't support Jython import statement as in 
> the following example:
> {code}
> #!/usr/bin/python
> import re
> @outputSchema("word:chararray")
> def resplit(content, regex, index):
> return re.compile(regex).split(content)[index]
> {code}
> Can Pig automatically locate the Jython module file and ship it to the 
> backend? Or should we add a ship clause to let user explicitly specify the 
> module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1622) DEFINE streaming options are ill defined and not properly documented

2011-04-25 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025006#comment-13025006
 ] 

Thejas M Nair commented on PIG-1622:


+1

> DEFINE streaming options are ill defined and not properly documented
> 
>
> Key: PIG-1622
> URL: https://issues.apache.org/jira/browse/PIG-1622
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Alan Gates
>Assignee: Corinne Chandel
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: PIG-1622-1.patch, PIG-1622.patch
>
>
> According to the documentation 
> (http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#DEFINE) the 
> syntax for DEFINE when used to define a streaming command is:
> DEFINE cmd INPUT(stdin|path) OUTPUT(stdout|stderr|path) SHIP(path [, path, 
> ...]) CACHE (path [, path, ...])
> However, the actual parser accepts something pretty different.  Consider the 
> following script:
> {code}
> define strm `wc -l` INPUT(stdin) 
> CACHE('/Users/gates/.vimrc#myvim') 
> OUTPUT(stdin)
> INPUT('/tmp/fred') 
> OUTPUT('/tmp/bob')
> SHIP('/Users/gates/.bashrc') 
> SHIP('/Users/gates/.vimrc') 
> CACHE('/Users/gates/.bashrc#mybash')
> stderr('/tmp/errors' limit 10);
> A = load '/Users/gates/test/data/studenttab10';
> B = stream A through strm;
> dump B;
> {code}
> The above actually parsers.  I see several issues here:
> # What do multiple INPUT and OUTPUT statements mean in the context of 
> streaming?  These should not be allowed.
> # The documentation implies an order (INPUT, OUTPUT, SHIP, CACHE) that is not 
> enforced by the parser.  We should either enforce the order in the parser or 
> update the documentation.  Most likely the latter to avoid breaking existing 
> scripts.
> # Why are multiple SHIP and CACHE clauses allowed when each can take multiple 
> paths?  It seems we should only allow one of each.
> # The error clause is completely different that what is given in the 
> documentation.  I suspect this is a documentation error and the grammar 
> supported by the parser here is what we want.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2007) Parsing error when map key referred directly from udf in nested foreach

2011-04-25 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025002#comment-13025002
 ] 

Thejas M Nair commented on PIG-2007:


+1

> Parsing error when map key referred directly from udf in nested foreach 
> 
>
> Key: PIG-2007
> URL: https://issues.apache.org/jira/browse/PIG-2007
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Anitha Raju
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: PIG-2007.patch
>
>
> The below script when executed with version 0.9 fails with parsing error.
> {code}
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
>  mismatched input '{' expecting GENERATE
> {code}
> Script1
> {code}
> register myudf.jar;
> A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
> B1 = foreach A {
> C = test.TOMAP('key1',$1)#'key1';
> generate C as C;
> }
> {code}
> The above happens when, in a nested foreach i refer to a map key directly 
> from a udf result
> The same would work if one executes without the nested foreach.
> {code}
> register myudf.jar;
> A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
> B1 = foreach A generate test.TOMAP('key1',$1)#'key1';
> dump B1;
> {code}
> Script1 works well with 0.8.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2003) Using keyward as alias doesn't either emit an error or produce a logical plan.

2011-04-25 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024995#comment-13024995
 ] 

Thejas M Nair commented on PIG-2003:


+1 .
The new rule in this patch is logically the same old one, but the new one fixes 
the problem. This is probably caused by some bug in antlr (3.2).


> Using keyward as alias doesn't either emit an error or produce a logical plan.
> --
>
> Key: PIG-2003
> URL: https://issues.apache.org/jira/browse/PIG-2003
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: PIG-2003.patch
>
>
> The following is the symptom:
> grunt> ship = load 'x';
> grunt> describe ship;
> 2011-04-19 13:52:52,809 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1005: No plan for ship to describe
> The correct behavior is to give an error.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Congratulation Aniket Mokashi, Gianmarco De Francisci, Zhijie Shen, our Google Summer of Code students this year!

2011-04-25 Thread Daniel Dai

Pig developers,
We have 3 students accepted into Google Summer of Code program this 
year. They are:


Aniket Mokashi "Support to 2 level nested foreach" (PIG-1631), mentored 
by Ashutosh Chauhan
Gianmarco De Francisci Morales "Sugar for Pig" (PIG-1904, PIG-1387, 
PIG-1926), mentored by Thejas Nair
Zhijie Shen "Implementation of Nested Cross for Pig Latin" (PIG-1916), 
mentored by Daniel Dai


Project timeline is from 5/22--8/22. Please give them necessary help and 
wish them great success!


Daniel


Build failed in Jenkins: Pig-trunk-commit #764

2011-04-25 Thread Apache Jenkins Server
See 

Changes:

[daijy] PIG-1814: mapred.output.compress in SET statement does not work

[daijy] PIG-1981: LoadPushDown.pushProjection should pass alias in addition to 
position

[daijy] PIG-1976: One more TwoLevelAccess to remove

[daijy] PIG-1865: BinStorage/PigStorageSchema cannot load data from a different 
namenode

--
[...truncated 36517 lines...]
[junit] (10:LOLoad={name: chararray,details: (age: bytearray,gpa: 
bytearray),field3: (a: bytearray,b: bytearray)}==>15)
[junit] (15:LOForEach={name: chararray,details: (age: bytearray,gpa: 
bytearray),field3: (a: bytearray,b: bytearray)}==>TERMINAL)
[junit] 
[junit] 
[junit] Checking DONE!
[junit] testSUM1
[junit] 11/04/25 22:27:47 INFO executionengine.HExecutionEngine: Connecting 
to hadoop file system at: file:///
[junit] 11/04/25 22:27:47 INFO executionengine.HExecutionEngine: Connecting 
to hadoop file system at: file:///
[junit] 11/04/25 22:27:47 INFO executionengine.HExecutionEngine: Connecting 
to hadoop file system at: file:///
[junit] testSUM2
[junit] 11/04/25 22:27:47 INFO executionengine.HExecutionEngine: Connecting 
to hadoop file system at: file:///
[junit] 11/04/25 22:27:47 INFO executionengine.HExecutionEngine: Connecting 
to hadoop file system at: file:///
[junit] Actual plan after type check:
[junit] *Type Graph***
[junit] (0:LOLoad={null}==>3)
[junit] (3:LOCogroup={group: (bytearray,bytearray),0: {null}}==>13)
[junit] 
[junit] (1:LOProject=bytearray==>TERMINAL)
[junit] 
[junit] (2:LOProject=bytearray==>TERMINAL)
[junit] (13:LOForEach={bytearray,bytearray,double}==>TERMINAL)
[junit] 
[junit] (9:LOProject=(bytearray,bytearray)==>TERMINAL)
[junit] 
[junit] (11:LOProject=(null)==>12)
[junit] (12:LOProject=(bytearray)==>10)
[junit] (10:LOUserFunc=double==>TERMINAL)
[junit] 
[junit] testGenerate1
[junit] 11/04/25 22:27:47 INFO executionengine.HExecutionEngine: Connecting 
to hadoop file system at: file:///
[junit] 11/04/25 22:27:47 INFO executionengine.HExecutionEngine: Connecting 
to hadoop file system at: file:///
[junit] *MessageCollector dump***
[junit] Warning:int is implicitly cast to float under LOAdd Operator
[junit] Warning:long is implicitly cast to float under LOAdd Operator
[junit] Warning:int is implicitly cast to double under LOAdd Operator
[junit] Actual plan after type check:
[junit] *Type Graph***
[junit] (0:LOLoad={name: chararray,age: int,gpa: double}==>18)
[junit] (18:LOForEach={float,double}==>TERMINAL)
[junit] 
[junit] (12:LOConst=int==>19)
[junit] (19:LOCast=float==>10)
[junit] (13:LOConst=float==>10)
[junit] (10:LOAdd=float==>14)
[junit] (11:LOConst=long==>20)
[junit] (20:LOCast=float==>14)
[junit] (14:LOAdd=float==>TERMINAL)
[junit] 
[junit] (17:LOProject=double==>16)
[junit] (15:LOConst=int==>21)
[junit] (21:LOCast=double==>16)
[junit] (16:LOAdd=double==>TERMINAL)
[junit] 
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.781 sec
[junit] Running org.apache.pig.test.TestTypeCheckingValidatorNoSchema
[junit] testUnion1
[junit] 11/04/25 22:27:47 WARN conf.Configuration: DEPRECATED: 
hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. 
Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override 
properties of core-default.xml, mapred-default.xml and hdfs-default.xml 
respectively
[junit] *Type Graph***
[junit] (0:LOLoad={null}==>2)
[junit] (1:LOLoad={null}==>2)
[junit] (2:LOUnion={null}==>TERMINAL)
[junit] 
[junit] testUnion2
[junit] *Type Graph***
[junit] (3:LOLoad={a: int,b: long,c: bytearray}==>5)
[junit] (4:LOLoad={null}==>5)
[junit] (5:LOUnion={null}==>TERMINAL)
[junit] 
[junit] testSplitWithInnerPlan1
[junit] *MessageCollector dump***
[junit] Warning:bytearray is implicitly cast to int under LOLesserThanEqual 
Operator
[junit] *Type Graph***
[junit] (6:LOLoad={null}==>13)
[junit] (13:LOSplit={null}==>14,15)
[junit] (7:LOProject=bytearray==>9)
[junit] (8:LOProject=bytearray==>9)
[junit] (9:LONotEqual=boolean==>TERMINAL)
[junit] (10:LOProject=bytearray==>16)
[junit] (16:LOCast=int==>12)
[junit] (11:LOConst=int==>12)
[junit] (12:LOLesserThanEqual=boolean==>TERMINAL)
[junit] 
[junit] testSplitWithInnerPlan2
[junit] *MessageCollector dump***
[junit] Warning:bytearray is implicitly cast to int under LOAdd Operator
[junit] Error:In alias null, incompatible types in Subtract Operator left 
hand side:int right hand side:chararray
[junit] *Type Graph***
[junit] (1

[jira] [Resolved] (PIG-1814) mapred.output.compress in SET statement does not work

2011-04-25 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1814.
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to both trunk and 0.9 branch.

> mapred.output.compress in SET statement does not work
> -
>
> Key: PIG-1814
> URL: https://issues.apache.org/jira/browse/PIG-1814
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1814-1.patch
>
>
> Setting output compression using "SET" in the script does not work:
> SET mapred.output.compress true;
> SET mapred.output.compression.codec org.apache.hadoop.io.compress.GzipCodec;
> We did some trick to make individual compression setting for multistore work. 
> Instead of the above parameter, using the following works:
> SET output.compression.enabled true;
> SET output.compression.codec org.apache.hadoop.io.compress.GzipCodec;
> However, this is against intuition. We should use 
> mapred.output.compress/mapred.output.compression.codec.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1814) mapred.output.compress in SET statement does not work

2011-04-25 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024971#comment-13024971
 ] 

Xuefu Zhang commented on PIG-1814:
--

+1 patch looks good.

> mapred.output.compress in SET statement does not work
> -
>
> Key: PIG-1814
> URL: https://issues.apache.org/jira/browse/PIG-1814
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1814-1.patch
>
>
> Setting output compression using "SET" in the script does not work:
> SET mapred.output.compress true;
> SET mapred.output.compression.codec org.apache.hadoop.io.compress.GzipCodec;
> We did some trick to make individual compression setting for multistore work. 
> Instead of the above parameter, using the following works:
> SET output.compression.enabled true;
> SET output.compression.codec org.apache.hadoop.io.compress.GzipCodec;
> However, this is against intuition. We should use 
> mapred.output.compress/mapred.output.compression.codec.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason

2011-04-25 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024970#comment-13024970
 ] 

Julien Le Dem commented on PIG-1827:


Hi Richard,
thanks for taking care of this.

* in test/org/apache/pig/test/TestScriptLanguage.java
the test should verify that the value are properly passed to the script
{code}
"testvar = 'abcd$py'",
"testvar2 = '$'",
"testvar3 = '$'",
"testvar4 = 'abcd\\$py$'",
"testvar5 = 'abcd\\$py'",
{code}
what about this?
{code}
"P = Pig.compile(\"\"\"a = load '$input'; b = foreach a generate 
$0,$1,$testvar,$testvar2,$testvar3,$testvar4,$testvar5; store b into 
'$output';\"\"\")", 
{code}
then you can check that you get the values in the output.

we should check that the transformation is bijective.


> When passing a parameter to Pig, if the value contains $ it has to be escaped 
> for no apparent reason
> 
>
> Key: PIG-1827
> URL: https://issues.apache.org/jira/browse/PIG-1827
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Julien Le Dem
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-1827-1.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-1981) LoadPushDown.pushProjection should pass alias in addition to position

2011-04-25 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1981.
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to both trunk and 0.9 branch.

> LoadPushDown.pushProjection should pass alias in addition to position
> -
>
> Key: PIG-1981
> URL: https://issues.apache.org/jira/browse/PIG-1981
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1981-1.patch
>
>
> Currently
> pushProjection(RequiredFieldList requiredFieldList)
> requiredFieldList only contains position. It is better that we also provide 
> alias whenever available.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-1976) One more TwoLevelAccess to remove

2011-04-25 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1976.
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to both trunk and 0.9 branch.

> One more TwoLevelAccess to remove
> -
>
> Key: PIG-1976
> URL: https://issues.apache.org/jira/browse/PIG-1976
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1976-1.patch
>
>
> We removed two level access in PIG-847. However, there is another occurrence 
> we miss in ResourceSchema.java:
> {code}
> if (type == DataType.BAG && fieldSchema.schema != null
> && !fieldSchema.schema.isTwoLevelAccessRequired()) { 
> log.info("Insert two-level access to Resource Schema");
> FieldSchema fs = new FieldSchema("t", fieldSchema.schema);
> inner = new Schema(fs);
> }
> {code}
> Though by default schema.isTwoLevelAccessRequired is false, we shall not use 
> this flag in Pig. User could set this flag in legacy UDF.
> Thanks Woody uncovered this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-1865) BinStorage/PigStorageSchema cannot load data from a different namenode

2011-04-25 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1865.
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to both trunk and 0.9 branch.

> BinStorage/PigStorageSchema cannot load data from a different namenode
> --
>
> Key: PIG-1865
> URL: https://issues.apache.org/jira/browse/PIG-1865
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0, 0.8.0, 0.9.0
>Reporter: Vivek Padmanabhan
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1865-1.patch
>
>
> BinStorage/PigStorageSchema cannot load data from a different namenode. The 
> main reason for this is that, in the getSchema method , they use 
> org.apache.pig.impl.io.FileLocalizer to check whether the exists, but the 
> filesystem in HDataStorage refers to the natively configured dfs.
> The test case is simple :
> a = load 'hdfs:///input' using BinStorage();
> dump a;
> Here if I specify -Dmapreduce.job.hdfs-servers, it should have worked , by 
> pig still takes the fs from fs.default.name so to make it work i had to 
> override  fs.default.name in pig command line.
> Raising this as a bug since the same scenario works with PigStorage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1824) Support import modules in Jython UDF

2011-04-25 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024960#comment-13024960
 ] 

Julien Le Dem commented on PIG-1824:


Hi Woody,
This is a great feature. 
I agree with the static block comments, but I don't see how you could do it 
differently without a major refactoring of the existing code.
Here are comments/questions about some details of the implementation.

in JythonScriptEngine.Interpreter static block:
* If _PYTHON_CACHEDIR_ is provided, we will delete it on exit. Shouldn't we 
delete it only if it has been created by Pig? it is dangerous to delete 
something that we have not created. The user could shoot himself in the foot by 
providing something he cares about as the _PYTHON_CACHEDIR_.
* Also, if we can't write to the provided _PYTHON_CACHEDIR_ we create another 
one. Can the user pre-populate the cache dir? If yes we should throw an 
exception here.

in JythonScriptEngine.Interpreter.init():
* Something should fail if _is_ is null.
{code}
InputStream is = getScriptAsStream(path);
 if (is != null) {
{code}

> Support import modules in Jython UDF
> 
>
> Key: PIG-1824
> URL: https://issues.apache.org/jira/browse/PIG-1824
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Richard Ding
>Assignee: Woody Anderson
> Fix For: 0.9.0
>
> Attachments: 1824.patch, 1824a.patch, 1824b.patch
>
>
> Currently, Jython UDF script doesn't support Jython import statement as in 
> the following example:
> {code}
> #!/usr/bin/python
> import re
> @outputSchema("word:chararray")
> def resplit(content, regex, index):
> return re.compile(regex).split(content)[index]
> {code}
> Can Pig automatically locate the Jython module file and ship it to the 
> backend? Or should we add a ship clause to let user explicitly specify the 
> module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2012) Comments at the begining of the file throws off line numbers in errors

2011-04-25 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2012:
--

Fix Version/s: 0.9.0

> Comments at the begining of the file throws off line numbers in errors
> --
>
> Key: PIG-2012
> URL: https://issues.apache.org/jira/browse/PIG-2012
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Alan Gates
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: macro.pig
>
>
> The preprocessor does not appear to be handling leading comments properly 
> when calculating line numbers for error messages.  In the attached script, 
> the error is reported to be on line 7.  It is actually on line 10.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1976) One more TwoLevelAccess to remove

2011-04-25 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024928#comment-13024928
 ] 

Richard Ding commented on PIG-1976:
---

+1

> One more TwoLevelAccess to remove
> -
>
> Key: PIG-1976
> URL: https://issues.apache.org/jira/browse/PIG-1976
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1976-1.patch
>
>
> We removed two level access in PIG-847. However, there is another occurrence 
> we miss in ResourceSchema.java:
> {code}
> if (type == DataType.BAG && fieldSchema.schema != null
> && !fieldSchema.schema.isTwoLevelAccessRequired()) { 
> log.info("Insert two-level access to Resource Schema");
> FieldSchema fs = new FieldSchema("t", fieldSchema.schema);
> inner = new Schema(fs);
> }
> {code}
> Though by default schema.isTwoLevelAccessRequired is false, we shall not use 
> this flag in Pig. User could set this flag in legacy UDF.
> Thanks Woody uncovered this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1865) BinStorage/PigStorageSchema cannot load data from a different namenode

2011-04-25 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024927#comment-13024927
 ] 

Richard Ding commented on PIG-1865:
---

+1

> BinStorage/PigStorageSchema cannot load data from a different namenode
> --
>
> Key: PIG-1865
> URL: https://issues.apache.org/jira/browse/PIG-1865
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0, 0.8.0, 0.9.0
>Reporter: Vivek Padmanabhan
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1865-1.patch
>
>
> BinStorage/PigStorageSchema cannot load data from a different namenode. The 
> main reason for this is that, in the getSchema method , they use 
> org.apache.pig.impl.io.FileLocalizer to check whether the exists, but the 
> filesystem in HDataStorage refers to the natively configured dfs.
> The test case is simple :
> a = load 'hdfs:///input' using BinStorage();
> dump a;
> Here if I specify -Dmapreduce.job.hdfs-servers, it should have worked , by 
> pig still takes the fs from fs.default.name so to make it work i had to 
> override  fs.default.name in pig command line.
> Raising this as a bug since the same scenario works with PigStorage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2013) Penny gets a null pointer when no properties are set

2011-04-25 Thread Benjamin Reed (JIRA)
Penny gets a null pointer when no properties are set


 Key: PIG-2013
 URL: https://issues.apache.org/jira/browse/PIG-2013
 Project: Pig
  Issue Type: Bug
  Components: tools
Reporter: Benjamin Reed
 Attachments: PIG-2013.patch

when you run penny without setting any properties, you get a null pointer 
exception. unfortunately, we need to set the properties to run in junit, so 
this bug doesn't get test coverage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2013) Penny gets a null pointer when no properties are set

2011-04-25 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated PIG-2013:
---

Attachment: PIG-2013.patch

> Penny gets a null pointer when no properties are set
> 
>
> Key: PIG-2013
> URL: https://issues.apache.org/jira/browse/PIG-2013
> Project: Pig
>  Issue Type: Bug
>  Components: tools
>Reporter: Benjamin Reed
> Attachments: PIG-2013.patch
>
>
> when you run penny without setting any properties, you get a null pointer 
> exception. unfortunately, we need to set the properties to run in junit, so 
> this bug doesn't get test coverage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2013) Penny gets a null pointer when no properties are set

2011-04-25 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated PIG-2013:
---

Status: Patch Available  (was: Open)

> Penny gets a null pointer when no properties are set
> 
>
> Key: PIG-2013
> URL: https://issues.apache.org/jira/browse/PIG-2013
> Project: Pig
>  Issue Type: Bug
>  Components: tools
>Reporter: Benjamin Reed
> Attachments: PIG-2013.patch
>
>
> when you run penny without setting any properties, you get a null pointer 
> exception. unfortunately, we need to set the properties to run in junit, so 
> this bug doesn't get test coverage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1981) LoadPushDown.pushProjection should pass alias in addition to position

2011-04-25 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024912#comment-13024912
 ] 

Xuefu Zhang commented on PIG-1981:
--

+1

> LoadPushDown.pushProjection should pass alias in addition to position
> -
>
> Key: PIG-1981
> URL: https://issues.apache.org/jira/browse/PIG-1981
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1981-1.patch
>
>
> Currently
> pushProjection(RequiredFieldList requiredFieldList)
> requiredFieldList only contains position. It is better that we also provide 
> alias whenever available.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1981) LoadPushDown.pushProjection should pass alias in addition to position

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024856#comment-13024856
 ] 

jirapos...@reviews.apache.org commented on PIG-1981:



---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/663/
---

Review request for pig.


Summary
---

See PIG-1981


This addresses bug PIG-1981.
https://issues.apache.org/jira/browse/PIG-1981


Diffs
-

  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/rules/ColumnPruneVisitor.java
 1095812 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPruneColumn.java
 1095812 

Diff: https://reviews.apache.org/r/663/diff


Testing
---

Test-patch:
PIG-1981-1.patch
 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

Unit-test:
all pass


Thanks,

Daniel



> LoadPushDown.pushProjection should pass alias in addition to position
> -
>
> Key: PIG-1981
> URL: https://issues.apache.org/jira/browse/PIG-1981
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1981-1.patch
>
>
> Currently
> pushProjection(RequiredFieldList requiredFieldList)
> requiredFieldList only contains position. It is better that we also provide 
> alias whenever available.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1814) mapred.output.compress in SET statement does not work

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024853#comment-13024853
 ] 

jirapos...@reviews.apache.org commented on PIG-1814:



---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/661/
---

Review request for pig.


Summary
---

See PIG-1814


This addresses bug PIG-1814.
https://issues.apache.org/jira/browse/PIG-1814


Diffs
-

  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigServer.java 
1095577 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
 1095577 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestBZip.java
 1095577 

Diff: https://reviews.apache.org/r/661/diff


Testing
---

Test-patch:
 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

Unit test:
all pass


Thanks,

Daniel



> mapred.output.compress in SET statement does not work
> -
>
> Key: PIG-1814
> URL: https://issues.apache.org/jira/browse/PIG-1814
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1814-1.patch
>
>
> Setting output compression using "SET" in the script does not work:
> SET mapred.output.compress true;
> SET mapred.output.compression.codec org.apache.hadoop.io.compress.GzipCodec;
> We did some trick to make individual compression setting for multistore work. 
> Instead of the above parameter, using the following works:
> SET output.compression.enabled true;
> SET output.compression.codec org.apache.hadoop.io.compress.GzipCodec;
> However, this is against intuition. We should use 
> mapred.output.compress/mapred.output.compression.codec.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: LoadPushDown.pushProjection should pass alias in addition to position

2011-04-25 Thread Daniel Dai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/663/
---

Review request for pig.


Summary
---

See PIG-1981


This addresses bug PIG-1981.
https://issues.apache.org/jira/browse/PIG-1981


Diffs
-

  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/rules/ColumnPruneVisitor.java
 1095812 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPruneColumn.java
 1095812 

Diff: https://reviews.apache.org/r/663/diff


Testing
---

Test-patch:
PIG-1981-1.patch
 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

Unit-test:
all pass


Thanks,

Daniel



[jira] [Commented] (PIG-1865) BinStorage/PigStorageSchema cannot load data from a different namenode

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024849#comment-13024849
 ] 

jirapos...@reviews.apache.org commented on PIG-1865:



---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/658/
---

Review request for pig.


Summary
---

See PIG-1865


This addresses bug PIG-1865.
https://issues.apache.org/jira/browse/PIG-1865


Diffs
-

  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/BinStorage.java
 1095143 

Diff: https://reviews.apache.org/r/658/diff


Testing
---

Test-patch:
 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

There is no test case added. It requires two clusters and is hard to do in unit 
test.

Unit-test:
all pass

Manual-test:
Tested using two clusters and BinStorage can access remote hdfs


Thanks,

Daniel



> BinStorage/PigStorageSchema cannot load data from a different namenode
> --
>
> Key: PIG-1865
> URL: https://issues.apache.org/jira/browse/PIG-1865
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0, 0.8.0, 0.9.0
>Reporter: Vivek Padmanabhan
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1865-1.patch
>
>
> BinStorage/PigStorageSchema cannot load data from a different namenode. The 
> main reason for this is that, in the getSchema method , they use 
> org.apache.pig.impl.io.FileLocalizer to check whether the exists, but the 
> filesystem in HDataStorage refers to the natively configured dfs.
> The test case is simple :
> a = load 'hdfs:///input' using BinStorage();
> dump a;
> Here if I specify -Dmapreduce.job.hdfs-servers, it should have worked , by 
> pig still takes the fs from fs.default.name so to make it work i had to 
> override  fs.default.name in pig command line.
> Raising this as a bug since the same scenario works with PigStorage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: One more TwoLevelAccess to remove

2011-04-25 Thread Daniel Dai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/662/
---

Review request for pig.


Summary
---

See PIG-1976


Diffs
-

  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/ResourceSchema.java
 1095812 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestResourceSchema.java
 1095812 

Diff: https://reviews.apache.org/r/662/diff


Testing
---

Test-patch:
 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

Unit-test:
all pass


Thanks,

Daniel



Review Request: mapred.output.compress in SET statement does not work

2011-04-25 Thread Daniel Dai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/661/
---

Review request for pig.


Summary
---

See PIG-1814


This addresses bug PIG-1814.
https://issues.apache.org/jira/browse/PIG-1814


Diffs
-

  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigServer.java 
1095577 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
 1095577 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestBZip.java
 1095577 

Diff: https://reviews.apache.org/r/661/diff


Testing
---

Test-patch:
 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

Unit test:
all pass


Thanks,

Daniel



Review Request: BinStorage/PigStorageSchema cannot load data from a different namenode

2011-04-25 Thread Daniel Dai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/658/
---

Review request for pig.


Summary
---

See PIG-1865


This addresses bug PIG-1865.
https://issues.apache.org/jira/browse/PIG-1865


Diffs
-

  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/BinStorage.java
 1095143 

Diff: https://reviews.apache.org/r/658/diff


Testing
---

Test-patch:
 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

There is no test case added. It requires two clusters and is hard to do in unit 
test.

Unit-test:
all pass

Manual-test:
Tested using two clusters and BinStorage can access remote hdfs


Thanks,

Daniel



[jira] [Commented] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason

2011-04-25 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024833#comment-13024833
 ] 

Richard Ding commented on PIG-1827:
---

Unit tests pass.

> When passing a parameter to Pig, if the value contains $ it has to be escaped 
> for no apparent reason
> 
>
> Key: PIG-1827
> URL: https://issues.apache.org/jira/browse/PIG-1827
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Julien Le Dem
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-1827-1.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1622) DEFINE streaming options are ill defined and not properly documented

2011-04-25 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1622:
-

Attachment: PIG-1622-1.patch

Added a minor change for a test case.

> DEFINE streaming options are ill defined and not properly documented
> 
>
> Key: PIG-1622
> URL: https://issues.apache.org/jira/browse/PIG-1622
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Alan Gates
>Assignee: Corinne Chandel
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: PIG-1622-1.patch, PIG-1622.patch
>
>
> According to the documentation 
> (http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#DEFINE) the 
> syntax for DEFINE when used to define a streaming command is:
> DEFINE cmd INPUT(stdin|path) OUTPUT(stdout|stderr|path) SHIP(path [, path, 
> ...]) CACHE (path [, path, ...])
> However, the actual parser accepts something pretty different.  Consider the 
> following script:
> {code}
> define strm `wc -l` INPUT(stdin) 
> CACHE('/Users/gates/.vimrc#myvim') 
> OUTPUT(stdin)
> INPUT('/tmp/fred') 
> OUTPUT('/tmp/bob')
> SHIP('/Users/gates/.bashrc') 
> SHIP('/Users/gates/.vimrc') 
> CACHE('/Users/gates/.bashrc#mybash')
> stderr('/tmp/errors' limit 10);
> A = load '/Users/gates/test/data/studenttab10';
> B = stream A through strm;
> dump B;
> {code}
> The above actually parsers.  I see several issues here:
> # What do multiple INPUT and OUTPUT statements mean in the context of 
> streaming?  These should not be allowed.
> # The documentation implies an order (INPUT, OUTPUT, SHIP, CACHE) that is not 
> enforced by the parser.  We should either enforce the order in the parser or 
> update the documentation.  Most likely the latter to avoid breaking existing 
> scripts.
> # Why are multiple SHIP and CACHE clauses allowed when each can take multiple 
> paths?  It seems we should only allow one of each.
> # The error clause is completely different that what is given in the 
> documentation.  I suspect this is a documentation error and the grammar 
> supported by the parser here is what we want.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2012) Comments at the begining of the file throws off line numbers in errors

2011-04-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2012:


Summary: Comments at the begining of the file throws off line numbers in 
errors  (was: Comments at the begging of file throw off line numbers in errors)

> Comments at the begining of the file throws off line numbers in errors
> --
>
> Key: PIG-2012
> URL: https://issues.apache.org/jira/browse/PIG-2012
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Alan Gates
>Assignee: Richard Ding
> Attachments: macro.pig
>
>
> The preprocessor does not appear to be handling leading comments properly 
> when calculating line numbers for error messages.  In the attached script, 
> the error is reported to be on line 7.  It is actually on line 10.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira