[jira] [Commented] (PIG-1429) Add Boolean Data Type to Pig

2011-07-15 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066365#comment-13066365
 ] 

Daniel Dai commented on PIG-1429:
-

I would vote for string "true"/"false"(regardless case), otherwise null, for 
Utf8StorageConverter.

> Add Boolean Data Type to Pig
> 
>
> Key: PIG-1429
> URL: https://issues.apache.org/jira/browse/PIG-1429
> Project: Pig
>  Issue Type: New Feature
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: boolean, gsoc2011, pig, type
> Attachments: working_boolean.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  
> I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
> plus unit tests to make this work?  
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-1866) Dereference a bag within a tuple does not work

2011-07-15 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated PIG-1866:
---

Attachment: PIG-1866-4-cdh3-0.8.0.patch

Here is a version of patch #4 that applies cleanly to CDH3u0 (pig 0.8.0)

> Dereference a bag within a tuple does not work
> --
>
> Key: PIG-1866
> URL: https://issues.apache.org/jira/browse/PIG-1866
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1866-1.patch, PIG-1866-2.patch, PIG-1866-3.patch, 
> PIG-1866-4-cdh3-0.8.0.patch, PIG-1866-4.patch
>
>
> The following script does not work (both in new and old logical plan):
> {code}
> a = load '1.txt' as (t : tuple(i: int, b1: bag { b_tuple : tuple ( b_str: 
> chararray) }));
> b = foreach a generate t.b1;
> dump b;
> {code}
> 1.txt:
> (1,{(one),(two)})
> Error from old logical plan:
> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be 
> cast to org.apache.pig.data.DataBag
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:482)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:197)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:480)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:197)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:339)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:237)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> Error from new logical plan:
> java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.consumeInputBag(POProject.java:246)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:200)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:339)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:237)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> If we change "b = foreach a generate t.b1;" to "b = foreach a generate t.i;", 
> it works fine, only refer to a bag does not work.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1429) Add Boolean Data Type to Pig

2011-07-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066358#comment-13066358
 ] 

Zhijie Shen commented on PIG-1429:
--

Does anyone have the opinion of casting DataByteArray to Boolean?

1. DataByteArray can represent a numeric value, such that non-zero value should 
be converted to True.
2. DataByteArray can also represent a string, such that the "true" string 
should be converted to True.

However, these two cases conflicts to some extent. a raw DataByteArray can be 
simultaneously translated into a non-zero numeric or a non-"true" string. Then, 
it is hard to say whether it should be converted to True or False.

> Add Boolean Data Type to Pig
> 
>
> Key: PIG-1429
> URL: https://issues.apache.org/jira/browse/PIG-1429
> Project: Pig
>  Issue Type: New Feature
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: boolean, gsoc2011, pig, type
> Attachments: working_boolean.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  
> I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
> plus unit tests to make this work?  
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1904) Default split destination

2011-07-15 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066287#comment-13066287
 ] 

Thejas M Nair commented on PIG-1904:


The approach you are proposing for @NonDeterministic udf sounds good.

PIG-1904.1.patch looks good. Some comments -

I think it is better to retain the restriction that a split needs at least two 
output aliases. This will prevent split being used instead of filter, and from 
pig becoming perl ;).

Maybe, something like - 
split_clause : SPLIT rel INTO split_branch  (COMMA split_branch)* ( COMMA 
split_branch ) |( COMMA split_otherwise ))


In LogicalPlanBuilder.java, I think it is better to change the assertion to a 
if(root == null){throw exception;}, as assertions are not enabled by default.



> Default split destination
> -
>
> Key: PIG-1904
> URL: https://issues.apache.org/jira/browse/PIG-1904
> Project: Pig
>  Issue Type: New Feature
>Reporter: Daniel Dai
>  Labels: gsoc2011
> Fix For: 0.10
>
> Attachments: PIG-1904.1.patch
>
>
> "split" statement is better to have a default destination, eg:
> {code}
> SPLIT A INTO X IF f1<7, Y IF f2==5, Z IF (f3<6 OR f3>6), OTHER otherwise; -- 
> OTHERS has all tuples with f1>=7 && f2!=5 && f3==6
> {code}
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-1904) Default split destination

2011-07-15 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1904:
---

Release Note: This feature introduces a new keyword - OTHERWISE, and that 
is not backward compatible - it can break scripts that use it as an alias. 

Adding a note in release notes, about how this feature affects backward 
compatibility. 


> Default split destination
> -
>
> Key: PIG-1904
> URL: https://issues.apache.org/jira/browse/PIG-1904
> Project: Pig
>  Issue Type: New Feature
>Reporter: Daniel Dai
>  Labels: gsoc2011
> Fix For: 0.10
>
> Attachments: PIG-1904.1.patch
>
>
> "split" statement is better to have a default destination, eg:
> {code}
> SPLIT A INTO X IF f1<7, Y IF f2==5, Z IF (f3<6 OR f3>6), OTHER otherwise; -- 
> OTHERS has all tuples with f1>=7 && f2!=5 && f3==6
> {code}
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2143) Improvements for PigStorage

2011-07-15 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066275#comment-13066275
 ] 

Thejas M Nair commented on PIG-2143:


Thanks for adding the comprehensive documentation and fixing the incorrect old 
one! 

Review of PIG-2143.4.patch -
- In PigStorage.getSchema(..) , it should check for (!dontLoadSchema) for 
deciding if the schema file should be read. (instead of (storeschema) ).
- A test case where pig loads schema with the default constructor will be 
useful. One of the new test cases in the patch can be modified for this. I 
think we need one for the -noschema as well. 
- In javadoc for constructor PigStorage(String delimiter, String options), the 
line about "-Dprop=value" can be removed as its not used right now.
- A nitpick - In the PigStorage class javadoc, I think 'An optional second 
constructor' is a bit misleading. There are 3 constructors including default 
one, and all 3 constructors are 'optional' :) . Maybe calling it 'Another 
constructor' is better.


> Improvements for PigStorage
> ---
>
> Key: PIG-2143
> URL: https://issues.apache.org/jira/browse/PIG-2143
> Project: Pig
>  Issue Type: Improvement
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.10
>
> Attachments: PIG-2143.2.diff, PIG-2143.3.patch, PIG-2143.4.patch, 
> PIG-2143.diff
>
>
> I'd like to propose that we allow for a greater degree of customization in 
> PigStorage.
> An incomplete list features that we might want to add:
> - flag to tell it to overwrite existing output if it exists
> - flag to tell it to compress output using gzip|bzip|lzo (currently this can 
> be achieved by setting the directory name to end in .gz or .bz2, which is a 
> bit awkward)
> - flag to tell it to store the schema and header (perhaps by merging in 
> PigStorageSchema work?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-1942) script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects

2011-07-15 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1942:


Status: Patch Available  (was: Open)

marking as SubmitPatch so this can be reviewed and committed.

> script UDF (jython) should utilize the intended output schema to more 
> directly convert Py objects to Pig objects
> 
>
> Key: PIG-1942
> URL: https://issues.apache.org/jira/browse/PIG-1942
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Woody Anderson
>Assignee: Woody Anderson
>Priority: Minor
>  Labels: python, schema, udf
> Fix For: 0.10
>
> Attachments: 1942.patch, 1942_with_junit.patch
>
>
> from https://issues.apache.org/jira/browse/PIG-1824
> {code}
> import re
> @outputSchema("y:bag{t:tuple(word:chararray)}")
> def strsplittobag(content,regex):
> return re.compile(regex).split(content)
> {code}
> does not work because split returns a list of strings. However, the output 
> schema is known, and it would be quite simple to implicitly promote the 
> string element to a tupled element.
> also, a list/array/tuple/set etc. are all equally convertable to bag, and 
> list/array/tuple are equally convertable to Tuple, this conversion can be 
> done in a much less rigid way with the use of the schema.
> this allows much more facile re-use of existing python code and less memory 
> overhead to create intermediate re-converting of object types.
> I have written the code to do this a while back as part of my version of the 
> jython script framework, i'll isolate that and attach.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-1973) UDFContext.getUDFContext has a thread race condition around it's ThreadLocal

2011-07-15 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1973:


Status: Patch Available  (was: Open)

Marking as submitpatch so we can get this reviewed and committed.

> UDFContext.getUDFContext has a thread race condition around it's ThreadLocal
> 
>
> Key: PIG-1973
> URL: https://issues.apache.org/jira/browse/PIG-1973
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Woody Anderson
>Assignee: Woody Anderson
>Priority: Minor
> Attachments: 1973.patch
>
>
> this is probably isn't manifesting anywhere, but it's an incorrect use of the 
> ThreadLocal pattern.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (PIG-1991) Leading Underscore (_) not allowed in schema names

2011-07-15 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-1991.
-

Resolution: Won't Fix

The definition of variable names for Pig is:

[a-zA-Z][a-zA-Z0-9]*

I don't see any compelling reason to change that.

> Leading Underscore (_) not allowed in schema names
> --
>
> Key: PIG-1991
> URL: https://issues.apache.org/jira/browse/PIG-1991
> Project: Pig
>  Issue Type: Wish
>  Components: grunt
>Affects Versions: 0.9.0
>Reporter: Viraj Bhat
>
> I have a Pig script which uses underscore in its schema name (_a)
> {code}
> a = load 'test.txt' as (_a:long, b:chararray);
> dump a;
> {code}
> This causes an error in Pig:
> {quote}
>   Unexpected character '_'
> 2011-04-12 11:58:59,624 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1200:   Unexpected character '_'
> {quote}
> Stack trace:
> Pig Stack Trace
> ---
> ERROR 1200:   Unexpected character '_'
> Failed to parse:   Unexpected character '_'
> at 
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:83)
> at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1555)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1527)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:582)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:917)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:176)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:152)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> at org.apache.pig.Main.run(Main.java:489)
> at org.apache.pig.Main.main(Main.java:108)
> 
> Schema names should be allowed to have underscores.
> Viraj

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2010) Bundle registered jars via distributed cache

2011-07-15 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2010:


Status: Patch Available  (was: Open)

Marking this as submitpatch so we can review it.

> Bundle registered jars via distributed cache
> 
>
> Key: PIG-2010
> URL: https://issues.apache.org/jira/browse/PIG-2010
> Project: Pig
>  Issue Type: Improvement
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig-2010.patch
>
>
> Currently registered jars get collapsed into a single job megajar that gets 
> submitted to Hadoop.
> A better pattern would be to take advantage of the distributed cache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2027) NPE if Pig don't have permission for log file

2011-07-15 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2027:


Status: Patch Available  (was: Open)

Marking as submitpatch so it can get reviewed and committed.

> NPE if Pig don't have permission for log file
> -
>
> Key: PIG-2027
> URL: https://issues.apache.org/jira/browse/PIG-2027
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Trivial
> Fix For: 0.10
>
> Attachments: PIG-2027-1.patch
>
>
> If specify a log file to Pig, but Pig don't have write permission, if any 
> failure in Pig script, we will get a NPE in addition to Pig script failure:
> 2011-05-02 13:18:36,493 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
> java.lang.NullPointerException
> at org.apache.pig.impl.util.LogUtils.writeLog(LogUtils.java:172)
> at org.apache.pig.impl.util.LogUtils.writeLog(LogUtils.java:79)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:131)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:180)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:152)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
> at org.apache.pig.Main.run(Main.java:554)
> at org.apache.pig.Main.main(Main.java:109)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2050) Pig shows auto-generated schema name for TOTUPLE in describe

2011-07-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066245#comment-13066245
 ] 

Alan Gates commented on PIG-2050:
-

The issue here is not that the user cannot access the tuple by this name, but 
that the name shows up in describe.  The semantics of Pig Latin are that if the 
user does not name an expression in foreach and it is not a simple column 
expression, then it has no name.  But describe should not show this internal 
name that Pig is using, as that is confusing.

> Pig shows auto-generated schema name for TOTUPLE in describe
> 
>
> Key: PIG-2050
> URL: https://issues.apache.org/jira/browse/PIG-2050
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Richard Ding
>Priority: Minor
>
> Here is the use case:
> {code}
> grunt> A = load 'data' as (a0, a1, a2); 
> grunt> B = foreach A generate TOTUPLE(a0, a2);  
> grunt> describe B
> B: {org.apache.pig.builtin.totuple_a0_3: (a0: bytearray,a2: bytearray)}
> grunt> C = foreach B generate org.apache.pig.builtin.totuple_a0_3;
> 2011-05-06 14:38:14,635 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1000: Error during parsing. Invalid alias: org in 
> {org.apache.pig.builtin.totuple_a0_1: (a0: bytearray,a2: bytearray)}
> {code}
> The workaround is to specify a use-defined schema name:
> {code}
> grunt> A = load 'data' as (a0, a1, a2);   
>
> grunt> B = foreach A generate TOTUPLE(a0, a2) as aa;  
> grunt> describe B 
> B: {aa: (a0: bytearray,a2: bytearray)}
> grunt> C = foreach B generate aa; 
> grunt> 
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2050) Pig shows auto-generated schema name for TOTUPLE in describe

2011-07-15 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2050:


Summary: Pig shows auto-generated schema name for TOTUPLE in describe  
(was: Pig can't reference auto-generated schema name for TOTUPLE)

> Pig shows auto-generated schema name for TOTUPLE in describe
> 
>
> Key: PIG-2050
> URL: https://issues.apache.org/jira/browse/PIG-2050
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Richard Ding
>Priority: Minor
>
> Here is the use case:
> {code}
> grunt> A = load 'data' as (a0, a1, a2); 
> grunt> B = foreach A generate TOTUPLE(a0, a2);  
> grunt> describe B
> B: {org.apache.pig.builtin.totuple_a0_3: (a0: bytearray,a2: bytearray)}
> grunt> C = foreach B generate org.apache.pig.builtin.totuple_a0_3;
> 2011-05-06 14:38:14,635 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1000: Error during parsing. Invalid alias: org in 
> {org.apache.pig.builtin.totuple_a0_1: (a0: bytearray,a2: bytearray)}
> {code}
> The workaround is to specify a use-defined schema name:
> {code}
> grunt> A = load 'data' as (a0, a1, a2);   
>
> grunt> B = foreach A generate TOTUPLE(a0, a2) as aa;  
> grunt> describe B 
> B: {aa: (a0: bytearray,a2: bytearray)}
> grunt> C = foreach B generate aa; 
> grunt> 
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2051) new LogicalSchema column prune code does not preserve type information for map subfields

2011-07-15 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2051:


Status: Patch Available  (was: Open)

Marking as Submitpatch so we can get this reviewed and committed.

> new LogicalSchema column prune code does not preserve type information for 
> map subfields
> 
>
> Key: PIG-2051
> URL: https://issues.apache.org/jira/browse/PIG-2051
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.10
>Reporter: Woody Anderson
>Assignee: Woody Anderson
> Fix For: 0.10
>
> Attachments: 2051.patch
>
>
> current impl of ColumnPruneVisitor.visit ignores field type info and passes 
> type BYTEARRAY for all map fields.
> the corrected type is pretty easy to fill in, especially since map field info 
> is only attempted 1 level deep.
> i came across this b/c i utilize the type information in the pushProjection 
> call, and this was previously of the 'correct' type information, the change 
> over to LogicalSchema caused a regression.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2053) PigInputFormat uses class.isAssignableFrom() where instanceof is more appropriate

2011-07-15 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2053:


Status: Patch Available  (was: Open)

Submitting patch so this can get reviewed and committed.

> PigInputFormat uses class.isAssignableFrom() where instanceof is more 
> appropriate
> -
>
> Key: PIG-2053
> URL: https://issues.apache.org/jira/browse/PIG-2053
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.10
>Reporter: Woody Anderson
>Priority: Minor
>  Labels: newbie
> Fix For: 0.10
>
> Attachments: 2053.patch
>
>
> This is a code style/quality improvement.
> isAssignableFrom is appropriate when the class is not known at compile type, 
> but assignment needs to be checked.
> e.g. foo.getClass().isAssignableFrom(bar.getClass())
> but, if the class of foo is known (e.g. X.class), then instanceof is more 
> appropriate and readable.
> i also made use of de morgan's to simply the "is combininable" boolean 
> statement, which is hard to grok as written.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2077) Project UDF output inside a non-foreach statement fail on 0.8

2011-07-15 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2077:


Status: Patch Available  (was: Open)

Marking SubmitPatch so we can get this reviewed and committed.

> Project UDF output inside a non-foreach statement fail on 0.8
> -
>
> Key: PIG-2077
> URL: https://issues.apache.org/jira/browse/PIG-2077
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.1
>
> Attachments: PIG-2077-1.patch
>
>
> The following script fail on 0.8:
> {code}
> A = load '1.txt' as (tracking_id, day:chararray);
> B = load '2.txt' as (tracking_id, timestamp:chararray);
> C = JOIN A by (tracking_id, day) LEFT OUTER, B by (tracking_id,  
> STRSPLIT(timestamp, ' ').$0);
> explain C;
> {code}
> Error stack:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
> at java.util.ArrayList.get(ArrayList.java:324)
> at 
> org.apache.pig.newplan.logical.expression.ProjectExpression.findReferent(ProjectExpression.java:207)
> at 
> org.apache.pig.newplan.logical.expression.ProjectExpression.getFieldSchema(ProjectExpression.java:121)
> at 
> org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:193)
> at 
> org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:53)
> at 
> org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:75)
> at 
> org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> at 
> org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:83)
> at 
> org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:149)
> at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:262)
> This is not a problem on 0.9, trunk, since LogicalExpPlanMigrationVistor is 
> dropped in 0.9.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2124) Script never ending when joining from the same source

2011-07-15 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2124:


Status: Patch Available  (was: Open)

Marking as ready for review.

> Script never ending when joining from the same source
> -
>
> Key: PIG-2124
> URL: https://issues.apache.org/jira/browse/PIG-2124
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Tristan Croiset
>Assignee: Daniel Dai
> Fix For: 0.10
>
> Attachments: PIG-2124-1.patch
>
>
> Considering the following script, it works perfectly fine or the script never 
> ends depending on the fields used at output.
> input ("scores" file) contains:
> --
> test1;0.1
> test2;0.9
> test1;0.3
> --
> --
> score_list = LOAD  'scores' USING PigStorage(';')
>   AS (word: chararray, score: double);
> score_list_ = FOREACH score_list GENERATE
>   word,
>   score,
>   0 AS joinField;
> group_score = GROUP score_list ALL;
> sum_score = FOREACH group_score GENERATE
>   0 AS joinField,
>   SUM(score_list.score) as scoreTotal;
> score_with_sum = JOIN score_list_ BY joinField, sum_score BY joinField;
> out = FOREACH score_with_sum GENERATE word, (score / scoreTotal);
> DUMP out;
> --
> This works fine
> But if I change "out" to : out = FOREACH score_with_sum GENERATE word;
> Then the script never ends and the output keeps repeating lines likes:
> 2011-06-15 15:00:22,536 [SpillThread] INFO  org.apache.hadoop.mapred.MapTask 
> - Finished spill 24
> 2011-06-15 15:00:22,889 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> Spilling map output: record full = true
> 2011-06-15 15:00:22,889 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> bufstart = 65535810; bufend = 68157240; bufvoid = 99614720
> 2011-06-15 15:00:22,889 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> kvstart = 327661; kvend = 262124; length = 327680
> 2011-06-15 15:00:22,994 [SpillThread] INFO  org.apache.hadoop.mapred.MapTask 
> - Finished spill 25
> 2011-06-15 15:00:23,345 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> Spilling map output: record full = true
> 2011-06-15 15:00:23,345 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> bufstart = 68157240; bufend = 70778670; bufvoid = 99614720
> 2011-06-15 15:00:23,345 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> kvstart = 262124; kvend = 196587; length = 327680
> 2011-06-15 15:00:23,447 [SpillThread] INFO  org.apache.hadoop.mapred.MapTask 
> - Finished spill 26
> 2011-06-15 15:00:23,794 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> Spilling map output: record full = true
> 2011-06-15 15:00:23,794 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> bufstart = 70778670; bufend = 73400100; bufvoid = 99614720
> 2011-06-15 15:00:23,794 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> kvstart = 196587; kvend = 131050; length = 327680
> 2011-06-15 15:00:23,896 [SpillThread] INFO  org.apache.hadoop.mapred.MapTask 
> - Finished spill 27
> 2011-06-15 15:00:24,243 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> Spilling map output: record full = true
> 2011-06-15 15:00:24,243 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> bufstart = 73400100; bufend = 76021530; bufvoid = 99614720
> 2011-06-15 15:00:24,243 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> kvstart = 131050; kvend = 65513; length = 327680
> 2011-06-15 15:00:24,346 [SpillThread] INFO  org.apache.hadoop.mapred.MapTask 
> - Finished spill 28
> 2011-06-15 15:00:24,692 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> Spilling map output: record full = true
> 2011-06-15 15:00:24,692 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> bufstart = 76021530; bufend = 78642970; bufvoid = 99614720
> 2011-06-15 15:00:24,693 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> kvstart = 65513; kvend = 327657; length = 327680
> 2011-06-15 15:00:24,793 [SpillThread] INFO  org.apache.hadoop.mapred.MapTask 
> - Finished spill 29
> 2011-06-15 15:00:25,144 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> Spilling map output: record full = true
> 2011-06-15 15:00:25,144 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> bufstart = 78642970; bufend = 81264400; bufvoid = 99614720
> 2011-06-15 15:00:25,144 [Thread-13] INFO  org.apache.hadoop.mapred.MapTask - 
> kvstart = 327657; kvend = 262120; length = 327680
> P.S. I know it's possible to refactor the script using casting to scalar ;)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2149) ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate exception from backed error: Error: Java heap space

2011-07-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066222#comment-13066222
 ] 

Alan Gates commented on PIG-2149:
-

It means Pig ran out of memory.  Can you attach your script (or some script 
that replicates the problem)?  Also an idea of how much data you are reading in 
the script would be helpful.

> ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate 
> exception from backed error: Error: Java heap space
> -
>
> Key: PIG-2149
> URL: https://issues.apache.org/jira/browse/PIG-2149
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.8.0
> Environment: hadoop 0.20.2 
> Linux  2.6.18-194.8.1.el5PAE #1 SMP Thu Jul 1 19:46:23 EDT 2010 i686 i686 
> i386 GNU/Linux
>Reporter: Kim Sang hyun
>
> Backend error message
> -
> Error: Java heap space
> Pig Stack Trace
> ---
> ERROR 2997: Unable to recreate exception from backed error: Error: Java heap 
> space
> org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to 
> recreate exception from backed error: Error: Java heap space
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
> at org.apache.pig.PigServer.execute(PigServer.java:1190)
> at org.apache.pig.PigServer.access$100(PigServer.java:128)
> at org.apache.pig.PigServer$Graph.execute(PigServer.java:1517)
> at org.apache.pig.PigServer.executeBatchEx(PigServer.java:362)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:329)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:169)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
> at org.apache.pig.Main.run(Main.java:510)
> at org.apache.pig.Main.main(Main.java:107)
> 
> Why this error occur?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Pig 0.9 release take 2

2011-07-15 Thread Olga Natkovich
Hi guys,

We have fixed several release blockers since the last attempt. I think it is 
time to get this release out!

I am planning to start the release process around 2:30. Please, let me know 
before that if you have concerns about it. Please, don't make any changes to 
0.9 branch while I roll the release candidate.

Thanks,

Olga


[jira] [Created] (PIG-2170) NPE thrown during illustrate

2011-07-15 Thread Mat Kelcey (JIRA)
NPE thrown during illustrate


 Key: PIG-2170
 URL: https://issues.apache.org/jira/browse/PIG-2170
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10
Reporter: Mat Kelcey


working with version
https://svn.apache.org/repos/asf/pig/trunk@1146777
fetched from git 
 git://git.apache.org/pig.git a7e1228a0fdfe76c3cff0e749e252dba8d387052

using file /tmp/data.tsv
id1 123
id1 234
id2 345
id2 456

this is the most cutdown/simplest script i can make that illustrates (no pun 
intended) the problem
grunt> data = load '/tmp/data.tsv'  as (id:chararray, value:long);
grunt> cogrouped = cogroup data by id;
grunt> exists = foreach cogrouped generate (IsEmpty(data.value) ? 0 : 1) as 
exists;

grunt> dump exists 
is ok

but
grunt> illustrate exists
throws
java.lang.NullPointerException
at 
org.apache.pig.pen.IllustratorAttacher.visitBinCond(IllustratorAttacher.java:360)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POBinCond.visit(POBinCond.java:145)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POBinCond.visit(POBinCond.java:36)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71)
at 
org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:52)
at 
org.apache.pig.pen.IllustratorAttacher.innerPlanAttach(IllustratorAttacher.java:417)
at 
org.apache.pig.pen.IllustratorAttacher.visitPOForEach(IllustratorAttacher.java:229)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.visit(POForEach.java:117)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.visit(POForEach.java:47)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71)
at 
org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:52)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at 
org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:246)
at 
org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:238)
at 
org.apache.pig.pen.LineageTrimmingVisitor.init(LineageTrimmingVisitor.java:103)
at 
org.apache.pig.pen.LineageTrimmingVisitor.(LineageTrimmingVisitor.java:98)
at 
org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:166)
at org.apache.pig.PigServer.getExamples(PigServer.java:1201)
at 
org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:698)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:591)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:306)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67)
at org.apache.pig.Main.run(Main.java:487)
at org.apache.pig.Main.main(Main.java:108)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

with pig.log containing
java.io.IOException: Exception : null
at org.apache.pig.PigServer.getExamples(PigServer.java:1207)
at 
org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:698)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:591)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:306)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
  

[jira] [Commented] (PIG-2165) Need a way to deal with params and param_file in embedded pig in python

2011-07-15 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066084#comment-13066084
 ] 

Julien Le Dem commented on PIG-2165:


I agree, on second thoughts my second option is a bad idea as the script can be 
launched either through the command line or programmatically through the java 
API. In that case the only parameters that are available are the ones passed 
through -p.
There's no good way to reuse sys.argv here, it is not really a good fit as it 
is a list of strings.

Suggestions:
 - a separate dictionary: Pig.getParameters() ?
 - placed in global variables

> Need a way to deal with params and param_file in embedded pig in python
> ---
>
> Key: PIG-2165
> URL: https://issues.apache.org/jira/browse/PIG-2165
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Supreeth
> Fix For: 0.10
>
>
> I am using embedded pig in python and cannot pass param key value pairs to 
> the python script. The only way to pass params  seem to be by passing it in 
> the bind command.
> Is there a plan to have command line parameters to a pig embedded python 
> script? Similar needs for param_file and using the environment variables.
> Thanks
> Supreeth

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2159) New logical plan uses incorrect class for SUM causing for ClassCastException

2011-07-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066082#comment-13066082
 ] 

Alan Gates commented on PIG-2159:
-

Dmitry, I don't see this as a blocker for 0.9.  It does not produce wrong 
results and users can rewrite their scripts to work around it.  I agree it 
should go on the 0.9 branch and be part of the anticipated 0.9.1 release.  

> New logical plan uses incorrect class for  SUM causing for ClassCastException
> -
>
> Key: PIG-2159
> URL: https://issues.apache.org/jira/browse/PIG-2159
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Vivek Padmanabhan
>Priority: Blocker
> Fix For: 0.9.0
>
> Attachments: PIG-2159-1.patch, PIG-2159-2.patch
>
>
> The below is my script;
> {code}
> A = load 'input1' using PigStorage(',')  as 
> (f1:int,f2:int,f3:int,f4:long,f5:double);
> B = load 'input2' using PigStorage(',')  as 
> (f1:int,f2:int,f3:int,f4:long,f5:double);
> C = load 'input_Main' using PigStorage(',')  as (f1:int,f2:int,f3:int);
> U = UNION ONSCHEMA A,B;
> J = join C by (f1,f2,f3) LEFT OUTER, U by (f1,f2,f3);
> Porj = foreach J generate C::f1 as f1 ,C::f2 as f2,C::f3 as f3,U::f4 as 
> f4,U::f5 as f5;
> G = GROUP Porj by (f1,f2,f3,f5);
> Final = foreach G generate SUM(Porj.f4) as total;
> dump Final;
> {code}
> The script fails at while computing the sum with class cast exception.
> Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to 
> java.lang.Double
>  at org.apache.pig.builtin.DoubleSum$Initial.exec(DoubleSum.java:82)
>  ... 19 more
> This is clearly a bug in the logical plan created in 0.9. The sum operation 
> should have processed using org.apache.pig.builtin.LongSum, but instead 0.9 
> logical plan have used org.apache.pig.builtin.DoubleSum which is meant for 
> sum of doubles. And hence the ClassCastException.
> The same script works fine with Pig 0.8.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-1946) HBaseStorage constructor syntax is error prone

2011-07-15 Thread Bill Graham (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated PIG-1946:
-

Attachment: PIG-1946_3.patch

All good suggestions, here's patch #3.

The {{--no-patch}} option doesn't seem to be valid in git merge. I tried 
{{--no-prefix}} but the format still looks gitish. Any other suggestions for an 
SVN-friendly git patch? I've made manual changes in the past when applying git 
patches to SVN, but I'd love to find a way to not require that.

> HBaseStorage constructor syntax is error prone
> --
>
> Key: PIG-1946
> URL: https://issues.apache.org/jira/browse/PIG-1946
> Project: Pig
>  Issue Type: Improvement
>Reporter: Bill Graham
>Assignee: Bill Graham
> Fix For: 0.10
>
> Attachments: PIG-1946_1.patch, PIG-1946_2.patch, PIG-1946_3.patch
>
>
> Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it 
> will yield unexpected results:
> {code}
> STORE result INTO 'hbase://foo' USING
>  org.apache.pig.backend.hadoop.hbase.HBaseStorage(
>  'info:first_name, info:last_name');
> {code}
> The problem us that a column named {{info:first_name,}} will be created, with 
> the trailing comma included. I've had numerous developers get tripped up on 
> this issue since everywhere else in Pig variables are separated by commas, so 
> I propose we fix it.
> I propose we trim leading/trailing commas from column names, but I'm open to 
> other ideas.
> Also should we accept column names that are comman-delimited without spaces?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2143) Improvements for PigStorage

2011-07-15 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066061#comment-13066061
 ] 

Raghu Angadi commented on PIG-2143:
---

Thanks for the detailed javadoc. noticed the compression changes in the prev 
patch.

PigStorageSchema class does not set "-schema" option. Is that correct?

didn't know .pig_schema didn't store the delimiter.


> Improvements for PigStorage
> ---
>
> Key: PIG-2143
> URL: https://issues.apache.org/jira/browse/PIG-2143
> Project: Pig
>  Issue Type: Improvement
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.10
>
> Attachments: PIG-2143.2.diff, PIG-2143.3.patch, PIG-2143.4.patch, 
> PIG-2143.diff
>
>
> I'd like to propose that we allow for a greater degree of customization in 
> PigStorage.
> An incomplete list features that we might want to add:
> - flag to tell it to overwrite existing output if it exists
> - flag to tell it to compress output using gzip|bzip|lzo (currently this can 
> be achieved by setting the directory name to end in .gz or .bz2, which is a 
> bit awkward)
> - flag to tell it to store the schema and header (perhaps by merging in 
> PigStorageSchema work?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2165) Need a way to deal with params and param_file in embedded pig in python

2011-07-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066048#comment-13066048
 ] 

Alan Gates commented on PIG-2165:
-

I like the idea of only passing the parameters from either the command line or 
the parameters file.  I don't see why the Python script should care about other 
parameters that will be Pig specific.  Whether those parameters are placed in 
global variables, in sys.argv, or in a separate dictionary (pig_argv?) I don't 
have an opinion on.

> Need a way to deal with params and param_file in embedded pig in python
> ---
>
> Key: PIG-2165
> URL: https://issues.apache.org/jira/browse/PIG-2165
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Supreeth
> Fix For: 0.10
>
>
> I am using embedded pig in python and cannot pass param key value pairs to 
> the python script. The only way to pass params  seem to be by passing it in 
> the bind command.
> Is there a plan to have command line parameters to a pig embedded python 
> script? Similar needs for param_file and using the environment variables.
> Thanks
> Supreeth

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1904) Default split destination

2011-07-15 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066046#comment-13066046
 ] 

Gianmarco De Francisci Morales commented on PIG-1904:
-

Created PIG-2169 for this.
Anyway given the benefit/cost ratio I wouldn't try to fix it.
A Nondeterministic UDF in a Split is probably better expressed as a Sample.
Anyway I think this simple workaround should work:
{code}
a = LOAD 'a.txt' AS (f1,f2,f3);
b = FOREACH a GENERATE f1, f2, f3, NonDetUDF(f1,f2,f3) AS f4;
SPLIT b INTO c IF f4 < 0.5, D OTHERWISE;
{code}

> Default split destination
> -
>
> Key: PIG-1904
> URL: https://issues.apache.org/jira/browse/PIG-1904
> Project: Pig
>  Issue Type: New Feature
>Reporter: Daniel Dai
>  Labels: gsoc2011
> Fix For: 0.10
>
> Attachments: PIG-1904.1.patch
>
>
> "split" statement is better to have a default destination, eg:
> {code}
> SPLIT A INTO X IF f1<7, Y IF f2==5, Z IF (f3<6 OR f3>6), OTHER otherwise; -- 
> OTHERS has all tuples with f1>=7 && f2!=5 && f3==6
> {code}
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2169) Allow Nondeterministic UDFs in Split-Otherwise

2011-07-15 Thread Gianmarco De Francisci Morales (JIRA)
Allow Nondeterministic UDFs in Split-Otherwise
--

 Key: PIG-2169
 URL: https://issues.apache.org/jira/browse/PIG-2169
 Project: Pig
  Issue Type: Wish
Reporter: Gianmarco De Francisci Morales
Priority: Trivial


PIG-1904 allows an Otherwise option in Split.
Because of how it is implemented, Nondeterministic UDFs are not allowed.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2060) Fix errors in pig grammars reported by ANTLRWorks

2011-07-15 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2060:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to trunk.
Thanks Gianmarco!

> Fix errors in pig grammars reported by ANTLRWorks
> -
>
> Key: PIG-2060
> URL: https://issues.apache.org/jira/browse/PIG-2060
> Project: Pig
>  Issue Type: Bug
>Reporter: Gianmarco De Francisci Morales
>Assignee: Gianmarco De Francisci Morales
>Priority: Minor
> Attachments: PIG-2060.1.patch, PIG-2060.patch
>
>
> There are various errors in pig's grammar files highlighted by ANTLRWorks.
> In particular, on token MATCHES, ANY and EVAL.
> The first one should be removed, as there is already STR_OP_MATCHES,
> the second one is an imaginary tokens that should be defined in the 
> appropriate section.
> On the third one I am not sure.
> I have been told it is from the old parsers but it is not used anywhere. Is 
> it correct?
> Is it reserved for future uses? Has it anything to do with FUNC_EVAL?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2167) CUBE operation in Pig

2011-07-15 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066023#comment-13066023
 ] 

Dmitriy V. Ryaboy commented on PIG-2167:


I believe there is value to providing the naive solution and improving on it 
later, rather than trying to build the optimal plan from the get-go.

Initial (naive) implementation plan:

Add an optional "WITH CUBE" clause to the group operator.

In LogicalPlanBuilder, if "WITH CUBE" is present, insert operators equivalent 
to the following above the group operator:

{code}
relation = foreach relation generate
   FLATTEN(CubeDimensions(dim1, dim2, dim3))
 as (dim1, dim2, dim3),
   other_fields;
{code}

It may be desirable in some cases to group by a superset of dimensions one 
wants to cube on: group by dim1, dim2, dim3 with cube on (dim1, dim2). If we 
want to support that use case, we simply need to know to call the UDF on (dim1, 
dim2) and push dim3 into the other_fields list.

Note also that there's a bit of a problem if null values are legitimate values 
for the dimensions, as we use null to indicate "all". The UDF provided in 
PIG-2168 allows one to use custom strings instead of null for the "all" marker. 
We can optionally support this in the grammar, as well.

> CUBE operation in Pig
> -
>
> Key: PIG-2167
> URL: https://issues.apache.org/jira/browse/PIG-2167
> Project: Pig
>  Issue Type: New Feature
>Reporter: Dmitriy V. Ryaboy
> Fix For: 0.10
>
>
> Computing aggregates over a cube of several dimensions is a common operation 
> in data warehousing.
> The standard SQL syntax is "GROUP relation BY dim1, dim2, dim3 WITH CUBE" -- 
> which in addition to all dim1-2-3, produces aggregations for just dim1, just 
> dim1 and dim2, etc. NULL is generally used to represent "all".
> A presentation by Arnab Nandi describes how one might implement efficient 
> cubing in Map-Reduce here: http://pdf.cx/44wrk
> We can start with the naive solution which only works for algebraic measures, 
> and work up from there.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2060) Fix errors in pig grammars reported by ANTLRWorks

2011-07-15 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2060:
---

Attachment: PIG-2060.1.patch

+1 
Regenerated patch for latest svn trunk (PIG-2060.1.patch).

> Fix errors in pig grammars reported by ANTLRWorks
> -
>
> Key: PIG-2060
> URL: https://issues.apache.org/jira/browse/PIG-2060
> Project: Pig
>  Issue Type: Bug
>Reporter: Gianmarco De Francisci Morales
>Assignee: Gianmarco De Francisci Morales
>Priority: Minor
> Attachments: PIG-2060.1.patch, PIG-2060.patch
>
>
> There are various errors in pig's grammar files highlighted by ANTLRWorks.
> In particular, on token MATCHES, ANY and EVAL.
> The first one should be removed, as there is already STR_OP_MATCHES,
> the second one is an imaginary tokens that should be defined in the 
> appropriate section.
> On the third one I am not sure.
> I have been told it is from the old parsers but it is not used anywhere. Is 
> it correct?
> Is it reserved for future uses? Has it anything to do with FUNC_EVAL?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2161) TOTUPLE should use no-copy tuple creation

2011-07-15 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066016#comment-13066016
 ] 

Thejas M Nair commented on PIG-2161:


+1. I don't see a reason why the tuple should be copied.

> TOTUPLE should use no-copy tuple creation
> -
>
> Key: PIG-2161
> URL: https://issues.apache.org/jira/browse/PIG-2161
> Project: Pig
>  Issue Type: Improvement
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Trivial
> Attachments: pig_2161.patch
>
>
> TOTUPLE udf gets an input tuple, creates a new list, puts every field from 
> the tuple into the list, and creates a new tuple by calling 
> TupleFactory.newTuple(List) method -- which in turn allocates 
> *another* list and copies everything in there.
> Simply returning the input tuple should be sufficient -- Pig already did the 
> work of putting the arguments into a tuple.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2168) CubeDimensions UDF

2011-07-15 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2168:
---

Attachment: PIG-2168.patch

> CubeDimensions UDF
> --
>
> Key: PIG-2168
> URL: https://issues.apache.org/jira/browse/PIG-2168
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.10
>
> Attachments: PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on 
> the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, 
> null, null), (null, b, null), (null, null, null).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2168) CubeDimensions UDF

2011-07-15 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2168:
---

Status: Patch Available  (was: Open)

> CubeDimensions UDF
> --
>
> Key: PIG-2168
> URL: https://issues.apache.org/jira/browse/PIG-2168
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.10
>
> Attachments: PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on 
> the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, 
> null, null), (null, b, null), (null, null, null).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2168) CubeDimensions UDF

2011-07-15 Thread Dmitriy V. Ryaboy (JIRA)
CubeDimensions UDF
--

 Key: PIG-2168
 URL: https://issues.apache.org/jira/browse/PIG-2168
 Project: Pig
  Issue Type: Sub-task
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy


A prerequisite for a naive cubing implementation:
A UDF that, given a set of dimensions (a, b, c) generates all the points on the 
cube:
(a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, 
null), (null, b, null), (null, null, null).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2167) CUBE operation in Pig

2011-07-15 Thread Dmitriy V. Ryaboy (JIRA)
CUBE operation in Pig
-

 Key: PIG-2167
 URL: https://issues.apache.org/jira/browse/PIG-2167
 Project: Pig
  Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy
 Fix For: 0.10


Computing aggregates over a cube of several dimensions is a common operation in 
data warehousing.

The standard SQL syntax is "GROUP relation BY dim1, dim2, dim3 WITH CUBE" -- 
which in addition to all dim1-2-3, produces aggregations for just dim1, just 
dim1 and dim2, etc. NULL is generally used to represent "all".

A presentation by Arnab Nandi describes how one might implement efficient 
cubing in Map-Reduce here: http://pdf.cx/44wrk

We can start with the naive solution which only works for algebraic measures, 
and work up from there.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1904) Default split destination

2011-07-15 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065966#comment-13065966
 ] 

Dmitriy V. Ryaboy commented on PIG-1904:


Nice catch about @NonDeterministic. Seems like it doesn't work due to the 
implementation details, the issue isn't fundamental. I'm cool with the partial 
solution for now, but please file a jira to fix this later.

> Default split destination
> -
>
> Key: PIG-1904
> URL: https://issues.apache.org/jira/browse/PIG-1904
> Project: Pig
>  Issue Type: New Feature
>Reporter: Daniel Dai
>  Labels: gsoc2011
> Fix For: 0.10
>
> Attachments: PIG-1904.1.patch
>
>
> "split" statement is better to have a default destination, eg:
> {code}
> SPLIT A INTO X IF f1<7, Y IF f2==5, Z IF (f3<6 OR f3>6), OTHER otherwise; -- 
> OTHERS has all tuples with f1>=7 && f2!=5 && f3==6
> {code}
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1914) Support load/store JSON data in Pig

2011-07-15 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065961#comment-13065961
 ] 

Dmitriy V. Ryaboy commented on PIG-1914:


Very cool.

Some quick code review notes:

Tiny typo here:
"e = foreach d generate flatten(men#'value') as val;" -- that should read 
menu#'value'


{code}
boolean notDone = in.nextKeyValue();
if (!notDone) {
return null;
}
{code}

Better: {code}
if (!in.nextKeyValue()) {
return null;
}
{code}

Parse exceptions: it's better to increment a counter and move on than to break 
on a bad input string. Throwing an exception kills the whole job. So maybe 
something like 
{code}
t = null;
while (t == null && in.nextKeyValue()) {
 ...
}
return t;
{code}

In flatten_array, if the value is an array, you allocate a new bag, populate it 
recursively, and add the contents of the new bag to the old bag. Why not skip 
the object allocation and copy, and simply pass the original bag into the 
recursive call?

Also: are null values for keys just plain unsupported? You skip them.

setLocation: not that it really matters, but for consistency, you should use 
PigTextInputFormat instead of PigFileInputFormat here.

schema: probably makes sense to implement getSchema?

> Support load/store JSON data in Pig
> ---
>
> Key: PIG-1914
> URL: https://issues.apache.org/jira/browse/PIG-1914
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Chao Tian
> Attachments: PIG-1914.patch
>
>
> The JSON is a commonly used data storage format. It is popular for storing 
> structured data, especially for JavaScript data exchange. 
> Pig should have the ability to load/store JSON format data. I plan to write 
> one for the piggy bank.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-1904) Default split destination

2011-07-15 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-1904:


Attachment: PIG-1904.1.patch

PIG-1904.1.patch contains the first working implementation of the feature.

The grammar now recognizes statements like:
SPLIT a INTO b IF x1 < 0, c OTHERWISE;
but also like:
SPLIT a INTO b IF x1 < 0;
This is a side-effect of making the otherwise branch optional and is a change 
from past behavior.
It shouldn't be a problem as the Split maps to a Filter in any case.

Implemented by copying of the other LOSplitOutput plans, and building a negated 
disjunction (OR) of the expressions.

Added unit test for Split-Otherwise

TODO:
Disable the feature if the expression contains a @NonDeterministic UDF.
I plan to do it by spawning a visitor on the expression.
The visitor will throw an error and explain the reason in the error message.
Is this a reasonable approach?

> Default split destination
> -
>
> Key: PIG-1904
> URL: https://issues.apache.org/jira/browse/PIG-1904
> Project: Pig
>  Issue Type: New Feature
>Reporter: Daniel Dai
>  Labels: gsoc2011
> Fix For: 0.10
>
> Attachments: PIG-1904.1.patch
>
>
> "split" statement is better to have a default destination, eg:
> {code}
> SPLIT A INTO X IF f1<7, Y IF f2==5, Z IF (f3<6 OR f3>6), OTHER otherwise; -- 
> OTHERS has all tuples with f1>=7 && f2!=5 && f3==6
> {code}
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (PIG-1429) Add Boolean Data Type to Pig

2011-07-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned PIG-1429:


Assignee: Zhijie Shen  (was: Russell Jurney)

> Add Boolean Data Type to Pig
> 
>
> Key: PIG-1429
> URL: https://issues.apache.org/jira/browse/PIG-1429
> Project: Pig
>  Issue Type: New Feature
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: boolean, gsoc2011, pig, type
> Attachments: working_boolean.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  
> I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
> plus unit tests to make this work?  
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira