[jira] [Created] (PIG-2091) Improve Pig's progress indicator by keeping track of jobs more precisely

2011-05-23 Thread Laukik Chitnis (JIRA)
Improve Pig's progress indicator by keeping track of jobs more precisely 
-

 Key: PIG-2091
 URL: https://issues.apache.org/jira/browse/PIG-2091
 Project: Pig
  Issue Type: Improvement
Reporter: Laukik Chitnis


Parallax/Paratimer seeks to improve the progress estimation of Pig by 
identifying the processing speed at different steps in the processing pipeline 
for each of the jobs. For that, it considers the following factors:
1. (Estimated) Per tuple processing cost (a)
2. (Estimated) Total Number of tuples to be processed (N)
3. The number of tuples which are processed till now (K) 

It also accounts for the dynamic changes to runtime environment by considering:
4. The (calculated) slowdown factor (s) to the per-tuple processing cost, and
5. The current width (w) of the pipeline (number of running mappers/reducers)

Given these parameters, the time remaining for a particular stage in the 
pipeline can be computed as:

s*a*(N-K)/w

Of these, 'a' and 'N' are either estimated from a sample, or learned from a 
previous "debug" run; while 's' and 'w' are dynamically read or calculated.

Paratimer also breaks down each MR job into the following (groups of) stages:
(1) Record reader - Map - Combine
(2) Copy
(3) Sort, and
(4) Reduce

'K' is observed while the job is in progress for each of these stages by 
monitoring the following counters in hadoop:
(1) MAP_INPUT_RECORDS (available in Hadoop)
(2) REDUCE_INPUT_GROUPS (available in Hadoop)
(3) REDUCE_INPUT_RECORDS (available in Hadoop)
(4) REDUCE_COPY_OUTPUT_RECORDS (new counter to be added in Hadoop)

The sum of such estimate of time remaining for each of the stages for all the 
jobs along the critical path of the execution plan, along with a overhead time 
for as yet uninitialized MR jobs, gives us a more precise estimate of the time 
remaining, and thus a better overall progress estimate.

The critical path calculation is targeted as part of PIG-1883; I also propose 
that the estimation of parameters such as 'N' (cardinality estimate) and 'a' be 
handled separately (and tracked in a different jira).

Assuming that the estimates are available, the following action items emerge:
1. The estimated values need to be propagated to the specific operators in the 
pipeline. This can be accomplished by piggy-backing (pun unintended ;) ) on the 
mechanism used for keeping track of line numbers for error reporting.
2. Using these and other observed counters and values, estimate the time 
remaining for each stage, and
3. Calculate the pig script execution percentage complete by estimating the 
progress of jobs along the critical path


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2083) bincond ERROR 1025: Invalid field projection when null is used

2011-05-23 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038409#comment-13038409
 ] 

Xuefu Zhang commented on PIG-2083:
--

This is probably not caused by Antlr, but by Pig's syntactic rule. 

IDENTIFIER : ( ID DCOLON ) => ( ID DCOLON IDENTIFIER )
   | ID
;

When Antlr say NULL, it cannot make a decision until it says the next 
character, which is a colon. With the colon, the above rule applies, so it 
tries to match IDENTIFIER. As a result of such matching, it ends with two 
tokens, IDENTIFIER and ':'. When there is a space, above rule doesn't apply.

I don't know a good solution yet. A potential solution is to add predicate for 
all reserved keyword such that matching stops as long as the next character is 
not a letter. In this way, 'null:' will be match to NULL as a keyword, and ':'.





> bincond ERROR 1025: Invalid field projection when null is used
> --
>
> Key: PIG-2083
> URL: https://issues.apache.org/jira/browse/PIG-2083
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.9.0
> Environment: Linux 2.6.18-53.1.13.el5 #1 SMP Mon Feb 11 13:27:27 EST 
> 2008 x86_64 x86_64 x86_64 GNU/Linux
> Hadoop 0.20.203.3.1104011556 -r 96519d04f65e22ffadf89b225d0d44ef1741d126
> Compiled on Fri Apr  1 16:29:09 PDT 2011
>Reporter: Araceli Henley
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
>
> This is a regression for 9.
> a = load '1.txt' as (a0, a1);
> b = foreach a generate (a0==0?null:2);
> explain b;
> ERROR 1025:
> Invalid field projection. Projected field [null] does not exist in schema

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2086) grunt parser fails for: load .. as \n (b:bag{});

2011-05-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038338#comment-13038338
 ] 

Thejas M Nair commented on PIG-2086:


When I substitute com.zzz.Storage() with PigStorage(), it fails in 0.8 as well -
{code}
cat t.pig
IN4 = load '$in' using
PigStorage() as
( inpt:bag{} );

java -Xmx500m  -cp pig.jar org.apache.pig.Main -x local  -param in=in  -c 
t.pig 
...
2011-05-23 17:51:32,845 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1000: Error during parsing. Encountered " ";" "; "" at line 3, column 17.
Was expecting one of:
")" ...
"," ...

{code}


> grunt parser fails for: load .. as \n (b:bag{}); 
> -
>
> Key: PIG-2086
> URL: https://issues.apache.org/jira/browse/PIG-2086
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.9.0
> Environment: mac 10.5.8
>Reporter: Woody Anderson
> Fix For: 0.9.0
>
>
> this snippet fails:
> {code}
> IN4 = load '$in' using
> com.zzz.Storage() as
> ( inpt:bag{} );
> {code}
> this works (as on same line as semi-colon)
> {code}
> IN4 = load '$in' using
> com.zzz.Storage()
> as ( inpt:bag{} );
> {code}
> this is the grunt error:
> 2011-05-20 20:19:34,934 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1200:   mismatched input ';' 
> expecting RIGHT_PAREN
> this only happens in cases where the types of the fields are complex e.g. 
> bags/tuples
> eg. change the type of _inpt_ to be _chararray_ and it will parse.
> this is very strange! and i spent hours debugging my schema writing skills 
> and reading QueryParser.g before simply trying "as (expr);" on the same line.
> _all_ of my scripts had been written with the lines split the other way (with 
> lots of ctor args and as-clause elements: hence the line breaks), this is not 
> an issue if i don't load complicated types, but it fails in this particular 
> case.
> This is quite unexpected and seems to be undocumented and a bug imho.
> i don't know enough about antlr (i was a javacc person) to make sense of why 
> this would be an issue for the parser b/c the grammar looks good assuming 
> newline is basically whitespace.
> though i can't figure out how newlines are treated in the grammar, there does 
> not seem to be a newline routine ala 
> https://supportweb.cs.bham.ac.uk/documentation/tutorials/docsystem/build/tutorials/antlr/antlr.html
> I'm going to assume the grammar author is much more sophisticated than that 
> tutorial and knows how to fix this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2083) bincond ERROR 1025: Invalid field projection when null is used

2011-05-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038334#comment-13038334
 ] 

Thejas M Nair commented on PIG-2083:


In the above query, the tokeniser is matching NULL as identifier instead of the 
NULL token. 
The above query works when there is a space after the null. The parser 
generated by antlr is supposed to match NULL token as that rule comes before 
the identifier rule. This looks like a bug in antlr.



> bincond ERROR 1025: Invalid field projection when null is used
> --
>
> Key: PIG-2083
> URL: https://issues.apache.org/jira/browse/PIG-2083
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.9.0
> Environment: Linux 2.6.18-53.1.13.el5 #1 SMP Mon Feb 11 13:27:27 EST 
> 2008 x86_64 x86_64 x86_64 GNU/Linux
> Hadoop 0.20.203.3.1104011556 -r 96519d04f65e22ffadf89b225d0d44ef1741d126
> Compiled on Fri Apr  1 16:29:09 PDT 2011
>Reporter: Araceli Henley
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
>
> This is a regression for 9.
> a = load '1.txt' as (a0, a1);
> b = foreach a generate (a0==0?null:2);
> explain b;
> ERROR 1025:
> Invalid field projection. Projected field [null] does not exist in schema

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2089) Javadoc for ResourceFieldSchema.getSchema() is wrong

2011-05-23 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-2089.
-

Resolution: Fixed

Patch committed to 0.9 and trunk.

> Javadoc for ResourceFieldSchema.getSchema() is wrong
> 
>
> Key: PIG-2089
> URL: https://issues.apache.org/jira/browse/PIG-2089
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-2089-1.patch
>
>
> Javadoc says: "Only fields of type tuple should have a schema". Actually bag, 
> map(starting from 0.9) also can have schema.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2089) Javadoc for ResourceFieldSchema.getSchema() is wrong

2011-05-23 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2089:


Attachment: PIG-2089-1.patch

> Javadoc for ResourceFieldSchema.getSchema() is wrong
> 
>
> Key: PIG-2089
> URL: https://issues.apache.org/jira/browse/PIG-2089
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-2089-1.patch
>
>
> Javadoc says: "Only fields of type tuple should have a schema". Actually bag, 
> map(starting from 0.9) also can have schema.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2051) new LogicalSchema column prune code does not preserve type information for map subfields

2011-05-23 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038322#comment-13038322
 ] 

Daniel Dai commented on PIG-2051:
-

+1. Will commit if tests pass.

> new LogicalSchema column prune code does not preserve type information for 
> map subfields
> 
>
> Key: PIG-2051
> URL: https://issues.apache.org/jira/browse/PIG-2051
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.10
>Reporter: Woody Anderson
>Assignee: Woody Anderson
> Fix For: 0.10
>
> Attachments: 2051.patch
>
>
> current impl of ColumnPruneVisitor.visit ignores field type info and passes 
> type BYTEARRAY for all map fields.
> the corrected type is pretty easy to fill in, especially since map field info 
> is only attempted 1 level deep.
> i came across this b/c i utilize the type information in the pushProjection 
> call, and this was previously of the 'correct' type information, the change 
> over to LogicalSchema caused a regression.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Jenkins build is back to normal : Pig-trunk #1021

2011-05-23 Thread Apache Jenkins Server
See 




[jira] [Commented] (PIG-2085) HBaseStorage fails with multiple STORE statements

2011-05-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038271#comment-13038271
 ] 

Dmitriy V. Ryaboy commented on PIG-2085:


We will likely be upgrading to 0.93 this week, I'll test once we do.

> HBaseStorage fails with multiple STORE statements
> -
>
> Key: PIG-2085
> URL: https://issues.apache.org/jira/browse/PIG-2085
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>Assignee: Bill Graham
> Attachments: PIG-2085_example_input.txt, PIG-2085_example_script.pig, 
> PIG-2085_schema.hbase
>
>
> Scripts with multiple STORE statements using HBaseStorage fail when run 
> against a cluster (they succeed in local mode). Below is an example script:
> {code}
> raw = LOAD 'hbase_split_load_bug.txt' AS
>   (f1: chararray, f2:chararray);
> SPLIT raw INTO apples IF (f2 == 'apple'), oranges IF (f2 == 'orange');
> STORE apples INTO 'hbase://test_table'
>USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:apple');
> STORE oranges INTO 'hbase://test_table'
>USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:orange');
> {code}
> The server throws the following exception after {{apples}} is successfully 
> stored:
> {code}
> Backend error message
> -
> java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@6273305c
>  closed
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:566)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1113)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchOfPuts(HConnectionManager.java:1233)
> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:819)
> at 
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.close(TableOutputFormat.java:106)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReducePOStoreImpl.tearDown(MapReducePOStoreImpl.java:96)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.tearDown(POStore.java:122)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.cleanup(PigMapBase.java:128)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2085) HBaseStorage fails with multiple STORE statements

2011-05-23 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038268#comment-13038268
 ] 

Bill Graham commented on PIG-2085:
--

>From discussions on the HBase list, I think this could be an issue in 
>TableOutputFormat in 0.90, where closing the connection on one table killed 
>the connections for all tables:

http://mail-archives.apache.org/mod_mbox/hbase-user/201105.mbox/%3cbanlktimcxkvtpaqi-hy2ut-h434xub8...@mail.gmail.com%3e

If anyone has an HBase cluster running off the trunk to test this theory on 
(we're still on 0.90), please do so with the attached scripts and report back. 
HBASE-3777 is the relevant fix.

> HBaseStorage fails with multiple STORE statements
> -
>
> Key: PIG-2085
> URL: https://issues.apache.org/jira/browse/PIG-2085
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>Assignee: Bill Graham
> Attachments: PIG-2085_example_input.txt, PIG-2085_example_script.pig, 
> PIG-2085_schema.hbase
>
>
> Scripts with multiple STORE statements using HBaseStorage fail when run 
> against a cluster (they succeed in local mode). Below is an example script:
> {code}
> raw = LOAD 'hbase_split_load_bug.txt' AS
>   (f1: chararray, f2:chararray);
> SPLIT raw INTO apples IF (f2 == 'apple'), oranges IF (f2 == 'orange');
> STORE apples INTO 'hbase://test_table'
>USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:apple');
> STORE oranges INTO 'hbase://test_table'
>USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:orange');
> {code}
> The server throws the following exception after {{apples}} is successfully 
> stored:
> {code}
> Backend error message
> -
> java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@6273305c
>  closed
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:566)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1113)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchOfPuts(HConnectionManager.java:1233)
> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:819)
> at 
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.close(TableOutputFormat.java:106)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReducePOStoreImpl.tearDown(MapReducePOStoreImpl.java:96)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.tearDown(POStore.java:122)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.cleanup(PigMapBase.java:128)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Jenkins build is back to normal : Pig-trunk-commit #825

2011-05-23 Thread Apache Jenkins Server
See 




[jira] [Commented] (PIG-2082) bincond ERROR 1050: Unsupported typ for BindCond: left hand side: tuple

2011-05-23 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038234#comment-13038234
 ] 

Daniel Dai commented on PIG-2082:
-

Python has similar issue. Python treats (1) as integer 1, if user want a tuple 
constant, he needs to say (1,). In Pig, (1,) might also cause confusion with 
(1,null), so I don't insist to support (1,). But seems treat (1) as integer 1 
is reasonable, and has the same behavior as Python. 

> bincond ERROR 1050: Unsupported typ for BindCond: left hand side: tuple 
> 
>
> Key: PIG-2082
> URL: https://issues.apache.org/jira/browse/PIG-2082
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.9.0
> Environment: Linux 2.6.18-53.1.13.el5 #1 SMP Mon Feb 11 13:27:27 EST 
> 2008 x86_64 x86_64 x86_64 GNU/Linux
> Hadoop 0.20.203.3.1104011556 Compiled on Fri Apr  1 16:29:09 PDT 2011
>Reporter: Araceli Henley
>Assignee: Thejas M Nair
> Fix For: 0.10
>
>
> Regression for 9, this passes on 8
> 2011-05-19 00:28:31,619 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1050 Unsupported
> input type for BinCond: left hand side: tuple; right hand side: double
> a = load '1.txt' as (a0, a1);
> b = foreach a generate (a0==0?(1):{a1/5));
> explain b;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2085) HBaseStorage fails with multiple STORE statements

2011-05-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038225#comment-13038225
 ] 

Dmitriy V. Ryaboy commented on PIG-2085:


I bet HBaseOutputFormat gets confused when the Pig does its optimizations and 
tries to do 2 stores in 1 reduce phase.

> HBaseStorage fails with multiple STORE statements
> -
>
> Key: PIG-2085
> URL: https://issues.apache.org/jira/browse/PIG-2085
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>Assignee: Bill Graham
> Attachments: PIG-2085_example_input.txt, PIG-2085_example_script.pig, 
> PIG-2085_schema.hbase
>
>
> Scripts with multiple STORE statements using HBaseStorage fail when run 
> against a cluster (they succeed in local mode). Below is an example script:
> {code}
> raw = LOAD 'hbase_split_load_bug.txt' AS
>   (f1: chararray, f2:chararray);
> SPLIT raw INTO apples IF (f2 == 'apple'), oranges IF (f2 == 'orange');
> STORE apples INTO 'hbase://test_table'
>USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:apple');
> STORE oranges INTO 'hbase://test_table'
>USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:orange');
> {code}
> The server throws the following exception after {{apples}} is successfully 
> stored:
> {code}
> Backend error message
> -
> java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@6273305c
>  closed
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:566)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1113)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchOfPuts(HConnectionManager.java:1233)
> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:819)
> at 
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.close(TableOutputFormat.java:106)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReducePOStoreImpl.tearDown(MapReducePOStoreImpl.java:96)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.tearDown(POStore.java:122)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.cleanup(PigMapBase.java:128)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2090) re-enable TestGrunt test cases

2011-05-23 Thread Thejas M Nair (JIRA)
re-enable TestGrunt test cases
--

 Key: PIG-2090
 URL: https://issues.apache.org/jira/browse/PIG-2090
 Project: Pig
  Issue Type: Task
Affects Versions: 0.8.0, 0.9.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.9.0


Some test cases in TestGrunt.java were commented out in PIG-928. But it seems 
to have been done by mistake. I re-enabled a few of the working ones as part of 
changes in PIG-2084. The rest of them should be fixed, or if what they test is 
no longer valid they should be removed from the test file . 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2084) pig is running validation for a statement at a time batch mode, instead of running it for whole script

2011-05-23 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair resolved PIG-2084.


Resolution: Fixed

Unit test and test-patch passed. Patch committed to trunk and 0.9 branch. 

> pig is running validation for a statement at a time batch mode, instead of 
> running it for whole script
> --
>
> Key: PIG-2084
> URL: https://issues.apache.org/jira/browse/PIG-2084
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-2084.1.patch
>
>
> In PIG-2059, a change was made to run validation for each statement instead 
> of running it once for the whole script.
> This slows down the validation phase, and it ends up taking tens of seconds.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2082) bincond ERROR 1050: Unsupported typ for BindCond: left hand side: tuple

2011-05-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-2082:


Fix Version/s: (was: 0.9.0)
   0.10

> bincond ERROR 1050: Unsupported typ for BindCond: left hand side: tuple 
> 
>
> Key: PIG-2082
> URL: https://issues.apache.org/jira/browse/PIG-2082
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.9.0
> Environment: Linux 2.6.18-53.1.13.el5 #1 SMP Mon Feb 11 13:27:27 EST 
> 2008 x86_64 x86_64 x86_64 GNU/Linux
> Hadoop 0.20.203.3.1104011556 Compiled on Fri Apr  1 16:29:09 PDT 2011
>Reporter: Araceli Henley
>Assignee: Thejas M Nair
> Fix For: 0.10
>
>
> Regression for 9, this passes on 8
> 2011-05-19 00:28:31,619 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1050 Unsupported
> input type for BinCond: left hand side: tuple; right hand side: double
> a = load '1.txt' as (a0, a1);
> b = foreach a generate (a0==0?(1):{a1/5));
> explain b;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1772) Pig 090 Documentation

2011-05-23 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038190#comment-13038190
 ] 

Olga Natkovich commented on PIG-1772:
-

patch committed to trunk

> Pig 090 Documentation
> -
>
> Key: PIG-1772
> URL: https://issues.apache.org/jira/browse/PIG-1772
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.9.0
>Reporter: Corinne Chandel
>Assignee: Corinne Chandel
> Fix For: 0.9.0
>
> Attachments: Pig-1772-GA-1.patch, penny-archt.jpg, pig-1772-1.patch, 
> pig-1772-2.patch, pig-1772-3.patch, pig-1772-GA-1-2.patch, 
> pig-1772-GA-1-3.patch, pig-1772-beta-1-2.patch, pig-1772-beta-1-3.patch, 
> pig-1772-beta-1.patch, pig-1772-beta2-1.patch, pig-1772-beta2-2.patch, 
> pig-index.xml
>
>
> Pig 090 documentation 
> Will include multiple patches as documentation is released with Pig  090 
> milestones (M1, M2, M3, ... )

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2088) Return alias validation failed when there is single line comment in the macro

2011-05-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2088.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to trunk and 0.9 branch.

> Return alias validation failed when there is single line comment in the macro
> -
>
> Key: PIG-2088
> URL: https://issues.apache.org/jira/browse/PIG-2088
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2088.patch
>
>
> The following script
> {code}
> define test() returns b { 
>a = load 'data' as (name, age, gpa);
> -- message 
>$b = filter a by (int)age > 40; 
> };
> beta = test();
> store beta into 'output';
> {code}
> results in a validation failure:
> {code}
> ERROR 1200 "Macro test missing return alias b"
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2089) Javadoc for ResourceFieldSchema.getSchema() is wrong

2011-05-23 Thread Daniel Dai (JIRA)
Javadoc for ResourceFieldSchema.getSchema() is wrong


 Key: PIG-2089
 URL: https://issues.apache.org/jira/browse/PIG-2089
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.9.0


Javadoc says: "Only fields of type tuple should have a schema". Actually bag, 
map(starting from 0.9) also can have schema.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Welcome to Aniket Mokashi

2011-05-23 Thread Daniel Dai
Though we don't support nested foreach in grammar, Pig has some limited 
support for it in logical plan/runtime. For example, the following 
script will contain a nested foreach:


a = load '1.txt' as (a0, a1, a2);
b = group a by a0;
c = foreach b {
c0 = a.a0;
generate c0;
};
explain c;

So I believe the basic piece to make nested foreach work is already 
there. We need to further:
1. Allow parser to handle the real nested foreach statement, define the 
limitation of nested foreach we support

2. Make sure Pig handles the extended scope of nested foreach

Daniel

On 05/22/2011 11:52 AM, Aniket Mokashi wrote:

Hi,

Thank you everyone for all your support. It has been a very enjoyable
experience to work with pig community.

I plan get involved through GSoC platform to contribute to pig project. I
will be working on addition of support for nested foreach. I will also try
to work on jiras related to this support (Please assign related jiras to
me). My proposal to GSoC can be found at --
http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/aniket486/1

I
worked on a couple of interesting projects at Yahoo last summer to learn
about internals of pig parser, logical plan build, construction of physical
and mr plans from the logical plan. While working on support for scalars, I
learnt about various passes in pig to reconstruct plans to optimize
execution and limitations on it. In Pig 0.9, a few things have changed with
parsers and optimizers. Hence, it would be beneficial for me if you can help
me out with any comments and remarks on my approach.

Here are my current thoughts on support of Nested Foreach -(
https://issues.apache.org/jira/browse/PIG-1631)
Pig currently supports nested_proj which internally streams the bag. This
support can be extended by assigning innerplan to this streaming with
nested_foreach. First step is to add parser support for this. But, this
would need changes further to restrict generic support to the innerplan
depending upon pig limitations. Currently, I am exploring various
possibilities to add buildNestedForeachOp to logicalplanbuilder with or
without using existing "generate_clause". I will upload a patch to jira once
  get projection support through nested foreach.
Please let me know your comments on the same.

Thanks,
Aniket

On Thu, May 19, 2011 at 1:19 PM, Ashutosh Chauhanwrote:


Congratulations, Aniket!
Hoping to see many more contributions in Pig from you.

Ashutosh
On Thu, May 19, 2011 at 10:08, Alan Gates  wrote:

Please join me in welcoming Aniket Mokashi as a new committer on Pig.
  Aniket has been contributing to Pig since last summer.  He wrote or

helped

shepherd several major features in 0.8, including the Python UDF work,

the

new mapreduce functionality, and the custom partitioner.  We look forward

to

more great work from him in the future.

Alan.








[jira] [Updated] (PIG-1926) Sample/Limit should take scalar

2011-05-23 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-1926:


Attachment: PIG-1926.patch

First implementation of the feature.
Focusing on LIMIT for now.

- Modified the grammar to *also* allow an expression in LIMIT.
Retained old behavior and code-path for constant expressions.
The rule is split into 2 fragments to solve problems with ANTLR's syntactic 
predicates and local variables. (see The Definitive ANTLR Reference, 14.7 
Issues with Actions and Syntactic Predicates, p.353)
- Modified LOLimit to keep also an expression plan.
Constant expressions have priority over expression plans. (even though both 
should never be set at the same time).
- Modified LogicalPlanBuilder to allow creation of LOLimit in 2 steps.

The expression evaluation does not work yet because I need to modify POLimit 
accordingly, but it compiles, the changes do not disrupt the old behaviour and 
no runtime exception is thrown when using the new one.

Next steps:
Modify POLimit to evaluate the expression.
Evaluate which changes are required in LogToPhyTranslationVisitor to correctly 
create POLimit.
Evaluate which changes are required in TypeCheckingRelVisitor to ensure type 
safety.
Evaluate which changes are required in MRCompiler to retain correct POLimit 
compilation
  (LimitOptimizer breaks the current modifications).

There are some whitespace/format changes in the current patch due to automatic 
formatting.
Should I try to remove them?

> Sample/Limit should take scalar
> ---
>
> Key: PIG-1926
> URL: https://issues.apache.org/jira/browse/PIG-1926
> Project: Pig
>  Issue Type: Improvement
>Reporter: Daniel Dai
>  Labels: gsoc2011
> Attachments: PIG-1926.patch
>
>
> Currently, Limit, Sample only takes a constant. It would be better we can use 
> a scalar in the place of constant. Eg:
> {code}
> a = load 'a.txt';
> b = group a all;
> c = foreach b generate COUNT(a) as sum;
> d = order a by $0;
> e = limit d c.sum/100;
> {code}
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1772) Pig 090 Documentation

2011-05-23 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038164#comment-13038164
 ] 

Olga Natkovich commented on PIG-1772:
-

patch 3 committed to 0.9 branch. Trunk is next

> Pig 090 Documentation
> -
>
> Key: PIG-1772
> URL: https://issues.apache.org/jira/browse/PIG-1772
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.9.0
>Reporter: Corinne Chandel
>Assignee: Corinne Chandel
> Fix For: 0.9.0
>
> Attachments: Pig-1772-GA-1.patch, penny-archt.jpg, pig-1772-1.patch, 
> pig-1772-2.patch, pig-1772-3.patch, pig-1772-GA-1-2.patch, 
> pig-1772-GA-1-3.patch, pig-1772-beta-1-2.patch, pig-1772-beta-1-3.patch, 
> pig-1772-beta-1.patch, pig-1772-beta2-1.patch, pig-1772-beta2-2.patch, 
> pig-index.xml
>
>
> Pig 090 documentation 
> Will include multiple patches as documentation is released with Pig  090 
> milestones (M1, M2, M3, ... )

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2088) Return alias validation failed when there is single line comment in the macro

2011-05-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038145#comment-13038145
 ] 

Thejas M Nair commented on PIG-2088:


+1

> Return alias validation failed when there is single line comment in the macro
> -
>
> Key: PIG-2088
> URL: https://issues.apache.org/jira/browse/PIG-2088
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2088.patch
>
>
> The following script
> {code}
> define test() returns b { 
>a = load 'data' as (name, age, gpa);
> -- message 
>$b = filter a by (int)age > 40; 
> };
> beta = test();
> store beta into 'output';
> {code}
> results in a validation failure:
> {code}
> ERROR 1200 "Macro test missing return alias b"
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2088) Return alias validation failed when there is single line comment in the macro

2011-05-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2088:
--

Attachment: PIG-2088.patch

> Return alias validation failed when there is single line comment in the macro
> -
>
> Key: PIG-2088
> URL: https://issues.apache.org/jira/browse/PIG-2088
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2088.patch
>
>
> The following script
> {code}
> define test() returns b { 
>a = load 'data' as (name, age, gpa);
> -- message 
>$b = filter a by (int)age > 40; 
> };
> beta = test();
> store beta into 'output';
> {code}
> results in a validation failure:
> {code}
> ERROR 1200 "Macro test missing return alias b"
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2088) Return alias validation failed when there is single line comment in the macro

2011-05-23 Thread Richard Ding (JIRA)
Return alias validation failed when there is single line comment in the macro
-

 Key: PIG-2088
 URL: https://issues.apache.org/jira/browse/PIG-2088
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0
 Attachments: PIG-2088.patch

The following script

{code}
define test() returns b { 
   a = load 'data' as (name, age, gpa);
-- message 
   $b = filter a by (int)age > 40; 
};

beta = test();
store beta into 'output';
{code}

results in a validation failure:

{code}
ERROR 1200 "Macro test missing return alias b"
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2087) Not able to project into 'group' tuple from FILTER

2011-05-23 Thread Daniel Eklund (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038122#comment-13038122
 ] 

Daniel Eklund commented on PIG-2087:


Apache Pig version 0.8.0-cdh3u0 (rexported) 

Looks like this was introduced by the Cloudera version of 0.8.0.  Apologies 
about that.  I had assumed that their distribution followed the apache to the 
tee.

> Not able to project into 'group' tuple from FILTER 
> ---
>
> Key: PIG-2087
> URL: https://issues.apache.org/jira/browse/PIG-2087
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Daniel Eklund
>Priority: Minor
>  Labels: pig
>
> GROUP creates the 'group' tuple, but subsequent FILTER statements cannot 
> project into it without throwing ClassCastExceptions.
> Example:
> ---
> my_data = LOAD 'test.txt' using PigStorage(',')
>   as (name:chararray, age:int, eye_color:chararray, height:int);
> by_age_and_color = GROUP my_data BY (age, eye_color);
> OUT2 = FILTER by_age_and_color by group.age is not null; 
> dump OUT2;
> -- I get a similar problem even if I do something like:
> OUT3 = FILTER by_age_and_color by group.age > 9;
> dump OUT3;
> -  sample test.txt -
> ravi,33,blue,43
> brendan,33,green,53
> ravichandra,15,blue,43
> leonor,15,brown,46
> caeser,18,blue,23
> JCVD,,blue,23
> anthony,33,blue,46
> xavier,23,blue,13
> patrick,18,blue,33
> sang,33,brown,44 
> ---
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> org.apache.pig.data.Tuple
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GreaterThanExpr.getNext(GreaterThanExpr.java:72)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2087) Not able to project into 'group' tuple from FILTER

2011-05-23 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038111#comment-13038111
 ] 

Daniel Dai commented on PIG-2087:
-

Seems I cannot reproduce the issue in 0.8.0, 0.8.1, 0.9 branch. Here is what I 
get for 0.8.0:

((15,blue),{(ravichandra,15,blue,43)})
((15,brown),{(leonor,15,brown,46)})
((18,blue),{(caeser,18,blue,23),(patrick,18,blue,33)})
((23,blue),{(xavier,23,blue,13)})
((33,blue),{(ravi,33,blue,43),(anthony,33,blue,46)})
((33,brown),{(sang,33,brown,44)})
((33,green),{(brendan,33,green,53)})

Can you double check?

> Not able to project into 'group' tuple from FILTER 
> ---
>
> Key: PIG-2087
> URL: https://issues.apache.org/jira/browse/PIG-2087
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Daniel Eklund
>Priority: Minor
>  Labels: pig
>
> GROUP creates the 'group' tuple, but subsequent FILTER statements cannot 
> project into it without throwing ClassCastExceptions.
> Example:
> ---
> my_data = LOAD 'test.txt' using PigStorage(',')
>   as (name:chararray, age:int, eye_color:chararray, height:int);
> by_age_and_color = GROUP my_data BY (age, eye_color);
> OUT2 = FILTER by_age_and_color by group.age is not null; 
> dump OUT2;
> -- I get a similar problem even if I do something like:
> OUT3 = FILTER by_age_and_color by group.age > 9;
> dump OUT3;
> -  sample test.txt -
> ravi,33,blue,43
> brendan,33,green,53
> ravichandra,15,blue,43
> leonor,15,brown,46
> caeser,18,blue,23
> JCVD,,blue,23
> anthony,33,blue,46
> xavier,23,blue,13
> patrick,18,blue,33
> sang,33,brown,44 
> ---
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> org.apache.pig.data.Tuple
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GreaterThanExpr.getNext(GreaterThanExpr.java:72)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1772) Pig 090 Documentation

2011-05-23 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel updated PIG-1772:
-

Attachment: pig-1772-GA-1-3.patch

Third version of GA-1 patch.
Fix problem in build.

> Pig 090 Documentation
> -
>
> Key: PIG-1772
> URL: https://issues.apache.org/jira/browse/PIG-1772
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.9.0
>Reporter: Corinne Chandel
>Assignee: Corinne Chandel
> Fix For: 0.9.0
>
> Attachments: Pig-1772-GA-1.patch, penny-archt.jpg, pig-1772-1.patch, 
> pig-1772-2.patch, pig-1772-3.patch, pig-1772-GA-1-2.patch, 
> pig-1772-GA-1-3.patch, pig-1772-beta-1-2.patch, pig-1772-beta-1-3.patch, 
> pig-1772-beta-1.patch, pig-1772-beta2-1.patch, pig-1772-beta2-2.patch, 
> pig-index.xml
>
>
> Pig 090 documentation 
> Will include multiple patches as documentation is released with Pig  090 
> milestones (M1, M2, M3, ... )

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-05-23 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038100#comment-13038100
 ] 

Ken Goodhope commented on PIG-1890:
---

Right now, in this test, AvroStorage is attempting to pass back a single array 
of floats with one call to next. To be consistent with intent of how the data 
is stored we want this array returned as a single unit(databag) with each 
foreach call. In other words we don't want foreach to return each element of 
that array one at a time. If I am understanding the code right, it appears that 
is what it is trying to do. Am I missing something? Is there a way to control 
this behavior?



> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Fix For: 0.9.0
>
> Attachments: PIG-1890-1.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2084) pig is running validation for a statement at a time batch mode, instead of running it for whole script

2011-05-23 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038097#comment-13038097
 ] 

Richard Ding commented on PIG-2084:
---

+1

> pig is running validation for a statement at a time batch mode, instead of 
> running it for whole script
> --
>
> Key: PIG-2084
> URL: https://issues.apache.org/jira/browse/PIG-2084
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-2084.1.patch
>
>
> In PIG-2059, a change was made to run validation for each statement instead 
> of running it once for the whole script.
> This slows down the validation phase, and it ends up taking tens of seconds.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1772) Pig 090 Documentation

2011-05-23 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel updated PIG-1772:
-

Attachment: pig-1772-GA-1-2.patch

pig-1772-GA-1-2.patch

Second version of GA-1 patch.
Added the new pig-index.xml file.

> Pig 090 Documentation
> -
>
> Key: PIG-1772
> URL: https://issues.apache.org/jira/browse/PIG-1772
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.9.0
>Reporter: Corinne Chandel
>Assignee: Corinne Chandel
> Fix For: 0.9.0
>
> Attachments: Pig-1772-GA-1.patch, penny-archt.jpg, pig-1772-1.patch, 
> pig-1772-2.patch, pig-1772-3.patch, pig-1772-GA-1-2.patch, 
> pig-1772-beta-1-2.patch, pig-1772-beta-1-3.patch, pig-1772-beta-1.patch, 
> pig-1772-beta2-1.patch, pig-1772-beta2-2.patch, pig-index.xml
>
>
> Pig 090 documentation 
> Will include multiple patches as documentation is released with Pig  090 
> milestones (M1, M2, M3, ... )

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1772) Pig 090 Documentation

2011-05-23 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038046#comment-13038046
 ] 

Olga Natkovich commented on PIG-1772:
-

I tried to user the current patch and adding pig-index.xml into the same 
location where the rest of xml files are and the build failed

> Pig 090 Documentation
> -
>
> Key: PIG-1772
> URL: https://issues.apache.org/jira/browse/PIG-1772
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.9.0
>Reporter: Corinne Chandel
>Assignee: Corinne Chandel
> Fix For: 0.9.0
>
> Attachments: Pig-1772-GA-1.patch, penny-archt.jpg, pig-1772-1.patch, 
> pig-1772-2.patch, pig-1772-3.patch, pig-1772-beta-1-2.patch, 
> pig-1772-beta-1-3.patch, pig-1772-beta-1.patch, pig-1772-beta2-1.patch, 
> pig-1772-beta2-2.patch, pig-index.xml
>
>
> Pig 090 documentation 
> Will include multiple patches as documentation is released with Pig  090 
> milestones (M1, M2, M3, ... )

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2084) pig is running validation for a statement at a time batch mode, instead of running it for whole script

2011-05-23 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2084:
---

Attachment: PIG-2084.1.patch

> pig is running validation for a statement at a time batch mode, instead of 
> running it for whole script
> --
>
> Key: PIG-2084
> URL: https://issues.apache.org/jira/browse/PIG-2084
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-2084.1.patch
>
>
> In PIG-2059, a change was made to run validation for each statement instead 
> of running it once for the whole script.
> This slows down the validation phase, and it ends up taking tens of seconds.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2086) grunt parser fails for: load .. as \n (b:bag{});

2011-05-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-2086:


Affects Version/s: (was: 0.10)
   0.9.0
Fix Version/s: 0.9.0

> grunt parser fails for: load .. as \n (b:bag{}); 
> -
>
> Key: PIG-2086
> URL: https://issues.apache.org/jira/browse/PIG-2086
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.9.0
> Environment: mac 10.5.8
>Reporter: Woody Anderson
> Fix For: 0.9.0
>
>
> this snippet fails:
> {code}
> IN4 = load '$in' using
> com.zzz.Storage() as
> ( inpt:bag{} );
> {code}
> this works (as on same line as semi-colon)
> {code}
> IN4 = load '$in' using
> com.zzz.Storage()
> as ( inpt:bag{} );
> {code}
> this is the grunt error:
> 2011-05-20 20:19:34,934 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1200:   mismatched input ';' 
> expecting RIGHT_PAREN
> this only happens in cases where the types of the fields are complex e.g. 
> bags/tuples
> eg. change the type of _inpt_ to be _chararray_ and it will parse.
> this is very strange! and i spent hours debugging my schema writing skills 
> and reading QueryParser.g before simply trying "as (expr);" on the same line.
> _all_ of my scripts had been written with the lines split the other way (with 
> lots of ctor args and as-clause elements: hence the line breaks), this is not 
> an issue if i don't load complicated types, but it fails in this particular 
> case.
> This is quite unexpected and seems to be undocumented and a bug imho.
> i don't know enough about antlr (i was a javacc person) to make sense of why 
> this would be an issue for the parser b/c the grammar looks good assuming 
> newline is basically whitespace.
> though i can't figure out how newlines are treated in the grammar, there does 
> not seem to be a newline routine ala 
> https://supportweb.cs.bham.ac.uk/documentation/tutorials/docsystem/build/tutorials/antlr/antlr.html
> I'm going to assume the grammar author is much more sophisticated than that 
> tutorial and knows how to fix this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1772) Pig 090 Documentation

2011-05-23 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037982#comment-13037982
 ] 

Olga Natkovich commented on PIG-1772:
-

Corinne,

Can you include pig-index.xml in your patch? All you need to do is svn-add for 
it to show up. Or at least tell me which directory it needs to go into.

> Pig 090 Documentation
> -
>
> Key: PIG-1772
> URL: https://issues.apache.org/jira/browse/PIG-1772
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.9.0
>Reporter: Corinne Chandel
>Assignee: Corinne Chandel
> Fix For: 0.9.0
>
> Attachments: Pig-1772-GA-1.patch, penny-archt.jpg, pig-1772-1.patch, 
> pig-1772-2.patch, pig-1772-3.patch, pig-1772-beta-1-2.patch, 
> pig-1772-beta-1-3.patch, pig-1772-beta-1.patch, pig-1772-beta2-1.patch, 
> pig-1772-beta2-2.patch, pig-index.xml
>
>
> Pig 090 documentation 
> Will include multiple patches as documentation is released with Pig  090 
> milestones (M1, M2, M3, ... )

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1926) Sample/Limit should take scalar

2011-05-23 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-1926:


Description: 
Currently, Limit, Sample only takes a constant. It would be better we can use a 
scalar in the place of constant. Eg:

{code}
a = load 'a.txt';
b = group a all;
c = foreach b generate COUNT(a) as sum;
d = order a by $0;
e = limit d c.sum/100;
{code}

This is a candidate project for Google summer of code 2011. More information 
about the program can be found at http://wiki.apache.org/pig/GSoc2011

  was:
Currently, Limit, Sample only takes a constant. It would be better we can use a 
scalar in the place of constant. Eg:

{code}
a = load 'a.txt';
b = group a by all;
c = foreach b generate COUNT(*) as sum;
d = order a by $0;
e = limit d c.sum/100;
{code}

This is a candidate project for Google summer of code 2011. More information 
about the program can be found at http://wiki.apache.org/pig/GSoc2011


> Sample/Limit should take scalar
> ---
>
> Key: PIG-1926
> URL: https://issues.apache.org/jira/browse/PIG-1926
> Project: Pig
>  Issue Type: Improvement
>Reporter: Daniel Dai
>  Labels: gsoc2011
>
> Currently, Limit, Sample only takes a constant. It would be better we can use 
> a scalar in the place of constant. Eg:
> {code}
> a = load 'a.txt';
> b = group a all;
> c = foreach b generate COUNT(a) as sum;
> d = order a by $0;
> e = limit d c.sum/100;
> {code}
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2060) Fix errors in pig grammars reported by ANTLRWorks

2011-05-23 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-2060:


Attachment: PIG-2060.patch

Removed MATCHES and EVAL, added ANY as a token. Fixed a naming issue in AST 
grammar files (substituted STDERROR to ERROR).

> Fix errors in pig grammars reported by ANTLRWorks
> -
>
> Key: PIG-2060
> URL: https://issues.apache.org/jira/browse/PIG-2060
> Project: Pig
>  Issue Type: Bug
>Reporter: Gianmarco De Francisci Morales
>Assignee: Gianmarco De Francisci Morales
>Priority: Minor
> Attachments: PIG-2060.patch
>
>
> There are various errors in pig's grammar files highlighted by ANTLRWorks.
> In particular, on token MATCHES, ANY and EVAL.
> The first one should be removed, as there is already STR_OP_MATCHES,
> the second one is an imaginary tokens that should be defined in the 
> appropriate section.
> On the third one I am not sure.
> I have been told it is from the old parsers but it is not used anywhere. Is 
> it correct?
> Is it reserved for future uses? Has it anything to do with FUNC_EVAL?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2060) Fix errors in pig grammars reported by ANTLRWorks

2011-05-23 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-2060:


Status: Patch Available  (was: Open)

> Fix errors in pig grammars reported by ANTLRWorks
> -
>
> Key: PIG-2060
> URL: https://issues.apache.org/jira/browse/PIG-2060
> Project: Pig
>  Issue Type: Bug
>Reporter: Gianmarco De Francisci Morales
>Assignee: Gianmarco De Francisci Morales
>Priority: Minor
> Attachments: PIG-2060.patch
>
>
> There are various errors in pig's grammar files highlighted by ANTLRWorks.
> In particular, on token MATCHES, ANY and EVAL.
> The first one should be removed, as there is already STR_OP_MATCHES,
> the second one is an imaginary tokens that should be defined in the 
> appropriate section.
> On the third one I am not sure.
> I have been told it is from the old parsers but it is not used anywhere. Is 
> it correct?
> Is it reserved for future uses? Has it anything to do with FUNC_EVAL?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira