[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

2011-07-06 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061035#comment-13061035
 ] 

Ken Goodhope commented on PIG-2153:
---

In my LoadFunc, I modified getSchema to check for a single element wrapping 
tuple and return the inner ResourceSchema when one is found.  This fixed the 
errors I was getting from POProject.java.  The unit tests for my LoadFunc are 
still breaking, because the output has changed.  However I suspect the new 
output is correct, so after some more investigation I will probably change the 
unit tests.  Why including the wrapping tuple in the schema used to work is 
still a mystery.  Maybe someone currently working on the project can answer 
that question.

> POProject throws an error with tuples containing a single non-tuple field
> -
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is 
> pulled out.  If that field is not a tuple, a cast exception is thrown.  This 
> is happening in the folliwing block of code at line 401.
>if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
>res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  
> The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. 
>  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at 
> the code, it appears that if I wrapped the previous tuple in another tuple, 
> then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner 
> tuple and return it.  But this would require the LoadFunc to check for tuples 
> with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap 
> another tuple.
>if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) 
> {...
> I don't know the original intent of this code well enough to say this is the 
> appropriate fix or not.  Hoping someone with more Pig experience can help 
> here.  Right now this is preventing the unit tests in AvroStorage from 
> working.  I can change the unit test, but I think in this case the unit test 
> is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2110) NullPointerException in piggybank.evaluation.util.apachelogparser.SearchTermExtractor

2011-07-06 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060978#comment-13060978
 ] 

Daniel Dai commented on PIG-2110:
-

Patch committed to trunk. Thanks Dale for contributing!

> NullPointerException in 
> piggybank.evaluation.util.apachelogparser.SearchTermExtractor
> -
>
> Key: PIG-2110
> URL: https://issues.apache.org/jira/browse/PIG-2110
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Michael Brauwerman
>Assignee: Dale Jin
> Fix For: 0.10
>
> Attachments: SearchTermExtractor.diff
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When processing a large log file, I get an exception in 
> SearchTermExtractor.exec
> I don't have a specific log line with a repro yet, but I assume the error 
> occurs when the input URL is null, or maybe just has no query string:
> I think a fix would be to be add a guard after creating queryString:
> String queryString = urlObject.getQuery();
> if (queryString == null) { return null; }
> Stack Trace:
> 
> Caused by: java.io.IOException: Caught exception processing input row
> at 
> org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchTermExtractor.exec(SearchTermExtractor.java:195)
> at 
> org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchTermExtractor.exec(SearchTermExtractor.java:64)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> Caused by: java.lang.NullPointerException
> at java.util.regex.Matcher.getTextLength(Matcher.java:1140)
> at java.util.regex.Matcher.reset(Matcher.java:291)
> at java.util.regex.Matcher.reset(Matcher.java:311)
> at 
> org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchTermExtractor.exec(SearchTermExtractor.java:170)
> 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (PIG-2110) NullPointerException in piggybank.evaluation.util.apachelogparser.SearchTermExtractor

2011-07-06 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-2110.
-

   Resolution: Fixed
Fix Version/s: 0.10
 Assignee: Dale Jin
 Hadoop Flags: [Reviewed]

> NullPointerException in 
> piggybank.evaluation.util.apachelogparser.SearchTermExtractor
> -
>
> Key: PIG-2110
> URL: https://issues.apache.org/jira/browse/PIG-2110
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Michael Brauwerman
>Assignee: Dale Jin
> Fix For: 0.10
>
> Attachments: SearchTermExtractor.diff
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When processing a large log file, I get an exception in 
> SearchTermExtractor.exec
> I don't have a specific log line with a repro yet, but I assume the error 
> occurs when the input URL is null, or maybe just has no query string:
> I think a fix would be to be add a guard after creating queryString:
> String queryString = urlObject.getQuery();
> if (queryString == null) { return null; }
> Stack Trace:
> 
> Caused by: java.io.IOException: Caught exception processing input row
> at 
> org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchTermExtractor.exec(SearchTermExtractor.java:195)
> at 
> org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchTermExtractor.exec(SearchTermExtractor.java:64)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> Caused by: java.lang.NullPointerException
> at java.util.regex.Matcher.getTextLength(Matcher.java:1140)
> at java.util.regex.Matcher.reset(Matcher.java:291)
> at java.util.regex.Matcher.reset(Matcher.java:311)
> at 
> org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchTermExtractor.exec(SearchTermExtractor.java:170)
> 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2142) Allow registering multiple jars from DFS via single statement

2011-07-06 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060976#comment-13060976
 ] 

jirapos...@reviews.apache.org commented on PIG-2142:



---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/976/#review983
---



trunk/src/org/apache/pig/impl/io/FileLocalizer.java


Is it possible to avoid a listStatus call if we are not use globbing?


- Daniel


On 2011-07-04 04:45:47, Dmitriy Ryaboy wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/976/
bq.  ---
bq.  
bq.  (Updated 2011-07-04 04:45:47)
bq.  
bq.  
bq.  Review request for pig and Daniel Dai.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Posting on behalf of Raghu. 
bq.  Daniel, can you take a look since this collides directly with your changes 
in PIG-1566?
bq.  
bq.  Argh, reviewboard apparently does not get git diffs. We'll try to make 
something it gets and add later.. please see the patch on the Jira in the 
meantime.
bq.  
bq.  
bq.  This addresses bug PIG-2142.
bq.  https://issues.apache.org/jira/browse/PIG-2142
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.trunk/src/org/apache/pig/PigServer.java 1142512 
bq.trunk/src/org/apache/pig/impl/io/FileLocalizer.java 1142512 
bq.trunk/test/org/apache/pig/test/TestPigServer.java 1142512 
bq.  
bq.  Diff: https://reviews.apache.org/r/976/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Dmitriy
bq.  
bq.



> Allow registering multiple jars from DFS via single statement
> -
>
> Key: PIG-2142
> URL: https://issues.apache.org/jira/browse/PIG-2142
> Project: Pig
>  Issue Type: Improvement
>Reporter: Dmitriy V. Ryaboy
>Assignee: Raghu Angadi
> Fix For: 0.10
>
> Attachments: PIG-2142-branch-0.8.patch, PIG-2142.patch
>
>
> Pig currently allows users to register jars from local and remote 
> filesystems, but only one jar can be specified at a time. It would be great 
> to be able to say something along the lines of "register 
> hdfs://user/me/lib/*lucene*.jar" and get all the jars registered in one go.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: PIG-2142: Allow registering multiple jars from DFS via single statement

2011-07-06 Thread Daniel Dai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/976/#review983
---



trunk/src/org/apache/pig/impl/io/FileLocalizer.java


Is it possible to avoid a listStatus call if we are not use globbing?


- Daniel


On 2011-07-04 04:45:47, Dmitriy Ryaboy wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/976/
> ---
> 
> (Updated 2011-07-04 04:45:47)
> 
> 
> Review request for pig and Daniel Dai.
> 
> 
> Summary
> ---
> 
> Posting on behalf of Raghu. 
> Daniel, can you take a look since this collides directly with your changes in 
> PIG-1566?
> 
> Argh, reviewboard apparently does not get git diffs. We'll try to make 
> something it gets and add later.. please see the patch on the Jira in the 
> meantime.
> 
> 
> This addresses bug PIG-2142.
> https://issues.apache.org/jira/browse/PIG-2142
> 
> 
> Diffs
> -
> 
>   trunk/src/org/apache/pig/PigServer.java 1142512 
>   trunk/src/org/apache/pig/impl/io/FileLocalizer.java 1142512 
>   trunk/test/org/apache/pig/test/TestPigServer.java 1142512 
> 
> Diff: https://reviews.apache.org/r/976/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Dmitriy
> 
>



[jira] [Updated] (PIG-1926) Sample/Limit should take scalar

2011-07-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1926:
---

   Resolution: Fixed
Fix Version/s: 0.10
   Status: Resolved  (was: Patch Available)

> Sample/Limit should take scalar
> ---
>
> Key: PIG-1926
> URL: https://issues.apache.org/jira/browse/PIG-1926
> Project: Pig
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Gianmarco De Francisci Morales
>  Labels: gsoc2011
> Fix For: 0.10
>
> Attachments: PIG-1926.10.patch, PIG-1926.11.patch, 
> PIG-1926.12.1.patch, PIG-1926.12.patch, PIG-1926.7.patch, PIG-1926.8.patch, 
> PIG-1926.9.patch, PIG-1926.patch, PIG-1926.patch, PIG-1926.patch, 
> PIG-1926.patch, PIG-1926.patch, PIG-1926.patch
>
>
> Currently, Limit, Sample only takes a constant. It would be better we can use 
> a scalar in the place of constant. Eg:
> {code}
> a = load 'a.txt';
> b = group a all;
> c = foreach b generate COUNT(a) as sum;
> d = order a by $0;
> e = limit d c.sum/100;
> {code}
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-1926) Sample/Limit should take scalar

2011-07-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1926:
---

Attachment: PIG-1926.12.1.patch

+1 for PIG-1926.12.patch. Two new files were missing apache license header 
files, I have added that (PIG-1926.12.1.patch) and committed it to trunk.

Gianmarco, thanks for the contribution!

> Sample/Limit should take scalar
> ---
>
> Key: PIG-1926
> URL: https://issues.apache.org/jira/browse/PIG-1926
> Project: Pig
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Gianmarco De Francisci Morales
>  Labels: gsoc2011
> Fix For: 0.10
>
> Attachments: PIG-1926.10.patch, PIG-1926.11.patch, 
> PIG-1926.12.1.patch, PIG-1926.12.patch, PIG-1926.7.patch, PIG-1926.8.patch, 
> PIG-1926.9.patch, PIG-1926.patch, PIG-1926.patch, PIG-1926.patch, 
> PIG-1926.patch, PIG-1926.patch, PIG-1926.patch
>
>
> Currently, Limit, Sample only takes a constant. It would be better we can use 
> a scalar in the place of constant. Eg:
> {code}
> a = load 'a.txt';
> b = group a all;
> c = foreach b generate COUNT(a) as sum;
> d = order a by $0;
> e = limit d c.sum/100;
> {code}
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2142) Allow registering multiple jars from DFS via single statement

2011-07-06 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060947#comment-13060947
 ] 

Daniel Dai commented on PIG-2142:
-

I am reviewing it.

> Allow registering multiple jars from DFS via single statement
> -
>
> Key: PIG-2142
> URL: https://issues.apache.org/jira/browse/PIG-2142
> Project: Pig
>  Issue Type: Improvement
>Reporter: Dmitriy V. Ryaboy
>Assignee: Raghu Angadi
> Fix For: 0.10
>
> Attachments: PIG-2142-branch-0.8.patch, PIG-2142.patch
>
>
> Pig currently allows users to register jars from local and remote 
> filesystems, but only one jar can be specified at a time. It would be great 
> to be able to say something along the lines of "register 
> hdfs://user/me/lib/*lucene*.jar" and get all the jars registered in one go.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2142) Allow registering multiple jars from DFS via single statement

2011-07-06 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-2142:
--

Attachment: PIG-2142-branch-0.8.patch

Patch for 0.8. This may not be committed to branch-0.8, but could be useful for 
some users.

> Allow registering multiple jars from DFS via single statement
> -
>
> Key: PIG-2142
> URL: https://issues.apache.org/jira/browse/PIG-2142
> Project: Pig
>  Issue Type: Improvement
>Reporter: Dmitriy V. Ryaboy
>Assignee: Raghu Angadi
> Fix For: 0.10
>
> Attachments: PIG-2142-branch-0.8.patch, PIG-2142.patch
>
>
> Pig currently allows users to register jars from local and remote 
> filesystems, but only one jar can be specified at a time. It would be great 
> to be able to say something along the lines of "register 
> hdfs://user/me/lib/*lucene*.jar" and get all the jars registered in one go.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

2011-07-06 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060860#comment-13060860
 ] 

Ken Goodhope commented on PIG-2153:
---

That makes sense, and if it is still the case it would mean the fix needs to 
occur in the LoadFunc and not POProject.  This is also consistent with the 
original comments by Daniel Dae for PIG-1890. AvroStorage has always included 
the wrapping tuple as part of the schema. In most cases the outer tuple isn't 
really a wrapper, but a record with multiple fields and that works fine.  Later 
tonight I will take a look and see what changes I need to make at the LoadFunc 
level.  I am still perplexed why the incorrect behavior used to work.  Thanks 
again Pradeep.

> POProject throws an error with tuples containing a single non-tuple field
> -
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is 
> pulled out.  If that field is not a tuple, a cast exception is thrown.  This 
> is happening in the folliwing block of code at line 401.
>if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
>res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  
> The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. 
>  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at 
> the code, it appears that if I wrapped the previous tuple in another tuple, 
> then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner 
> tuple and return it.  But this would require the LoadFunc to check for tuples 
> with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap 
> another tuple.
>if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) 
> {...
> I don't know the original intent of this code well enough to say this is the 
> appropriate fix or not.  Hoping someone with more Pig experience can help 
> here.  Right now this is preventing the unit tests in AvroStorage from 
> working.  I can change the unit test, but I think in this case the unit test 
> is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

2011-07-06 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060840#comment-13060840
 ] 

Pradeep Kamath commented on PIG-2153:
-

Also am wondering if changes (any fix) are needed in the appropriate LoadFunc 
rather than in POProject (if my initial hypothesis that the cast is valid is 
true)

> POProject throws an error with tuples containing a single non-tuple field
> -
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is 
> pulled out.  If that field is not a tuple, a cast exception is thrown.  This 
> is happening in the folliwing block of code at line 401.
>if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
>res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  
> The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. 
>  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at 
> the code, it appears that if I wrapped the previous tuple in another tuple, 
> then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner 
> tuple and return it.  But this would require the LoadFunc to check for tuples 
> with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap 
> another tuple.
>if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) 
> {...
> I don't know the original intent of this code well enough to say this is the 
> appropriate fix or not.  Hoping someone with more Pig experience can help 
> here.  Right now this is preventing the unit tests in AvroStorage from 
> working.  I can change the unit test, but I think in this case the unit test 
> is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

2011-07-06 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060835#comment-13060835
 ] 

Pradeep Kamath commented on PIG-2153:
-

Based on my (old) knowledge, the tuple returned by LoadFunc (LoadFunc always 
has to return a tuple) simply stands for the record and the schema deals with 
the types of the fields inside it. So if the schema is A: 
{t:tuple(i:int,c:char)} that means each record contains one field of type tuple 
which has an int and char). I would think this means the LoadFunc returns an 
outer tuple (for the record), with a tuple inside (standing for the field) 
which has int and char subfields. I will let the more active committers comment 
on whether anything with respect to LoadFunc tuple handling has changed. 
Hopefully I am not giving wrong information here based my old knowledge, 
apologies in advance if so.

> POProject throws an error with tuples containing a single non-tuple field
> -
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is 
> pulled out.  If that field is not a tuple, a cast exception is thrown.  This 
> is happening in the folliwing block of code at line 401.
>if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
>res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  
> The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. 
>  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at 
> the code, it appears that if I wrapped the previous tuple in another tuple, 
> then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner 
> tuple and return it.  But this would require the LoadFunc to check for tuples 
> with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap 
> another tuple.
>if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) 
> {...
> I don't know the original intent of this code well enough to say this is the 
> appropriate fix or not.  Hoping someone with more Pig experience can help 
> here.  Right now this is preventing the unit tests in AvroStorage from 
> working.  I can change the unit test, but I think in this case the unit test 
> is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-07-06 Thread Mads Moeller (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060767#comment-13060767
 ] 

Mads Moeller commented on PIG-1890:
---

Hi Ken,

With the latest patch the UNION behaves as expected for me.


Thanks,
Mads

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Attachments: PIG-1890-1.patch, PIG-1890-2.patch, PIG-1890-3.patch, 
> pig_setloc_avro.txt
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-07-06 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060762#comment-13060762
 ] 

Ken Goodhope commented on PIG-1890:
---

A recent change in Pig causes setLocation to be called twice, and if 
setLocation isn't idempotent, then you get twice the output.  My suspicion is 
UNION is further exasperating the problem leading to the input being added 4X.  
Did you still see the problem with the last patch I added?

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Attachments: PIG-1890-1.patch, PIG-1890-2.patch, PIG-1890-3.patch, 
> pig_setloc_avro.txt
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

2011-07-06 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060758#comment-13060758
 ] 

Ken Goodhope commented on PIG-2153:
---

Thanks Pradeep, that is actually very helpful.  If I understand you correctly, 
the outer tuple isn't part of the schema returned by LoadFunc.getSchema().  Is 
it possible that the result of LoadFunc.getNext used to be wrapped in an 
implicit tuple, and that is no longer happening?  

The results of the unit tests with the fix I suggested in my last comment 
showed 11 tests now working that were broke before, and 11 tests now breaking 
that used to work.  This makes me wonder if some of the tests have been written 
with the expectation there is an implicit wrapping tuple, and some have been 
written with expectation that there is no implicit wrapper.  Am I missing 
something?

Here are the test results.

Test that were broke and now work.
> [junit] Test org.apache.pig.test.TestBestFitCast
> [junit] Test org.apache.pig.test.TestCounters
> [junit] Test org.apache.pig.test.TestDataBagAccess
> [junit] Test org.apache.pig.test.TestEmptyInputDir
> [junit] Test org.apache.pig.test.TestImplicitSplit
> [junit] Test org.apache.pig.test.TestInvoker
> [junit] Test org.apache.pig.test.TestPigRunner
> [junit] Test org.apache.pig.test.TestPigSplit
> [junit] Test org.apache.pig.test.TestScriptLanguage
> [junit] Test org.apache.pig.test.TestScriptUDF
> [junit] Test org.apache.pig.test.TestSkewedJoin

Tests that used to work, but break with the fix I tried.
< [junit] Test org.apache.pig.test.TestCombiner FAILED
< [junit] Test org.apache.pig.test.TestCommit FAILED
< [junit] Test org.apache.pig.test.TestEvalPipeline2 FAILED
< [junit] Test org.apache.pig.test.TestEvalPipelineLocal FAILED
< [junit] Test org.apache.pig.test.TestForEachNestedPlanLocal FAILED
< [junit] Test org.apache.pig.test.TestLimitAdjuster FAILED
< [junit] Test org.apache.pig.test.TestMergeJoinOuter FAILED
< [junit] Test org.apache.pig.test.TestProject FAILED
< [junit] Test org.apache.pig.test.TestProjectRange FAILED
< [junit] Test org.apache.pig.test.TestPruneColumn FAILED
< [junit] Test org.apache.pig.test.TestUnionOnSchema FAILED

> POProject throws an error with tuples containing a single non-tuple field
> -
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is 
> pulled out.  If that field is not a tuple, a cast exception is thrown.  This 
> is happening in the folliwing block of code at line 401.
>if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
>res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  
> The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. 
>  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at 
> the code, it appears that if I wrapped the previous tuple in another tuple, 
> then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner 
> tuple and return it.  But this would require the LoadFunc to check for tuples 
> with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap 
> another tuple.
>if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) 
> {...
> I don't know the original intent of this code well enough to say this is the 
> appropriate fix or not.  Hoping someone with more Pig experience can help 
> here.  Right now this is preventing the unit tests in AvroStorage from 
> working.  I can change the unit test, but I think in this case the unit test 
> is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-07-06 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated PIG-1890:
--

Attachment: pig_setloc_avro.txt

demonstrate setLocation calls on AvroStorage.

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Attachments: PIG-1890-1.patch, PIG-1890-2.patch, PIG-1890-3.patch, 
> pig_setloc_avro.txt
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-07-06 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060700#comment-13060700
 ] 

Patrick Hunt commented on PIG-1890:
---

@Dmitriy thanks.

bq. Patrick, can you show some debug output that has the sequence of calls?

Sure, I didn't save the original so I re-ran it, see attached 
(pig_setloc_avro.txt) for full details using the UNION example (this is with 
current trunk - notice that there are 6 tuples output rather than 2). I 
mis-remembered one detail - it's calling setLoc for the same job, with 
different files, but _different_ AvroStorage objects. (see first two lines of 
setLocation debug message). 

Why are there 8 AvroStorage objects being created, shouldn't there just be 2, 
one for loading each of the two input files?

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Attachments: PIG-1890-1.patch, PIG-1890-2.patch, PIG-1890-3.patch, 
> pig_setloc_avro.txt
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira