[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

2011-07-09 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062666#comment-13062666
 ] 

Ken Goodhope commented on PIG-2153:
---

The behavior of POProject is correct.  LoadFuncs need make sure the pig schema 
they return does not include the implicit wrapping tuple.  The schema should 
only reflect the contents inside the wrapping tuple.  

I am not 100% sure how this relates to the issue with ElephantBird, but I am 
reasonably convinced the problem there would lie in either how the schema is 
built, or possibly how the logical plan is being executed.  Regardless I 
believe this jira can be closed, since POProject is no longer suspect.

> POProject throws an error with tuples containing a single non-tuple field
> -
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is 
> pulled out.  If that field is not a tuple, a cast exception is thrown.  This 
> is happening in the folliwing block of code at line 401.
>if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
>res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  
> The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. 
>  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at 
> the code, it appears that if I wrapped the previous tuple in another tuple, 
> then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner 
> tuple and return it.  But this would require the LoadFunc to check for tuples 
> with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap 
> another tuple.
>if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) 
> {...
> I don't know the original intent of this code well enough to say this is the 
> appropriate fix or not.  Hoping someone with more Pig experience can help 
> here.  Right now this is preventing the unit tests in AvroStorage from 
> working.  I can change the unit test, but I think in this case the unit test 
> is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-07-09 Thread Ken Goodhope (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ken Goodhope updated PIG-1890:
--

  Labels: patch  (was: )
Release Note: Fixed AvroStorage unit tests.
  Status: Patch Available  (was: Open)

PIG-1890-4.patch ready for review.  All unit test now working against trunk.

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
>  Labels: patch
> Attachments: PIG-1890-1.patch, PIG-1890-2.patch, PIG-1890-3.patch, 
> PIG-1890-4.patch, pig_setloc_avro.txt
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-07-09 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062664#comment-13062664
 ] 

Ken Goodhope commented on PIG-1890:
---

Removing the blocker for PIG-2153.  Turns out the problem, as first asserted, 
was in AvroStorage.  The new logical plan must handle implicit wrapping tuples 
differently than used to be the case.  In order to make this work, I removed 
the wrapping tuple from the schema produced by getSchema.  getNext still 
returns its result in the wrapping tuple.  I also had to modify putNext, to 
expect a piq schema without the implicit wrapping tuple.

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Attachments: PIG-1890-1.patch, PIG-1890-2.patch, PIG-1890-3.patch, 
> PIG-1890-4.patch, pig_setloc_avro.txt
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-07-09 Thread Ken Goodhope (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ken Goodhope updated PIG-1890:
--

Attachment: PIG-1890-4.patch

Uploading new patch that contains the same fixes to setLocation contained in 
the previous patch.  New patch adds fixes to the schema that resolve the issues 
around the unit tests.

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Attachments: PIG-1890-1.patch, PIG-1890-2.patch, PIG-1890-3.patch, 
> PIG-1890-4.patch, pig_setloc_avro.txt
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

2011-07-06 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061035#comment-13061035
 ] 

Ken Goodhope commented on PIG-2153:
---

In my LoadFunc, I modified getSchema to check for a single element wrapping 
tuple and return the inner ResourceSchema when one is found.  This fixed the 
errors I was getting from POProject.java.  The unit tests for my LoadFunc are 
still breaking, because the output has changed.  However I suspect the new 
output is correct, so after some more investigation I will probably change the 
unit tests.  Why including the wrapping tuple in the schema used to work is 
still a mystery.  Maybe someone currently working on the project can answer 
that question.

> POProject throws an error with tuples containing a single non-tuple field
> -
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is 
> pulled out.  If that field is not a tuple, a cast exception is thrown.  This 
> is happening in the folliwing block of code at line 401.
>if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
>res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  
> The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. 
>  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at 
> the code, it appears that if I wrapped the previous tuple in another tuple, 
> then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner 
> tuple and return it.  But this would require the LoadFunc to check for tuples 
> with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap 
> another tuple.
>if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) 
> {...
> I don't know the original intent of this code well enough to say this is the 
> appropriate fix or not.  Hoping someone with more Pig experience can help 
> here.  Right now this is preventing the unit tests in AvroStorage from 
> working.  I can change the unit test, but I think in this case the unit test 
> is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

2011-07-06 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060860#comment-13060860
 ] 

Ken Goodhope commented on PIG-2153:
---

That makes sense, and if it is still the case it would mean the fix needs to 
occur in the LoadFunc and not POProject.  This is also consistent with the 
original comments by Daniel Dae for PIG-1890. AvroStorage has always included 
the wrapping tuple as part of the schema. In most cases the outer tuple isn't 
really a wrapper, but a record with multiple fields and that works fine.  Later 
tonight I will take a look and see what changes I need to make at the LoadFunc 
level.  I am still perplexed why the incorrect behavior used to work.  Thanks 
again Pradeep.

> POProject throws an error with tuples containing a single non-tuple field
> -
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is 
> pulled out.  If that field is not a tuple, a cast exception is thrown.  This 
> is happening in the folliwing block of code at line 401.
>if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
>res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  
> The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. 
>  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at 
> the code, it appears that if I wrapped the previous tuple in another tuple, 
> then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner 
> tuple and return it.  But this would require the LoadFunc to check for tuples 
> with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap 
> another tuple.
>if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) 
> {...
> I don't know the original intent of this code well enough to say this is the 
> appropriate fix or not.  Hoping someone with more Pig experience can help 
> here.  Right now this is preventing the unit tests in AvroStorage from 
> working.  I can change the unit test, but I think in this case the unit test 
> is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-07-06 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060762#comment-13060762
 ] 

Ken Goodhope commented on PIG-1890:
---

A recent change in Pig causes setLocation to be called twice, and if 
setLocation isn't idempotent, then you get twice the output.  My suspicion is 
UNION is further exasperating the problem leading to the input being added 4X.  
Did you still see the problem with the last patch I added?

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Attachments: PIG-1890-1.patch, PIG-1890-2.patch, PIG-1890-3.patch, 
> pig_setloc_avro.txt
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

2011-07-06 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060758#comment-13060758
 ] 

Ken Goodhope commented on PIG-2153:
---

Thanks Pradeep, that is actually very helpful.  If I understand you correctly, 
the outer tuple isn't part of the schema returned by LoadFunc.getSchema().  Is 
it possible that the result of LoadFunc.getNext used to be wrapped in an 
implicit tuple, and that is no longer happening?  

The results of the unit tests with the fix I suggested in my last comment 
showed 11 tests now working that were broke before, and 11 tests now breaking 
that used to work.  This makes me wonder if some of the tests have been written 
with the expectation there is an implicit wrapping tuple, and some have been 
written with expectation that there is no implicit wrapper.  Am I missing 
something?

Here are the test results.

Test that were broke and now work.
> [junit] Test org.apache.pig.test.TestBestFitCast
> [junit] Test org.apache.pig.test.TestCounters
> [junit] Test org.apache.pig.test.TestDataBagAccess
> [junit] Test org.apache.pig.test.TestEmptyInputDir
> [junit] Test org.apache.pig.test.TestImplicitSplit
> [junit] Test org.apache.pig.test.TestInvoker
> [junit] Test org.apache.pig.test.TestPigRunner
> [junit] Test org.apache.pig.test.TestPigSplit
> [junit] Test org.apache.pig.test.TestScriptLanguage
> [junit] Test org.apache.pig.test.TestScriptUDF
> [junit] Test org.apache.pig.test.TestSkewedJoin

Tests that used to work, but break with the fix I tried.
< [junit] Test org.apache.pig.test.TestCombiner FAILED
< [junit] Test org.apache.pig.test.TestCommit FAILED
< [junit] Test org.apache.pig.test.TestEvalPipeline2 FAILED
< [junit] Test org.apache.pig.test.TestEvalPipelineLocal FAILED
< [junit] Test org.apache.pig.test.TestForEachNestedPlanLocal FAILED
< [junit] Test org.apache.pig.test.TestLimitAdjuster FAILED
< [junit] Test org.apache.pig.test.TestMergeJoinOuter FAILED
< [junit] Test org.apache.pig.test.TestProject FAILED
< [junit] Test org.apache.pig.test.TestProjectRange FAILED
< [junit] Test org.apache.pig.test.TestPruneColumn FAILED
< [junit] Test org.apache.pig.test.TestUnionOnSchema FAILED

> POProject throws an error with tuples containing a single non-tuple field
> -
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is 
> pulled out.  If that field is not a tuple, a cast exception is thrown.  This 
> is happening in the folliwing block of code at line 401.
>if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
>res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  
> The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. 
>  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at 
> the code, it appears that if I wrapped the previous tuple in another tuple, 
> then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner 
> tuple and return it.  But this would require the LoadFunc to check for tuples 
> with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap 
> another tuple.
>if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) 
> {...
> I don't know the original intent of this code well enough to say this is the 
> appropriate fix or not.  Hoping someone with more Pig experience can help 
> here.  Right now this is preventing the unit tests in AvroStorage from 
> working.  I can change the unit test, but I think in this case the unit test 
> is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

2011-07-05 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060251#comment-13060251
 ] 

Ken Goodhope commented on PIG-2153:
---

I am the first to admit this is ugly, and if someone has a better idea I would 
be thrilled.  I am currently running unit tests with this possible fix.

if(columns.size() == 1 && ((!overloaded && inpValue.getType(0) == 
DataType.TUPLE) || (overloaded && inpValue.getType(0) == DataType.BAG))) {
...

My current thinking is the reason the previous fix broke so many unit tests is 
single element tuples containing a databag are acceptable if overloaded is set. 
 I will post the results of the tests when complete.

This might fix the issue in ElephantBird, but I haven't had time to investigate 
that.

> POProject throws an error with tuples containing a single non-tuple field
> -
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is 
> pulled out.  If that field is not a tuple, a cast exception is thrown.  This 
> is happening in the folliwing block of code at line 401.
>if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
>res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  
> The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. 
>  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at 
> the code, it appears that if I wrapped the previous tuple in another tuple, 
> then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner 
> tuple and return it.  But this would require the LoadFunc to check for tuples 
> with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap 
> another tuple.
>if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) 
> {...
> I don't know the original intent of this code well enough to say this is the 
> appropriate fix or not.  Hoping someone with more Pig experience can help 
> here.  Right now this is preventing the unit tests in AvroStorage from 
> working.  I can change the unit test, but I think in this case the unit test 
> is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-07-05 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060241#comment-13060241
 ] 

Ken Goodhope commented on PIG-1890:
---

Dmitry, when I inherited the code it was already doing the traversal in 
setLocation, and I didn't consider doing in the InputFormat.  To be honest, I 
am not crazy about adding all the subdirs by default, since this is 
inconsistent with the way a standard map-reduce job works.  But, our users 
expect this behavior, and have pig jobs that depend on it.

If the current patch works, I am inclined to leave it, until I get time to do a 
better re-factoring.

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Attachments: PIG-1890-1.patch, PIG-1890-2.patch, PIG-1890-3.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-07-05 Thread Ken Goodhope (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ken Goodhope updated PIG-1890:
--

Attachment: PIG-1890-3.patch

There are places where we use addInputDir as a true add, not set.  Otherwise 
your solution would work.  I did incorporate the use in a set for 
addAllSubDirs.  Since the method name was no longer descriptive, I changed it 
to getAllSubDirs.  This new patch passed unit tests, but currently there isn't 
a test for UNION.  Let me know if this works.

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Attachments: PIG-1890-1.patch, PIG-1890-2.patch, PIG-1890-3.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-07-05 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060165#comment-13060165
 ] 

Ken Goodhope commented on PIG-1890:
---

Hi Patrick, for our purposes we need setLocation to add all sub-directories, 
including directories more than 2 levels deep.  A common use case for us to to 
have directories organized by time, /MM/dd/hh/mm.  In that case if you want 
to load all the data from a particular month, then you need to add all the 
subdirs.  Your right that a UNION can accomplish this, but it can be painful to 
add the directories that way.  I will take a look at why this is still breaking 
in your case.



> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Attachments: PIG-1890-1.patch, PIG-1890-2.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

2011-07-05 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060062#comment-13060062
 ] 

Ken Goodhope commented on PIG-2153:
---

It looks like the last time this code was touched it was for PIG-1369 by 
Pradeep Kamath.

> POProject throws an error with tuples containing a single non-tuple field
> -
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is 
> pulled out.  If that field is not a tuple, a cast exception is thrown.  This 
> is happening in the folliwing block of code at line 401.
>if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
>res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  
> The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. 
>  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at 
> the code, it appears that if I wrapped the previous tuple in another tuple, 
> then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner 
> tuple and return it.  But this would require the LoadFunc to check for tuples 
> with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap 
> another tuple.
>if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) 
> {...
> I don't know the original intent of this code well enough to say this is the 
> appropriate fix or not.  Hoping someone with more Pig experience can help 
> here.  Right now this is preventing the unit tests in AvroStorage from 
> working.  I can change the unit test, but I think in this case the unit test 
> is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

2011-07-05 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060034#comment-13060034
 ] 

Ken Goodhope commented on PIG-2153:
---

I ran unit tests with the change I recommend in the description.  Good news is 
several tests that failed before now work and are listed below.
org.apache.pig.test.TestBestFitCast 
org.apache.pig.test.TestDataBagAccess 
org.apache.pig.test.TestGrunt 
org.apache.pig.test.TestImplicitSplit 
org.apache.pig.test.TestMapSideCogroup 
org.apache.pig.test.TestPigRunner 
org.apache.pig.test.TestPigSplit 
org.apache.pig.test.TestScriptUDF 

The bad news is several tests that were working now fail.
org.apache.pig.test.TestBuiltin 
org.apache.pig.test.TestCollectedGroup 
org.apache.pig.test.TestCombiner 
org.apache.pig.test.TestCommit 
org.apache.pig.test.TestEvalPipeline2 
org.apache.pig.test.TestEvalPipelineLocal 
org.apache.pig.test.TestFRJoin2 
org.apache.pig.test.TestFilter 
org.apache.pig.test.TestForEach 
org.apache.pig.test.TestForEachNestedPlanLocal 
org.apache.pig.test.TestJoin 
org.apache.pig.test.TestJoinSmoke 
org.apache.pig.test.TestLimitAdjuster 
org.apache.pig.test.TestLocalRearrange 
org.apache.pig.test.TestNativeMapReduce 
org.apache.pig.test.TestNewPlanImplicitSplit 
org.apache.pig.test.TestProject 
org.apache.pig.test.TestStore 
org.apache.pig.test.TestStoreInstances 
org.apache.pig.test.TestUnionOnSchema 

Obviously, there are more tests that break than get fixed.  

> POProject throws an error with tuples containing a single non-tuple field
> -
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is 
> pulled out.  If that field is not a tuple, a cast exception is thrown.  This 
> is happening in the folliwing block of code at line 401.
>if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
>res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  
> The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. 
>  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at 
> the code, it appears that if I wrapped the previous tuple in another tuple, 
> then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner 
> tuple and return it.  But this would require the LoadFunc to check for tuples 
> with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap 
> another tuple.
>if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) 
> {...
> I don't know the original intent of this code well enough to say this is the 
> appropriate fix or not.  Hoping someone with more Pig experience can help 
> here.  Right now this is preventing the unit tests in AvroStorage from 
> working.  I can change the unit test, but I think in this case the unit test 
> is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-07-04 Thread Ken Goodhope (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ken Goodhope updated PIG-1890:
--

Attachment: PIG-1890-2.patch

Attached patch.  Only works if PIG-2153 is fixed.  Until then the unit tests 
still break.  This patch fixes setLocation.

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Attachments: PIG-1890-1.patch, PIG-1890-2.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-07-01 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058813#comment-13058813
 ] 

Ken Goodhope commented on PIG-1890:
---

The fix for this jira involves two parts, making setLocation idempotent, and a 
fix in POProject.  I have added a jira for POProject issue PIG-2153.  I will 
try and get a patch for the setLocation issue added this weekend.  I have made 
some other changes to the version of AvroStorage we are using at LinkedIn and 
want to seperate those changes from any patch I submit for this.

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Attachments: PIG-1890-1.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

2011-07-01 Thread Ken Goodhope (JIRA)
POProject throws an error with tuples containing a single non-tuple field
-

 Key: PIG-2153
 URL: https://issues.apache.org/jira/browse/PIG-2153
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Ken Goodhope


When POProject.getNext(tuple) processes a tuple with one field, the field is 
pulled out.  If that field is not a tuple, a cast exception is thrown.  This is 
happening in the folliwing block of code at line 401.

   if(columns.size() == 1) {
try{
ret = inpValue.get(columns.get(0));
...
   res.result = (Tuple)ret;

I am seeing this error in a unit test that is loading an array of floats.  The 
LoadFunc is converting the array to bag, and wrapping the bag in a tuple.  

({(3.3),(1.2),(5.6)})

This results on POProject attempting to cast the bag to a tuple.  Looking at 
the code, it appears that if I wrapped the previous tuple in another tuple, 
then it would work.

(({(3.3),(1.2),(5.6)}))

In this case it would work because POProject would extract the first inner 
tuple and return it.  But this would require the LoadFunc to check for tuples 
with a single non-tuple field and only wrap those.

This could be fixed by first checking that the tuple does actually wrap another 
tuple.

   if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...

I don't know the original intent of this code well enough to say this is the 
appropriate fix or not.  Hoping someone with more Pig experience can help here. 
 Right now this is preventing the unit tests in AvroStorage from working.  I 
can change the unit test, but I think in this case the unit test is catching a 
real bug.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-05-31 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041915#comment-13041915
 ] 

Ken Goodhope commented on PIG-1890:
---

I need some clarification on the contract for POProject.getNext(Tuple).  Right 
now, if it receives a tuple with a single element, it extracts that element and 
attempts to cast it as a tuple and return it.  This breaks with any single 
element tuple that where the single element is not a tuple.  The code could be 
modified to not extract non-tuple elements.

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Fix For: 0.9.0
>
> Attachments: PIG-1890-1.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-05-23 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038100#comment-13038100
 ] 

Ken Goodhope commented on PIG-1890:
---

Right now, in this test, AvroStorage is attempting to pass back a single array 
of floats with one call to next. To be consistent with intent of how the data 
is stored we want this array returned as a single unit(databag) with each 
foreach call. In other words we don't want foreach to return each element of 
that array one at a time. If I am understanding the code right, it appears that 
is what it is trying to do. Am I missing something? Is there a way to control 
this behavior?



> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Fix For: 0.9.0
>
> Attachments: PIG-1890-1.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-05-15 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033822#comment-13033822
 ] 

Ken Goodhope commented on PIG-1890:
---

For testArrayDefault, we are attempting to return an entire avro array, which 
is consistent with the schema.  The result is tuple with one column, a bag of 
floats".  In POProject.getNext(Tuple), tuples with one column have their single 
column extracted, cast to a tuple, and then returned.  Obviously in this case, 
this results in trying to cast the bag of floats into a tuple and an exception 
being thrown.

Does anyone know why this is being done in POProject?

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Fix For: 0.9.0
>
> Attachments: PIG-1890-1.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-05-02 Thread Ken Goodhope (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027862#comment-13027862
 ] 

Ken Goodhope commented on PIG-1890:
---

I have been working on some fixes to AvroStorage already.  I should be able to 
make sure this issue gets addressed in those fixes as will.  Will have it done 
sometime this week.

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Fix For: 0.9.0
>
> Attachments: PIG-1890-1.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira