[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062666#comment-13062666 ] Ken Goodhope commented on PIG-2153: --- The behavior of POProject is correct. LoadFuncs need make sure the pig schema they return does not include the implicit wrapping tuple. The schema should only reflect the contents inside the wrapping tuple. I am not 100% sure how this relates to the issue with ElephantBird, but I am reasonably convinced the problem there would lie in either how the schema is built, or possibly how the logical plan is being executed. Regardless I believe this jira can be closed, since POProject is no longer suspect. > POProject throws an error with tuples containing a single non-tuple field > - > > Key: PIG-2153 > URL: https://issues.apache.org/jira/browse/PIG-2153 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.1 >Reporter: Ken Goodhope > > When POProject.getNext(tuple) processes a tuple with one field, the field is > pulled out. If that field is not a tuple, a cast exception is thrown. This > is happening in the folliwing block of code at line 401. >if(columns.size() == 1) { > try{ > ret = inpValue.get(columns.get(0)); > ... >res.result = (Tuple)ret; > I am seeing this error in a unit test that is loading an array of floats. > The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. > > ({(3.3),(1.2),(5.6)}) > This results on POProject attempting to cast the bag to a tuple. Looking at > the code, it appears that if I wrapped the previous tuple in another tuple, > then it would work. > (({(3.3),(1.2),(5.6)})) > In this case it would work because POProject would extract the first inner > tuple and return it. But this would require the LoadFunc to check for tuples > with a single non-tuple field and only wrap those. > This could be fixed by first checking that the tuple does actually wrap > another tuple. >if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) > {... > I don't know the original intent of this code well enough to say this is the > appropriate fix or not. Hoping someone with more Pig experience can help > here. Right now this is preventing the unit tests in AvroStorage from > working. I can change the unit test, but I think in this case the unit test > is catching a real bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1890) Fix piggybank unit test TestAvroStorage
[ https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Goodhope updated PIG-1890: -- Labels: patch (was: ) Release Note: Fixed AvroStorage unit tests. Status: Patch Available (was: Open) PIG-1890-4.patch ready for review. All unit test now working against trunk. > Fix piggybank unit test TestAvroStorage > --- > > Key: PIG-1890 > URL: https://issues.apache.org/jira/browse/PIG-1890 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Jakob Homan > Labels: patch > Attachments: PIG-1890-1.patch, PIG-1890-2.patch, PIG-1890-3.patch, > PIG-1890-4.patch, pig_setloc_avro.txt > > > TestAvroStorage fail on trunk. There are two reasons: > 1. After PIG-1680, we call LoadFunc.setLocation one more time. > 2. The schema for AvroStorage seems to be wrong. For example, in first test > case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: > {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This > issue is hidden until PIG-1188 checked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage
[ https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062664#comment-13062664 ] Ken Goodhope commented on PIG-1890: --- Removing the blocker for PIG-2153. Turns out the problem, as first asserted, was in AvroStorage. The new logical plan must handle implicit wrapping tuples differently than used to be the case. In order to make this work, I removed the wrapping tuple from the schema produced by getSchema. getNext still returns its result in the wrapping tuple. I also had to modify putNext, to expect a piq schema without the implicit wrapping tuple. > Fix piggybank unit test TestAvroStorage > --- > > Key: PIG-1890 > URL: https://issues.apache.org/jira/browse/PIG-1890 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Jakob Homan > Attachments: PIG-1890-1.patch, PIG-1890-2.patch, PIG-1890-3.patch, > PIG-1890-4.patch, pig_setloc_avro.txt > > > TestAvroStorage fail on trunk. There are two reasons: > 1. After PIG-1680, we call LoadFunc.setLocation one more time. > 2. The schema for AvroStorage seems to be wrong. For example, in first test > case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: > {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This > issue is hidden until PIG-1188 checked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1890) Fix piggybank unit test TestAvroStorage
[ https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Goodhope updated PIG-1890: -- Attachment: PIG-1890-4.patch Uploading new patch that contains the same fixes to setLocation contained in the previous patch. New patch adds fixes to the schema that resolve the issues around the unit tests. > Fix piggybank unit test TestAvroStorage > --- > > Key: PIG-1890 > URL: https://issues.apache.org/jira/browse/PIG-1890 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Jakob Homan > Attachments: PIG-1890-1.patch, PIG-1890-2.patch, PIG-1890-3.patch, > PIG-1890-4.patch, pig_setloc_avro.txt > > > TestAvroStorage fail on trunk. There are two reasons: > 1. After PIG-1680, we call LoadFunc.setLocation one more time. > 2. The schema for AvroStorage seems to be wrong. For example, in first test > case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: > {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This > issue is hidden until PIG-1188 checked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061035#comment-13061035 ] Ken Goodhope commented on PIG-2153: --- In my LoadFunc, I modified getSchema to check for a single element wrapping tuple and return the inner ResourceSchema when one is found. This fixed the errors I was getting from POProject.java. The unit tests for my LoadFunc are still breaking, because the output has changed. However I suspect the new output is correct, so after some more investigation I will probably change the unit tests. Why including the wrapping tuple in the schema used to work is still a mystery. Maybe someone currently working on the project can answer that question. > POProject throws an error with tuples containing a single non-tuple field > - > > Key: PIG-2153 > URL: https://issues.apache.org/jira/browse/PIG-2153 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.1 >Reporter: Ken Goodhope > > When POProject.getNext(tuple) processes a tuple with one field, the field is > pulled out. If that field is not a tuple, a cast exception is thrown. This > is happening in the folliwing block of code at line 401. >if(columns.size() == 1) { > try{ > ret = inpValue.get(columns.get(0)); > ... >res.result = (Tuple)ret; > I am seeing this error in a unit test that is loading an array of floats. > The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. > > ({(3.3),(1.2),(5.6)}) > This results on POProject attempting to cast the bag to a tuple. Looking at > the code, it appears that if I wrapped the previous tuple in another tuple, > then it would work. > (({(3.3),(1.2),(5.6)})) > In this case it would work because POProject would extract the first inner > tuple and return it. But this would require the LoadFunc to check for tuples > with a single non-tuple field and only wrap those. > This could be fixed by first checking that the tuple does actually wrap > another tuple. >if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) > {... > I don't know the original intent of this code well enough to say this is the > appropriate fix or not. Hoping someone with more Pig experience can help > here. Right now this is preventing the unit tests in AvroStorage from > working. I can change the unit test, but I think in this case the unit test > is catching a real bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060860#comment-13060860 ] Ken Goodhope commented on PIG-2153: --- That makes sense, and if it is still the case it would mean the fix needs to occur in the LoadFunc and not POProject. This is also consistent with the original comments by Daniel Dae for PIG-1890. AvroStorage has always included the wrapping tuple as part of the schema. In most cases the outer tuple isn't really a wrapper, but a record with multiple fields and that works fine. Later tonight I will take a look and see what changes I need to make at the LoadFunc level. I am still perplexed why the incorrect behavior used to work. Thanks again Pradeep. > POProject throws an error with tuples containing a single non-tuple field > - > > Key: PIG-2153 > URL: https://issues.apache.org/jira/browse/PIG-2153 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.1 >Reporter: Ken Goodhope > > When POProject.getNext(tuple) processes a tuple with one field, the field is > pulled out. If that field is not a tuple, a cast exception is thrown. This > is happening in the folliwing block of code at line 401. >if(columns.size() == 1) { > try{ > ret = inpValue.get(columns.get(0)); > ... >res.result = (Tuple)ret; > I am seeing this error in a unit test that is loading an array of floats. > The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. > > ({(3.3),(1.2),(5.6)}) > This results on POProject attempting to cast the bag to a tuple. Looking at > the code, it appears that if I wrapped the previous tuple in another tuple, > then it would work. > (({(3.3),(1.2),(5.6)})) > In this case it would work because POProject would extract the first inner > tuple and return it. But this would require the LoadFunc to check for tuples > with a single non-tuple field and only wrap those. > This could be fixed by first checking that the tuple does actually wrap > another tuple. >if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) > {... > I don't know the original intent of this code well enough to say this is the > appropriate fix or not. Hoping someone with more Pig experience can help > here. Right now this is preventing the unit tests in AvroStorage from > working. I can change the unit test, but I think in this case the unit test > is catching a real bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage
[ https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060762#comment-13060762 ] Ken Goodhope commented on PIG-1890: --- A recent change in Pig causes setLocation to be called twice, and if setLocation isn't idempotent, then you get twice the output. My suspicion is UNION is further exasperating the problem leading to the input being added 4X. Did you still see the problem with the last patch I added? > Fix piggybank unit test TestAvroStorage > --- > > Key: PIG-1890 > URL: https://issues.apache.org/jira/browse/PIG-1890 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Jakob Homan > Attachments: PIG-1890-1.patch, PIG-1890-2.patch, PIG-1890-3.patch, > pig_setloc_avro.txt > > > TestAvroStorage fail on trunk. There are two reasons: > 1. After PIG-1680, we call LoadFunc.setLocation one more time. > 2. The schema for AvroStorage seems to be wrong. For example, in first test > case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: > {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This > issue is hidden until PIG-1188 checked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060758#comment-13060758 ] Ken Goodhope commented on PIG-2153: --- Thanks Pradeep, that is actually very helpful. If I understand you correctly, the outer tuple isn't part of the schema returned by LoadFunc.getSchema(). Is it possible that the result of LoadFunc.getNext used to be wrapped in an implicit tuple, and that is no longer happening? The results of the unit tests with the fix I suggested in my last comment showed 11 tests now working that were broke before, and 11 tests now breaking that used to work. This makes me wonder if some of the tests have been written with the expectation there is an implicit wrapping tuple, and some have been written with expectation that there is no implicit wrapper. Am I missing something? Here are the test results. Test that were broke and now work. > [junit] Test org.apache.pig.test.TestBestFitCast > [junit] Test org.apache.pig.test.TestCounters > [junit] Test org.apache.pig.test.TestDataBagAccess > [junit] Test org.apache.pig.test.TestEmptyInputDir > [junit] Test org.apache.pig.test.TestImplicitSplit > [junit] Test org.apache.pig.test.TestInvoker > [junit] Test org.apache.pig.test.TestPigRunner > [junit] Test org.apache.pig.test.TestPigSplit > [junit] Test org.apache.pig.test.TestScriptLanguage > [junit] Test org.apache.pig.test.TestScriptUDF > [junit] Test org.apache.pig.test.TestSkewedJoin Tests that used to work, but break with the fix I tried. < [junit] Test org.apache.pig.test.TestCombiner FAILED < [junit] Test org.apache.pig.test.TestCommit FAILED < [junit] Test org.apache.pig.test.TestEvalPipeline2 FAILED < [junit] Test org.apache.pig.test.TestEvalPipelineLocal FAILED < [junit] Test org.apache.pig.test.TestForEachNestedPlanLocal FAILED < [junit] Test org.apache.pig.test.TestLimitAdjuster FAILED < [junit] Test org.apache.pig.test.TestMergeJoinOuter FAILED < [junit] Test org.apache.pig.test.TestProject FAILED < [junit] Test org.apache.pig.test.TestProjectRange FAILED < [junit] Test org.apache.pig.test.TestPruneColumn FAILED < [junit] Test org.apache.pig.test.TestUnionOnSchema FAILED > POProject throws an error with tuples containing a single non-tuple field > - > > Key: PIG-2153 > URL: https://issues.apache.org/jira/browse/PIG-2153 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.1 >Reporter: Ken Goodhope > > When POProject.getNext(tuple) processes a tuple with one field, the field is > pulled out. If that field is not a tuple, a cast exception is thrown. This > is happening in the folliwing block of code at line 401. >if(columns.size() == 1) { > try{ > ret = inpValue.get(columns.get(0)); > ... >res.result = (Tuple)ret; > I am seeing this error in a unit test that is loading an array of floats. > The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. > > ({(3.3),(1.2),(5.6)}) > This results on POProject attempting to cast the bag to a tuple. Looking at > the code, it appears that if I wrapped the previous tuple in another tuple, > then it would work. > (({(3.3),(1.2),(5.6)})) > In this case it would work because POProject would extract the first inner > tuple and return it. But this would require the LoadFunc to check for tuples > with a single non-tuple field and only wrap those. > This could be fixed by first checking that the tuple does actually wrap > another tuple. >if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) > {... > I don't know the original intent of this code well enough to say this is the > appropriate fix or not. Hoping someone with more Pig experience can help > here. Right now this is preventing the unit tests in AvroStorage from > working. I can change the unit test, but I think in this case the unit test > is catching a real bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060251#comment-13060251 ] Ken Goodhope commented on PIG-2153: --- I am the first to admit this is ugly, and if someone has a better idea I would be thrilled. I am currently running unit tests with this possible fix. if(columns.size() == 1 && ((!overloaded && inpValue.getType(0) == DataType.TUPLE) || (overloaded && inpValue.getType(0) == DataType.BAG))) { ... My current thinking is the reason the previous fix broke so many unit tests is single element tuples containing a databag are acceptable if overloaded is set. I will post the results of the tests when complete. This might fix the issue in ElephantBird, but I haven't had time to investigate that. > POProject throws an error with tuples containing a single non-tuple field > - > > Key: PIG-2153 > URL: https://issues.apache.org/jira/browse/PIG-2153 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.1 >Reporter: Ken Goodhope > > When POProject.getNext(tuple) processes a tuple with one field, the field is > pulled out. If that field is not a tuple, a cast exception is thrown. This > is happening in the folliwing block of code at line 401. >if(columns.size() == 1) { > try{ > ret = inpValue.get(columns.get(0)); > ... >res.result = (Tuple)ret; > I am seeing this error in a unit test that is loading an array of floats. > The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. > > ({(3.3),(1.2),(5.6)}) > This results on POProject attempting to cast the bag to a tuple. Looking at > the code, it appears that if I wrapped the previous tuple in another tuple, > then it would work. > (({(3.3),(1.2),(5.6)})) > In this case it would work because POProject would extract the first inner > tuple and return it. But this would require the LoadFunc to check for tuples > with a single non-tuple field and only wrap those. > This could be fixed by first checking that the tuple does actually wrap > another tuple. >if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) > {... > I don't know the original intent of this code well enough to say this is the > appropriate fix or not. Hoping someone with more Pig experience can help > here. Right now this is preventing the unit tests in AvroStorage from > working. I can change the unit test, but I think in this case the unit test > is catching a real bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage
[ https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060241#comment-13060241 ] Ken Goodhope commented on PIG-1890: --- Dmitry, when I inherited the code it was already doing the traversal in setLocation, and I didn't consider doing in the InputFormat. To be honest, I am not crazy about adding all the subdirs by default, since this is inconsistent with the way a standard map-reduce job works. But, our users expect this behavior, and have pig jobs that depend on it. If the current patch works, I am inclined to leave it, until I get time to do a better re-factoring. > Fix piggybank unit test TestAvroStorage > --- > > Key: PIG-1890 > URL: https://issues.apache.org/jira/browse/PIG-1890 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Jakob Homan > Attachments: PIG-1890-1.patch, PIG-1890-2.patch, PIG-1890-3.patch > > > TestAvroStorage fail on trunk. There are two reasons: > 1. After PIG-1680, we call LoadFunc.setLocation one more time. > 2. The schema for AvroStorage seems to be wrong. For example, in first test > case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: > {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This > issue is hidden until PIG-1188 checked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1890) Fix piggybank unit test TestAvroStorage
[ https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Goodhope updated PIG-1890: -- Attachment: PIG-1890-3.patch There are places where we use addInputDir as a true add, not set. Otherwise your solution would work. I did incorporate the use in a set for addAllSubDirs. Since the method name was no longer descriptive, I changed it to getAllSubDirs. This new patch passed unit tests, but currently there isn't a test for UNION. Let me know if this works. > Fix piggybank unit test TestAvroStorage > --- > > Key: PIG-1890 > URL: https://issues.apache.org/jira/browse/PIG-1890 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Jakob Homan > Attachments: PIG-1890-1.patch, PIG-1890-2.patch, PIG-1890-3.patch > > > TestAvroStorage fail on trunk. There are two reasons: > 1. After PIG-1680, we call LoadFunc.setLocation one more time. > 2. The schema for AvroStorage seems to be wrong. For example, in first test > case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: > {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This > issue is hidden until PIG-1188 checked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage
[ https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060165#comment-13060165 ] Ken Goodhope commented on PIG-1890: --- Hi Patrick, for our purposes we need setLocation to add all sub-directories, including directories more than 2 levels deep. A common use case for us to to have directories organized by time, /MM/dd/hh/mm. In that case if you want to load all the data from a particular month, then you need to add all the subdirs. Your right that a UNION can accomplish this, but it can be painful to add the directories that way. I will take a look at why this is still breaking in your case. > Fix piggybank unit test TestAvroStorage > --- > > Key: PIG-1890 > URL: https://issues.apache.org/jira/browse/PIG-1890 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Jakob Homan > Attachments: PIG-1890-1.patch, PIG-1890-2.patch > > > TestAvroStorage fail on trunk. There are two reasons: > 1. After PIG-1680, we call LoadFunc.setLocation one more time. > 2. The schema for AvroStorage seems to be wrong. For example, in first test > case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: > {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This > issue is hidden until PIG-1188 checked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060062#comment-13060062 ] Ken Goodhope commented on PIG-2153: --- It looks like the last time this code was touched it was for PIG-1369 by Pradeep Kamath. > POProject throws an error with tuples containing a single non-tuple field > - > > Key: PIG-2153 > URL: https://issues.apache.org/jira/browse/PIG-2153 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.1 >Reporter: Ken Goodhope > > When POProject.getNext(tuple) processes a tuple with one field, the field is > pulled out. If that field is not a tuple, a cast exception is thrown. This > is happening in the folliwing block of code at line 401. >if(columns.size() == 1) { > try{ > ret = inpValue.get(columns.get(0)); > ... >res.result = (Tuple)ret; > I am seeing this error in a unit test that is loading an array of floats. > The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. > > ({(3.3),(1.2),(5.6)}) > This results on POProject attempting to cast the bag to a tuple. Looking at > the code, it appears that if I wrapped the previous tuple in another tuple, > then it would work. > (({(3.3),(1.2),(5.6)})) > In this case it would work because POProject would extract the first inner > tuple and return it. But this would require the LoadFunc to check for tuples > with a single non-tuple field and only wrap those. > This could be fixed by first checking that the tuple does actually wrap > another tuple. >if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) > {... > I don't know the original intent of this code well enough to say this is the > appropriate fix or not. Hoping someone with more Pig experience can help > here. Right now this is preventing the unit tests in AvroStorage from > working. I can change the unit test, but I think in this case the unit test > is catching a real bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060034#comment-13060034 ] Ken Goodhope commented on PIG-2153: --- I ran unit tests with the change I recommend in the description. Good news is several tests that failed before now work and are listed below. org.apache.pig.test.TestBestFitCast org.apache.pig.test.TestDataBagAccess org.apache.pig.test.TestGrunt org.apache.pig.test.TestImplicitSplit org.apache.pig.test.TestMapSideCogroup org.apache.pig.test.TestPigRunner org.apache.pig.test.TestPigSplit org.apache.pig.test.TestScriptUDF The bad news is several tests that were working now fail. org.apache.pig.test.TestBuiltin org.apache.pig.test.TestCollectedGroup org.apache.pig.test.TestCombiner org.apache.pig.test.TestCommit org.apache.pig.test.TestEvalPipeline2 org.apache.pig.test.TestEvalPipelineLocal org.apache.pig.test.TestFRJoin2 org.apache.pig.test.TestFilter org.apache.pig.test.TestForEach org.apache.pig.test.TestForEachNestedPlanLocal org.apache.pig.test.TestJoin org.apache.pig.test.TestJoinSmoke org.apache.pig.test.TestLimitAdjuster org.apache.pig.test.TestLocalRearrange org.apache.pig.test.TestNativeMapReduce org.apache.pig.test.TestNewPlanImplicitSplit org.apache.pig.test.TestProject org.apache.pig.test.TestStore org.apache.pig.test.TestStoreInstances org.apache.pig.test.TestUnionOnSchema Obviously, there are more tests that break than get fixed. > POProject throws an error with tuples containing a single non-tuple field > - > > Key: PIG-2153 > URL: https://issues.apache.org/jira/browse/PIG-2153 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.1 >Reporter: Ken Goodhope > > When POProject.getNext(tuple) processes a tuple with one field, the field is > pulled out. If that field is not a tuple, a cast exception is thrown. This > is happening in the folliwing block of code at line 401. >if(columns.size() == 1) { > try{ > ret = inpValue.get(columns.get(0)); > ... >res.result = (Tuple)ret; > I am seeing this error in a unit test that is loading an array of floats. > The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. > > ({(3.3),(1.2),(5.6)}) > This results on POProject attempting to cast the bag to a tuple. Looking at > the code, it appears that if I wrapped the previous tuple in another tuple, > then it would work. > (({(3.3),(1.2),(5.6)})) > In this case it would work because POProject would extract the first inner > tuple and return it. But this would require the LoadFunc to check for tuples > with a single non-tuple field and only wrap those. > This could be fixed by first checking that the tuple does actually wrap > another tuple. >if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) > {... > I don't know the original intent of this code well enough to say this is the > appropriate fix or not. Hoping someone with more Pig experience can help > here. Right now this is preventing the unit tests in AvroStorage from > working. I can change the unit test, but I think in this case the unit test > is catching a real bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1890) Fix piggybank unit test TestAvroStorage
[ https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Goodhope updated PIG-1890: -- Attachment: PIG-1890-2.patch Attached patch. Only works if PIG-2153 is fixed. Until then the unit tests still break. This patch fixes setLocation. > Fix piggybank unit test TestAvroStorage > --- > > Key: PIG-1890 > URL: https://issues.apache.org/jira/browse/PIG-1890 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Jakob Homan > Attachments: PIG-1890-1.patch, PIG-1890-2.patch > > > TestAvroStorage fail on trunk. There are two reasons: > 1. After PIG-1680, we call LoadFunc.setLocation one more time. > 2. The schema for AvroStorage seems to be wrong. For example, in first test > case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: > {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This > issue is hidden until PIG-1188 checked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage
[ https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058813#comment-13058813 ] Ken Goodhope commented on PIG-1890: --- The fix for this jira involves two parts, making setLocation idempotent, and a fix in POProject. I have added a jira for POProject issue PIG-2153. I will try and get a patch for the setLocation issue added this weekend. I have made some other changes to the version of AvroStorage we are using at LinkedIn and want to seperate those changes from any patch I submit for this. > Fix piggybank unit test TestAvroStorage > --- > > Key: PIG-1890 > URL: https://issues.apache.org/jira/browse/PIG-1890 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Jakob Homan > Attachments: PIG-1890-1.patch > > > TestAvroStorage fail on trunk. There are two reasons: > 1. After PIG-1680, we call LoadFunc.setLocation one more time. > 2. The schema for AvroStorage seems to be wrong. For example, in first test > case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: > {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This > issue is hidden until PIG-1188 checked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field
POProject throws an error with tuples containing a single non-tuple field - Key: PIG-2153 URL: https://issues.apache.org/jira/browse/PIG-2153 Project: Pig Issue Type: Bug Affects Versions: 0.8.1 Reporter: Ken Goodhope When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out. If that field is not a tuple, a cast exception is thrown. This is happening in the folliwing block of code at line 401. if(columns.size() == 1) { try{ ret = inpValue.get(columns.get(0)); ... res.result = (Tuple)ret; I am seeing this error in a unit test that is loading an array of floats. The LoadFunc is converting the array to bag, and wrapping the bag in a tuple. ({(3.3),(1.2),(5.6)}) This results on POProject attempting to cast the bag to a tuple. Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work. (({(3.3),(1.2),(5.6)})) In this case it would work because POProject would extract the first inner tuple and return it. But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those. This could be fixed by first checking that the tuple does actually wrap another tuple. if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {... I don't know the original intent of this code well enough to say this is the appropriate fix or not. Hoping someone with more Pig experience can help here. Right now this is preventing the unit tests in AvroStorage from working. I can change the unit test, but I think in this case the unit test is catching a real bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage
[ https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041915#comment-13041915 ] Ken Goodhope commented on PIG-1890: --- I need some clarification on the contract for POProject.getNext(Tuple). Right now, if it receives a tuple with a single element, it extracts that element and attempts to cast it as a tuple and return it. This breaks with any single element tuple that where the single element is not a tuple. The code could be modified to not extract non-tuple elements. > Fix piggybank unit test TestAvroStorage > --- > > Key: PIG-1890 > URL: https://issues.apache.org/jira/browse/PIG-1890 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Jakob Homan > Fix For: 0.9.0 > > Attachments: PIG-1890-1.patch > > > TestAvroStorage fail on trunk. There are two reasons: > 1. After PIG-1680, we call LoadFunc.setLocation one more time. > 2. The schema for AvroStorage seems to be wrong. For example, in first test > case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: > {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This > issue is hidden until PIG-1188 checked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage
[ https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038100#comment-13038100 ] Ken Goodhope commented on PIG-1890: --- Right now, in this test, AvroStorage is attempting to pass back a single array of floats with one call to next. To be consistent with intent of how the data is stored we want this array returned as a single unit(databag) with each foreach call. In other words we don't want foreach to return each element of that array one at a time. If I am understanding the code right, it appears that is what it is trying to do. Am I missing something? Is there a way to control this behavior? > Fix piggybank unit test TestAvroStorage > --- > > Key: PIG-1890 > URL: https://issues.apache.org/jira/browse/PIG-1890 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Jakob Homan > Fix For: 0.9.0 > > Attachments: PIG-1890-1.patch > > > TestAvroStorage fail on trunk. There are two reasons: > 1. After PIG-1680, we call LoadFunc.setLocation one more time. > 2. The schema for AvroStorage seems to be wrong. For example, in first test > case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: > {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This > issue is hidden until PIG-1188 checked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage
[ https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033822#comment-13033822 ] Ken Goodhope commented on PIG-1890: --- For testArrayDefault, we are attempting to return an entire avro array, which is consistent with the schema. The result is tuple with one column, a bag of floats". In POProject.getNext(Tuple), tuples with one column have their single column extracted, cast to a tuple, and then returned. Obviously in this case, this results in trying to cast the bag of floats into a tuple and an exception being thrown. Does anyone know why this is being done in POProject? > Fix piggybank unit test TestAvroStorage > --- > > Key: PIG-1890 > URL: https://issues.apache.org/jira/browse/PIG-1890 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Jakob Homan > Fix For: 0.9.0 > > Attachments: PIG-1890-1.patch > > > TestAvroStorage fail on trunk. There are two reasons: > 1. After PIG-1680, we call LoadFunc.setLocation one more time. > 2. The schema for AvroStorage seems to be wrong. For example, in first test > case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: > {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This > issue is hidden until PIG-1188 checked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage
[ https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027862#comment-13027862 ] Ken Goodhope commented on PIG-1890: --- I have been working on some fixes to AvroStorage already. I should be able to make sure this issue gets addressed in those fixes as will. Will have it done sometime this week. > Fix piggybank unit test TestAvroStorage > --- > > Key: PIG-1890 > URL: https://issues.apache.org/jira/browse/PIG-1890 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Jakob Homan > Fix For: 0.9.0 > > Attachments: PIG-1890-1.patch > > > TestAvroStorage fail on trunk. There are two reasons: > 1. After PIG-1680, we call LoadFunc.setLocation one more time. > 2. The schema for AvroStorage seems to be wrong. For example, in first test > case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: > {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This > issue is hidden until PIG-1188 checked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira