[jira] [Commented] (PIG-2767) Pig creates wrong schema after dereferencing nested tuple fields
[ https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638351#comment-13638351 ] Cheolsoo Park commented on PIG-2767: [~daijy], thank you very much for the clarification. That case, I have no objection. > Pig creates wrong schema after dereferencing nested tuple fields > > > Key: PIG-2767 > URL: https://issues.apache.org/jira/browse/PIG-2767 > Project: Pig > Issue Type: Bug > Components: parser >Affects Versions: 0.10.0 > Environment: Amazon EMR, patched to use Pig 0.10.0 >Reporter: Jonathan Packer >Assignee: Daniel Dai > Fix For: 0.12 > > Attachments: PIG-2767-1.patch, test_data.txt > > > The following script fails: > data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3: > int, f4: int); > nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple; > dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3); > DESCRIBE dereferenced; > uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3; > DESCRIBE uses_dereferenced; > The schema of "dereferenced" should be {f1: int, nested_tuple: (f2: int, > f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is > used, the data is actually in form of the correct schema however, ex. > (1,(2,3)) > (5,(6,7)) > ... > This is not just a problem with DESCRIBE. Because the schema is incorrect, > the reference to "nested_tuple" in the "uses_dereferenced" statement is > considered to be invalid, and the script fails to run. The error is: > Invalid field projection. Projected field [nested_tuple] does not exist in > schema: f1:int,f2:int. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2767) Pig creates wrong schema after dereferencing nested tuple fields
[ https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638228#comment-13638228 ] Daniel Dai commented on PIG-2767: - [~cheolsoo], thanks for testing more. I believe that's a different issue. In "((field2, field3))", the inner parentheses translate to TOTUPLE, and the outer one does not. There is ambiguity in the grammar. I remember there is a discussion, we agree that we translate parentheses into TOTUPLE only when there are more than 1 items inside. This is equivalent to the python style. > Pig creates wrong schema after dereferencing nested tuple fields > > > Key: PIG-2767 > URL: https://issues.apache.org/jira/browse/PIG-2767 > Project: Pig > Issue Type: Bug > Components: parser >Affects Versions: 0.10.0 > Environment: Amazon EMR, patched to use Pig 0.10.0 >Reporter: Jonathan Packer >Assignee: Daniel Dai > Fix For: 0.12 > > Attachments: PIG-2767-1.patch, test_data.txt > > > The following script fails: > data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3: > int, f4: int); > nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple; > dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3); > DESCRIBE dereferenced; > uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3; > DESCRIBE uses_dereferenced; > The schema of "dereferenced" should be {f1: int, nested_tuple: (f2: int, > f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is > used, the data is actually in form of the correct schema however, ex. > (1,(2,3)) > (5,(6,7)) > ... > This is not just a problem with DESCRIBE. Because the schema is incorrect, > the reference to "nested_tuple" in the "uses_dereferenced" statement is > considered to be invalid, and the script fails to run. The error is: > Invalid field projection. Projected field [nested_tuple] does not exist in > schema: f1:int,f2:int. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2767) Pig creates wrong schema after dereferencing nested tuple fields
[ https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637385#comment-13637385 ] Cheolsoo Park commented on PIG-2767: Wait. I think there is a case that is not covered by the patch. What if we have multiple levels of nesting? Consider the following example: {code} a = load 'a' as (field1: int, field2: int, field3: int ); b = foreach a generate field1, TOTUPLE(TOTUPLE(field2, field3)); c = foreach a generate field1, ((field2, field3)); describe b; describe c; {code} With the patch, this still gives me incorrect results: {code} b: {field1: int,org.apache.pig.builtin.totuple_org.apache.pig.builtin.totuple_field3_13_14: (org.apache.pig.builtin.totuple_field3_13: (field2: int,field3: int))} c: {field1: int,org.apache.pig.builtin.totuple_field3_23: (field2: int,field3: int)} {code} > Pig creates wrong schema after dereferencing nested tuple fields > > > Key: PIG-2767 > URL: https://issues.apache.org/jira/browse/PIG-2767 > Project: Pig > Issue Type: Bug > Components: parser >Affects Versions: 0.10.0 > Environment: Amazon EMR, patched to use Pig 0.10.0 >Reporter: Jonathan Packer >Assignee: Daniel Dai > Fix For: 0.12 > > Attachments: PIG-2767-1.patch, test_data.txt > > > The following script fails: > data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3: > int, f4: int); > nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple; > dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3); > DESCRIBE dereferenced; > uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3; > DESCRIBE uses_dereferenced; > The schema of "dereferenced" should be {f1: int, nested_tuple: (f2: int, > f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is > used, the data is actually in form of the correct schema however, ex. > (1,(2,3)) > (5,(6,7)) > ... > This is not just a problem with DESCRIBE. Because the schema is incorrect, > the reference to "nested_tuple" in the "uses_dereferenced" statement is > considered to be invalid, and the script fails to run. The error is: > Invalid field projection. Projected field [nested_tuple] does not exist in > schema: f1:int,f2:int. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2767) Pig creates wrong schema after dereferencing nested tuple fields
[ https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637368#comment-13637368 ] Prashant Kommireddi commented on PIG-2767: -- With Alan's +1, this can be committed? > Pig creates wrong schema after dereferencing nested tuple fields > > > Key: PIG-2767 > URL: https://issues.apache.org/jira/browse/PIG-2767 > Project: Pig > Issue Type: Bug > Components: parser >Affects Versions: 0.10.0 > Environment: Amazon EMR, patched to use Pig 0.10.0 >Reporter: Jonathan Packer >Assignee: Daniel Dai > Fix For: 0.12 > > Attachments: PIG-2767-1.patch, test_data.txt > > > The following script fails: > data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3: > int, f4: int); > nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple; > dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3); > DESCRIBE dereferenced; > uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3; > DESCRIBE uses_dereferenced; > The schema of "dereferenced" should be {f1: int, nested_tuple: (f2: int, > f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is > used, the data is actually in form of the correct schema however, ex. > (1,(2,3)) > (5,(6,7)) > ... > This is not just a problem with DESCRIBE. Because the schema is incorrect, > the reference to "nested_tuple" in the "uses_dereferenced" statement is > considered to be invalid, and the script fails to run. The error is: > Invalid field projection. Projected field [nested_tuple] does not exist in > schema: f1:int,f2:int. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2767) Pig creates wrong schema after dereferencing nested tuple fields
[ https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633111#comment-13633111 ] Alan Gates commented on PIG-2767: - +1. > Pig creates wrong schema after dereferencing nested tuple fields > > > Key: PIG-2767 > URL: https://issues.apache.org/jira/browse/PIG-2767 > Project: Pig > Issue Type: Bug > Components: parser >Affects Versions: 0.10.0 > Environment: Amazon EMR, patched to use Pig 0.10.0 >Reporter: Jonathan Packer >Assignee: Daniel Dai > Fix For: 0.12 > > Attachments: PIG-2767-1.patch, test_data.txt > > > The following script fails: > data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3: > int, f4: int); > nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple; > dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3); > DESCRIBE dereferenced; > uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3; > DESCRIBE uses_dereferenced; > The schema of "dereferenced" should be {f1: int, nested_tuple: (f2: int, > f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is > used, the data is actually in form of the correct schema however, ex. > (1,(2,3)) > (5,(6,7)) > ... > This is not just a problem with DESCRIBE. Because the schema is incorrect, > the reference to "nested_tuple" in the "uses_dereferenced" statement is > considered to be invalid, and the script fails to run. The error is: > Invalid field projection. Projected field [nested_tuple] does not exist in > schema: f1:int,f2:int. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira