[jira] [Commented] (PIG-2767) Pig creates wrong schema after dereferencing nested tuple fields

2013-04-22 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638351#comment-13638351
 ] 

Cheolsoo Park commented on PIG-2767:


[~daijy], thank you very much for the clarification. That case, I have no 
objection.

> Pig creates wrong schema after dereferencing nested tuple fields
> 
>
> Key: PIG-2767
> URL: https://issues.apache.org/jira/browse/PIG-2767
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.10.0
> Environment: Amazon EMR, patched to use Pig 0.10.0
>Reporter: Jonathan Packer
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-2767-1.patch, test_data.txt
>
>
> The following script fails:
> data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3:
> int, f4: int);
> nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple;
> dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3);
> DESCRIBE dereferenced;
> uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3;
> DESCRIBE uses_dereferenced;
> The schema of "dereferenced" should be {f1: int, nested_tuple: (f2: int,
> f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is
> used, the data is actually in form of the correct schema however, ex.
> (1,(2,3))
> (5,(6,7))
> ...
> This is not just a problem with DESCRIBE. Because the schema is incorrect,
> the reference to "nested_tuple" in the "uses_dereferenced" statement is
> considered to be invalid, and the script fails to run. The error is:
> Invalid field projection. Projected field [nested_tuple] does not exist in
> schema: f1:int,f2:int.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2767) Pig creates wrong schema after dereferencing nested tuple fields

2013-04-22 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638228#comment-13638228
 ] 

Daniel Dai commented on PIG-2767:
-

[~cheolsoo], thanks for testing more. I believe that's a different issue. In 
"((field2, field3))", the inner parentheses translate to TOTUPLE, and the outer 
one does not. There is ambiguity in the grammar. I remember there is a 
discussion, we agree that we translate parentheses into TOTUPLE only when there 
are more than 1 items inside. This is equivalent to the python style.

> Pig creates wrong schema after dereferencing nested tuple fields
> 
>
> Key: PIG-2767
> URL: https://issues.apache.org/jira/browse/PIG-2767
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.10.0
> Environment: Amazon EMR, patched to use Pig 0.10.0
>Reporter: Jonathan Packer
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-2767-1.patch, test_data.txt
>
>
> The following script fails:
> data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3:
> int, f4: int);
> nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple;
> dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3);
> DESCRIBE dereferenced;
> uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3;
> DESCRIBE uses_dereferenced;
> The schema of "dereferenced" should be {f1: int, nested_tuple: (f2: int,
> f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is
> used, the data is actually in form of the correct schema however, ex.
> (1,(2,3))
> (5,(6,7))
> ...
> This is not just a problem with DESCRIBE. Because the schema is incorrect,
> the reference to "nested_tuple" in the "uses_dereferenced" statement is
> considered to be invalid, and the script fails to run. The error is:
> Invalid field projection. Projected field [nested_tuple] does not exist in
> schema: f1:int,f2:int.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2767) Pig creates wrong schema after dereferencing nested tuple fields

2013-04-20 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637385#comment-13637385
 ] 

Cheolsoo Park commented on PIG-2767:


Wait. I think there is a case that is not covered by the patch. What if we have 
multiple levels of nesting?

Consider the following example:
{code}
a = load 'a' as (field1: int, field2: int, field3: int );
b = foreach a generate field1, TOTUPLE(TOTUPLE(field2, field3));
c = foreach a generate field1, ((field2, field3));
describe b;
describe c;
{code}
With the patch, this still gives me incorrect results:
{code}
b: {field1: 
int,org.apache.pig.builtin.totuple_org.apache.pig.builtin.totuple_field3_13_14: 
(org.apache.pig.builtin.totuple_field3_13: (field2: int,field3: int))}
c: {field1: int,org.apache.pig.builtin.totuple_field3_23: (field2: int,field3: 
int)}
{code}

> Pig creates wrong schema after dereferencing nested tuple fields
> 
>
> Key: PIG-2767
> URL: https://issues.apache.org/jira/browse/PIG-2767
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.10.0
> Environment: Amazon EMR, patched to use Pig 0.10.0
>Reporter: Jonathan Packer
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-2767-1.patch, test_data.txt
>
>
> The following script fails:
> data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3:
> int, f4: int);
> nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple;
> dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3);
> DESCRIBE dereferenced;
> uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3;
> DESCRIBE uses_dereferenced;
> The schema of "dereferenced" should be {f1: int, nested_tuple: (f2: int,
> f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is
> used, the data is actually in form of the correct schema however, ex.
> (1,(2,3))
> (5,(6,7))
> ...
> This is not just a problem with DESCRIBE. Because the schema is incorrect,
> the reference to "nested_tuple" in the "uses_dereferenced" statement is
> considered to be invalid, and the script fails to run. The error is:
> Invalid field projection. Projected field [nested_tuple] does not exist in
> schema: f1:int,f2:int.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2767) Pig creates wrong schema after dereferencing nested tuple fields

2013-04-20 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637368#comment-13637368
 ] 

Prashant Kommireddi commented on PIG-2767:
--

With Alan's +1, this can be committed?

> Pig creates wrong schema after dereferencing nested tuple fields
> 
>
> Key: PIG-2767
> URL: https://issues.apache.org/jira/browse/PIG-2767
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.10.0
> Environment: Amazon EMR, patched to use Pig 0.10.0
>Reporter: Jonathan Packer
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-2767-1.patch, test_data.txt
>
>
> The following script fails:
> data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3:
> int, f4: int);
> nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple;
> dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3);
> DESCRIBE dereferenced;
> uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3;
> DESCRIBE uses_dereferenced;
> The schema of "dereferenced" should be {f1: int, nested_tuple: (f2: int,
> f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is
> used, the data is actually in form of the correct schema however, ex.
> (1,(2,3))
> (5,(6,7))
> ...
> This is not just a problem with DESCRIBE. Because the schema is incorrect,
> the reference to "nested_tuple" in the "uses_dereferenced" statement is
> considered to be invalid, and the script fails to run. The error is:
> Invalid field projection. Projected field [nested_tuple] does not exist in
> schema: f1:int,f2:int.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2767) Pig creates wrong schema after dereferencing nested tuple fields

2013-04-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633111#comment-13633111
 ] 

Alan Gates commented on PIG-2767:
-

+1.

> Pig creates wrong schema after dereferencing nested tuple fields
> 
>
> Key: PIG-2767
> URL: https://issues.apache.org/jira/browse/PIG-2767
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.10.0
> Environment: Amazon EMR, patched to use Pig 0.10.0
>Reporter: Jonathan Packer
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-2767-1.patch, test_data.txt
>
>
> The following script fails:
> data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3:
> int, f4: int);
> nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple;
> dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3);
> DESCRIBE dereferenced;
> uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3;
> DESCRIBE uses_dereferenced;
> The schema of "dereferenced" should be {f1: int, nested_tuple: (f2: int,
> f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is
> used, the data is actually in form of the correct schema however, ex.
> (1,(2,3))
> (5,(6,7))
> ...
> This is not just a problem with DESCRIBE. Because the schema is incorrect,
> the reference to "nested_tuple" in the "uses_dereferenced" statement is
> considered to be invalid, and the script fails to run. The error is:
> Invalid field projection. Projected field [nested_tuple] does not exist in
> schema: f1:int,f2:int.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira