Jonathan Packer created PIG-2767:
------------------------------------

             Summary: Pig creates wrong schema after dereferencing nested tuple 
fields
                 Key: PIG-2767
                 URL: https://issues.apache.org/jira/browse/PIG-2767
             Project: Pig
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.10.0
         Environment: Amazon EMR, patched to use Pig 0.10.0
            Reporter: Jonathan Packer


The following script fails:

data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3:
int, f4: int);

nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple;

dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3);
DESCRIBE dereferenced;

uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3;
DESCRIBE uses_dereferenced;

The schema of "dereferenced" should be {f1: int, nested_tuple: (f2: int,
f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is
used, the data is actually in form of the correct schema however, ex.

(1,(2,3))
(5,(6,7))
...

This is not just a problem with DESCRIBE. Because the schema is incorrect,
the reference to "nested_tuple" in the "uses_dereferenced" statement is
considered to be invalid, and the script fails to run. The error is:

Invalid field projection. Projected field [nested_tuple] does not exist in
schema: f1:int,f2:int.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to