Matthew Rathbone created HIVE-3753:
--------------------------------------

             Summary: 'CTAS' and INSERT OVERWRITE send different column names 
to the underlying SerDe
                 Key: HIVE-3753
                 URL: https://issues.apache.org/jira/browse/HIVE-3753
             Project: Hive
          Issue Type: Bug
          Components: Serializers/Deserializers
    Affects Versions: 0.9.0
            Reporter: Matthew Rathbone


A good example is with a JSON serde 
(https://github.com/rathboma/Hive-JSON-Serde-1)
Here is a simple example of how the two results differ:
CREATE TABLE foo ROW FORMAT SERDE '....JsonSerDe' SELECT host from table1;
generates => {"_col0": "localhost"}

CREATE TABLE foo(host string) ROW FORMAT SERDE '....JsonSerDe';
INSERT OVERWRITE TABLE FOO SELECT host FROM table;
generates => {"host": "localhost"}


The SerDe gets passed column names in two places:
1) The property Constants.LIST_COLUMNS
2) It gets passed a StructObjectInspector on serialize

In the CTAS example above, both of these contain '_col0' as the column name. 
This is not true in the second example, as the LIST_COLUMNS property contains 
the real column names.

I'd be happy to help out with this change, but I fear that the solution lies 
somewhere in SemanticAnalyser.java, and I'm having a hard time finding my way 
around.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to