Matthew Rathbone created HIVE-3753: -------------------------------------- Summary: 'CTAS' and INSERT OVERWRITE send different column names to the underlying SerDe Key: HIVE-3753 URL: https://issues.apache.org/jira/browse/HIVE-3753 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.0 Reporter: Matthew Rathbone
A good example is with a JSON serde (https://github.com/rathboma/Hive-JSON-Serde-1) Here is a simple example of how the two results differ: CREATE TABLE foo ROW FORMAT SERDE '....JsonSerDe' SELECT host from table1; generates => {"_col0": "localhost"} CREATE TABLE foo(host string) ROW FORMAT SERDE '....JsonSerDe'; INSERT OVERWRITE TABLE FOO SELECT host FROM table; generates => {"host": "localhost"} The SerDe gets passed column names in two places: 1) The property Constants.LIST_COLUMNS 2) It gets passed a StructObjectInspector on serialize In the CTAS example above, both of these contain '_col0' as the column name. This is not true in the second example, as the LIST_COLUMNS property contains the real column names. I'd be happy to help out with this change, but I fear that the solution lies somewhere in SemanticAnalyser.java, and I'm having a hard time finding my way around. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira