[
https://issues.apache.org/jira/browse/HIVE-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Navis updated HIVE-6198:
------------------------
Attachment: HIVE-6198.3.patch.txt
> ORC file and struct column names are case sensitive
> ---------------------------------------------------
>
> Key: HIVE-6198
> URL: https://issues.apache.org/jira/browse/HIVE-6198
> Project: Hive
> Issue Type: Bug
> Components: CLI, File Formats
> Affects Versions: 0.11.0, 0.12.0
> Reporter: Viraj Bhat
> Assignee: Navis
> Attachments: HIVE-6198.1.patch.txt, HIVE-6198.2.patch.txt,
> HIVE-6198.3.patch.txt
>
>
> HiveQL document states that the "Table names and column names are case
> insensitive". But the struct behavior for ORC file is different.
> Consider a sample text file:
> {code}
> $ cat data.txt
> line1|key11:value11,key12:value12,key13:value13|a,b,c|one,two
> line2|key21:value21,key22:value22,key23:value23|d,e,f|three,four
> line3|key31:value31,key32:value32,key33:value33|g,h,i|five,six
> {code}
> Creating a table stored as txt and then using this to create a table stored
> as orc
> {code}
> CREATE TABLE orig (
> str STRING,
> mp MAP<STRING,STRING>,
> lst ARRAY<STRING>,
> strct STRUCT<A:STRING,B:STRING>
> ) ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|'
> COLLECTION ITEMS TERMINATED BY ','
> MAP KEYS TERMINATED BY ':';
> LOAD DATA LOCAL INPATH 'data.txt' INTO TABLE orig;
> CREATE TABLE tableorc (
> str STRING,
> mp MAP<STRING,STRING>,
> lst ARRAY<STRING>,
> strct STRUCT<A:STRING,B:STRING>
> ) STORED AS ORC;
> INSERT OVERWRITE TABLE tableorc SELECT * FROM orig;
> {code}
> Suppose we project columns or read the *strct* columns for both table types,
> here are the results. I have also tested the same with *RC*. The behavior is
> similar to *txt* files.
> {code}
> hive> SELECT * FROM orig;
> line1 {"key11":"value11","key12":"value12","key13":"value13"} ["a","b","c"]
>
> {"a":"one","b":"two"}
> line2 {"key21":"value21","key22":"value22","key23":"value23"} ["d","e","f"]
>
> {"a":"three","b":"four"}
> line3 {"key31":"value31","key32":"value32","key33":"value33"} ["g","h","i"]
>
> {"a":"five","b":"six"}
> Time taken: 0.126 seconds, Fetched: 3 row(s)
> hive> SELECT * FROM tableorc;
> line1 {"key12":"value12","key11":"value11","key13":"value13"} ["a","b","c"]
>
> {"A":"one","B":"two"}
> line2 {"key21":"value21","key23":"value23","key22":"value22"} ["d","e","f"]
>
> {"A":"three","B":"four"}
> line3 {"key33":"value33","key31":"value31","key32":"value32"} ["g","h","i"]
>
> {"A":"five","B":"six"}
> Time taken: 0.178 seconds, Fetched: 3 row(s)
> hive> SELECT strct FROM tableorc;
> {"a":"one","b":"two"}
> {"a":"three","b":"four"}
> {"a":"five","b":"six"}
> hive>SELECT strct.A FROM orig;
> one
> three
> five
> hive>SELECT strct.a FROM orig;
> one
> three
> five
> hive>SELECT strct.A FROM tableorc;
> one
> three
> five
> hive>SELECT strct.a FROM tableorc;
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> MapReduce Jobs Launched:
> Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
> {code}
> So it seems that ORC behaves differently for struct columns. Also why are we
> storing the column names for struct for the other types as CASE SENSITIVE?
> What is the standard for Hive QL with respect to structs?
> Regards
> Viraj
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)