[jira] [Updated] (HIVE-6198) ORC file and struct column names are case sensitive

Navis (JIRA) Tue, 07 Oct 2014 01:24:12 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Navis updated HIVE-6198:
------------------------
    Description: 
HiveQL document states that the "Table names and column names are case 
insensitive". But the struct behavior for ORC file is different. 
Consider a sample text file:
{code}
$ cat data.txt
line1|key11:value11,key12:value12,key13:value13|a,b,c|one,two
line2|key21:value21,key22:value22,key23:value23|d,e,f|three,four
line3|key31:value31,key32:value32,key33:value33|g,h,i|five,six
{code}
Creating a table stored as txt and then using this to create a table stored as 
orc 
{code}
CREATE TABLE orig (
  str STRING,
  mp  MAP<STRING,STRING>,
  lst ARRAY<STRING>,
  strct STRUCT<A:STRING,B:STRING>
) ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '|'
    COLLECTION ITEMS TERMINATED BY ','
    MAP KEYS TERMINATED BY ':';
LOAD DATA LOCAL INPATH 'data.txt' INTO TABLE orig;

CREATE TABLE tableorc (
  str STRING,
  mp  MAP<STRING,STRING>,
  lst ARRAY<STRING>,
  strct STRUCT<A:STRING,B:STRING>
) STORED AS ORC;
INSERT OVERWRITE TABLE tableorc SELECT * FROM orig;
{code}

Suppose we project columns or read the *strct* columns for both table types, 
here are the results. I have also tested the same with *RC*. The behavior is 
similar to *txt* files.

{code}
hive> SELECT * FROM orig;
line1   {"key11":"value11","key12":"value12","key13":"value13"} ["a","b","c"]  
{"a":"one","b":"two"}
line2   {"key21":"value21","key22":"value22","key23":"value23"} ["d","e","f"]  
{"a":"three","b":"four"}
line3   {"key31":"value31","key32":"value32","key33":"value33"} ["g","h","i"]  
{"a":"five","b":"six"}
Time taken: 0.126 seconds, Fetched: 3 row(s)

hive> SELECT * FROM tableorc;
line1   {"key12":"value12","key11":"value11","key13":"value13"} ["a","b","c"]  
{"A":"one","B":"two"}
line2   {"key21":"value21","key23":"value23","key22":"value22"} ["d","e","f"]  
{"A":"three","B":"four"}
line3   {"key33":"value33","key31":"value31","key32":"value32"} ["g","h","i"]  
{"A":"five","B":"six"}
Time taken: 0.178 seconds, Fetched: 3 row(s)

hive> SELECT strct FROM tableorc;
{"a":"one","b":"two"}
{"a":"three","b":"four"}
{"a":"five","b":"six"}

hive>SELECT strct.A FROM orig;
one
three
five

hive>SELECT strct.a FROM orig;
one
three
five

hive>SELECT strct.A FROM tableorc;
one
three
five

hive>SELECT strct.a FROM tableorc;
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
{code}

So it seems that ORC behaves differently for struct columns. Also why are we 
storing the column names for struct for the other types as CASE SENSITIVE? What 
is the standard for Hive QL with respect to structs?

Regards
Viraj




  was:
HiveQL document states that the "Table names and column names are case 
insensitive". But the struct behavior for ORC file is different. 
Consider a sample text file:
{code}
$ cat data.txt
line1|key11:value11,key12:value12,key13:value13|a,b,c|one,two
line2|key21:value21,key22:value22,key23:value23|d,e,f|three,four
line3|key31:value31,key32:value32,key33:value33|g,h,i|five,six
{code}
Creating a table stored as txt and then using this to create a table stored as 
orc 
{code}
CREATE TABLE orig (
  str STRING,
  mp  MAP<STRING,STRING>,
  lst ARRAY<STRING>,
  strct STRUCT<A:STRING,B:STRING>
) ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '|'
    COLLECTION ITEMS TERMINATED BY ','
    MAP KEYS TERMINATED BY ':';
LOAD DATA LOCAL 'test.txt' INTO TABLE orig;

CREATE TABLE tableorc (
  str STRING,
  mp  MAP<STRING,STRING>,
  lst ARRAY<STRING>,
  strct STRUCT<A:STRING,B:STRING>
) STORED AS ORC;
INSERT OVERWRITE TABLE tableorc SELECT * FROM orig;
{code}

Suppose we project columns or read the *strct* columns for both table types, 
here are the results. I have also tested the same with *RC*. The behavior is 
similar to *txt* files.

{code}
hive> SELECT * FROM orig;
line1   {"key11":"value11","key12":"value12","key13":"value13"} ["a","b","c"]  
{"a":"one","b":"two"}
line2   {"key21":"value21","key22":"value22","key23":"value23"} ["d","e","f"]  
{"a":"three","b":"four"}
line3   {"key31":"value31","key32":"value32","key33":"value33"} ["g","h","i"]  
{"a":"five","b":"six"}
Time taken: 0.126 seconds, Fetched: 3 row(s)

hive> SELECT * FROM tableorc;
line1   {"key12":"value12","key11":"value11","key13":"value13"} ["a","b","c"]  
{"A":"one","B":"two"}
line2   {"key21":"value21","key23":"value23","key22":"value22"} ["d","e","f"]  
{"A":"three","B":"four"}
line3   {"key33":"value33","key31":"value31","key32":"value32"} ["g","h","i"]  
{"A":"five","B":"six"}
Time taken: 0.178 seconds, Fetched: 3 row(s)

hive> SELECT strct FROM tableorc;
{"a":"one","b":"two"}
{"a":"three","b":"four"}
{"a":"five","b":"six"}

hive>SELECT strct.A FROM orig;
one
three
five

hive>SELECT strct.a FROM orig;
one
three
five

hive>SELECT strct.A FROM tableorc;
one
three
five

hive>SELECT strct.a FROM tableorc;
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
{code}

So it seems that ORC behaves differently for struct columns. Also why are we 
storing the column names for struct for the other types as CASE SENSITIVE? What 
is the standard for Hive QL with respect to structs?

Regards
Viraj





> ORC file and struct column names are case sensitive
> ---------------------------------------------------
>
>                 Key: HIVE-6198
>                 URL: https://issues.apache.org/jira/browse/HIVE-6198
>             Project: Hive
>          Issue Type: Bug
>          Components: CLI, File Formats
>    Affects Versions: 0.11.0, 0.12.0
>            Reporter: Viraj Bhat
>            Assignee: Navis
>         Attachments: HIVE-6198.1.patch.txt, HIVE-6198.2.patch.txt
>
>
> HiveQL document states that the "Table names and column names are case 
> insensitive". But the struct behavior for ORC file is different. 
> Consider a sample text file:
> {code}
> $ cat data.txt
> line1|key11:value11,key12:value12,key13:value13|a,b,c|one,two
> line2|key21:value21,key22:value22,key23:value23|d,e,f|three,four
> line3|key31:value31,key32:value32,key33:value33|g,h,i|five,six
> {code}
> Creating a table stored as txt and then using this to create a table stored 
> as orc 
> {code}
> CREATE TABLE orig (
>   str STRING,
>   mp  MAP<STRING,STRING>,
>   lst ARRAY<STRING>,
>   strct STRUCT<A:STRING,B:STRING>
> ) ROW FORMAT DELIMITED
>     FIELDS TERMINATED BY '|'
>     COLLECTION ITEMS TERMINATED BY ','
>     MAP KEYS TERMINATED BY ':';
> LOAD DATA LOCAL INPATH 'data.txt' INTO TABLE orig;
> CREATE TABLE tableorc (
>   str STRING,
>   mp  MAP<STRING,STRING>,
>   lst ARRAY<STRING>,
>   strct STRUCT<A:STRING,B:STRING>
> ) STORED AS ORC;
> INSERT OVERWRITE TABLE tableorc SELECT * FROM orig;
> {code}
> Suppose we project columns or read the *strct* columns for both table types, 
> here are the results. I have also tested the same with *RC*. The behavior is 
> similar to *txt* files.
> {code}
> hive> SELECT * FROM orig;
> line1   {"key11":"value11","key12":"value12","key13":"value13"} ["a","b","c"] 
>  
> {"a":"one","b":"two"}
> line2   {"key21":"value21","key22":"value22","key23":"value23"} ["d","e","f"] 
>  
> {"a":"three","b":"four"}
> line3   {"key31":"value31","key32":"value32","key33":"value33"} ["g","h","i"] 
>  
> {"a":"five","b":"six"}
> Time taken: 0.126 seconds, Fetched: 3 row(s)
> hive> SELECT * FROM tableorc;
> line1   {"key12":"value12","key11":"value11","key13":"value13"} ["a","b","c"] 
>  
> {"A":"one","B":"two"}
> line2   {"key21":"value21","key23":"value23","key22":"value22"} ["d","e","f"] 
>  
> {"A":"three","B":"four"}
> line3   {"key33":"value33","key31":"value31","key32":"value32"} ["g","h","i"] 
>  
> {"A":"five","B":"six"}
> Time taken: 0.178 seconds, Fetched: 3 row(s)
> hive> SELECT strct FROM tableorc;
> {"a":"one","b":"two"}
> {"a":"three","b":"four"}
> {"a":"five","b":"six"}
> hive>SELECT strct.A FROM orig;
> one
> three
> five
> hive>SELECT strct.a FROM orig;
> one
> three
> five
> hive>SELECT strct.A FROM tableorc;
> one
> three
> five
> hive>SELECT strct.a FROM tableorc;
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> MapReduce Jobs Launched: 
> Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
> {code}
> So it seems that ORC behaves differently for struct columns. Also why are we 
> storing the column names for struct for the other types as CASE SENSITIVE? 
> What is the standard for Hive QL with respect to structs?
> Regards
> Viraj



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6198) ORC file and struct column names are case sensitive

Reply via email to