Venugopal Reddy K created HIVE-26861:
----------------------------------------
Summary: Skewed column table load do not work as expected if the
user data for skewed column is not in lowercase.
Key: HIVE-26861
URL: https://issues.apache.org/jira/browse/HIVE-26861
Project: Hive
Issue Type: Bug
Reporter: Venugopal Reddy K
Attachments: data
*[Description]*
Skewed table with case sensitive data on skewed column do not work as expected.
S{color:#172b4d}kewed values are stored in lower case. And it is expecting user
data also to be in same lower case(i.e.,does case sensitive comparison).
Otherwise it doesn't work.{color}
*[Steps to reproduce]*
1. Create stage table, load some data into stage table, create table with a
skewed column and load data into that table from the stage table. data file is
attached below.
{code:java}
0: jdbc:hive2://localhost:10000> create database mydb;
0: jdbc:hive2://localhost:10000> use mydb;
{code}
{code:java}
0: jdbc:hive2://localhost:10000> create table stage(num int, name string,
category string) row format delimited fields terminated by ',' stored as
textfile;{code}
{code:java}
0: jdbc:hive2://localhost:10000> load data local inpath 'data' into table
stage;{code}
{code:java}
0: jdbc:hive2://localhost:10000> select * from stage;
+------------+-------------+-----------------+
| stage.num | stage.name | stage.category |
+------------+-------------+-----------------+
| 1 | apple | Fruit |
| 2 | banana | Fruit |
| 3 | carrot | vegetable |
| 4 | cherry | Fruit |
| 5 | potato | vegetable |
| 6 | mango | Fruit |
| 7 | tomato | vegetable |
+------------+-------------+-----------------+
7 rows selected (2.688 seconds)
{code}
{code:java}
0: jdbc:hive2://localhost:10000> create table skew(num int, name string,
category string) skewed by(category) on ('Fruit','Vegetable') stored as
directories row format delimited fields terminated by ',' stored as
textfile;{code}
{code:java}
0: jdbc:hive2://localhost:10000> insert into skew select * from stage;{code}
2. Check warehouse directory skew table data. Table was created with {*}skewed
by(category) on ('Fruit','Vegetable') clause. {color:#de350b}But,
t{color}{*}{color:#de350b}*{color:#de350b}h{color}ere is no directory created
for category=fruit.* {color}{color:#172b4d}Data related to category fruit are
present in HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME directory itself. {color}
{color:#172b4d}Internally skewed values are stored in lower case. And it is
expecting user data also to be in same lower case(i.e.,does case sensitive
comparison). {color}{color:#172b4d}Thus, directory for fruit is not
created.{color}
{code:java}
kvenureddy@192 mydb.db % cd skew
kvenureddy@192 skew % ls
kvenureddy@192 skew % ls
HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME category=vegetable
kvenureddy@192 skew % pwd
/tmp/warehouse/external/mydb.db/skew
kvenureddy@192 skew % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % ls
000000_0
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0
1,apple,Fruit
2,banana,Fruit
4,cherry,Fruit
6,mango,Fruit
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd ../
kvenureddy@192 skew % cd category=vegetable
kvenureddy@192 category=vegetable % ls
000000_0
kvenureddy@192 category=vegetable % cat 000000_0
3,carrot,vegetable
5,potato,vegetable
7,tomato,vegetable
kvenureddy@192 category=vegetable %
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)