Kim Jaechang created HIVE-22371:
-----------------------------------
Summary: CTAS not working with non-ACID managed tables
Key: HIVE-22371
URL: https://issues.apache.org/jira/browse/HIVE-22371
Project: Hive
Issue Type: Bug
Components: Query Planning
Affects Versions: 4.0.0
Reporter: Kim Jaechang
I used Hive commit HIVE-21344 (f16509a5c9187f592c48c253ee001fc3a5e0d508) in the
master branch, which was committed on 12 Oct.
When I submit a query below, the query was finished without any errors.
{code:sql}
create table call_center
stored as orc
as select * from tpcds_text_2.call_center;
{code}
However, "select count( * ) from call_center" returned 0, and data in HDFS
looks strange.
* Two tables were created, one in the warehouse directory and another in the
external warehouse directory.
* Table `call_center` in the external warehouse is empty.
{code:java}
> hdfs dfs -du -h $WAREHOUSE_PATH
5.0 K 14.9 K $WAREHOUSE_PATH/call_center
0 0 $WAREHOUSE_PATH/tpcds_text_2.db
> hdfs dfs -du -h $EXTERNAL_WAREHOUSE_PATH
2.1 G 2.1 G $EXTERNAL_WAREHOUSE_PATH/2
0 0 $EXTERNAL_WAREHOUSE_PATH/call_center
{code}
After a few hours of digging, I guess this bug was introduced in HIVE-22158,
which creates every non-ACID managed table in the external warehouse directory
by default. In the example above, call_center is intended as a managed table,
but not explicitly specified as ACID. Hence, it should created in the external
warehouse directory.
However, the table call_center created in the external warehouse directory is
empty, while another non-empty table of the same name is created in the
warehouse directory. This is because in the current implementation, the (buggy)
compiled query plan proceeds as follows:
1. Write data to a temporary directory
2. Move the data to the warehouse directory ($WAREHOUSE_PATH/call_center)
3. Create a table using data in the warehouse directory
Without the bug, step 2 would move the data to the external warehouse
directory, and step 3 would create a table using the data in the external
warehouse directory. The crux of the problem is that the query compiler checks
only whether the query does not include the keyword "external" or not. In other
words, the query compiler should also be aware of the changes made in
HIVE-22158 and updated accordingly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)