Kim Jaechang created HIVE-22371:
-----------------------------------

             Summary: CTAS not working with non-ACID managed tables
                 Key: HIVE-22371
                 URL: https://issues.apache.org/jira/browse/HIVE-22371
             Project: Hive
          Issue Type: Bug
          Components: Query Planning
    Affects Versions: 4.0.0
            Reporter: Kim Jaechang


I used Hive commit HIVE-21344 (f16509a5c9187f592c48c253ee001fc3a5e0d508) in the 
master branch, which was committed on 12 Oct.

When I submit a query below, the query was finished without any errors.
{code:sql}
create table call_center
stored as orc 
 as select * from tpcds_text_2.call_center;
{code}
However, "select count( * ) from call_center" returned 0, and data in HDFS 
looks strange.
 * Two tables were created, one in the warehouse directory and another in the 
external warehouse directory.
 * Table `call_center` in the external warehouse is empty.

{code:java}
 > hdfs dfs -du -h $WAREHOUSE_PATH
 5.0 K 14.9 K $WAREHOUSE_PATH/call_center
 0 0 $WAREHOUSE_PATH/tpcds_text_2.db

> hdfs dfs -du -h $EXTERNAL_WAREHOUSE_PATH
 2.1 G 2.1 G $EXTERNAL_WAREHOUSE_PATH/2
 0 0 $EXTERNAL_WAREHOUSE_PATH/call_center
{code}
After a few hours of digging, I guess this bug was introduced in HIVE-22158, 
which creates every non-ACID managed table in the external warehouse directory 
by default. In the example above, call_center is intended as a managed table, 
but not explicitly specified as ACID. Hence, it should created in the external 
warehouse directory.

However, the table call_center created in the external warehouse directory is 
empty, while another non-empty table of the same name is created in the 
warehouse directory. This is because in the current implementation, the (buggy) 
compiled query plan proceeds as follows:

1. Write data to a temporary directory
 2. Move the data to the warehouse directory ($WAREHOUSE_PATH/call_center)
 3. Create a table using data in the warehouse directory

Without the bug, step 2 would move the data to the external warehouse 
directory, and step 3 would create a table using the data in the external 
warehouse directory. The crux of the problem is that the query compiler checks 
only whether the query does not include the keyword "external" or not. In other 
words, the query compiler should also be aware of the changes made in 
HIVE-22158 and updated accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to