Venugopal Reddy K created HIVE-29587:
----------------------------------------

             Summary: Managed table directory not deleted after DROP DATABASE 
when hive.acid.lockless.reads.enabled=true
                 Key: HIVE-29587
                 URL: https://issues.apache.org/jira/browse/HIVE-29587
             Project: Hive
          Issue Type: Bug
          Components: Hive, Metastore, Standalone Metastore
            Reporter: Venugopal Reddy K


*Description]*

When {{hive.acid.lockless.reads.enabled}} is set to *true,* drop acid table 
becomes a non-blocking operation in Hive Metastore (HMS).

In this mode, Deletion of ACID table data is handled *asynchronously* by a 
background cleaner thread. Table data and corresponding table directories are 
eventually removed as expected after cleaner thread scheduled run.

However, when drop acid table is followed by drop database cascade operation, 
the *acid table directory itself is never deleted. Hence* {*}managed database 
directory is also never deleted.{*}{*}{*}

This results in 
 # {*}orphaned table and database directories remaining indefinitely in the 
filesystem{*}, leading to potential storage bloat and inconsistencies between 
HMS metadata and filesystem state.
 # Fails to recreate the database again with error -> Execution Error, return 
code 40000 from org.apache.hadoop.hive.ql.ddl.DDLTask. 
MetaException(message:{*}Unable to create database managed directory{*} 
file:/tmp/warehouse/managed/mydb.db, failed to create database mydb);

*[Steps to Reproduce]* 

1. Run the following commands:

 
{code:java}
0: jdbc:hive2://localhost:10000> set hive.acid.lockless.reads.enabled=true;
0: jdbc:hive2://localhost:10000> create database mydb;
0: jdbc:hive2://localhost:10000> use mydb;
0: jdbc:hive2://localhost:10000> create table t1(i int);
0: jdbc:hive2://localhost:10000> create table t2(i int);
0: jdbc:hive2://localhost:10000> insert into t1 values(1),(2),(3);
0: jdbc:hive2://localhost:10000> insert into t2 values(1),(2),(3);
# Make sure cleaner thead is not scheduled to run between below 2 operations.
0: jdbc:hive2://localhost:10000> drop table t2;
0: jdbc:hive2://localhost:10000> show compactions;
+---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
| compactionid  | dbname  | tabname  | partname  |  type  |        state        
| workerhost  | workerid  |  enqueuetime   | starttime  | duration  | 
hadoopjobid  | errormessage  | initiatorhost  | initiatorid  | poolname  | 
txnid  | nexttxnid  |   committime   | hightestwriteid  |
+---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
| 9             | mydb    | t2       |  ---      | MAJOR  | ready for cleaning  
|  ---        |  ---      | 1777458738637  |  ---       |  ---      | None      
   |  ---          |  ---           |  ---         | default   | 32     | 32    
     | 1777458738667  |  ---             |
+---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
0: jdbc:hive2://localhost:10000> drop database mydb cascade;
0: jdbc:hive2://localhost:10000> show compactions;
+---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
| compactionid  | dbname  | tabname  | partname  |  type  |        state        
| workerhost  | workerid  |  enqueuetime   | starttime  | duration  | 
hadoopjobid  | errormessage  | initiatorhost  | initiatorid  | poolname  | 
txnid  | nexttxnid  |   committime   | hightestwriteid  |
+---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
| 10            | mydb    | t1       |  ---      | MAJOR  | ready for cleaning  
|  ---        |  ---      | 1777458753447  |  ---       |  ---      | None      
   |  ---          |  ---           |  ---         | default   | 33     | 33    
     | 1777458754019  |  ---             |
+---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
{code}
2. Wait for cleaner thread to finish the next scheduled run and check the 
filesystem for table directories. t1 directory is removed since it is part of 
drop database cascade. But t2 directory remains forever. *Issue is happening 
because drop database cascade has removed all the compaction queue entries for 
the database that do not belong to current transaction*

 
{code:java}
user@Q044GWJL71 apache-hive-4.3.0-SNAPSHOT-bin %  ls -lt 
/tmp/warehouse/managed/mydb.db
total 0
drwxr-xr-x  3 user  wheel  96 29 Apr 15:41 t2.v0000022
{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to