Venugopal Reddy K created HIVE-29587:
----------------------------------------
Summary: Managed table directory not deleted after DROP DATABASE
when hive.acid.lockless.reads.enabled=true
Key: HIVE-29587
URL: https://issues.apache.org/jira/browse/HIVE-29587
Project: Hive
Issue Type: Bug
Components: Hive, Metastore, Standalone Metastore
Reporter: Venugopal Reddy K
*Description]*
When {{hive.acid.lockless.reads.enabled}} is set to *true,* drop acid table
becomes a non-blocking operation in Hive Metastore (HMS).
In this mode, Deletion of ACID table data is handled *asynchronously* by a
background cleaner thread. Table data and corresponding table directories are
eventually removed as expected after cleaner thread scheduled run.
However, when drop acid table is followed by drop database cascade operation,
the *acid table directory itself is never deleted. Hence* {*}managed database
directory is also never deleted.{*}{*}{*}
This results in
# {*}orphaned table and database directories remaining indefinitely in the
filesystem{*}, leading to potential storage bloat and inconsistencies between
HMS metadata and filesystem state.
# Fails to recreate the database again with error -> Execution Error, return
code 40000 from org.apache.hadoop.hive.ql.ddl.DDLTask.
MetaException(message:{*}Unable to create database managed directory{*}
file:/tmp/warehouse/managed/mydb.db, failed to create database mydb);
*[Steps to Reproduce]*
1. Run the following commands:
{code:java}
0: jdbc:hive2://localhost:10000> set hive.acid.lockless.reads.enabled=true;
0: jdbc:hive2://localhost:10000> create database mydb;
0: jdbc:hive2://localhost:10000> use mydb;
0: jdbc:hive2://localhost:10000> create table t1(i int);
0: jdbc:hive2://localhost:10000> create table t2(i int);
0: jdbc:hive2://localhost:10000> insert into t1 values(1),(2),(3);
0: jdbc:hive2://localhost:10000> insert into t2 values(1),(2),(3);
# Make sure cleaner thead is not scheduled to run between below 2 operations.
0: jdbc:hive2://localhost:10000> drop table t2;
0: jdbc:hive2://localhost:10000> show compactions;
+---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
| compactionid | dbname | tabname | partname | type | state
| workerhost | workerid | enqueuetime | starttime | duration |
hadoopjobid | errormessage | initiatorhost | initiatorid | poolname |
txnid | nexttxnid | committime | hightestwriteid |
+---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
| 9 | mydb | t2 | --- | MAJOR | ready for cleaning
| --- | --- | 1777458738637 | --- | --- | None
| --- | --- | --- | default | 32 | 32
| 1777458738667 | --- |
+---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
0: jdbc:hive2://localhost:10000> drop database mydb cascade;
0: jdbc:hive2://localhost:10000> show compactions;
+---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
| compactionid | dbname | tabname | partname | type | state
| workerhost | workerid | enqueuetime | starttime | duration |
hadoopjobid | errormessage | initiatorhost | initiatorid | poolname |
txnid | nexttxnid | committime | hightestwriteid |
+---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
| 10 | mydb | t1 | --- | MAJOR | ready for cleaning
| --- | --- | 1777458753447 | --- | --- | None
| --- | --- | --- | default | 33 | 33
| 1777458754019 | --- |
+---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
{code}
2. Wait for cleaner thread to finish the next scheduled run and check the
filesystem for table directories. t1 directory is removed since it is part of
drop database cascade. But t2 directory remains forever. *Issue is happening
because drop database cascade has removed all the compaction queue entries for
the database that do not belong to current transaction*
{code:java}
user@Q044GWJL71 apache-hive-4.3.0-SNAPSHOT-bin % ls -lt
/tmp/warehouse/managed/mydb.db
total 0
drwxr-xr-x 3 user wheel 96 29 Apr 15:41 t2.v0000022
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)