Delta files that are no longer needed are deleted asynchronously.
For example, you may have some query using delta_002_002. A minor
compaction, for example, can run concurrently
and create delta_001_003 but it will leave delta_001_001,
delta_002_002, delta_003_003 to be cleaned later.
A query that starts after this, will use delta_001_003 and ignore
delta_001_001, delta_002_002, delta_003_003, thus it
has fewer files to read and merge. delta_001_001,
delta_002_002, delta_003_003 will be deleted when the system
determines that no query can be using them.
Judging by the directory listing you sent no major or minor compactions have
ran.
From: r7raul1...@163.commailto:r7raul1...@163.com
r7raul1...@163.commailto:r7raul1...@163.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org
user@hive.apache.orgmailto:user@hive.apache.org
Date: Thursday, June 11, 2015 at 12:53 AM
To: user@hive.apache.orgmailto:user@hive.apache.org
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: Re: delta file compact take no effect
SHOW COMPACTIONS;
I can see some info
Database Table Partition Type State Worker Start Time
default u_data_txn NULL MAJOR initiated NULL 0
Time taken: 0.024 seconds, Fetched: 2 row(s)
But after that I still see many delta file.
r7raul1...@163.commailto:r7raul1...@163.com
From: Elliot Westmailto:tea...@gmail.com
Date: 2015-06-11 15:25
To: user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: delta file compact take no effect
What do you see if you issue:
SHOW COMPACTIONS;
On Thursday, 11 June 2015, r7raul1...@163.commailto:r7raul1...@163.com
r7raul1...@163.commailto:r7raul1...@163.com wrote:
I use hive 1.1.0 on hadoop 2.5.0
After I do some update operation on table u_data_txn.
My table create many delta file like:
drwxr-xr-x - hdfs hive 0 2015-02-06 22:52
/user/hive/warehouse/u_data_txn/delta_001_001
-rw-r--r-- 3 hdfs supergroup 346453 2015-02-06 22:52
/user/hive/warehouse/u_data_txn/delta_001_001/bucket_0
-rw-r--r-- 3 hdfs supergroup 415924 2015-02-06 22:52
/user/hive/warehouse/u_data_txn/delta_001_001/bucket_1
drwxr-xr-x - hdfs hive 0 2015-02-06 22:58
/user/hive/warehouse/u_data_txn/delta_002_002
-rw-r--r-- 3 hdfs supergroup 807 2015-02-06 22:58
/user/hive/warehouse/u_data_txn/delta_002_002/bucket_0
-rw-r--r-- 3 hdfs supergroup 779 2015-02-06 22:58
/user/hive/warehouse/u_data_txn/delta_002_002/bucket_1
drwxr-xr-x - hdfs hive 0 2015-02-06 22:59
/user/hive/warehouse/u_data_txn/delta_003_003
-rw-r--r-- 3 hdfs supergroup 817 2015-02-06 22:59
/user/hive/warehouse/u_data_txn/delta_003_003/bucket_0
-rw-r--r-- 3 hdfs supergroup 767 2015-02-06 22:59
/user/hive/warehouse/u_data_txn/delta_003_003/bucket_1
drwxr-xr-x - hdfs hive 0 2015-02-06 23:01
/user/hive/warehouse/u_data_txn/delta_004_004
-rw-r--r-- 3 hdfs supergroup 817 2015-02-06 23:01
/user/hive/warehouse/u_data_txn/delta_004_004/bucket_0
-rw-r--r-- 3 hdfs supergroup 779 2015-02-06 23:01
/user/hive/warehouse/u_data_txn/delta_004_004/bucket_1
drwxr-xr-x - hdfs hive 0 2015-02-06 23:03
/user/hive/warehouse/u_data_txn/delta_005_005
-rw-r--r-- 3 hdfs supergroup 817 2015-02-06 23:03
/user/hive/warehouse/u_data_txn/delta_005_005/bucket_0
-rw-r--r-- 3 hdfs supergroup 779 2015-02-06 23:03
/user/hive/warehouse/u_data_txn/delta_005_005/bucket_1
drwxr-xr-x - hdfs hive 0 2015-02-10 21:34
/user/hive/warehouse/u_data_txn/delta_006_006
-rw-r--r-- 3 hdfs supergroup 821 2015-02-10 21:34
/user/hive/warehouse/u_data_txn/delta_006_006/bucket_0
drwxr-xr-x - hdfs hive 0 2015-02-10 21:35
/user/hive/warehouse/u_data_txn/delta_007_007
-rw-r--r-- 3 hdfs supergroup 821 2015-02-10 21:35
/user/hive/warehouse/u_data_txn/delta_007_007/bucket_0
drwxr-xr-x - hdfs hive 0 2015-03-24 01:16
/user/hive/warehouse/u_data_txn/delta_008_008
-rw-r--r-- 3 hdfs supergroup 1670 2015-03-24 01:16
/user/hive/warehouse/u_data_txn/delta_008_008/bucket_0
-rw-r--r-- 3 hdfs supergroup 1767 2015-03-24 01:16
/user/hive/warehouse/u_data_txn/delta_008_008/bucket_1
I try ALTER TABLE u_data_txn COMPACT 'MAJOR';
The delta still exist.
Then I try ALTER TABLE u_data_txn COMPACT 'MINOR';
The delta still exist.
How to merge delta file?
My config is:
property
namehive.support.concurrency/name
valuetrue/value
/property
property
namehive.enforce.bucketing/name
valuetrue/value
/property
property
namehive.exe.dynamic.partition.mode/name
valuenonstrict/value
/property
property
namehive.txn.manager/name
valueorg.apache.hadoop.hive.ql.lockmgr.DbTxnManager/value
/property
property
namehive.compactor.initiator.on/name
valuetrue/value
/property
property
namehive.compactor.worker.threads/name
value4/value