[ 
https://issues.apache.org/jira/browse/HIVE-22255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128320#comment-17128320
 ] 

Peter Vary commented on HIVE-22255:
-----------------------------------

[~Rajkumar Singh]: There is a customer who is using {{INSERT OVERWRITE}} to 
overwrite an existing ACID table. The do not use any other DML to change the 
data, so they end up with only base directories in the table. On S3 it could be 
costly to read the listing for all those directories, so we end up wasting 
resources here. Do you have time to work on this, or we should take over?

Thanks,
Peter

> Hive don't trigger Major Compaction automatically if table contains only base 
> files 
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-22255
>                 URL: https://issues.apache.org/jira/browse/HIVE-22255
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Transactions
>    Affects Versions: 3.1.2
>         Environment: Hive-3.1.1
>            Reporter: Rajkumar Singh
>            Assignee: Rajkumar Singh
>            Priority: Major
>         Attachments: HIVE-22255.01.patch, HIVE-22255.patch
>
>
> user may run into the issue if the table consists of all base files but no 
> delta, then the following condition will yield false and automatic major 
> compaction will be skipped.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L313]
>  
> Steps to Reproduce:
>  # create Acid table 
> {code:java}
> //  create table myacid(id int);
> {code}
>  # Run multiple insert table 
> {code:java}
> // insert overwrite table myacid values(1);insert overwrite table myacid 
> values(2),(3),(4){code}
>  # DFS ls output
> {code:java}
> // dfs -ls -R /warehouse/tablespace/managed/hive/myacid;
> +----------------------------------------------------+
> |                     DFS Output                     |
> +----------------------------------------------------+
> | drwxrwx---+  - hive hadoop          0 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_0000001 |
> | -rw-rw----+  3 hive hadoop          1 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_0000001/_orc_acid_version |
> | -rw-rw----+  3 hive hadoop        610 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_0000001/bucket_00000 |
> | drwxrwx---+  - hive hadoop          0 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_0000002 |
> | -rw-rw----+  3 hive hadoop          1 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_0000002/_orc_acid_version |
> | -rw-rw----+  3 hive hadoop        633 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_0000002/bucket_00000 |
> +----------------------------------------------------+{code}
>  
> you will see that Major compaction will not be trigger until you run alter 
> table compact MAJOR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to