[ https://issues.apache.org/jira/browse/HIVE-22255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941540#comment-16941540 ]
Rajkumar Singh commented on HIVE-22255: --------------------------------------- IOW will create a new base file every time and there is no delta then Automatic compaction will never be the trigger, in the absence of compaction, the table never reaches to "Ready to clean" state hence Cleaner thread will not try to clean the table. I am planning to try either of the approach # introduce one more param called num_base_threashld, if threshold reaches beyond some number mark the table for major compaction, at this point since there is no delta, compactor worker thread will skip running the MR job and will mark table for compaction complete, Cleaner thread will do rest of the things. # based on num_base_thresold with zero deltas, mark the table compacted and cleaner thread will clean the table in the next run. I think #2 will good for code readability, please let me know your thoughts? Thanks > Hive don't trigger Major Compaction automatically if table contains only base > files > ------------------------------------------------------------------------------------ > > Key: HIVE-22255 > URL: https://issues.apache.org/jira/browse/HIVE-22255 > Project: Hive > Issue Type: Bug > Components: Hive, Transactions > Affects Versions: 3.1.2 > Environment: Hive-3.1.1 > Reporter: Rajkumar Singh > Assignee: Rajkumar Singh > Priority: Major > > user may run into the issue if the table consists of all base files but no > delta, then the following condition will yield false and automatic major > compaction will be skipped. > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L313] > > Steps to Reproduce: > # create Acid table > {code:java} > // create table myacid(id int); > {code} > # Run multiple insert table > {code:java} > // insert overwrite table myacid values(1);insert overwrite table myacid > values(2),(3),(4){code} > # DFS ls output > {code:java} > // dfs -ls -R /warehouse/tablespace/managed/hive/myacid; > +----------------------------------------------------+ > | DFS Output | > +----------------------------------------------------+ > | drwxrwx---+ - hive hadoop 0 2019-09-27 16:42 > /warehouse/tablespace/managed/hive/myacid/base_0000001 | > | -rw-rw----+ 3 hive hadoop 1 2019-09-27 16:42 > /warehouse/tablespace/managed/hive/myacid/base_0000001/_orc_acid_version | > | -rw-rw----+ 3 hive hadoop 610 2019-09-27 16:42 > /warehouse/tablespace/managed/hive/myacid/base_0000001/bucket_00000 | > | drwxrwx---+ - hive hadoop 0 2019-09-27 16:43 > /warehouse/tablespace/managed/hive/myacid/base_0000002 | > | -rw-rw----+ 3 hive hadoop 1 2019-09-27 16:43 > /warehouse/tablespace/managed/hive/myacid/base_0000002/_orc_acid_version | > | -rw-rw----+ 3 hive hadoop 633 2019-09-27 16:43 > /warehouse/tablespace/managed/hive/myacid/base_0000002/bucket_00000 | > +----------------------------------------------------+{code} > > you will see that Major compaction will not be trigger until you run alter > table compact MAJOR. -- This message was sent by Atlassian Jira (v8.3.4#803005)