Re: Review Request 71792: COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs

Denys Kuzmenko via Review Board Thu, 21 Nov 2019 06:15:36 -0800


> On Nov. 20, 2019, 3:19 p.m., Denys Kuzmenko wrote:
> > Not ready. Need to handle aborted and currently active compactions.
> 
> Denys Kuzmenko wrote:
>     Handling above cases would complicate the Initiator logic and make 
> preliminare check longer. Not sure how critial it is that in case of 
> unsuccessful compaction attempt, on next run we won't retry unless there is 
> some change to the selected table/partiotion. Any thoughts on this?


Changed findPotentialCompactions query to: 

select distinct ctc_database, ctc_table, ctc_partition from 
COMPLETED_TXN_COMPONENTS where 
(select CC_STATE from COMPLETED_COMPACTIONS where ctc_database = CC_DATABASE 
and ctc_table = CC_TABLE and (ctc_partition is null or ctc_partition = 
cc_partition)
order by cc_id desc limit 1) IN ('a', 'f') || ctc_timestamp < current_timestamp

however this still won't cover skipped compactions due to already running one


- Denys


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71792/#review218723
-----------------------------------------------------------


On Nov. 20, 2019, 12:20 p.m., Denys Kuzmenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71792/
> -----------------------------------------------------------
> 
> (Updated Nov. 20, 2019, 12:20 p.m.)
> 
> 
> Review request for hive, Laszlo Pinter and Peter Vary.
> 
> 
> Bugs: HIVE-21917
>     https://issues.apache.org/jira/browse/HIVE-21917
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The Initiator thread in the metastore repeatedly loops over entries in the 
> COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might 
> need to be compacted. However, entries are never removed from this table 
> except by a completed Compactor run.
> 
> In a cluster where most tables / partitions are write-once read-many, this 
> results in stale entries in this table never being cleaned up. In a small 
> test cluster, we have observed approximately 45k entries in this table 
> (virtually equal to the number of partitions in the cluster) while < 100 of 
> these tables have delta files at all. Since most of the tables will never get 
> enough writes to trigger a compaction (and in fact have only ever been 
> written to once), the initiator thread keeps trying to evaluate them on every 
> loop.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 610cf05204 
>   
> ql/src/test/org/apache/hadoop/hive/metastore/txn/TestCompactionTxnHandler.java
>  b28b57779b 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
>  8253ccb9c9 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  6281208247 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
>  e840758c9d 
> 
> 
> Diff: https://reviews.apache.org/r/71792/diff/1/
> 
> 
> Testing
> -------
> 
> Unit tests
> 
> 
> Thanks,
> 
> Denys Kuzmenko
> 
>

Re: Review Request 71792: COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs

Reply via email to