> On Nov. 20, 2019, 3:19 p.m., Denys Kuzmenko wrote:
> > Not ready. Need to handle aborted and currently active compactions.
>
> Denys Kuzmenko wrote:
> Handling above cases would complicate the Initiator logic and make
> preliminare check longer. Not sure how critial it is that in case of
> unsuccessful compaction attempt, on next run we won't retry unless there is
> some change to the selected table/partiotion. Any thoughts on this?
Changed findPotentialCompactions query to:
select distinct ctc_database, ctc_table, ctc_partition from
COMPLETED_TXN_COMPONENTS where
(select CC_STATE from COMPLETED_COMPACTIONS where ctc_database = CC_DATABASE
and ctc_table = CC_TABLE and (ctc_partition is null or ctc_partition =
cc_partition)
order by cc_id desc limit 1) IN ('a', 'f') || ctc_timestamp < current_timestamp
however this still won't cover skipped compactions due to already running one
- Denys
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71792/#review218723
-----------------------------------------------------------
On Nov. 20, 2019, 12:20 p.m., Denys Kuzmenko wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71792/
> -----------------------------------------------------------
>
> (Updated Nov. 20, 2019, 12:20 p.m.)
>
>
> Review request for hive, Laszlo Pinter and Peter Vary.
>
>
> Bugs: HIVE-21917
> https://issues.apache.org/jira/browse/HIVE-21917
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> The Initiator thread in the metastore repeatedly loops over entries in the
> COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might
> need to be compacted. However, entries are never removed from this table
> except by a completed Compactor run.
>
> In a cluster where most tables / partitions are write-once read-many, this
> results in stale entries in this table never being cleaned up. In a small
> test cluster, we have observed approximately 45k entries in this table
> (virtually equal to the number of partitions in the cluster) while < 100 of
> these tables have delta files at all. Since most of the tables will never get
> enough writes to trigger a compaction (and in fact have only ever been
> written to once), the initiator thread keeps trying to evaluate them on every
> loop.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
> 610cf05204
>
> ql/src/test/org/apache/hadoop/hive/metastore/txn/TestCompactionTxnHandler.java
> b28b57779b
>
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
> 8253ccb9c9
>
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> 6281208247
>
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
> e840758c9d
>
>
> Diff: https://reviews.apache.org/r/71792/diff/1/
>
>
> Testing
> -------
>
> Unit tests
>
>
> Thanks,
>
> Denys Kuzmenko
>
>