[ https://issues.apache.org/jira/browse/HIVE-27332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sourabh Badhya resolved HIVE-27332. ----------------------------------- Fix Version/s: 4.0.0 Resolution: Fixed > Add retry backoff mechanism for abort cleanup > --------------------------------------------- > > Key: HIVE-27332 > URL: https://issues.apache.org/jira/browse/HIVE-27332 > Project: Hive > Issue Type: Sub-task > Reporter: Sourabh Badhya > Assignee: Sourabh Badhya > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > HIVE-27019 and HIVE-27020 added the functionality to directly clean data > directories from aborted transactions without using Initiator & Worker. > However, during the event of continuous failure during cleanup, the retry > mechanism is initiated every single time. We need to add retry backoff > mechanism to control the time required to initiate retry again and not > continuously retry. > There are widely 3 cases wherein retry due to abort cleanup is impacted - > *1. Abort cleanup on the table failed + Compaction on the table failed.* > *2. Abort cleanup on the table failed + Compaction on the table passed* > *3. Abort cleanup on the table failed + No compaction on the table.* > *Solution -* > *We reuse COMPACTION_QUEUE table to store the retry metadata -* > *Advantage: Most of the fields with respect to retry are present in > COMPACTION_QUEUE. Hence we can use the same for storing retry metadata. A > compaction type called ABORT_CLEANUP ('c') is introduced. The CQ_STATE will > remain ready for cleaning for such records.* > *Actions performed by TaskHandler in the case of failure -* > *AbortTxnCleaner -* > Action: Just add retry details in the queue table during the abort failure. > *CompactionCleaner -* > Action: If compaction on the same table is successful, delete the retry entry > in markCleaned when removing any TXN_COMPONENTS entries except when there are > no uncompacted aborts. We do not want to be in a situation where there is a > queue entry for a table but there is no record in TXN_COMPONENTS associated > with the same table. > *Advantage: Expecting no performance issues with this approach. Since we > delete 1 record most of the times for the associated table/partition.* -- This message was sent by Atlassian Jira (v8.20.10#820010)