[ 
https://issues.apache.org/jira/browse/HIVE-27332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730892#comment-17730892
 ] 

Sourabh Badhya commented on HIVE-27332:
---------------------------------------

Thanks [~veghlaci05] and [~dkuzmenko] for the reviews.

> Add retry backoff mechanism for abort cleanup
> ---------------------------------------------
>
>                 Key: HIVE-27332
>                 URL: https://issues.apache.org/jira/browse/HIVE-27332
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sourabh Badhya
>            Assignee: Sourabh Badhya
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HIVE-27019 and HIVE-27020 added the functionality to directly clean data 
> directories from aborted transactions without using Initiator & Worker. 
> However, during the event of continuous failure during cleanup, the retry 
> mechanism is initiated every single time. We need to add retry backoff 
> mechanism to control the time required to initiate retry again and not 
> continuously retry.
> There are widely 3 cases wherein retry due to abort cleanup is impacted - 
> *1. Abort cleanup on the table failed + Compaction on the table failed.*
> *2. Abort cleanup on the table failed + Compaction on the table passed*
> *3. Abort cleanup on the table failed + No compaction on the table.*
> *Solution -* 
> *We reuse COMPACTION_QUEUE table to store the retry metadata -* 
> *Advantage: Most of the fields with respect to retry are present in 
> COMPACTION_QUEUE. Hence we can use the same for storing retry metadata. A 
> compaction type called ABORT_CLEANUP ('c') is introduced. The CQ_STATE will 
> remain ready for cleaning for such records.*
> *Actions performed by TaskHandler in the case of failure -* 
> *AbortTxnCleaner -* 
> Action: Just add retry details in the queue table during the abort failure.
> *CompactionCleaner -* 
> Action: If compaction on the same table is successful, delete the retry entry 
> in markCleaned when removing any TXN_COMPONENTS entries except when there are 
> no uncompacted aborts. We do not want to be in a situation where there is a 
> queue entry for a table but there is no record in TXN_COMPONENTS associated 
> with the same table.
> *Advantage: Expecting no performance issues with this approach. Since we 
> delete 1 record most of the times for the associated table/partition.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to