[jira] [Updated] (HIVE-29572) ACID Compaction: Cleaner should mark a compaction failed when its txn is aborted

Marta Kuczora (Jira) Fri, 22 May 2026 01:07:14 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-29572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Marta Kuczora updated HIVE-29572:
---------------------------------
    Description: 
It can happen that a compaction is marked as finished and get into "ready for 
cleaning" state, but the compaction txn stays open. And when the timeout 
reached, the txn gets aborted. 

With min.history.level, a compaction like this can block the cleaning for all 
consecutive compaction. 
This is what happens:
 * Cleaner picks compaction1 and finds nothing to delete, because it doesn’t 
find valid base (which is correct as this cleaner should only see what 
compaction 1 did and its txn is not committed)
 * Deletes nothing but finds obsolete deltas (because here the txn range is 
cleared and finds the base), so puts back the compaction to the queue with 
‘ready-for-cleaning’ state.
 * The other compaction’s are not fetched by the cleaner.
 * The problem is that even after the txn of compaction 1 is aborted, the same 
will happen, so the cleaner will be blocked forever.

To avoid this blocking, the cleaner should check the state of the compaction 
txn and if it is already aborted, mark the compaction as failed and delete 
nothing.

  was:We ran into some situations when the compaction was marked as finished 
and was in ready for cleaning state, but the compaction txn was still open. 
This inconsistency led to data loss. There were some improvements in the 
cleaner to avoid these situations, but we should consider checking the txn 
state when the cleaner selects a compaction to clean.


> ACID Compaction: Cleaner should mark a compaction failed when its txn is 
> aborted
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-29572
>                 URL: https://issues.apache.org/jira/browse/HIVE-29572
>             Project: Hive
>          Issue Type: Task
>            Reporter: Marta Kuczora
>            Assignee: Marta Kuczora
>            Priority: Major
>              Labels: pull-request-available
>
> It can happen that a compaction is marked as finished and get into "ready for 
> cleaning" state, but the compaction txn stays open. And when the timeout 
> reached, the txn gets aborted. 
> With min.history.level, a compaction like this can block the cleaning for all 
> consecutive compaction. 
> This is what happens:
>  * Cleaner picks compaction1 and finds nothing to delete, because it doesn’t 
> find valid base (which is correct as this cleaner should only see what 
> compaction 1 did and its txn is not committed)
>  * Deletes nothing but finds obsolete deltas (because here the txn range is 
> cleared and finds the base), so puts back the compaction to the queue with 
> ‘ready-for-cleaning’ state.
>  * The other compaction’s are not fetched by the cleaner.
>  * The problem is that even after the txn of compaction 1 is aborted, the 
> same will happen, so the cleaner will be blocked forever.
> To avoid this blocking, the cleaner should check the state of the compaction 
> txn and if it is already aborted, mark the compaction as failed and delete 
> nothing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-29572) ACID Compaction: Cleaner should mark a compaction failed when its txn is aborted

Reply via email to