[
https://issues.apache.org/jira/browse/HIVE-29210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marta Kuczora resolved HIVE-29210.
----------------------------------
Fix Version/s: 4.2.0
Resolution: Fixed
Thanks a lot [~tanishqchugh] for the fix.
> Minor compaction produces duplicates conditionally in case of HMS instance
> running initiator crash
> --------------------------------------------------------------------------------------------------
>
> Key: HIVE-29210
> URL: https://issues.apache.org/jira/browse/HIVE-29210
> Project: Hive
> Issue Type: Bug
> Reporter: tanishqchugh
> Assignee: tanishqchugh
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.2.0
>
>
> In a case, with multiple HiveServer2 (HS2) instances, one of the HS2
> instances may run on the same host as the Hive Metastore (HMS). In this
> setup, the initiator runs within HMS, while the compaction worker threads run
> within HS2.
> If the HMS instance unexpectedly crashes, the method revokeFromLocalWorkers()
> is invoked. This method resets all compaction jobs back to the initiated
> state, provided they were running on the same host. We believe this behavior
> is by design: if both HMS and HS2(running workers) were to crash
> simultaneously, and jobs were not reset, those compactions could remain
> stalled until revokeTimedoutWorkers() eventually reclaims them.
> However, in the case where HMS crashes but the HS2 instance survives, the
> reset still occurs. As a result, the job is made available for reassignment
> even though the original HS2 worker is still actively processing it. This can
> lead to a scenario where another HS2 worker picks up the same compaction
> task, causing two workers to run the same minor compaction job concurrently.
> This race condition can intermittently result in duplicate records being
> written to the table.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)