[ https://issues.apache.org/jira/browse/HIVE-22420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aron Hamvas updated HIVE-22420: ------------------------------- Status: In Progress (was: Patch Available) > DbTxnManager.stopHeartbeat() should be thread-safe > -------------------------------------------------- > > Key: HIVE-22420 > URL: https://issues.apache.org/jira/browse/HIVE-22420 > Project: Hive > Issue Type: Bug > Affects Versions: 3.1.0 > Reporter: Aron Hamvas > Assignee: Aron Hamvas > Priority: Major > Attachments: HIVE-22420.1.patch > > > When a transactional query is being executed and interrupted via HS2 close > operation request, both the background pool thread executing the query and > the HttpHandler thread running the close operation logic will eventually call > the below method: > {noformat} > Driver.releaseLocksAndCommitOrRollback(commit boolean) > {noformat} > Since this method is invoked several times in both threads, it can happen > that the two threads invoke it at the same time, and due to a race condition, > the txnId field of the DbTxnManager used by both threads could be set to 0 > without actually successfully aborting the transaction. > The root cause is stopHeartbeat() method in DbTxnManager not being thread > safe: > When Thread-1 and Thread-2 enter stopHeartbeat() with very little time > difference, Thread-1 might successfully cancel the heartbeat task and set the > heartbeatTask field to null, while Thread-2 is trying to observe its state. > Thread-1 will return to the calling rollbackTxn() method and continue > execution there, while Thread-2 wis thrown back to the same method with a > NullPointerException. Thread-2 will then set txnId to 0, and Thread-1 is > sending this 0 value to HMS. So, the txn will not be aborted, and the locks > cannot be released later on either. -- This message was sent by Atlassian Jira (v8.3.4#803005)