John Sherman created HIVE-26875:
-----------------------------------
Summary: Transaction conflict retry loop only executes once
Key: HIVE-26875
URL: https://issues.apache.org/jira/browse/HIVE-26875
Project: Hive
Issue Type: Bug
Components: HiveServer2
Reporter: John Sherman
Assignee: John Sherman
Currently the "conflict retry loop" only executes once.
[https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L264]
The intent of this loop is to detect if a conflicting transaction has committed
while we were waiting to acquire locks. If there is a conflicting transaction,
it invalidates the snapshot, rolls-back the transaction, opens a new
transaction and tries to re-acquire locks (and then recompile). It then checks
again if a conflicting transaction has committed and if so, redoes the above
steps again, up to HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT times.
However - isValidTxnState relies on getNonSharedLockedTable():
[https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java#L422]
which does:
{code:java}
private Set<String> getNonSharedLockedTables() {
if (CollectionUtils.isEmpty(driver.getContext().getHiveLocks())) {
return Collections.emptySet(); // Nothing to check
}{code}
getHiveLocks gets populated by lockAndRespond... HOWEVER -
compileInternal ends up calling compile which ends up calling preparForCompile
which ends up calling prepareContext which ends up destroying the context with
the information lockAndRespond populated. So when the loop executes after all
of this, it will never detect a 2nd conflict because isValidTxnState will
always return true (because it thinks there are no locked objects).
This manifests as duplicate records being created during concurrent UPDATEs if
a transaction get conflicted twice.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)