[jira] [Work logged] (HIVE-26875) Transaction conflict retry loop only executes once
[ https://issues.apache.org/jira/browse/HIVE-26875?focusedWorklogId=836960&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-836960 ] ASF GitHub Bot logged work on HIVE-26875: - Author: ASF GitHub Bot Created on: 04/Jan/23 17:24 Start Date: 04/Jan/23 17:24 Worklog Time Spent: 10m Work Description: deniskuzZ merged PR #3887: URL: https://github.com/apache/hive/pull/3887 Issue Time Tracking --- Worklog Id: (was: 836960) Time Spent: 2h (was: 1h 50m) > Transaction conflict retry loop only executes once > -- > > Key: HIVE-26875 > URL: https://issues.apache.org/jira/browse/HIVE-26875 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: John Sherman >Assignee: John Sherman >Priority: Critical > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > Currently the "conflict retry loop" only executes once. > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L264] > The intent of this loop is to detect if a conflicting transaction has > committed while we were waiting to acquire locks. If there is a conflicting > transaction, it invalidates the snapshot, rolls-back the transaction, opens a > new transaction and tries to re-acquire locks (and then recompile). It then > checks again if a conflicting transaction has committed and if so, redoes the > above steps again, up to HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT times. > However - isValidTxnState relies on getNonSharedLockedTable(): > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java#L422] > which does: > {code:java} > private Set getNonSharedLockedTables() { > if (CollectionUtils.isEmpty(driver.getContext().getHiveLocks())) { > return Collections.emptySet(); // Nothing to check > }{code} > getHiveLocks gets populated by lockAndRespond... HOWEVER - > compileInternal ends up calling compile which ends up calling > preparForCompile which ends up calling prepareContext which ends up > destroying the context with the information lockAndRespond populated. So when > the loop executes after all of this, it will never detect a 2nd conflict > because isValidTxnState will always return true (because it thinks there are > no locked objects). > This manifests as duplicate records being created during concurrent UPDATEs > if a transaction get conflicted twice. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26875) Transaction conflict retry loop only executes once
[ https://issues.apache.org/jira/browse/HIVE-26875?focusedWorklogId=836316&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-836316 ] ASF GitHub Bot logged work on HIVE-26875: - Author: ASF GitHub Bot Created on: 31/Dec/22 18:29 Start Date: 31/Dec/22 18:29 Worklog Time Spent: 10m Work Description: jfsii commented on PR #3887: URL: https://github.com/apache/hive/pull/3887#issuecomment-1368262791 > LGTM +1, thanks for the refactor, John! > Pending green build. Thanks @deniskuzZ for the review, it looks like I got a green build now (after a retry or two due to the oracle metastore tests) Issue Time Tracking --- Worklog Id: (was: 836316) Time Spent: 1h 50m (was: 1h 40m) > Transaction conflict retry loop only executes once > -- > > Key: HIVE-26875 > URL: https://issues.apache.org/jira/browse/HIVE-26875 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: John Sherman >Assignee: John Sherman >Priority: Critical > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently the "conflict retry loop" only executes once. > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L264] > The intent of this loop is to detect if a conflicting transaction has > committed while we were waiting to acquire locks. If there is a conflicting > transaction, it invalidates the snapshot, rolls-back the transaction, opens a > new transaction and tries to re-acquire locks (and then recompile). It then > checks again if a conflicting transaction has committed and if so, redoes the > above steps again, up to HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT times. > However - isValidTxnState relies on getNonSharedLockedTable(): > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java#L422] > which does: > {code:java} > private Set getNonSharedLockedTables() { > if (CollectionUtils.isEmpty(driver.getContext().getHiveLocks())) { > return Collections.emptySet(); // Nothing to check > }{code} > getHiveLocks gets populated by lockAndRespond... HOWEVER - > compileInternal ends up calling compile which ends up calling > preparForCompile which ends up calling prepareContext which ends up > destroying the context with the information lockAndRespond populated. So when > the loop executes after all of this, it will never detect a 2nd conflict > because isValidTxnState will always return true (because it thinks there are > no locked objects). > This manifests as duplicate records being created during concurrent UPDATEs > if a transaction get conflicted twice. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26875) Transaction conflict retry loop only executes once
[ https://issues.apache.org/jira/browse/HIVE-26875?focusedWorklogId=835365&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-835365 ] ASF GitHub Bot logged work on HIVE-26875: - Author: ASF GitHub Bot Created on: 22/Dec/22 17:17 Start Date: 22/Dec/22 17:17 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3887: URL: https://github.com/apache/hive/pull/3887#issuecomment-1363121050 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=3887) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3887&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3887&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3887&resolved=false&types=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=CODE_SMELL) [0 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3887&metric=coverage&view=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3887&metric=duplicated_lines_density&view=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 835365) Time Spent: 1h 40m (was: 1.5h) > Transaction conflict retry loop only executes once > -- > > Key: HIVE-26875 > URL: https://issues.apache.org/jira/browse/HIVE-26875 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: John Sherman >Assignee: John Sherman >Priority: Critical > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently the "conflict retry loop" only executes once. > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L264] > The intent of this loop is to detect if a conflicting transaction has > committed while we were waiting to acquire locks. If there is a conflicting > transaction, it invalidates the snapshot, rolls-back the transaction, opens a > new transaction and tries to re-acquire locks (and then recompile). It then > checks again if a conflicting transaction has committed and if so, redoes the
[jira] [Work logged] (HIVE-26875) Transaction conflict retry loop only executes once
[ https://issues.apache.org/jira/browse/HIVE-26875?focusedWorklogId=835310&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-835310 ] ASF GitHub Bot logged work on HIVE-26875: - Author: ASF GitHub Bot Created on: 22/Dec/22 13:28 Start Date: 22/Dec/22 13:28 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3887: URL: https://github.com/apache/hive/pull/3887#issuecomment-1362841406 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=3887) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3887&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3887&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3887&resolved=false&types=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=CODE_SMELL) [0 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3887&metric=coverage&view=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3887&metric=duplicated_lines_density&view=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 835310) Time Spent: 1.5h (was: 1h 20m) > Transaction conflict retry loop only executes once > -- > > Key: HIVE-26875 > URL: https://issues.apache.org/jira/browse/HIVE-26875 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: John Sherman >Assignee: John Sherman >Priority: Critical > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently the "conflict retry loop" only executes once. > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L264] > The intent of this loop is to detect if a conflicting transaction has > committed while we were waiting to acquire locks. If there is a conflicting > transaction, it invalidates the snapshot, rolls-back the transaction, opens a > new transaction and tries to re-acquire locks (and then recompile). It then > checks again if a conflicting transaction has committed and if so, redoes the
[jira] [Work logged] (HIVE-26875) Transaction conflict retry loop only executes once
[ https://issues.apache.org/jira/browse/HIVE-26875?focusedWorklogId=835299&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-835299 ] ASF GitHub Bot logged work on HIVE-26875: - Author: ASF GitHub Bot Created on: 22/Dec/22 12:39 Start Date: 22/Dec/22 12:39 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3887: URL: https://github.com/apache/hive/pull/3887#discussion_r1055417803 ## ql/src/java/org/apache/hadoop/hive/ql/Driver.java: ## @@ -261,6 +274,8 @@ private boolean validateTxnList() throws CommandProcessorException { String userFromUGI = DriverUtils.getUserFromUGI(driverContext); driverContext.getTxnManager().openTxn(context, userFromUGI, driverContext.getTxnType()); +// this is used by the test framework to more easily generate conflicting transactions +if (testConflictCB != null) testConflictCB.run(); Review Comment: looks good! Issue Time Tracking --- Worklog Id: (was: 835299) Time Spent: 1h 20m (was: 1h 10m) > Transaction conflict retry loop only executes once > -- > > Key: HIVE-26875 > URL: https://issues.apache.org/jira/browse/HIVE-26875 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: John Sherman >Assignee: John Sherman >Priority: Critical > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently the "conflict retry loop" only executes once. > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L264] > The intent of this loop is to detect if a conflicting transaction has > committed while we were waiting to acquire locks. If there is a conflicting > transaction, it invalidates the snapshot, rolls-back the transaction, opens a > new transaction and tries to re-acquire locks (and then recompile). It then > checks again if a conflicting transaction has committed and if so, redoes the > above steps again, up to HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT times. > However - isValidTxnState relies on getNonSharedLockedTable(): > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java#L422] > which does: > {code:java} > private Set getNonSharedLockedTables() { > if (CollectionUtils.isEmpty(driver.getContext().getHiveLocks())) { > return Collections.emptySet(); // Nothing to check > }{code} > getHiveLocks gets populated by lockAndRespond... HOWEVER - > compileInternal ends up calling compile which ends up calling > preparForCompile which ends up calling prepareContext which ends up > destroying the context with the information lockAndRespond populated. So when > the loop executes after all of this, it will never detect a 2nd conflict > because isValidTxnState will always return true (because it thinks there are > no locked objects). > This manifests as duplicate records being created during concurrent UPDATEs > if a transaction get conflicted twice. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26875) Transaction conflict retry loop only executes once
[ https://issues.apache.org/jira/browse/HIVE-26875?focusedWorklogId=835122&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-835122 ] ASF GitHub Bot logged work on HIVE-26875: - Author: ASF GitHub Bot Created on: 21/Dec/22 18:17 Start Date: 21/Dec/22 18:17 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3887: URL: https://github.com/apache/hive/pull/3887#issuecomment-1361809343 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=3887) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3887&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3887&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3887&resolved=false&types=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=CODE_SMELL) [0 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3887&metric=coverage&view=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3887&metric=duplicated_lines_density&view=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 835122) Time Spent: 1h 10m (was: 1h) > Transaction conflict retry loop only executes once > -- > > Key: HIVE-26875 > URL: https://issues.apache.org/jira/browse/HIVE-26875 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: John Sherman >Assignee: John Sherman >Priority: Critical > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently the "conflict retry loop" only executes once. > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L264] > The intent of this loop is to detect if a conflicting transaction has > committed while we were waiting to acquire locks. If there is a conflicting > transaction, it invalidates the snapshot, rolls-back the transaction, opens a > new transaction and tries to re-acquire locks (and then recompile). It then > checks again if a conflicting transaction has committed and if so, redoes the
[jira] [Work logged] (HIVE-26875) Transaction conflict retry loop only executes once
[ https://issues.apache.org/jira/browse/HIVE-26875?focusedWorklogId=835114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-835114 ] ASF GitHub Bot logged work on HIVE-26875: - Author: ASF GitHub Bot Created on: 21/Dec/22 17:37 Start Date: 21/Dec/22 17:37 Worklog Time Spent: 10m Work Description: jfsii commented on PR #3887: URL: https://github.com/apache/hive/pull/3887#issuecomment-1361732696 Thanks @deniskuzZ for the review, I have updated the test. I didn't have enough experience with mocking style tests to understand that what I was trying to do was somewhat doable. Hopefully it seems better to you, it does seem much better to me. Issue Time Tracking --- Worklog Id: (was: 835114) Time Spent: 1h (was: 50m) > Transaction conflict retry loop only executes once > -- > > Key: HIVE-26875 > URL: https://issues.apache.org/jira/browse/HIVE-26875 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: John Sherman >Assignee: John Sherman >Priority: Critical > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Currently the "conflict retry loop" only executes once. > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L264] > The intent of this loop is to detect if a conflicting transaction has > committed while we were waiting to acquire locks. If there is a conflicting > transaction, it invalidates the snapshot, rolls-back the transaction, opens a > new transaction and tries to re-acquire locks (and then recompile). It then > checks again if a conflicting transaction has committed and if so, redoes the > above steps again, up to HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT times. > However - isValidTxnState relies on getNonSharedLockedTable(): > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java#L422] > which does: > {code:java} > private Set getNonSharedLockedTables() { > if (CollectionUtils.isEmpty(driver.getContext().getHiveLocks())) { > return Collections.emptySet(); // Nothing to check > }{code} > getHiveLocks gets populated by lockAndRespond... HOWEVER - > compileInternal ends up calling compile which ends up calling > preparForCompile which ends up calling prepareContext which ends up > destroying the context with the information lockAndRespond populated. So when > the loop executes after all of this, it will never detect a 2nd conflict > because isValidTxnState will always return true (because it thinks there are > no locked objects). > This manifests as duplicate records being created during concurrent UPDATEs > if a transaction get conflicted twice. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26875) Transaction conflict retry loop only executes once
[ https://issues.apache.org/jira/browse/HIVE-26875?focusedWorklogId=835112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-835112 ] ASF GitHub Bot logged work on HIVE-26875: - Author: ASF GitHub Bot Created on: 21/Dec/22 17:28 Start Date: 21/Dec/22 17:28 Worklog Time Spent: 10m Work Description: jfsii commented on code in PR #3887: URL: https://github.com/apache/hive/pull/3887#discussion_r1054647497 ## ql/src/java/org/apache/hadoop/hive/ql/Driver.java: ## @@ -261,6 +274,8 @@ private boolean validateTxnList() throws CommandProcessorException { String userFromUGI = DriverUtils.getUserFromUGI(driverContext); driverContext.getTxnManager().openTxn(context, userFromUGI, driverContext.getTxnType()); +// this is used by the test framework to more easily generate conflicting transactions +if (testConflictCB != null) testConflictCB.run(); Review Comment: Thanks for the feedback. I re-wrote the test using mocking techniques, it turned out much better. The approach I took was mostly because I didn't have strong enough experience with mocking style tests to understand I could do it this way. Issue Time Tracking --- Worklog Id: (was: 835112) Time Spent: 50m (was: 40m) > Transaction conflict retry loop only executes once > -- > > Key: HIVE-26875 > URL: https://issues.apache.org/jira/browse/HIVE-26875 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: John Sherman >Assignee: John Sherman >Priority: Critical > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Currently the "conflict retry loop" only executes once. > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L264] > The intent of this loop is to detect if a conflicting transaction has > committed while we were waiting to acquire locks. If there is a conflicting > transaction, it invalidates the snapshot, rolls-back the transaction, opens a > new transaction and tries to re-acquire locks (and then recompile). It then > checks again if a conflicting transaction has committed and if so, redoes the > above steps again, up to HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT times. > However - isValidTxnState relies on getNonSharedLockedTable(): > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java#L422] > which does: > {code:java} > private Set getNonSharedLockedTables() { > if (CollectionUtils.isEmpty(driver.getContext().getHiveLocks())) { > return Collections.emptySet(); // Nothing to check > }{code} > getHiveLocks gets populated by lockAndRespond... HOWEVER - > compileInternal ends up calling compile which ends up calling > preparForCompile which ends up calling prepareContext which ends up > destroying the context with the information lockAndRespond populated. So when > the loop executes after all of this, it will never detect a 2nd conflict > because isValidTxnState will always return true (because it thinks there are > no locked objects). > This manifests as duplicate records being created during concurrent UPDATEs > if a transaction get conflicted twice. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26875) Transaction conflict retry loop only executes once
[ https://issues.apache.org/jira/browse/HIVE-26875?focusedWorklogId=835004&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-835004 ] ASF GitHub Bot logged work on HIVE-26875: - Author: ASF GitHub Bot Created on: 21/Dec/22 09:00 Start Date: 21/Dec/22 09:00 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3887: URL: https://github.com/apache/hive/pull/3887#discussion_r1054134017 ## ql/src/java/org/apache/hadoop/hive/ql/Driver.java: ## @@ -261,6 +274,8 @@ private boolean validateTxnList() throws CommandProcessorException { String userFromUGI = DriverUtils.getUserFromUGI(driverContext); driverContext.getTxnManager().openTxn(context, userFromUGI, driverContext.getTxnType()); +// this is used by the test framework to more easily generate conflicting transactions +if (testConflictCB != null) testConflictCB.run(); Review Comment: Strong NO for this approach. Use mocking techniques and refactor the code if needed. There shouldn't be any test hacks in the production code. Issue Time Tracking --- Worklog Id: (was: 835004) Time Spent: 40m (was: 0.5h) > Transaction conflict retry loop only executes once > -- > > Key: HIVE-26875 > URL: https://issues.apache.org/jira/browse/HIVE-26875 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: John Sherman >Assignee: John Sherman >Priority: Critical > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Currently the "conflict retry loop" only executes once. > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L264] > The intent of this loop is to detect if a conflicting transaction has > committed while we were waiting to acquire locks. If there is a conflicting > transaction, it invalidates the snapshot, rolls-back the transaction, opens a > new transaction and tries to re-acquire locks (and then recompile). It then > checks again if a conflicting transaction has committed and if so, redoes the > above steps again, up to HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT times. > However - isValidTxnState relies on getNonSharedLockedTable(): > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java#L422] > which does: > {code:java} > private Set getNonSharedLockedTables() { > if (CollectionUtils.isEmpty(driver.getContext().getHiveLocks())) { > return Collections.emptySet(); // Nothing to check > }{code} > getHiveLocks gets populated by lockAndRespond... HOWEVER - > compileInternal ends up calling compile which ends up calling > preparForCompile which ends up calling prepareContext which ends up > destroying the context with the information lockAndRespond populated. So when > the loop executes after all of this, it will never detect a 2nd conflict > because isValidTxnState will always return true (because it thinks there are > no locked objects). > This manifests as duplicate records being created during concurrent UPDATEs > if a transaction get conflicted twice. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26875) Transaction conflict retry loop only executes once
[ https://issues.apache.org/jira/browse/HIVE-26875?focusedWorklogId=834944&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-834944 ] ASF GitHub Bot logged work on HIVE-26875: - Author: ASF GitHub Bot Created on: 20/Dec/22 23:29 Start Date: 20/Dec/22 23:29 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3887: URL: https://github.com/apache/hive/pull/3887#issuecomment-1360464170 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=3887) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3887&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3887&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3887&resolved=false&types=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=CODE_SMELL) [2 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3887&resolved=false&types=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3887&metric=coverage&view=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3887&metric=duplicated_lines_density&view=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 834944) Time Spent: 0.5h (was: 20m) > Transaction conflict retry loop only executes once > -- > > Key: HIVE-26875 > URL: https://issues.apache.org/jira/browse/HIVE-26875 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: John Sherman >Assignee: John Sherman >Priority: Critical > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently the "conflict retry loop" only executes once. > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L264] > The intent of this loop is to detect if a conflicting transaction has > committed while we were waiting to acquire locks. If there is a conflicting > transaction, it invalidates the snapshot, rolls-back the transaction, opens a > new transaction and tries to re-acquire locks (and then recompile). It then > checks again if a conflicting transaction has committed and if so, redoes the > a
[jira] [Work logged] (HIVE-26875) Transaction conflict retry loop only executes once
[ https://issues.apache.org/jira/browse/HIVE-26875?focusedWorklogId=834941&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-834941 ] ASF GitHub Bot logged work on HIVE-26875: - Author: ASF GitHub Bot Created on: 20/Dec/22 22:52 Start Date: 20/Dec/22 22:52 Worklog Time Spent: 10m Work Description: jfsii commented on code in PR #3887: URL: https://github.com/apache/hive/pull/3887#discussion_r1053821695 ## ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java: ## @@ -42,6 +42,7 @@ import org.apache.hadoop.hive.ql.Driver; import org.apache.hadoop.hive.ql.QueryState; import org.apache.hadoop.hive.ql.io.AcidUtils; +import org.apache.hadoop.hive.ql.metadata.HiveException; Review Comment: I don't think this is used, remove it. ## ql/src/java/org/apache/hadoop/hive/ql/Driver.java: ## @@ -555,7 +551,14 @@ private void prepareContext() throws CommandProcessorException { String originalCboInfo = context != null ? context.cboInfo : null; if (context != null && context.getExplainAnalyze() != AnalyzeState.RUNNING) { // close the existing ctx etc before compiling a new query, but does not destroy driver - closeInProcess(false); + if (!driverContext.isRetrial()) { +closeInProcess(false); + } else { +// On retrail we need to maintain information from the prior context. Such Review Comment: fix typo ## ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java: ## @@ -2465,6 +2468,113 @@ private void testConcurrent2MergeUpdatesConflict(boolean slowCompile) throws Exc Assert.assertTrue("Lost Update", isEqualCollection(res, asList("earl\t10", "amy\t10"))); } + // The intent of this test is to cause multiple conflicts to the same query to test the conflict retry functionality. + @Test + public void testConcurrentConflictRetry() throws Exception { +dropTable(new String[]{"target"}); + +driver2.getConf().setBoolVar(HiveConf.ConfVars.TXN_WRITE_X_LOCK, true); +driver.run("create table target(i int) stored as orc tblproperties ('transactional'='true')"); +driver.run("insert into target values (1),(1)"); + +DbTxnManager txnMgr2 = (DbTxnManager) TxnManagerFactory.getTxnManagerFactory().getTxnManager(conf); +swapTxnManager(txnMgr2); +// Start a query on driver2 +driver2.compileAndRespond("update target set i = 1 where i = 1"); + +swapTxnManager(txnMgr); +// this should cause a conflict which should cause driver2.run to invoke its conflict lambda +driver.run("update target set i = 1 where i = 1"); +swapTxnManager(txnMgr2); + +// This run should conflict with the above query and cause the "conflict lambda" to be execute, +// which will then also conflict with the driver2 query and cause it to retry. +AtomicInteger conflictCount = new AtomicInteger(); +driver2.run(() -> { + conflictCount.getAndIncrement(); + // on first conflict, + // we execute another query to cause an additional conflict + if (conflictCount.get() == 1) { +swapTxnManager(txnMgr); +// this should cause a conflict +try { + driver.run("update target set i = 1 where i = 1"); +} catch (Exception e) { + // do nothing +} +swapTxnManager(txnMgr2); + } +}); + +// we expected two conflicts +Assert.assertEquals(2, conflictCount.get()); +swapTxnManager(txnMgr); +// we expect two rows +driver.run("select * from target"); +List res = new ArrayList<>(); +driver.getFetchTask().fetch(res); +Assert.assertEquals(2, res.size()); + } + + @Rule + public ExpectedException exceptionRule = ExpectedException.none(); + + @Test + public void testConcurrentConflictMaxRetryCount() throws Exception { +dropTable(new String[]{"target"}); + +driver2.getConf().setBoolVar(HiveConf.ConfVars.TXN_WRITE_X_LOCK, true); + +final int maxRetries = 4; + driver2.getConf().setIntVar(HiveConf.ConfVars.HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT, maxRetries); + +driver.run("create table target(i int) stored as orc tblproperties ('transactional'='true')"); +driver.run("insert into target values (1),(1)"); + +DbTxnManager txnMgr2 = (DbTxnManager) TxnManagerFactory.getTxnManagerFactory().getTxnManager(conf); +swapTxnManager(txnMgr2); +// Start a query on driver2, we expect this query to never execute because of HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT. +// We verify that it is never executed by counting the number of rows returned that have i = 1. +driver2.compileAndRespond("update target set i = 2 where i = 1"); + +swapTxnManager(txnMgr); +// this should cause a conflict which should cause driver2.run to invoke its conflict lambda +driver.run("update target set i = 1 where i = 1"); +swapTxnM
[jira] [Work logged] (HIVE-26875) Transaction conflict retry loop only executes once
[ https://issues.apache.org/jira/browse/HIVE-26875?focusedWorklogId=834940&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-834940 ] ASF GitHub Bot logged work on HIVE-26875: - Author: ASF GitHub Bot Created on: 20/Dec/22 22:45 Start Date: 20/Dec/22 22:45 Worklog Time Spent: 10m Work Description: jfsii opened a new pull request, #3887: URL: https://github.com/apache/hive/pull/3887 ### What changes were proposed in this pull request? The functional change is to not clear the query context completely when re-compiling the query. ### Why are the changes needed? It is a bug that can cause duplicate records or other incorrect behavior. ### Does this PR introduce _any_ user-facing change? No - except fixing the bug ### How was this patch tested? Added test cases and tested on a 3 node cluster. Issue Time Tracking --- Worklog Id: (was: 834940) Remaining Estimate: 0h Time Spent: 10m > Transaction conflict retry loop only executes once > -- > > Key: HIVE-26875 > URL: https://issues.apache.org/jira/browse/HIVE-26875 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: John Sherman >Assignee: John Sherman >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > Currently the "conflict retry loop" only executes once. > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L264] > The intent of this loop is to detect if a conflicting transaction has > committed while we were waiting to acquire locks. If there is a conflicting > transaction, it invalidates the snapshot, rolls-back the transaction, opens a > new transaction and tries to re-acquire locks (and then recompile). It then > checks again if a conflicting transaction has committed and if so, redoes the > above steps again, up to HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT times. > However - isValidTxnState relies on getNonSharedLockedTable(): > [https://github.com/apache/hive/blob/ab4c53de82d4aaa33706510441167f2df55df15e/ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java#L422] > which does: > {code:java} > private Set getNonSharedLockedTables() { > if (CollectionUtils.isEmpty(driver.getContext().getHiveLocks())) { > return Collections.emptySet(); // Nothing to check > }{code} > getHiveLocks gets populated by lockAndRespond... HOWEVER - > compileInternal ends up calling compile which ends up calling > preparForCompile which ends up calling prepareContext which ends up > destroying the context with the information lockAndRespond populated. So when > the loop executes after all of this, it will never detect a 2nd conflict > because isValidTxnState will always return true (because it thinks there are > no locked objects). > This manifests as duplicate records being created during concurrent UPDATEs > if a transaction get conflicted twice. -- This message was sent by Atlassian Jira (v8.20.10#820010)