[ https://issues.apache.org/jira/browse/HIVE-27117?focusedWorklogId=849091&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-849091 ]
ASF GitHub Bot logged work on HIVE-27117: ----------------------------------------- Author: ASF GitHub Bot Created on: 04/Mar/23 09:00 Start Date: 04/Mar/23 09:00 Worklog Time Spent: 10m Work Description: abstractdog commented on code in PR #4096: URL: https://github.com/apache/hive/pull/4096#discussion_r1125422392 ########## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ########## @@ -3647,9 +3647,12 @@ long generateCompactionQueueId(Statement stmt) throws SQLException, MetaExceptio + "no record found in next_compaction_queue_id"); } long id = rs.getLong(1); - s = "UPDATE \"NEXT_COMPACTION_QUEUE_ID\" SET \"NCQ_NEXT\" = " + (id + 1); + s = "UPDATE \"NEXT_COMPACTION_QUEUE_ID\" SET \"NCQ_NEXT\" = " + (id + 1) + " WHERE \"NCQ_NEXT\" = " + id; LOG.debug("Going to execute update <{}>", s); - stmt.executeUpdate(s); + if (stmt.executeUpdate(s) != 1) { Review Comment: this looks like a hack in production code while we're targeting test fixes here, which is a bit concerning this method is already very confusing: it looks like we cannot guarantee synchronization regarding NCQ_NEXT (which is okay if we have more instances running this code, right?), as we're selecting it, doing something here in java code (increment), then trying to write it back, isn't there a way to use AUTOINCREMENTed value across the supported RDBMS types? I mean, this recursive call is not something I really like, we should find a better synchronization pattern, I think a compaction queue id increment is not something that happens 100 times per second, so even a RDBMS-level lock is fine on this table I guess (I don't know this area though) don't get me wrong, I'm more than happy to see stability fixes, but simply retrying in production code reminds me of a good old pattern in hive code: "we have no idea what's going on, so let's retry :) " Issue Time Tracking ------------------- Worklog Id: (was: 849091) Time Spent: 50m (was: 40m) > Fix compaction related flaky tests > ---------------------------------- > > Key: HIVE-27117 > URL: https://issues.apache.org/jira/browse/HIVE-27117 > Project: Hive > Issue Type: Task > Reporter: László Végh > Assignee: László Végh > Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > The following tests turned out to be flaky recently: > * > org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testDropTableAndCompactionConcurrent > * > org.apache.hadoop.hive.ql.txn.compactor.TestCompactionMetrics.testInitiatorFailuresCountedCorrectly > * > org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez.testMajorCompactionNotPartitionedWithoutBuckets > * > org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez.testCompactionWithCreateTableProps -- This message was sent by Atlassian Jira (v8.20.10#820010)