[jira] [Work logged] (HIVE-27117) Fix compaction related flaky tests

ASF GitHub Bot (Jira) Sat, 04 Mar 2023 01:01:06 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-27117?focusedWorklogId=849091&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-849091
 ]


ASF GitHub Bot logged work on HIVE-27117:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Mar/23 09:00
            Start Date: 04/Mar/23 09:00
    Worklog Time Spent: 10m 
      Work Description: abstractdog commented on code in PR #4096:
URL: https://github.com/apache/hive/pull/4096#discussion_r1125422392


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:
##########
@@ -3647,9 +3647,12 @@ long generateCompactionQueueId(Statement stmt) throws 
SQLException, MetaExceptio
             + "no record found in next_compaction_queue_id");
       }
       long id = rs.getLong(1);
-      s = "UPDATE \"NEXT_COMPACTION_QUEUE_ID\" SET \"NCQ_NEXT\" = " + (id + 1);
+      s = "UPDATE \"NEXT_COMPACTION_QUEUE_ID\" SET \"NCQ_NEXT\" = " + (id + 1) 
+ " WHERE \"NCQ_NEXT\" = " + id;
       LOG.debug("Going to execute update <{}>", s);
-      stmt.executeUpdate(s);
+      if (stmt.executeUpdate(s) != 1) {

Review Comment:
   this looks like a hack in production code while we're targeting test fixes 
here, which is a bit concerning
   this method is already very confusing: it looks like we cannot guarantee 
synchronization regarding NCQ_NEXT (which is okay if we have more instances 
running this code, right?), as we're selecting it, doing something here in java 
code (increment), then trying to write it back, isn't there a way to use 
AUTOINCREMENTed value across the supported RDBMS types?
   I mean, this recursive call is not something I really like, we should find a 
better synchronization pattern, I think a compaction queue id increment is not 
something that happens 100 times per second, so even a RDBMS-level lock is fine 
on this table I guess (I don't know this area though)
   
   don't get me wrong, I'm more than happy to see stability fixes, but simply 
retrying in production code reminds me of a good old pattern in hive code: "we 
have no idea what's going on, so let's retry :) "





Issue Time Tracking
-------------------

    Worklog Id:     (was: 849091)
    Time Spent: 50m  (was: 40m)

> Fix compaction related flaky tests
> ----------------------------------
>
>                 Key: HIVE-27117
>                 URL: https://issues.apache.org/jira/browse/HIVE-27117
>             Project: Hive
>          Issue Type: Task
>            Reporter: László Végh
>            Assignee: László Végh
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> The following tests turned out to be flaky recently:
>  * 
> org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testDropTableAndCompactionConcurrent
>  * 
> org.apache.hadoop.hive.ql.txn.compactor.TestCompactionMetrics.testInitiatorFailuresCountedCorrectly
>  * 
> org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez.testMajorCompactionNotPartitionedWithoutBuckets
>  * 
> org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez.testCompactionWithCreateTableProps



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27117) Fix compaction related flaky tests

Reply via email to