[jira] [Work logged] (HIVE-26107) Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

ASF GitHub Bot (Jira) Fri, 22 Apr 2022 02:26:05 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-26107?focusedWorklogId=760707&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760707
 ]


ASF GitHub Bot logged work on HIVE-26107:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Apr/22 09:25
            Start Date: 22/Apr/22 09:25
    Worklog Time Spent: 10m 
      Work Description: klcopp commented on code in PR #3172:
URL: https://github.com/apache/hive/pull/3172#discussion_r855867715


##########
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##########
@@ -303,8 +303,15 @@ void setWriteIdForAcidFileSinks() throws 
SemanticException, LockException {
 
   private void allocateWriteIdForAcidAnalyzeTable() throws LockException {
     if (driverContext.getPlan().getAcidAnalyzeTable() != null) {
+      //Inside a compaction transaction, only stats gathering is running which 
is not requiring a new write id,
+      //and for duplicate compaction detection it is necessary to not 
increment it.
+      boolean isWithinCompactionTxn = 
Boolean.parseBoolean(SessionState.get().getHiveVariables().get(Constants.INSIDE_COMPACTION_TRANSACTION_FLAG));

Review Comment:
   I think you can use driverContext.getTxnType() instead (TxnType.COMPACTION)



##########
ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java:
##########
@@ -797,7 +797,6 @@ public void testCompactStatsGather() throws Exception {
 
     int[][] targetVals2 = {{5, 1, 1}, {5, 2, 2}, {5, 3, 1}, {5, 4, 2}};
     runStatementOnDriver("insert into T partition(p=1,q) " + 
makeValuesClause(targetVals2));
-

Review Comment:
   Nit: Unnecessary change to this file



##########
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java:
##########
@@ -122,6 +125,56 @@ public void 
testCompactionShouldNotFailOnPartitionsWithBooleanField() throws Exc
             "ready for cleaning", compacts.get(0).getState());
   }
 
+  @Test
+  public void secondCompactionShouldBeRefusedBeforeEnqueueing() throws 
Exception {
+    conf.setBoolVar(HiveConf.ConfVars.COMPACTOR_CRUD_QUERY_BASED, true);
+    // Set delta numbuer threshold to 2 to avoid skipping compaction because 
of too few deltas
+    conf.setIntVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_NUM_THRESHOLD, 2);
+    // Set delta percentage to a high value to suppress selecting major 
compression based on that
+    conf.setFloatVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD, 
1000f);

Review Comment:
   These 2 settings aren't necessary since the Initiator uses these thresholds, 
but in the test we always queue compaction manually



##########
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/CompactorTest.java:
##########
@@ -38,6 +38,7 @@
 import org.apache.hadoop.hive.metastore.api.AllocateTableWriteIdsResponse;
 import org.apache.hadoop.hive.metastore.api.CommitTxnRequest;
 import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionResponse;

Review Comment:
   Nit: Unused import



##########
ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java:
##########
@@ -235,18 +235,18 @@ private void loadData(boolean isVectorized) throws 
Exception {
     runStatementOnDriver("export table Tstage to '" + getWarehouseDir() 
+"/2'");
     runStatementOnDriver("load data inpath '" + getWarehouseDir() + "/2/data' 
overwrite into table T");
     String[][] expected3 = new String[][] {
-        {"{\"writeid\":5,\"bucketid\":536870912,\"rowid\":0}\t5\t6", 
"t/base_0000005/000000_0"},

Review Comment:
   Just trying to understand – Why was the writeid 5 originally? And if there 
was no compaction in the meantime, why is it 4 now?



##########
ql/src/test/queries/clientpositive/acid_insert_overwrite_update.q:
##########
@@ -26,7 +26,6 @@ insert overwrite table sequential_update 
values(current_timestamp, 0, current_ti
 delete from sequential_update where seq=2;
 select distinct IF(seq==0, 'LOOKS OKAY', 'BROKEN'), 
regexp_extract(INPUT__FILE__NAME, '.*/(.*)/[^/]*', 1) from sequential_update;
 
-alter table sequential_update compact 'major';

Review Comment:
   Why change this?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 760707)
    Time Spent: 20m  (was: 10m)

> Worker shouldn't inject duplicate entries in `ready for cleaning` state into 
> the compaction queue
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-26107
>                 URL: https://issues.apache.org/jira/browse/HIVE-26107
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Végh
>            Assignee: László Végh
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> How to reproduce:
> 1) create an acid table and load some data ;
> 2) manually trigger the compaction for the table several times;
> 4) inspect compaction_queue: There are multiple entries in 'ready for 
> cleaning' state for the same table.
>  
> Expected behavior: All compaction request after the first one should be 
> rejected until the table is changed again.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26107) Worker shouldn't inject duplicate entries in `ready for cleaning` state into the compaction queue

Reply via email to