[ https://issues.apache.org/jira/browse/HIVE-26107?focusedWorklogId=760707&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760707 ]
ASF GitHub Bot logged work on HIVE-26107: ----------------------------------------- Author: ASF GitHub Bot Created on: 22/Apr/22 09:25 Start Date: 22/Apr/22 09:25 Worklog Time Spent: 10m Work Description: klcopp commented on code in PR #3172: URL: https://github.com/apache/hive/pull/3172#discussion_r855867715 ########## ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java: ########## @@ -303,8 +303,15 @@ void setWriteIdForAcidFileSinks() throws SemanticException, LockException { private void allocateWriteIdForAcidAnalyzeTable() throws LockException { if (driverContext.getPlan().getAcidAnalyzeTable() != null) { + //Inside a compaction transaction, only stats gathering is running which is not requiring a new write id, + //and for duplicate compaction detection it is necessary to not increment it. + boolean isWithinCompactionTxn = Boolean.parseBoolean(SessionState.get().getHiveVariables().get(Constants.INSIDE_COMPACTION_TRANSACTION_FLAG)); Review Comment: I think you can use driverContext.getTxnType() instead (TxnType.COMPACTION) ########## ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java: ########## @@ -797,7 +797,6 @@ public void testCompactStatsGather() throws Exception { int[][] targetVals2 = {{5, 1, 1}, {5, 2, 2}, {5, 3, 1}, {5, 4, 2}}; runStatementOnDriver("insert into T partition(p=1,q) " + makeValuesClause(targetVals2)); - Review Comment: Nit: Unnecessary change to this file ########## itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java: ########## @@ -122,6 +125,56 @@ public void testCompactionShouldNotFailOnPartitionsWithBooleanField() throws Exc "ready for cleaning", compacts.get(0).getState()); } + @Test + public void secondCompactionShouldBeRefusedBeforeEnqueueing() throws Exception { + conf.setBoolVar(HiveConf.ConfVars.COMPACTOR_CRUD_QUERY_BASED, true); + // Set delta numbuer threshold to 2 to avoid skipping compaction because of too few deltas + conf.setIntVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_NUM_THRESHOLD, 2); + // Set delta percentage to a high value to suppress selecting major compression based on that + conf.setFloatVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD, 1000f); Review Comment: These 2 settings aren't necessary since the Initiator uses these thresholds, but in the test we always queue compaction manually ########## ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/CompactorTest.java: ########## @@ -38,6 +38,7 @@ import org.apache.hadoop.hive.metastore.api.AllocateTableWriteIdsResponse; import org.apache.hadoop.hive.metastore.api.CommitTxnRequest; import org.apache.hadoop.hive.metastore.api.CompactionRequest; +import org.apache.hadoop.hive.metastore.api.CompactionResponse; Review Comment: Nit: Unused import ########## ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java: ########## @@ -235,18 +235,18 @@ private void loadData(boolean isVectorized) throws Exception { runStatementOnDriver("export table Tstage to '" + getWarehouseDir() +"/2'"); runStatementOnDriver("load data inpath '" + getWarehouseDir() + "/2/data' overwrite into table T"); String[][] expected3 = new String[][] { - {"{\"writeid\":5,\"bucketid\":536870912,\"rowid\":0}\t5\t6", "t/base_0000005/000000_0"}, Review Comment: Just trying to understand – Why was the writeid 5 originally? And if there was no compaction in the meantime, why is it 4 now? ########## ql/src/test/queries/clientpositive/acid_insert_overwrite_update.q: ########## @@ -26,7 +26,6 @@ insert overwrite table sequential_update values(current_timestamp, 0, current_ti delete from sequential_update where seq=2; select distinct IF(seq==0, 'LOOKS OKAY', 'BROKEN'), regexp_extract(INPUT__FILE__NAME, '.*/(.*)/[^/]*', 1) from sequential_update; -alter table sequential_update compact 'major'; Review Comment: Why change this? Issue Time Tracking ------------------- Worklog Id: (was: 760707) Time Spent: 20m (was: 10m) > Worker shouldn't inject duplicate entries in `ready for cleaning` state into > the compaction queue > ------------------------------------------------------------------------------------------------- > > Key: HIVE-26107 > URL: https://issues.apache.org/jira/browse/HIVE-26107 > Project: Hive > Issue Type: Improvement > Reporter: László Végh > Assignee: László Végh > Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > How to reproduce: > 1) create an acid table and load some data ; > 2) manually trigger the compaction for the table several times; > 4) inspect compaction_queue: There are multiple entries in 'ready for > cleaning' state for the same table. > > Expected behavior: All compaction request after the first one should be > rejected until the table is changed again. -- This message was sent by Atlassian Jira (v8.20.7#820007)