veghlaci05 commented on code in PR #3775:
URL: https://github.com/apache/hive/pull/3775#discussion_r1063433513


##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java:
##########
@@ -504,6 +504,47 @@ private CompactionType 
determineCompactionType(CompactionInfo ci, AcidDirectory
       if (initiateMajor) return CompactionType.MAJOR;
     }
 
+    // bucket size calculation can be resource intensive if there are numerous 
deltas, so we check for rebalance
+    // compaction only if the table is in an acceptable shape: no major 
compaction required. This means the number of
+    // files shouldn't be too high
+    if ("tez".equalsIgnoreCase(HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_EXECUTION_ENGINE)) &&

Review Comment:
   Yes, running a rebalance compaction on uncompacted tables could be resource 
intensive due to the hive number of files and folders. So I decided to schedule 
rebalance compactions only on tables already major compacted. This ensures that 
the number of deltas are relatively low.



##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java:
##########
@@ -504,6 +504,47 @@ private CompactionType 
determineCompactionType(CompactionInfo ci, AcidDirectory
       if (initiateMajor) return CompactionType.MAJOR;
     }
 
+    // bucket size calculation can be resource intensive if there are numerous 
deltas, so we check for rebalance
+    // compaction only if the table is in an acceptable shape: no major 
compaction required. This means the number of
+    // files shouldn't be too high
+    if ("tez".equalsIgnoreCase(HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_EXECUTION_ENGINE)) &&

Review Comment:
   Yes, running a rebalance compaction on uncompacted tables could be resource 
intensive due to the high number of files and folders. So I decided to schedule 
rebalance compactions only on tables already major compacted. This ensures that 
the number of deltas are relatively low.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to