deniskuzZ commented on code in PR #5540:
URL: https://github.com/apache/hive/pull/5540#discussion_r1932471699
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergCompactionUtil.java:
##########
@@ -94,4 +105,48 @@ public static List<DeleteFile> getDeleteFiles(Table table,
String partitionPath)
return
Lists.newArrayList(CloseableIterable.transform(filteredDeletesScanTasks,
t -> ((PositionDeletesScanTask) t).file()));
}
+
+ /**
+ * Returns target file size as following:
+ * In case of Minor compaction:
+ * 1. When COMPACTION_FILE_SIZE_THRESHOLD is defined, returns it.
+ * 2. Otherwise, calculates the file size threshold as:
+ * COMPACTION_FILE_SIZE_THRESHOLD *
TableProperties.HIVE_ICEBERG_COMPACTION_TARGET_FILE_SIZE
+ * This makes Compaction evaluator consider data files with size less
than file size threshold as undersized
+ * segment files eligible for minor compaction (as per Amoro compaction
evaluator, which is minor compaction
+ * in Hive).
+ * In case of Major compaction returns -1.
+ * @param ci the compaction info
+ * @param conf Hive configuration
+ */
+ public static long getFileSizeThreshold(CompactionInfo ci, HiveConf conf) {
+ switch (ci.type) {
+ case MINOR:
+ return
Optional.ofNullable(ci.getProperty(CompactorContext.COMPACTION_FILE_SIZE_THRESHOLD))
Review Comment:
````
Optional.ofNullable(
ci.getProperty(CompactorContext.COMPACTION_FILE_SIZE_THRESHOLD))
.map(HiveConf::toSizeBytes)
.orElse(HiveConf.getSizeVar(conf,
HiveConf.ConfVars.HIVE_ICEBERG_COMPACTION_TARGET_FILE_SIZE)
* TableProperties.SELF_OPTIMIZING_MIN_TARGET_SIZE_RATIO_DEFAULT)
````
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]