okumin commented on code in PR #4477:
URL: https://github.com/apache/hive/pull/4477#discussion_r1269422641


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java:
##########
@@ -86,6 +108,13 @@ public static ReduceWork createReduceWork(
 
     float maxPartitionFactor =
         context.conf.getFloatVar(HiveConf.ConfVars.TEZ_MAX_PARTITION_FACTOR);
+    
+    if (context.parseContext.getContext().getOperation() == 
Context.Operation.DELETE &&
+            isRestrictReducerExtrapolation(context)) {
+      LOG.debug("Overriding maxPartitionFactor to 1.0 to prevent creation of 
small files after delete operation");
+      maxPartitionFactor = 1f;

Review Comment:
   I quickly double-checked it works as expected.
   
   ACID
   
   ```
   $ beeline -e "
   > DROP TABLE IF EXISTS test;
   > CREATE TABLE test (id INT) STORED AS ORC TBLPROPERTIES 
('transactional'='true');
   > INSERT INTO test VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
   > "
   ...
   $ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf 
hive.server2.in.place.progress=false --hiveconf 
hive.tez.auto.reducer.parallelism=true
   ...
   INFO  : 2023-07-20 12:52:55,637      Map 1: -/-      Reducer 2: 0/1  
   INFO  : 2023-07-20 12:52:57,161      Map 1: 0/1      Reducer 2: 0/1
   ```
   
   Iceberg
   ```
   $ beeline -e "
   > DROP TABLE IF EXISTS test;
   > CREATE TABLE test (id INT) STORED BY ICEBERG 
TBLPROPERTIES('format-version'='2');
   > INSERT INTO test VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
   > "
   ...
   $ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf 
hive.server2.in.place.progress=false --hiveconf 
hive.tez.auto.reducer.parallelism=true
   ...
   INFO  : 2023-07-20 12:55:56,104      Map 1: -/-      Reducer 2: 0/1  
   INFO  : 2023-07-20 12:55:58,337      Map 1: 0/1      Reducer 2: 0/1
   ```



##########
ql/src/java/org/apache/hadoop/hive/ql/ddl/DDLUtils.java:
##########
@@ -231,10 +231,17 @@ public static void 
validateTableIsIceberg(org.apache.hadoop.hive.ql.metadata.Tab
   }
 
   public static boolean isIcebergTable(Table table) {
-    return table.isNonNative() && 
-            table.getStorageHandler().getType() == StorageHandlerTypes.ICEBERG;
+    return table.isNonNative() &&
+            ((table.getStorageHandler() != null && 
table.getStorageHandler().getType() == StorageHandlerTypes.ICEBERG) || 
+                    isIcebergTableType(table.getTTable().getParameters()));

Review Comment:
   Is there any case where StorageHandler#getType != ICEBERG but 
isIcebergTableType = true?



##########
ql/src/java/org/apache/hadoop/hive/ql/ddl/DDLUtils.java:
##########
@@ -231,10 +231,17 @@ public static void 
validateTableIsIceberg(org.apache.hadoop.hive.ql.metadata.Tab
   }
 
   public static boolean isIcebergTable(Table table) {

Review Comment:
   Wow, I originally thought the change be available also for Delta Lake, Hudi, 
etc, but looks like we hardcode many for Iceberg...
   I will create a ticket to push this kind of logic to StorageHandler...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to