okumin commented on code in PR #4477:
URL: https://github.com/apache/hive/pull/4477#discussion_r1266998178


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java:
##########
@@ -148,6 +148,14 @@ public void initialize(@Nullable Configuration 
configuration, Properties serDePr
     // TODO: remove once we have both Fanout and ClusteredWriter available: 
HIVE-25948
     HiveConf.setIntVar(configuration, 
HiveConf.ConfVars.HIVEOPTSORTDYNAMICPARTITIONTHRESHOLD, 1);
     HiveConf.setVar(configuration, HiveConf.ConfVars.DYNAMICPARTITIONINGMODE, 
"nonstrict");
+
+    Context.Operation operation = 
HiveCustomStorageHandlerUtils.getWriteOperation(configuration,
+            serDeProperties.getProperty(Catalogs.NAME));
+
+    if (operation != null) {
+      HiveConf.setFloatVar(configuration, 
HiveConf.ConfVars.TEZ_MAX_PARTITION_FACTOR, 1f);

Review Comment:
   I personally think it is reasonable to explicitly inject the logic into 
GenTezUtils or somewhere. One request is I'd like to make it pluggable because 
other formats would hit the same issue. As far as I checked on my machine, Hive 
ACID shares the problem. Note that these are my opinions and committers could 
have different ideas, or they might think it is an expected behavior.
   
   As for the parameter, I guess you tested it with 4.0.0-alpha-2 since the 
param was merged recently. It enforces auto reduce parallelism.
   
   ```
   $ beeline -e "
   > DROP TABLE IF EXISTS test;
   > CREATE TABLE test (id INT) STORED BY ICEBERG 
TBLPROPERTIES('format-version'='2');
   > INSERT INTO test VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
   > "
   ```
   
   ```
   $ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf 
hive.server2.in.place.progress=false --hiveconf 
hive.tez.auto.reducer.parallelism=true
   ...
   INFO  : 2023-07-18 14:19:45,200      Map 1: 0(+1)/1  Reducer 2: 0/2  
   INFO  : 2023-07-18 14:19:48,227      Map 1: 1/1      Reducer 2: 0(+1)/2
   ...
   $ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf 
hive.server2.in.place.progress=false --hiveconf 
hive.tez.auto.reducer.parallelism=true --hiveconf 
hive.tez.auto.reducer.parallelism.min.threshold=0.0
   ...
   INFO  : 2023-07-18 14:20:23,730      Map 1: 0(+1)/1  Reducer 2: 0/2  
   INFO  : 2023-07-18 14:20:27,271      Map 1: 1/1      Reducer 2: 0(+1)/1
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to