Rajesh Balamohan created HIVE-27050:
---------------------------------------

             Summary: Iceberg: MOR: Restrict reducer extrapolation to contain 
number of small files being created
                 Key: HIVE-27050
                 URL: https://issues.apache.org/jira/browse/HIVE-27050
             Project: Hive
          Issue Type: Improvement
          Components: Iceberg integration
            Reporter: Rajesh Balamohan


Scenario:
 # Create a simple table in iceberg (MOR mode). e.g store_sales_delete_1
 # Insert some data into it. 
 # Run an update statement as follows
 ## "update  store_sales_delete_1 set ss_sold_time_sk=699060 where 
ss_sold_time_sk=69906"

Hive estimates the number of reducers as "1". But due to 
"hive.tez.max.partition.factor" which defaults to "2.0", it will double the 
number of reducers.

To put in perspective, it will create very small positional delete files 
spreading across different reducers. This will cause problems during reading, 
as all files should be opened for reading.

 
 # When iceberg MOR tables are involved in update/delete/merges, disable 
"hive.tez.max.partition.factor"; or set it to "1.0" irrespective of the user 
setting;
 # Have explicit logs for easier debugging; User shouldn't be confused on why 
the setting is not taking into effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to