Steve Carlin created HIVE-26283:
-----------------------------------
Summary: Need better decision making for creating
SortedDynPartitionOptimizer
Key: HIVE-26283
URL: https://issues.apache.org/jira/browse/HIVE-26283
Project: Hive
Issue Type: Bug
Components: Logical Optimizer
Reporter: Steve Carlin
When the hive.optimize.sort.dynamic.partition.threshold param is set to 0, the
optimizer decides whether to create the SortedDynPartitionOptimizer class.
In production, we've seen this making the wrong decision when there is a simple
INSERT..SELECT into a partitioned table and the data being inserted is skewed
towards one partition.
In this case, it still is creating the SortedDynPartitionOptimizer. This
forces a reducer step and all the data gets sent to the same reducer.
In order to reproduce this, you may also have to turn off "autogather" stats
since this also will create a reducer step.
What we ultimately want is just a mapper step so the load is evenly distributed
across the mappers.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)