Jesus Camacho Rodriguez created HIVE-23365:
----------------------------------------------
Summary: Put RS deduplication optimization under cost based
decision
Key: HIVE-23365
URL: https://issues.apache.org/jira/browse/HIVE-23365
Project: Hive
Issue Type: Improvement
Components: Physical Optimizer
Reporter: Jesus Camacho Rodriguez
Currently, RS deduplication is always executed whenever it is semantically
correct. However, it could be beneficial if t to leave both RS operators in the
plan, e.g., if the NDV of the second RS is very low. Thus, we would like this
decision to be cost-based. We could use a simple heuristic that would work fine
for most of the cases without introducing regressions for existing cases, e.g.,
if NDV for partition column is less than estimated parallelism in the second
RS, do not execute deduplication.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)