Seonggon Namgung created HIVE-29159:
---------------------------------------
Summary: Consider DPP optimization when computing the benefit of
the SemiJoin branch
Key: HIVE-29159
URL: https://issues.apache.org/jira/browse/HIVE-29159
Project: Hive
Issue Type: Improvement
Reporter: Seonggon Namgung
Assignee: Seonggon Namgung
To minimize the amount of shuffled data, Hive uses dynamic partition pruning
(DPP) and dynamic semijoin reduction (DSR). These optimization techniques are
useful in most cases, but we often observe that DSR is less efficient than
expected when a TableScan is affected by both DPP and DSR. This happens because
the computation of the benefit of DSR does not take DPP into account, resulting
in an overestimation of the number of rows from the TableScan.
This JIRA aims to improve the computation of the benefit of DSR by adjusting
the statistics based on DPP branches. The current plan for this JIRA consists
of three steps:
1. Adjust the statistics of TableScan operators targeted by DPP.
2. Propagate the updated statistics from the TableScan operators to their
descendants.
3. Adjust the order of the DPP branch removal steps if needed, and implement a
fallback mechanism in case query execution fails due to DPP failures.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)