Stamatis Zampetakis created HIVE-28911:
------------------------------------------
Summary: Improve SEARCH expansion to exploit <> operator
Key: HIVE-28911
URL: https://issues.apache.org/jira/browse/HIVE-28911
Project: Hive
Issue Type: Improvement
Components: CBO
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
During various CBO transformations (especially during simplifications) the
internal SEARCH (CALCITE-4173) operator is introduced in the plan. The SEARCH
operator cannot be executed directly and must be expanded (using
SearchTransformer) to an equivalent form for further processing.
The SEARCH operator can be used to represent many types of range predicates
including the inequality operator (<>).
+Example+
{code:sql}
explain cbo
select d_date_sk
from date_dim
where d_dom <> 10 and d_dom <> 20;
{code}
The intermediate plan before SEARCH expansion is shown below.
{noformat}
HiveProject(d_date_sk=[$0])
HiveFilter(condition=[SEARCH($9, Sarg[(-∞..10), (10..20), (20..+∞)])])
HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])
{noformat}
The two inequalities were converted to a Sarg with three ranges.
The final plan after SEARCH expansion is shown below.
{noformat}
HiveProject(d_date_sk=[$0])
HiveFilter(condition=[OR(<($9, 10), >($9, 20), AND(>($9, 10), <($9, 20)))])
HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])
{noformat}
The conversion to ranges/Sarg is useful cause it allows us the optimizer to
perform much more powerful simplifications especially for complex predicates.
However, the expanded expression for this simple range is sub-optimal.
Ideally, the final filter condition after expansion should be the following:
{noformat}
AND(<>($9, 10), <>($9, 20))
{noformat}
The goal of this ticket is to be able to exploit the inequality operator when
expanding ranges to generate simpler and slightly more efficient expressions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)