[
https://issues.apache.org/jira/browse/HIVE-28911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17945246#comment-17945246
]
Stamatis Zampetakis commented on HIVE-28911:
--------------------------------------------
This ticket is only relevant after the upgrade of Calcite version to 1.33.0
(HIVE-27102).
> Improve SEARCH expansion to exploit <> operator
> ------------------------------------------------
>
> Key: HIVE-28911
> URL: https://issues.apache.org/jira/browse/HIVE-28911
> Project: Hive
> Issue Type: Improvement
> Components: CBO
> Reporter: Stamatis Zampetakis
> Assignee: Stamatis Zampetakis
> Priority: Major
>
> During various CBO transformations (especially during simplifications) the
> internal SEARCH (CALCITE-4173) operator is introduced in the plan. The SEARCH
> operator cannot be executed directly and must be expanded (using
> SearchTransformer) to an equivalent form for further processing.
> The SEARCH operator can be used to represent many types of range predicates
> including the inequality operator (<>).
> +Example+
> {code:sql}
> explain cbo
> select d_date_sk
> from date_dim
> where d_dom <> 10 and d_dom <> 20;
> {code}
> The intermediate plan before SEARCH expansion is shown below.
> {noformat}
> HiveProject(d_date_sk=[$0])
> HiveFilter(condition=[SEARCH($9, Sarg[(-∞..10), (10..20), (20..+∞)])])
> HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])
> {noformat}
> The two inequalities were converted to a Sarg with three ranges.
> The final plan after SEARCH expansion is shown below.
> {noformat}
> HiveProject(d_date_sk=[$0])
> HiveFilter(condition=[OR(<($9, 10), >($9, 20), AND(>($9, 10), <($9, 20)))])
> HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])
> {noformat}
> The conversion to ranges/Sarg is useful cause it allows us the optimizer to
> perform much more powerful simplifications especially for complex predicates.
> However, the expanded expression for this simple range is sub-optimal.
> Ideally, the final filter condition after expansion should be the following:
> {noformat}
> AND(<>($9, 10), <>($9, 20))
> {noformat}
> The goal of this ticket is to be able to exploit the inequality operator when
> expanding ranges to generate simpler and slightly more efficient expressions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)