Stamatis Zampetakis created HIVE-29657:
------------------------------------------
Summary: Improve SEARCH expansion by considering negation
Key: HIVE-29657
URL: https://issues.apache.org/jira/browse/HIVE-29657
Project: Hive
Issue Type: Improvement
Components: CBO
Reporter: Stamatis Zampetakis
SEARCH is an internal logical operator that is used by CBO to represent many
type of range predicates. Currently, it doesn't have a physical equivalent so
we have to expand it to existing primitive operators (via
[SearchTransformer|https://github.com/apache/hive/blob/45bdea4f8bb62873ab2b2f4a8f7afc4c780d4aa3/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/SearchTransformer.java]).
A single SEARCH expression can be expanded in different ways with potentially
different performance implications.
+Example:+
{noformat}
SEARCH($0, Sarg[(-∞..10), (10..20), (20..30], [50..+∞)])
{noformat}
+Expansion A+
{code:sql}
... WHERE x < 10 OR (x > 10 AND x < 20) OR (x > 20 and x <= 30) OR (x >= 50)
{code}
However, negating the SEARCH can gives us simpler and potentially more
efficient expressions. The following SEARCH expression is equivalent with the
above.
+Example:+
{noformat}
NOT SEARCH($0, Sarg[10, 20, (30..50)])
{noformat}
+Expansion B+
{code:sql}
... WHERE NOT ( x = 10 OR x = 20 OR (x > 30 AND x < 50))
... WHERE x <> 10 AND x <> 20 AND (x <= 30 OR x >= 50)
... WHERE x NOT IN (10, 20) AND (x <= 30 OR x >= 50)
{code}
Expansions in category B are undeniably simpler and better in terms of
memory/CPU consumption.
The goal of this ticket is to consider both the positive and negative SEARCH
expression during the expansion (e.g., using
org.apache.calcite.util.Sarg#negate) and pick the one that is simpler and more
efficient.
We have to be careful though not to introduce an overly complex and expensive
expansion check cause that would beat the entire purpose of this
transformation; compilation should remain fast. If that is not possible then we
should close this ticket as won't fix.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)