Stamatis Zampetakis created HIVE-29657:
------------------------------------------

             Summary: Improve SEARCH expansion by considering negation
                 Key: HIVE-29657
                 URL: https://issues.apache.org/jira/browse/HIVE-29657
             Project: Hive
          Issue Type: Improvement
          Components: CBO
            Reporter: Stamatis Zampetakis


SEARCH is an internal logical operator that is used by CBO to represent many 
type of range predicates. Currently, it doesn't have a physical equivalent so 
we have to expand it to existing primitive operators (via 
[SearchTransformer|https://github.com/apache/hive/blob/45bdea4f8bb62873ab2b2f4a8f7afc4c780d4aa3/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/SearchTransformer.java]).

A single SEARCH expression can be expanded in different ways with potentially 
different performance implications. 

+Example:+
{noformat}
SEARCH($0, Sarg[(-∞..10), (10..20), (20..30], [50..+∞)])
{noformat}

+Expansion A+
{code:sql}
... WHERE x < 10 OR (x > 10 AND x < 20) OR (x > 20 and x <= 30) OR (x >= 50)
{code}

However, negating the SEARCH can gives us simpler and potentially more 
efficient expressions. The following SEARCH expression is equivalent with the 
above.

+Example:+
{noformat}
NOT SEARCH($0, Sarg[10, 20, (30..50)])
{noformat}

+Expansion B+
{code:sql}
... WHERE NOT ( x = 10 OR x = 20 OR (x > 30 AND x < 50))
... WHERE x <> 10 AND x <> 20 AND (x <= 30 OR x >= 50)
... WHERE x NOT IN (10, 20) AND (x <= 30 OR x >= 50)
{code}

Expansions in category B are undeniably simpler and better in terms of 
memory/CPU consumption.

The goal of this ticket is to consider both the positive and negative SEARCH 
expression during the expansion (e.g., using 
org.apache.calcite.util.Sarg#negate) and pick the one that is simpler and more 
efficient. 

We have to be careful though not to introduce an overly complex and expensive 
expansion check cause that would beat the entire purpose of this 
transformation; compilation should remain fast. If that is not possible then we 
should close this ticket as won't fix.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to