Thai Bui created HIVE-21074:
-------------------------------

             Summary: Hive bucketed table query pruning does not work for IS 
NOT NULL condition
                 Key: HIVE-21074
                 URL: https://issues.apache.org/jira/browse/HIVE-21074
             Project: Hive
          Issue Type: Bug
          Components: Query Planning
    Affects Versions: 3.1.1, 3.1.0, 3.0.0
            Reporter: Thai Bui
            Assignee: Thai Bui


The current version of bucket pruning skips all the predicates when it detects 
that one of the predicates is a compound type (e.g. NOT(IS_NULL) ) when 
evaluating AND logical operators.

This logic is faulty since as long as one of the AND operators is a bucketed 
column (_col_ = *literal*), the *literal* value of that _col_ should be 
considered in the bucket pruning optimization no matter what. For example:

SELECT * FROM tbl WHERE bucketed_col = 1 AND (some_compound_expr)

Then the the value '*1'* should be considered for pruning in the query plan. 
This limitation has manifested into a simpler case where a table that I am 
trying to optimized using bucketing technique is not effective when IS NOT NULL 
is used. Since IS NOT NULL is parsed into NOT(IS_NULL) (a compound expression), 
the pruning phase is completed skipped causing unnecessary tasks to be spawned. 
For instance:

SELECT * FROM tbl WHERE bucketed_col = 1 AND some_other_col IS NOT NULL

Will not trigger bucket pruning logic and perform a full table scan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to