Thai Bui created HIVE-21074: ------------------------------- Summary: Hive bucketed table query pruning does not work for IS NOT NULL condition Key: HIVE-21074 URL: https://issues.apache.org/jira/browse/HIVE-21074 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 3.1.1, 3.1.0, 3.0.0 Reporter: Thai Bui Assignee: Thai Bui
The current version of bucket pruning skips all the predicates when it detects that one of the predicates is a compound type (e.g. NOT(IS_NULL) ) when evaluating AND logical operators. This logic is faulty since as long as one of the AND operators is a bucketed column (_col_ = *literal*), the *literal* value of that _col_ should be considered in the bucket pruning optimization no matter what. For example: SELECT * FROM tbl WHERE bucketed_col = 1 AND (some_compound_expr) Then the the value '*1'* should be considered for pruning in the query plan. This limitation has manifested into a simpler case where a table that I am trying to optimized using bucketing technique is not effective when IS NOT NULL is used. Since IS NOT NULL is parsed into NOT(IS_NULL) (a compound expression), the pruning phase is completed skipped causing unnecessary tasks to be spawned. For instance: SELECT * FROM tbl WHERE bucketed_col = 1 AND some_other_col IS NOT NULL Will not trigger bucket pruning logic and perform a full table scan. -- This message was sent by Atlassian JIRA (v7.6.3#76005)