[ https://issues.apache.org/jira/browse/HIVE-21074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469926#comment-17469926 ]
Ádám Szita commented on HIVE-21074: ----------------------------------- [~thaibui] - I don't think this is solved by HIVE-19097 and I don't even see how that would help this issue. I can take this over unless you have cycles to work on this of course. > Hive bucketed table query pruning does not work for IS NOT NULL condition > ------------------------------------------------------------------------- > > Key: HIVE-21074 > URL: https://issues.apache.org/jira/browse/HIVE-21074 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer > Affects Versions: 3.1.0, 3.0.0, 3.1.1 > Reporter: Thai Bui > Assignee: Thai Bui > Priority: Minor > Fix For: 4.0.0 > > Attachments: HIVE-21074.patch > > > The current version of bucket pruning skips all the predicates when it > detects that one of the predicates is a compound type (e.g. NOT(IS_NULL) ) > when evaluating AND logical operators. > This logic is faulty since as long as one of the AND operators is a bucketed > column (_col_ = *literal*), the *literal* value of that _col_ should be > considered in the bucket pruning optimization no matter what. For example: > SELECT * FROM tbl WHERE bucketed_col = 1 AND (some_compound_expr) > Then the the value '*1'* should be considered for pruning in the query plan. > This limitation has manifested into a simpler case where a table that I am > trying to optimized using bucketing technique is not effective when IS NOT > NULL is used. Since IS NOT NULL is parsed into NOT(IS_NULL) (a compound > expression), the pruning phase is completed skipped causing unnecessary tasks > to be spawned. For instance: > SELECT * FROM tbl WHERE bucketed_col = 1 AND some_other_col IS NOT NULL > Will not trigger bucket pruning logic and perform a full table scan. -- This message was sent by Atlassian Jira (v8.20.1#820001)