[ 
https://issues.apache.org/jira/browse/HIVE-14652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438552#comment-15438552
 ] 

Jesus Camacho Rodriguez commented on HIVE-14652:
------------------------------------------------

Thanks for looking into this [~sershe].

The problem seemed to be there for IN clauses before HIVE-11424 went in, which 
just added the case for single column. In fact, as you said, it is expected 
that logic for multi-column (struct) IN clause is broken too.

I think the source of the problem is in the assumption for the IN logic about 
the WalkState, as it considers that TRUE means that the condition can be 
removed (comment in line 423 in the original code, line 359 after applying your 
patch). WalkState seems to be a global overview on the results of the children 
expressions, thus that assumption is not correct.

I checked the patch and changes look good to me, but I have a couple of 
questions.
1. Does the patch still consider the dynamic partition pruner generated 
synthetic predicates for IN clause with a single column? Previously there was 
some special handling for this case, but it does not seem to be there anymore. 
Maybe it is handled generically as any other predicate?
2. I would extend the patch to cover multi-column IN clauses so we fix all the 
issues. That would mean extending the logic in lines 359-364 after applying 
your patch (it seems straightforward), and adding an additional test case.

--

Concerning the logic behind pcr. If I understand your question correctly, the 
answer is that we need to evaluate them because partition pruning does not 
necessarily correspond to the filter condition. For instance, consider a table 
with partition column _b_, and the given predicate _(a = 5 and b = 1) or (a=3 
and b=2)_. We can infer that we only need partitions _b=1_ and _b=2_. However, 
we cannot remove any part of the predicate if both partitions exist. In turn, 
if only _b=1_ exists, then final predicate would be _a=5_.

Btw, we had some discussion with [~ashutoshc] about moving pcr to the logical 
optimization phase (Calcite), but till the return path is in place, we cannot 
complete this task.

> incorrect results for not in on partition columns
> -------------------------------------------------
>
>                 Key: HIVE-14652
>                 URL: https://issues.apache.org/jira/browse/HIVE-14652
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.1.0, 2.2.0
>            Reporter: stephen sprague
>            Assignee: Sergey Shelukhin
>            Priority: Blocker
>         Attachments: HIVE-14652.patch
>
>
> {noformat}
> create table foo (i int) partitioned by (s string);
> insert overwrite table foo partition(s='foo') select cint from alltypesorc 
> limit 10;
> insert overwrite table foo partition(s='bar') select cint from alltypesorc 
> limit 10;
> select * from foo where s not in ('bar');
> {noformat}
> No results. IN ... works correctly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to