[ 
https://issues.apache.org/jira/browse/IMPALA-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned IMPALA-8030:
-----------------------------------

    Assignee:     (was: Paul Rogers)

> Remove duplicate in-clause values for selectivity calcs
> -------------------------------------------------------
>
>                 Key: IMPALA-8030
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8030
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> If an IN clause has duplicate values, they should be removed so that 
> selectivity estimates are based only on unique values.
> {noformat}
> select *
> from tpch.customer c
> where c.c_custkey in (10, 20, 30, 30, 10, 20)
> ---- PLAN
> PLAN-ROOT SINK
> |
> 00:SCAN HDFS [tpch.customer c]
>    partitions=1/1 files=1 size=23.08MB row-size=218B cardinality=6
>    predicates: c.c_custkey IN (10, 20, 30, 30, 10, 20)
> {noformat}
> Expected:
> {noformat}
> 00:SCAN HDFS [tpch.customer c]
>    partitions=1/1 files=1 size=23.08MB row-size=218B cardinality=3
> {noformat}
> Notice that in the current version, we treat each value, duplicate or not, as 
> a match. In the expected result, we notice that duplicate values match only 
> once and we return matches for the unique values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to