[ https://issues.apache.org/jira/browse/IMPALA-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers reassigned IMPALA-8030: ----------------------------------- Assignee: (was: Paul Rogers) > Remove duplicate in-clause values for selectivity calcs > ------------------------------------------------------- > > Key: IMPALA-8030 > URL: https://issues.apache.org/jira/browse/IMPALA-8030 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Affects Versions: Impala 3.1.0 > Reporter: Paul Rogers > Priority: Minor > > If an IN clause has duplicate values, they should be removed so that > selectivity estimates are based only on unique values. > {noformat} > select * > from tpch.customer c > where c.c_custkey in (10, 20, 30, 30, 10, 20) > ---- PLAN > PLAN-ROOT SINK > | > 00:SCAN HDFS [tpch.customer c] > partitions=1/1 files=1 size=23.08MB row-size=218B cardinality=6 > predicates: c.c_custkey IN (10, 20, 30, 30, 10, 20) > {noformat} > Expected: > {noformat} > 00:SCAN HDFS [tpch.customer c] > partitions=1/1 files=1 size=23.08MB row-size=218B cardinality=3 > {noformat} > Notice that in the current version, we treat each value, duplicate or not, as > a match. In the expected result, we notice that duplicate values match only > once and we return matches for the unique values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org