[ https://issues.apache.org/jira/browse/IMPALA-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers reassigned IMPALA-8031: ----------------------------------- Assignee: (was: Paul Rogers) > Remove redundant inequalities for selectivity calcs > --------------------------------------------------- > > Key: IMPALA-8031 > URL: https://issues.apache.org/jira/browse/IMPALA-8031 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Affects Versions: Impala 3.1.0 > Reporter: Paul Rogers > Priority: Minor > > IMPALA-8035 describes how Impala currently estimates inequality: lump all > non-equality predicates together an assume a single 0.1 selectivity for the > whole group. As we try to fix that, we hit another issue. The bug here > assumes we are treating inequality correctly on a per-predicate basis. > If a query has two inequalities on the same column, and they are of the same > “direction”, then only the one with the larger (or smaller) applies. > Selectivity estimates should reflect this fact. > {noformat} > select * > from tpch.customer c > where c.c_custkey < 1234 > and c.c_custkey < 2345 > ---- PLAN > PLAN-ROOT SINK > | > 00:SCAN HDFS [tpch.customer c] > partitions=1/1 files=1 size=23.08MB row-size=218B cardinality=28.44K > predicates: c.c_custkey < 1234, c.c_custkey < 2345 > {noformat} > Expected: > {noformat} > 00:SCAN HDFS [tpch.customer c] > partitions=1/1 files=1 size=23.08MB row-size=218B cardinality=49.50K > {noformat} > The calcs don't even need to do the math. Just noticing two expressions in > the same direction is sufficient: count only one of them toward overall > selectivity; doesn't matter which one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org