[ https://issues.apache.org/jira/browse/IMPALA-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers reassigned IMPALA-8035: ----------------------------------- Assignee: (was: Paul Rogers) > Planner estimate incorrect for non-equi-join case > ------------------------------------------------- > > Key: IMPALA-8035 > URL: https://issues.apache.org/jira/browse/IMPALA-8035 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 3.1.0 > Reporter: Paul Rogers > Priority: Major > > The code in {{JoinNode.getJoinCardinality()}} makes a bold (and incorrect) > assumption: > {code:java} > if (eqJoinConjunctSlots.isEmpty()) { > // There are no eligible equi-join conjuncts. Optimistically assume > FK/PK with a > // join selectivity of 1. > return probeCard; > } > {code} > Suppose we have a join of the form: > {code:sql} > SELECT * FROM t1, t2 > {code} > Or > {code:sql} > SELECT * > FROM t1, t2 > WHERE t1.a > t2.b > {code} > The code assumes that each t1 row will match just one t2 row, which seems > very unlikely. > In fact, there are well-known algorithms to estimate this case. The first > example is a Cartesian product with cardinality {{|T1| * |T2|}}. > The second uses the selectivity of the expression: > {noformat} > |T1 ⋈ T2| = |T1| * |T2| * sel(T1.a > T2.b) > {noformat} > Without a histogram, we cannot obtain an accurate estimation of the > cardinality (but see IMPALA-8032). But, we can assume that there is some > reduction, else the user would not have included the clause. Most systems > assume a value of 0.1 or 0.45 for inequality. See IMPALA-7601 and > [Ramakrishnan and > Gehrke|http://pages.cs.wisc.edu/~dbbook/openAccess/Minibase/optimizer/costformula.html]. > However, as noted in IMPALA-7601, we don’t accurately estimate the > selectivity of inequalities, so work is needed there also. > The reason that this bug does not cause problems for users is that, > presumably, most real-world queries use eqi-joins. However, some kinds of > analysis queries use other predicates, and Impala should support these use > cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org