[ 
https://issues.apache.org/jira/browse/IMPALA-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned IMPALA-8035:
-----------------------------------

    Assignee:     (was: Paul Rogers)

> Planner estimate incorrect for non-equi-join case
> -------------------------------------------------
>
>                 Key: IMPALA-8035
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8035
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Priority: Major
>
> The code in {{JoinNode.getJoinCardinality()}} makes a bold (and incorrect) 
> assumption:
> {code:java}
>     if (eqJoinConjunctSlots.isEmpty()) {
>       // There are no eligible equi-join conjuncts. Optimistically assume 
> FK/PK with a
>       // join selectivity of 1.
>       return probeCard;
>     }
> {code}
> Suppose we have a join of the form:
> {code:sql}
> SELECT * FROM t1, t2
> {code}
> Or
> {code:sql}
> SELECT *
> FROM t1, t2
> WHERE t1.a > t2.b
> {code}
> The code assumes that each t1 row will match just one t2 row, which seems 
> very unlikely.
> In fact, there are well-known algorithms to estimate this case. The first 
> example is a Cartesian product with cardinality {{|T1| * |T2|}}.
> The second uses the selectivity of the expression:
> {noformat}
> |T1 ⋈ T2| = |T1| * |T2| * sel(T1.a > T2.b)
> {noformat}
> Without a histogram, we cannot obtain an accurate estimation of the 
> cardinality (but see IMPALA-8032). But, we can assume that there is some 
> reduction, else the user would not have included the clause. Most systems 
> assume a value of 0.1 or 0.45 for inequality. See IMPALA-7601 and 
> [Ramakrishnan and 
> Gehrke|http://pages.cs.wisc.edu/~dbbook/openAccess/Minibase/optimizer/costformula.html].
> However, as noted in IMPALA-7601, we don’t accurately estimate the 
> selectivity of inequalities, so work is needed there also.
> The reason that this bug does not cause problems for users is that, 
> presumably, most real-world queries use eqi-joins. However, some kinds of 
> analysis queries use other predicates, and Impala should support these use 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to