[ 
https://issues.apache.org/jira/browse/IMPALA-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned IMPALA-8262:
-----------------------------------

    Assignee:     (was: Paul Rogers)

> Join cardinality not decreased by join filter selectivity
> ---------------------------------------------------------
>
>                 Key: IMPALA-8262
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8262
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Priority: Major
>
> Consider a subset of the plan for TPC-H query 7. (See {{tpch-all.test}} for 
> details.)
> {noformat}
> 11:AGGREGATE [FINALIZE]
> |  output: sum(l_extendedprice * (1 - l_discount))
> |  group by: n1.n_name, n2.n_name, year(l_shipdate)
> |  row-size=58B cardinality=575.77K
> |
> 10:HASH JOIN [INNER JOIN]
> |  hash predicates: c_nationkey = n2.n_nationkey
> |  other predicates: ((n1.n_name = 'FRANCE' AND n2.n_name = 'GERMANY') OR 
> (n1.n_name = 'GERMANY' AND n2.n_name = 'FRANCE'))
> |  row-size=132B cardinality=575.77K
> |
> |--05:SCAN HDFS [tpch.nation n2]
> |     row-size=21B cardinality=25
> |
> 09:HASH JOIN [INNER JOIN]
> |  hash predicates: s_nationkey = n1.n_nationkey
> |  row-size=111B cardinality=575.77K
> {noformat}
> Here, we have join 09 feeding 576K rows into join 10. All 576K rows pass 
> along to the aggregate 11. Notice, however, that join 10 has a that picks out 
> 2 of the 25 countries in each of two paths. The selectivity of the filters 
> should be something like 2 * 2/25 = 0.16. Thus, the output cardinality of the 
> 10 join should be 577K * 0.16 = 92K.
> The problem is that the join cardinality calculations don't consider join 
> filter selectivity.
> It may be that this was done to handle the outer join case, in which filters 
> applied in the outer-side scan must be re-applied on the join. Omitting the 
> filters avoids duplicate accounting for the selectivity.
> But, that case is special and should be handled specially as part of 
> IMPALA-8213. Except for correlated filters, the planner *should* apply join 
> filter selectivity to the join output cardinality calculations.
> This error has consequences. The filter should reduce the number of rows 
> though the join. Because it does so, it should come early in the join tree to 
> reduce the set of rows processed. But, because selectivity is ignored, the 
> planner does not see the join as a filter, and ends up putting the join 10 at 
> the top of the join tree. (See the test file for the full plan.) The result 
> is that Impala schleps around many more rows than necessary, only to discard 
> them near the top of the DAG.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to