[ 
https://issues.apache.org/jira/browse/IMPALA-12006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha resolved IMPALA-12006.
---------------------------------
    Fix Version/s: Impala 4.3.0
       Resolution: Fixed

> Outer/inner join cardinality highly overestimated
> -------------------------------------------------
>
>                 Key: IMPALA-12006
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12006
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.2.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Major
>             Fix For: Impala 4.3.0
>
>
> In one of the use cases, we have seen the cardinality estimate for left outer 
> join highly overestimated.  The plan is complex and only a partial output is 
> shown below (with the column names anonymized): 
> {noformat}
>   57:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
>   |  hash-table-id=121
>   |  hash predicates: a.id = a.id
>   |  fk/pk conjuncts: none
>   |  mem-estimate=0B mem-reservation=0B spill-buffer=2.00MB 
> thread-reservation=0
>   |  tuple-ids=4N,3,6N,8N,9N,11N,14N,16N,19N,21N,24N,26N,29N,31N 
> row-size=2.63KB cardinality=3.90T
>   |  in pipelines: 06(GETNEXT), 26(OPEN)
>   |
>   |--F1253:PLAN FRAGMENT hosts=13 instances=13
>   |  |  Per-Instance Resources: mem-estimate=1.10GB mem-reservation=204.00MB 
> thread-reservation=1
>   |  JOIN BUILD
>   |  |  join-table-id=121 plan-id=122 cohort-id=25
>   |  |  build expressions: a.id
>   |  |  mem-estimate=1.08GB mem-reservation=204.00MB spill-buffer=2.00MB 
> thread-reservation=0
>   |  |
>   |  1758:EXCHANGE [BROADCAST]
>   |  |  mem-estimate=20.87MB mem-reservation=0B thread-reservation=0
>   |  |  tuple-ids=29,31 row-size=85B cardinality=9.56M
>   |  |  in pipelines: 26(GETNEXT)
>   |  |
>   ...
>   ...
>   ...
>   56:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
>   |  hash predicates: ifnull(a.id, a.id) = a.id
>   |  fk/pk conjuncts: assumed fk/pk
>   |  mem-estimate=0B mem-reservation=0B spill-buffer=2.00MB 
> thread-reservation=0
>   |  tuple-ids=4N,3,6N,8N,9N,11N,14N,16N,19N,21N,24N,26N row-size=2.55KB 
> cardinality=14.97G
>   |  in pipelines: 06(GETNEXT), 22(OPEN)
> {noformat}
> Note that the left input of the join is estimated as 14.97G rows, right input 
> as 9.56M rows but the LOJ estimate is 3.9T rows.  We need to investigate why 
> that is so and fix it.  The NDV of the based column involved in the join is 
> 36661  but in the lower join there are functions involved in the join 
> condition. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to