[ https://issues.apache.org/jira/browse/IMPALA-12006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aman Sinha resolved IMPALA-12006. --------------------------------- Fix Version/s: Impala 4.3.0 Resolution: Fixed > Outer/inner join cardinality highly overestimated > ------------------------------------------------- > > Key: IMPALA-12006 > URL: https://issues.apache.org/jira/browse/IMPALA-12006 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 4.2.0 > Reporter: Aman Sinha > Assignee: Aman Sinha > Priority: Major > Fix For: Impala 4.3.0 > > > In one of the use cases, we have seen the cardinality estimate for left outer > join highly overestimated. The plan is complex and only a partial output is > shown below (with the column names anonymized): > {noformat} > 57:HASH JOIN [LEFT OUTER JOIN, BROADCAST] > | hash-table-id=121 > | hash predicates: a.id = a.id > | fk/pk conjuncts: none > | mem-estimate=0B mem-reservation=0B spill-buffer=2.00MB > thread-reservation=0 > | tuple-ids=4N,3,6N,8N,9N,11N,14N,16N,19N,21N,24N,26N,29N,31N > row-size=2.63KB cardinality=3.90T > | in pipelines: 06(GETNEXT), 26(OPEN) > | > |--F1253:PLAN FRAGMENT hosts=13 instances=13 > | | Per-Instance Resources: mem-estimate=1.10GB mem-reservation=204.00MB > thread-reservation=1 > | JOIN BUILD > | | join-table-id=121 plan-id=122 cohort-id=25 > | | build expressions: a.id > | | mem-estimate=1.08GB mem-reservation=204.00MB spill-buffer=2.00MB > thread-reservation=0 > | | > | 1758:EXCHANGE [BROADCAST] > | | mem-estimate=20.87MB mem-reservation=0B thread-reservation=0 > | | tuple-ids=29,31 row-size=85B cardinality=9.56M > | | in pipelines: 26(GETNEXT) > | | > ... > ... > ... > 56:HASH JOIN [LEFT OUTER JOIN, BROADCAST] > | hash predicates: ifnull(a.id, a.id) = a.id > | fk/pk conjuncts: assumed fk/pk > | mem-estimate=0B mem-reservation=0B spill-buffer=2.00MB > thread-reservation=0 > | tuple-ids=4N,3,6N,8N,9N,11N,14N,16N,19N,21N,24N,26N row-size=2.55KB > cardinality=14.97G > | in pipelines: 06(GETNEXT), 22(OPEN) > {noformat} > Note that the left input of the join is estimated as 14.97G rows, right input > as 9.56M rows but the LOJ estimate is 3.9T rows. We need to investigate why > that is so and fix it. The NDV of the based column involved in the join is > 36661 but in the lower join there are functions involved in the join > condition. -- This message was sent by Atlassian Jira (v8.20.10#820010)