Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17387 )

Change subject: IMPALA-10681: [WIP] Fix join cardinality if one side is scalar
......................................................................


Patch Set 1:

> Patch Set 1:
>
> Thanks Aman for working on this.
>
> I've found the following queries where the cardinality for LEFT SEMI JOIN and 
> INNER JOIN still differs.
>
> This kind of query is especially important for us:
>
>  explain select * from store_sales
>          inner join (select max(s_store_sk) as max_store_sk from store
>                      union
>                      select min(s_store_sk) as max_store_sk from store) v
>          on ss_store_sk = max_store_sk;
>
> In the above query LHS NDV is 6 while RHS cardinality is 2, therefore join 
> output cardinality should be LHS CARD / 3, just like LEFT SEMI JOIN 
> calculates.
>
> The planner also calculates wrong cardinalities for this:
>
>  explain select * from store_sales
>          inner join (select max(s_store_sk) as max_store_sk
>                      from store group by s_market_id limit 3) v
>          on ss_store_sk = max_store_sk;

Thanks Zoltan for reviewing. I didn't get a chance to get back sooner but I did 
start re-working the logic to handle the above cases. Previously, I wanted to 
be conservative to only handle the one side scalar case which was in the JIRA. 
I agree that if the NDV is usable for semi joins (the cardinality estimate of 
SJ does not seem to depend on the source scan slots which the inner join 
depends on), we could do a better job for the inner join estimation.  If there 
are no duplicates on the right side then in theory the estimates for SJ and IJ 
should match (or be close enough).


--
To view, visit http://gerrit.cloudera.org:8080/17387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
Gerrit-Change-Number: 17387
Gerrit-PatchSet: 1
Gerrit-Owner: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Sun, 09 May 2021 02:13:38 +0000
Gerrit-HasComments: No

Reply via email to