Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17387 )

Change subject: IMPALA-10681: Improve inner join cardinality estimates
......................................................................


Patch Set 5:

(9 comments)

Thanks Zoltan and Qifan for the review. I addressed some of your comments in 
the latest patch set. Will do another round a bit later today.

http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@20
PS3, Line 20:
> nit. extra
Done


http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@21
PS3, Line 21:  MAX(int_col) and the right input does
            : not have a group-by, so right NDV = 1 can be used. (b) if it
            : has a group-b
> Just wonder if this case is handled in the code.
The group by NDV was available but was not being used because of the MAX 
expression. I clarified the commit message to indicate that this patch is 
leveraging those stats while previously it was only looking for corresponding 
scan slots and ignoring the available NDV.


http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@38
PS3, Line 38: shape of a plan (e.g whether the join order changed).
            :  - Preliminary tests with a TPC-DS 10 GB scale factor on a single
> Just out of curiosity I did a single node perf run on TPCDS with scale 10
Thanks Zoltan for doing the preliminary validation.  It looks promising, so I 
feel comfortable moving ahead. I am in touch with the performance team to run a 
larger scale benchmark but it may take some time.
I took the liberty of citing your perf measurement in the commit message.


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@447
PS3, Line 447: g joinCard = Math.round((l
> I think this part is fixed by https://gerrit.cloudera.org/#/c/16349/
Done


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@507
PS3, Line 507:
             :
> nit: Since we return in the 'true' branch, we don't need to nest the statem
Done. Thanks for pointing out. I had a different control flow at one point and  
forgot to change it.


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@550
PS3, Line 550: return (hasNdvStats(slotDesc) && hasNumRowsStats(slotDesc));
             :     }
             :
> nit. May shorten to return (hasNdvStats() && hasNumRowsStats()).
Yes. Thanks for pointing out.


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@591
PS3, Line 591:    * cardinality estimations. The ndv values are upper bounded
> nit: Could you please add a brief comment?
Done


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@606
PS3, Line 606: s;
> nit. May adjust the lhsNdv_ in the cstr to avoid repeated computation.
Moved it.  When I wrote this I was thinking there may be a setter also for 
these member variables but at this point I am not seeing a  good reason for it.


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@607
PS3, Line 607: s;
> Same as above.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
Gerrit-Change-Number: 17387
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 19 May 2021 17:26:37 +0000
Gerrit-HasComments: Yes

Reply via email to