Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17387 )
Change subject: IMPALA-10681: Improve inner join cardinality estimates ...................................................................... Patch Set 5: (9 comments) Thanks Zoltan and Qifan for the review. I addressed some of your comments in the latest patch set. Will do another round a bit later today. http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@20 PS3, Line 20: > nit. extra Done http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@21 PS3, Line 21: MAX(int_col) and the right input does : not have a group-by, so right NDV = 1 can be used. (b) if it : has a group-b > Just wonder if this case is handled in the code. The group by NDV was available but was not being used because of the MAX expression. I clarified the commit message to indicate that this patch is leveraging those stats while previously it was only looking for corresponding scan slots and ignoring the available NDV. http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@38 PS3, Line 38: shape of a plan (e.g whether the join order changed). : - Preliminary tests with a TPC-DS 10 GB scale factor on a single > Just out of curiosity I did a single node perf run on TPCDS with scale 10 Thanks Zoltan for doing the preliminary validation. It looks promising, so I feel comfortable moving ahead. I am in touch with the performance team to run a larger scale benchmark but it may take some time. I took the liberty of citing your perf measurement in the commit message. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@447 PS3, Line 447: g joinCard = Math.round((l > I think this part is fixed by https://gerrit.cloudera.org/#/c/16349/ Done http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@507 PS3, Line 507: : > nit: Since we return in the 'true' branch, we don't need to nest the statem Done. Thanks for pointing out. I had a different control flow at one point and forgot to change it. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@550 PS3, Line 550: return (hasNdvStats(slotDesc) && hasNumRowsStats(slotDesc)); : } : > nit. May shorten to return (hasNdvStats() && hasNumRowsStats()). Yes. Thanks for pointing out. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@591 PS3, Line 591: * cardinality estimations. The ndv values are upper bounded > nit: Could you please add a brief comment? Done http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@606 PS3, Line 606: s; > nit. May adjust the lhsNdv_ in the cstr to avoid repeated computation. Moved it. When I wrote this I was thinking there may be a setter also for these member variables but at this point I am not seeing a good reason for it. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@607 PS3, Line 607: s; > Same as above. Done -- To view, visit http://gerrit.cloudera.org:8080/17387 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc Gerrit-Change-Number: 17387 Gerrit-PatchSet: 5 Gerrit-Owner: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 19 May 2021 17:26:37 +0000 Gerrit-HasComments: Yes