Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16266 )
Change subject: IMPALA-5022 part 1/2: Outer join simplification ...................................................................... Patch Set 19: (9 comments) Overall this looks very good and prior reviewers have given great feedback. I had a few additional ones below. Sorry, my review is coming a little late in the cycle but I think other than the default value of config option, rest of the comments are relatively minor. http://gerrit.cloudera.org:8080/#/c/16266/19/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/16266/19/common/thrift/ImpalaInternalService.thrift@452 PS19, Line 452: 112: optional bool enable_outer_join_to_inner_transformation = true; I am leaning towards having this flag default to False in the first cut for a few reasons: (a) we should do performance testing with such flags which affect plans in a major way even though this functionality should in theory improve performance..but we should still verify with a larger scale workload, (b) the large number of test baseline changes in this patch. It would be helpful to have only the targeted Oj-to-IJ plans with this flag set explicitly to True in the test. Some of the existing tests that are modified by this patch may contain outer joins for specific reasons, e.g just core OJ functionality or something else like cardinality estimation. (c) There's a separate functional testing that Impala's QE team does with various BI tool generated queries. Those require extra setup, so would be good to pass it through that before turning the flag on by default. What do you guys think ? We could also get additional input on this. http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3252 PS19, Line 3252: Set<Set<TupleId nit: In the comment above, could you elaborate on the returned set of sets ? What is each item in the outer set ? http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3253 PS19, Line 3253: Set<Set<TupleId>> set = new HashSet<Set<TupleId>>(); It is strange to instantiate this set within a recursive call. It would be cleaner if the caller could create this outer HashSet and pass it as a function argument and have this function populate the inner set and the return type can be void. http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3259 PS19, Line 3259: e.getIds(tids, null); It seems you could directly call Expr.getIdsHelper(Set<TupleId>, Set<SlotId>) since getIds() itself is a wrapper around getIdsHelper. This way you don't need to create the intermediate tids list. http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3269 PS19, Line 3269: for (TupleId tid : tids1) { This is O(n^2), although understandably the sizes are small. It seems the caller could have propagated a set instead of list for one of the lists. That would make this intersect O(n). http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3349 PS19, Line 3349: continue; nit: The 'continue' is not needed. http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3458 PS19, Line 3458: } catch (InternalException ex) { Similar to other such try-catch, can you log a warning here ? Also 'continue' is not needed. http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3475 PS19, Line 3475: return simplifyOuterJoins(tableRefs, nonnullableTidList); Can we just propagate the set directly instead of the list ? This will allow the subsequent intersect operation to be more efficient. http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@321 PS19, Line 321: static boolean isConditionalBuiltinFnName(String fnName) { It would be better to keep this list of functions in a static HashSet for quick lookup. -- To view, visit http://gerrit.cloudera.org:8080/16266 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa7804033fac68e93f33c387dc68ef67f803e93e Gerrit-Change-Number: 16266 Gerrit-PatchSet: 19 Gerrit-Owner: Xianqing He <hexianqing...@126.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com> Gerrit-Reviewer: Xianqing He <hexianqing...@126.com> Gerrit-Comment-Date: Thu, 10 Sep 2020 02:48:22 +0000 Gerrit-HasComments: Yes