Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16266 )

Change subject: IMPALA-5022 part 1/2: Outer join simplification
......................................................................


Patch Set 19:

(9 comments)

Overall this looks very good and prior reviewers have given great feedback. I 
had a few additional ones below. Sorry, my review is coming a little late in 
the cycle but I think other than the default value of config option, rest of 
the comments are relatively minor.

http://gerrit.cloudera.org:8080/#/c/16266/19/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/16266/19/common/thrift/ImpalaInternalService.thrift@452
PS19, Line 452:   112: optional bool enable_outer_join_to_inner_transformation 
= true;
I am leaning towards having this flag default to False in the first cut for a 
few reasons:
(a) we should do performance testing with such flags which affect plans in a 
major way even though this functionality should in theory improve 
performance..but we should still verify with a larger scale workload,
(b) the large number of test baseline changes in this patch.  It would be 
helpful to have only the targeted Oj-to-IJ plans with this flag set explicitly 
to True in the test.  Some of the existing tests that are modified by this 
patch may contain outer joins for specific reasons, e.g just core OJ 
functionality or something else like cardinality estimation.
(c) There's a separate functional testing that Impala's QE team does with 
various BI tool generated queries.  Those require extra setup, so would be good 
to pass it through that before turning the flag on by default.

What do you guys think ? We could also get additional input on this.


http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3252
PS19, Line 3252: Set<Set<TupleId
nit: In the comment above, could you elaborate on the returned  set of sets ? 
What is each item in the outer set ?


http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3253
PS19, Line 3253:     Set<Set<TupleId>> set = new HashSet<Set<TupleId>>();
It is strange to instantiate this set within a recursive call.  It would be 
cleaner if the caller could create this outer HashSet and pass it as a function 
argument and have this function populate the inner set  and the return type can 
be void.


http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3259
PS19, Line 3259:       e.getIds(tids, null);
It seems you could directly call Expr.getIdsHelper(Set<TupleId>, Set<SlotId>) 
since  getIds() itself is a wrapper around getIdsHelper. This way you don't 
need to create the intermediate tids list.


http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3269
PS19, Line 3269:     for (TupleId tid : tids1) {
This is O(n^2), although understandably the sizes are small. It seems the 
caller could have propagated a set instead of list for one of the lists. That 
would make this intersect O(n).


http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3349
PS19, Line 3349:           continue;
nit: The 'continue' is not needed.


http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3458
PS19, Line 3458:       } catch (InternalException ex) {
Similar to other such try-catch,  can you log a warning here ?  Also 'continue' 
 is not needed.


http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3475
PS19, Line 3475:       return simplifyOuterJoins(tableRefs, nonnullableTidList);
Can we just propagate the set directly instead of the list ?  This will allow 
the subsequent intersect operation to be more efficient.


http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

http://gerrit.cloudera.org:8080/#/c/16266/19/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@321
PS19, Line 321:   static boolean isConditionalBuiltinFnName(String fnName) {
It would be better to keep this list of functions in a static HashSet for quick 
lookup.



--
To view, visit http://gerrit.cloudera.org:8080/16266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa7804033fac68e93f33c387dc68ef67f803e93e
Gerrit-Change-Number: 16266
Gerrit-PatchSet: 19
Gerrit-Owner: Xianqing He <hexianqing...@126.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com>
Gerrit-Reviewer: Xianqing He <hexianqing...@126.com>
Gerrit-Comment-Date: Thu, 10 Sep 2020 02:48:22 +0000
Gerrit-HasComments: Yes

Reply via email to