Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11760 )
Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE ...................................................................... Patch Set 10: (10 comments) http://gerrit.cloudera.org:8080/#/c/11760/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11760/8//COMMIT_MSG@43 PS8, Line 43: (IMPALA-7737) > are there examples of these fns in our benchmarks to quantify the regressio Done http://gerrit.cloudera.org:8080/#/c/11760/8//COMMIT_MSG@45 PS8, Line 45: Still, the fix provides most of what the JIRA ticket requested > I'd skip these next three paragraphs. Done http://gerrit.cloudera.org:8080/#/c/11760/8//COMMIT_MSG@57 PS8, Line 57: > pls make a section for this called "Testing" so its easier to jump to. also Done http://gerrit.cloudera.org:8080/#/c/11760/8/be/src/exprs/conditional-functions.h File be/src/exprs/conditional-functions.h: http://gerrit.cloudera.org:8080/#/c/11760/8/be/src/exprs/conditional-functions.h@76 PS8, Line 76: since : /// various bugs mean that this implementation is still sometimes used. But : /// the goal is to remove these classes at some point. > simpler: "until their use is eliminated by the frontend". Done http://gerrit.cloudera.org:8080/#/c/11760/8/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java File fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java: http://gerrit.cloudera.org:8080/#/c/11760/8/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@37 PS8, Line 37: vanish > is this accurate given the comments in the commit message about order by? Yes, they vanish after the rewrite. If the rewrite does not occur (due to bugs or options), then the functions do not vanish. http://gerrit.cloudera.org:8080/#/c/11760/8/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@48 PS8, Line 48: planner runs the rule to simplify CASE : * after this rule. Where that other rule can perform simplifications, : * those simplifications are omitted here > simplify and use the specific rule name for concreteness. Rewrote. Since many rules apply, did not seem to add value to try to enumerate them. http://gerrit.cloudera.org:8080/#/c/11760/8/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@106 PS8, Line 106: return rewriteCoalesceFn(e > clarify whether you think this happens after the rewrite or before. If its Done http://gerrit.cloudera.org:8080/#/c/11760/8/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@130 PS8, Line 130: Lists.ne > isn't this all that's done here (the most general case) and we'll depend on Rewrote comment http://gerrit.cloudera.org:8080/#/c/11760/8/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@172 PS8, Line 172: ESCE(a, literal) when literal > The simplest rewrite here would be to not look at the child exprs for the v I found that I could strip this function down as you suggest, but the results were sub-optimal due to weaknesses in other parts of the code. coalesce(id) --> CASE ELSE id END (expected: id) coalesce(id, null) --> CASE WHEN id IS NOT NULL THEN ID ELSE NULL END (expected: id) coalesce(null, id) --> same as above coalesce(null is null, id) --> 1 (expected: TRUE) coalesce(10 + null, id) --> CASE WHEN NULL IS NOT NULL THEN NULL ELSE id END (expected: id) coalesce(42, min(distinct tinyint_col)) --> CASE WHEN FALSE THEN NULL WHEN TRUE THEN 42 ELSE min(distinct tinyint_col, 42)) END (Expected: CASE WHEN min(distinct tinyint_col) IS NOT NULL THEN min(distinct tinyint_col) ELSE 42 END) And so on. One solution is to restore the original coalesce() function rewrite which produces somewhat better results. Even that function has holes which this version fixed. So, we'd lose those optimizations. As it turns out, all this is moot because the BE CASE implementation has a bug so we can't even use CASE to replace coalesce()... For now, I did rework the function to the simple form requested, then disabled it. We can tackle any other concerns if/when it is a priority. http://gerrit.cloudera.org:8080/#/c/11760/8/testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test: http://gerrit.cloudera.org:8080/#/c/11760/8/testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test@1137 PS8, Line 1137: | other predicates: functional.alltypestiny.tinyint_col + functional.alltypestiny.smallint_col + functional.alltypestiny.int_col > 10, CASE WHEN functional.alltypestiny.tinyint_col + functional.alltypestiny.bigint_col IS NULL THEN 1 ELSE functional.alltypestiny.tinyint_col + functional.alltypestiny.bigint_col END = 1 > so this is the example of the performance regression (same work on multiple Yes. See IMPALA-7737. My concern with the coalesce() rewrite is that, without IMPALA-7737, and without VERY clever optimizations in LLVM, we'd end up doing duplicate work. As noted elsewhere, the BE has a bug that prevents us from rewriting coalesce(), so I had to revert many of these test revisions. May be the best path is to fix the underlying issues, then do a clean version of this change. -- To view, visit http://gerrit.cloudera.org:8080/11760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163 Gerrit-Change-Number: 11760 Gerrit-PatchSet: 10 Gerrit-Owner: Paul Rogers <par0...@yahoo.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Paul Rogers <par0...@yahoo.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Wed, 31 Oct 2018 21:38:24 +0000 Gerrit-HasComments: Yes