Hello Bharath Vissapragada, Philip Zeyliger, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, Vuk Ercegovac,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11760 to look at the new patch set (#5). Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE ...................................................................... IMPALA-7655: Rewrite if, isnull, coalesce to use CASE See IMPALA-7655 for backgound. Tim found that the current interpreted forms of if, isnull and coalesce are slow compared to the code-generated CASE statement. This patch rewrites the above functions into the equivalent CASE structure. The rewrite engine has many bugs that are beyond the scope of this change to fix. This change codes around those bugs. The result is that the conditional rewrite happens some of the time, and sometimes produces less-than perfect optimizations. Conditionals in the top-level ORDER BY clause are not rewritten (IMPALA-7753), but those one or more levels down are. Some expressions involving NULL are not simplified (IMPALA-7769). Several hacks were used to work around the fact that the rewrite engine ignores unanalyzed expressions, yet the rewrite engine does not, in general, re-analyze the expressions it produces, causing simplifications to be skipped (IMPALA-7754). And so on. As a result, the BE retains the original interpreted forms that are still used in two cases: 1) top-leel conditions in the ORDER BY clause, and 2) if the user disables rewrites. Further, code generation does not occur for CASE statements in the SELECT clause when it is in the root fragment (the most common case in simple tests.) This is another known bug (IMPALA-4356). One possible performance regression is that the new form of the code evaluates some expressions twice, where the original interpreted code evaluated the argument once. E.g. coalesce(id, 10) is rewritten to CASE WHEN id IS NULL THEN id ELSE 10 END. Here, id is evaluated twice. If the "id" were replaced by a complex sub-expression, the gain from compilation could be offset by doing work twice. (IMPALA-7737) Still, the fix provides most of what the JIRA ticket requested within the limitations of the existing code. Conditional function rewrites are moved into a new class, RewriteConditionalsRule in order to keep things simple. Most functions use the simplest possible rewrite, relying on the existing rewrite rules for further simplification. The one exception is coalesce(): the existing code relies on the semantics of the function and so was retained and slightly improved. The code was extended to produce a CASE statement directly, retaining existing simplifications. Tests for conditional functions were in one large function along with other rewrite tests. Moved them into a new file, then broke up the tests by function to allow much easier debugging of each function one-by-one. This required moving the common test mechanims into a new common base class. Existing tests focus on one or two rules at a time. The conditional function rewrite, however, relies on the entire set of rules being applied repeatedly. So, added a new FullRewriteTest case to verify this behavior. This class contains several commented-out tests that cannot pass due to existing rewrite bugs noted above. Changing the rewrite cause the PlannerTest to produce different plans than previously. Changed the expected results file to match the new rewrite rules. Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java A fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java A fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java A fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java A fe/src/test/java/org/apache/impala/analysis/RewriteConditionalFnsRuleTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test 16 files changed, 831 insertions(+), 309 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/11760/5 -- To view, visit http://gerrit.cloudera.org:8080/11760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163 Gerrit-Change-Number: 11760 Gerrit-PatchSet: 5 Gerrit-Owner: Paul Rogers <par0...@yahoo.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Paul Rogers <par0...@yahoo.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>