[ 
https://issues.apache.org/jira/browse/CALCITE-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16991128#comment-16991128
 ] 

Feng Zhu commented on CALCITE-3224:
-----------------------------------

Hi, [~zabetak], thanks for your review. Let me answer your comments.

*"Apparently there are certain optimizations that we were performing before and 
that the current code generator does not handle for the moment."*
In new code generator, the optimizations can be naturally achieved. 
The old code generator uses mutable state for null (i.e., Map<? extends 
RexNode, Boolean> exprNullableMap) and produces redundant code when traversing 
the RexNode tree.
Consequently, more and more optimizations are required, e.g., expression reuse, 
NullAs state, NullPolicy.AND/OR/NOT and etc.
In new code generator, we remove these mutable states and generate the code in 
one pass.
Therefore, the optimizations, some tricks and even unusual (but not erroneous) 
situation (AlwaysNull) are not required any more.

 

*"What optims remain to be done in the future?"*
(1) As in the new code generator generates code strictly corresponds to RexNode 
(type), it is not incline to conduct unboxing. Therefore, it may ignore this 
kind of optimations in some places, for instance test case in PR we discussed. 
However, this kind of optimations can be done immediately during the review. It 
is not necessary to leave them in the future.
(2) The special case-when semantic. I found current code generator implements a 
wrong semantic for case-when clause. The operator is SQL’s way of handling 
if/then logic. Different with other \{@code RexCall}s, it is not safe to 
implement its operands first. The right generated code will be the form as 
below. However, this kind of code generated is not simplified as current 
inlined one-line code.
{code:java}
  /**
   * Case statements of the form:
   * {@code CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END}.
   * When {@code a = true}, returns {@code b};
   * when {@code c = true}, returns {@code d};
   * else returns {@code e}.
   *
   * We generate code that looks like: {@code
   *      int case_when_value;
   *      ......code for a......
   *      if (!a_isNull && a_value) {
   *          ......code for b......
   *          case_when_value = res(b_isNull, b_value);
   *      } else {
   *          ......code for c......
   *          if (!c_isNull && c_value) {
   *              ......code for d......
   *              case_when_value = res(d_isNull, d_value);
   *          } else {
   *              ......code for e......
   *              case_when_value = res(e_isNull, e_value);
   *          }
   *      }
   * }
   */{code}
 

*"What will not be able to handle due to the new approach (if any)?"*
I have not found any cases that fail to be handled in new approach so far. In 
addition, during implementing the new code generator, I find some issues 
(CALCITE-3513, CALCITE-3498, CALCITE-3520) which are concealed by current code 
generator's _{color:#FF0000}AlwaysNull{color}_ mechanism.

 

 

> New RexNode-to-Expression CodeGen Implementation
> ------------------------------------------------
>
>                 Key: CALCITE-3224
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3224
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.20.0
>            Reporter: Feng Zhu
>            Assignee: Feng Zhu
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.22.0
>
>         Attachments: RexNode-CodeGen.pdf, codegen.png
>
>          Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> h3. *Background*
>     Current RexNode-to-Expression implementation relies on BlockBuilder's 
> incorrect “optimizations” to inline unsafe operations. As illustrated in 
> CALCITE-3173, when this cooperation is broken in some special cases, it will 
> cause exceptions like NPE, such as CALCITE-3142, CALCITE-3143, CALCITE-3150.
>     Though we can fix these problems under current implementation framework 
> with some efforts like the PR in CALCITE-3142, the logic will become more and 
> more complex. To pursue a thorough and elegant solution, we implement a new 
> one. Moreover, it also ensures the correctness for non-optimized code.
> h3. *Major Features*
>  * *Visitor Pattern*: Each RexNode will be visited only once in a bottom-up 
> way, rather than recursively visiting a RexNode many times with different 
> NullAs settings.
>  * *Conditional Semantic*: It can naturally guarantee the correctness even 
> without BlockBuilder’s “optimizings”. Each line of code generated for a 
> RexNode is null safe.
>  * *Interface Compatibility*: The implementation only updates 
> _RexToLixTranslator_ and _RexImpTable_. Interfaces such as CallImplementor 
> keep unchanged.
> h3. *Implementation*
>     For each RexNode, the visitor will generally generate two declaration 
> statements, one for value and one for nullable. The code snippet is like:
> {code:java}
> {valueVariable} = {valueExpression}
> {isNullVariable} = {isNullExpression}
> {code}
> The visitor’s result will be the variable pair (*_isNullVariable_*, 
> *_valueVariable_*).
> *Other changes:*
> (1) ReImplement different RexCall implementations (e.g., CastImplementor, 
> BinaryImplementor and etc.) as seperated files and remove them into the newly 
> created package _org.apache.calcite.adapter.enumerable.rex,_ and organize 
> them in RexCallImpTable.
> (2) move some util functions into EnumUtils.
> h3. *Example Demonstration*
> Take a simple test case as example, in which the "commission" column is 
> nullable.
> {code:java}
> @Test public void testNPE() {
>   CalciteAssert.hr()
>     .query("select \"commission\" + 10 as s\n"
>       + "from \"hr\".\"emps\"")
>     .returns("S=1010\nS=510\nS=null\nS=260\n");
> }
> {code}
> The codegen progress and non-optimized code are demonstrated in the figure 
> below.
> !codegen.png!
>  # When visiting *RexInputRef (commission)*, the visitor generates three 
> lines of code, the result is a pair of ParameterExpression (*_input_isNull_*, 
> *_input_value_*).
>  # Then the visitor visits *RexLiteral (10)* and generates two lines of code. 
> The result is (*_literal_isNull_*, *_literal_value_*).
>  # After that, when visiting *RexCall(Add)*, (_*input_isNull*_, 
> _*input_value*_) and (_*literal_isNull*_, _*literal_value*_) can be used to 
> implement the logic. The visitor also generates two lines of code and return 
> the variable pair.
> In the end, the result Expression is constructed based on 
> (_*binary_call_isNull*_, _*binary_call_value*_)
> [^RexNode-CodeGen.pdf]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to