[ 
https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773921#comment-17773921
 ] 

Zhizhen Hou commented on SPARK-45478:
-------------------------------------

There are three children in If: predicate, trueValue and falseValue.

There are two execution paths. 1: predicate and trueValue. 2: predicate and 
falseValue.

There are three conbinations of possible common subexpression. 1: predicate and 
trueValue. 2: predicate and falseValue. 3: trueValue and falseValue.

So if all possible common subexpression be eliminated, there is 2 of 3 
possibility to improve the performance. For example, if there is common 
subexpression in predicate and falseValue, and common subexpression is executed 
only once, and it can improve the performance. Only there is common 
subexpression in trueValue and falseValue will not improve the performance and 
it will not draw back the performance, since whether trueValue and falseValue 
will be executed.

So, it looks good to check all three children in If. Any suggestions?

> codegen sum(decimal_column / 2) computes div twice
> --------------------------------------------------
>
>                 Key: SPARK-45478
>                 URL: https://issues.apache.org/jira/browse/SPARK-45478
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Zhizhen Hou
>            Priority: Minor
>
> *The SQL to reproduce the result*
> {code:java}
> create table t_dec (c1 decimal(6,2));
> insert into t_dec values(1.0),(2.0),(null),(3.0);
> explain codegen select sum(c1/2) from t_dec; {code}
>  
> *Reasons may cause the result:* 
>  
> Function sum use If expression in updateExpressions:
>  `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
>  
> The three variables in if expression like this.
> {code:java}
> predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
> true]falseValue: (input[0, decimal(26,6), true] + 
> cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
> In sub expression elimination, only predicate is evaluated in 
> EquivalentExpressions# childrenToRecurse
> {code:java}
> private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match 
> {
>   case _: CodegenFallback => Nil
>   case i: If => i.predicate :: Nil
>   case c: CaseWhen => c.children.head :: Nil
>   case c: Coalesce => c.children.head :: Nil
>   case other => other.children
> } {code}
> I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
> i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.
>  
> But the following comment in `childrenToRecurse`  makes me not sure it will 
> cause any other problems.
> {code:java}
> // 2. If: common subexpressions will always be evaluated at the beginning, 
> but the true and
> // false expressions in `If` may not get accessed, according to the predicate
> // expression. We should only recurse into the predicate expression. {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to