[ https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhizhen Hou updated SPARK-45478: -------------------------------- Description: *The SQL to reproduce the result* {code:java} create table t_dec (c1 decimal(6,2)); insert into t_dec values(1.0),(2.0),(null),(3.0); explain codegen select sum(c1/2) from t_dec; {code} *Reasons may cause the result:* Function sum use If expression in updateExpressions: `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))` The three variables in if expression like this. {code:java} predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), true]falseValue: (input[0, decimal(26,6), true] + cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code} In sub expression elimination, only predicate is evaluated in EquivalentExpressions# childrenToRecurse {code:java} private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match { case _: CodegenFallback => Nil case i: If => i.predicate :: Nil case c: CaseWhen => c.children.head :: Nil case c: Coalesce => c.children.head :: Nil case other => other.children } {code} I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result. But the following comment in `childrenToRecurse` makes me not sure it will cause any other problems. {code:java} // 2. If: common subexpressions will always be evaluated at the beginning, but the true and // false expressions in `If` may not get accessed, according to the predicate // expression. We should only recurse into the predicate expression. {code} was: *The SQL to reproduce* {code:java} create table t_dec (c1 decimal(6,2)); insert into t_dec values(1.0),(2.0),(null),(3.0); explain codegen select sum(c1/2) from t_dec; {code} *Reasons may cause the result:* Function sum use If expression in updateExpressions: `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))` The three variables in if expression like this. {code:java} predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), true]falseValue: (input[0, decimal(26,6), true] + cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code} In sub expression elimination, only predicate is evaluated in EquivalentExpressions# childrenToRecurse {code:java} private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match { case _: CodegenFallback => Nil case i: If => i.predicate :: Nil case c: CaseWhen => c.children.head :: Nil case c: Coalesce => c.children.head :: Nil case other => other.children } {code} I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result. But the following comment in `childrenToRecurse` makes me not sure it will cause any other problems. {code:java} // 2. If: common subexpressions will always be evaluated at the beginning, but the true and // false expressions in `If` may not get accessed, according to the predicate // expression. We should only recurse into the predicate expression. {code} > codegen sum(decimal_column / 2) computes div twice > -------------------------------------------------- > > Key: SPARK-45478 > URL: https://issues.apache.org/jira/browse/SPARK-45478 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.2.0 > Reporter: Zhizhen Hou > Priority: Minor > > *The SQL to reproduce the result* > {code:java} > create table t_dec (c1 decimal(6,2)); > insert into t_dec values(1.0),(2.0),(null),(3.0); > explain codegen select sum(c1/2) from t_dec; {code} > > *Reasons may cause the result:* > > Function sum use If expression in updateExpressions: > `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))` > > The three variables in if expression like this. > {code:java} > predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), > true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), > true]falseValue: (input[0, decimal(26,6), true] + > cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), > true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code} > In sub expression elimination, only predicate is evaluated in > EquivalentExpressions# childrenToRecurse > {code:java} > private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match > { > case _: CodegenFallback => Nil > case i: If => i.predicate :: Nil > case c: CaseWhen => c.children.head :: Nil > case c: Coalesce => c.children.head :: Nil > case other => other.children > } {code} > I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => > i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result. > > But the following comment in `childrenToRecurse` makes me not sure it will > cause any other problems. > {code:java} > // 2. If: common subexpressions will always be evaluated at the beginning, > but the true and > // false expressions in `If` may not get accessed, according to the predicate > // expression. We should only recurse into the predicate expression. {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org