[jira] [Updated] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice

2023-10-17 Thread Zhizhen Hou (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhizhen Hou updated SPARK-45478:

Description: 
After eliminates all common subexpression of `If`, it reduces computation time 
from 93.727s to 55.498s over 15393291 records. ** 

*The SQL to reproduce the result*
{code:java}
create table t_dec (c1 decimal(6,2));
insert into t_dec values(1.0),(2.0),(null),(3.0);
explain codegen select sum(c1/2) from t_dec; {code}
 
*Reasons may cause the result:* 
 
Function sum use If expression in updateExpressions:
 `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
 
The three variables in if expression like this.
{code:java}
predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
true]falseValue: (input[0, decimal(26,6), true] + 
cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
In sub expression elimination, only predicate is evaluated in 
EquivalentExpressions# childrenToRecurse
{code:java}
private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
  case _: CodegenFallback => Nil
  case i: If => i.predicate :: Nil
  case c: CaseWhen => c.children.head :: Nil
  case c: Coalesce => c.children.head :: Nil
  case other => other.children
} {code}
I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.

 

But the following comment in `childrenToRecurse`  makes me not sure it will 
cause any other problems.
{code:java}
// 2. If: common subexpressions will always be evaluated at the beginning, but 
the true and
// false expressions in `If` may not get accessed, according to the predicate
// expression. We should only recurse into the predicate expression. {code}
 

  was:
*The SQL to reproduce the result*
{code:java}
create table t_dec (c1 decimal(6,2));
insert into t_dec values(1.0),(2.0),(null),(3.0);
explain codegen select sum(c1/2) from t_dec; {code}
 
*Reasons may cause the result:* 
 
Function sum use If expression in updateExpressions:
 `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
 
The three variables in if expression like this.
{code:java}
predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
true]falseValue: (input[0, decimal(26,6), true] + 
cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
In sub expression elimination, only predicate is evaluated in 
EquivalentExpressions# childrenToRecurse
{code:java}
private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
  case _: CodegenFallback => Nil
  case i: If => i.predicate :: Nil
  case c: CaseWhen => c.children.head :: Nil
  case c: Coalesce => c.children.head :: Nil
  case other => other.children
} {code}
I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.

 

But the following comment in `childrenToRecurse`  makes me not sure it will 
cause any other problems.
{code:java}
// 2. If: common subexpressions will always be evaluated at the beginning, but 
the true and
// false expressions in `If` may not get accessed, according to the predicate
// expression. We should only recurse into the predicate expression. {code}
 


> codegen sum(decimal_column / 2) computes div twice
> --
>
> Key: SPARK-45478
> URL: https://issues.apache.org/jira/browse/SPARK-45478
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Zhizhen Hou
>Priority: Minor
>  Labels: pull-request-available
>
> After eliminates all common subexpression of `If`, it reduces computation 
> time from 93.727s to 55.498s over 15393291 records. ** 
> *The SQL to reproduce the result*
> {code:java}
> create table t_dec (c1 decimal(6,2));
> insert into t_dec values(1.0),(2.0),(null),(3.0);
> explain codegen select sum(c1/2) from t_dec; {code}
>  
> *Reasons may cause the result:* 
>  
> Function sum use If expression in updateExpressions:
>  `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
>  
> The three variables in if expression like this.
> {code:java}
> predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
> true]falseValue: (input[0, decimal(26,6), true] + 
> cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2

[jira] [Updated] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice

2023-10-17 Thread Zhizhen Hou (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhizhen Hou updated SPARK-45478:

Description: 
After eliminates all common subexpression of `If`, it reduces computation time 
from 93.727s to 55.498s over 15393291 records. 

*The SQL to reproduce the result*
{code:java}
create table t_dec (c1 decimal(6,2));
insert into t_dec values(1.0),(2.0),(null),(3.0);
explain codegen select sum(c1/2) from t_dec; {code}
 
*Reasons may cause the result:* 
 
Function sum use If expression in updateExpressions:
 `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
 
The three variables in if expression like this.
{code:java}
predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
true]falseValue: (input[0, decimal(26,6), true] + 
cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
In sub expression elimination, only predicate is evaluated in 
EquivalentExpressions# childrenToRecurse
{code:java}
private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
  case _: CodegenFallback => Nil
  case i: If => i.predicate :: Nil
  case c: CaseWhen => c.children.head :: Nil
  case c: Coalesce => c.children.head :: Nil
  case other => other.children
} {code}
I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.

 

But the following comment in `childrenToRecurse`  makes me not sure it will 
cause any other problems.
{code:java}
// 2. If: common subexpressions will always be evaluated at the beginning, but 
the true and
// false expressions in `If` may not get accessed, according to the predicate
// expression. We should only recurse into the predicate expression. {code}
 

  was:
After eliminates all common subexpression of `If`, it reduces computation time 
from 93.727s to 55.498s over 15393291 records. ** 

*The SQL to reproduce the result*
{code:java}
create table t_dec (c1 decimal(6,2));
insert into t_dec values(1.0),(2.0),(null),(3.0);
explain codegen select sum(c1/2) from t_dec; {code}
 
*Reasons may cause the result:* 
 
Function sum use If expression in updateExpressions:
 `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
 
The three variables in if expression like this.
{code:java}
predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
true]falseValue: (input[0, decimal(26,6), true] + 
cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
In sub expression elimination, only predicate is evaluated in 
EquivalentExpressions# childrenToRecurse
{code:java}
private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
  case _: CodegenFallback => Nil
  case i: If => i.predicate :: Nil
  case c: CaseWhen => c.children.head :: Nil
  case c: Coalesce => c.children.head :: Nil
  case other => other.children
} {code}
I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.

 

But the following comment in `childrenToRecurse`  makes me not sure it will 
cause any other problems.
{code:java}
// 2. If: common subexpressions will always be evaluated at the beginning, but 
the true and
// false expressions in `If` may not get accessed, according to the predicate
// expression. We should only recurse into the predicate expression. {code}
 


> codegen sum(decimal_column / 2) computes div twice
> --
>
> Key: SPARK-45478
> URL: https://issues.apache.org/jira/browse/SPARK-45478
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Zhizhen Hou
>Priority: Minor
>  Labels: pull-request-available
>
> After eliminates all common subexpression of `If`, it reduces computation 
> time from 93.727s to 55.498s over 15393291 records. 
> *The SQL to reproduce the result*
> {code:java}
> create table t_dec (c1 decimal(6,2));
> insert into t_dec values(1.0),(2.0),(null),(3.0);
> explain codegen select sum(c1/2) from t_dec; {code}
>  
> *Reasons may cause the result:* 
>  
> Function sum use If expression in updateExpressions:
>  `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
>  
> The three variables in if expression like this.
> {code:java}
> predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
> true]falseValue

[jira] [Commented] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice

2023-10-10 Thread Zhizhen Hou (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773921#comment-17773921
 ] 

Zhizhen Hou commented on SPARK-45478:
-

There are three children in If: predicate, trueValue and falseValue.

There are two execution paths. 1: predicate and trueValue. 2: predicate and 
falseValue.

There are three conbinations of possible common subexpression. 1: predicate and 
trueValue. 2: predicate and falseValue. 3: trueValue and falseValue.

So if all possible common subexpression be eliminated, there is 2 of 3 
possibility to improve the performance. For example, if there is common 
subexpression in predicate and falseValue, and common subexpression is executed 
only once, and it can improve the performance. Only there is common 
subexpression in trueValue and falseValue will not improve the performance and 
it will not draw back the performance, since whether trueValue and falseValue 
will be executed.

So, it looks good to check all three children in If. Any suggestions?

> codegen sum(decimal_column / 2) computes div twice
> --
>
> Key: SPARK-45478
> URL: https://issues.apache.org/jira/browse/SPARK-45478
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Zhizhen Hou
>Priority: Minor
>
> *The SQL to reproduce the result*
> {code:java}
> create table t_dec (c1 decimal(6,2));
> insert into t_dec values(1.0),(2.0),(null),(3.0);
> explain codegen select sum(c1/2) from t_dec; {code}
>  
> *Reasons may cause the result:* 
>  
> Function sum use If expression in updateExpressions:
>  `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
>  
> The three variables in if expression like this.
> {code:java}
> predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
> true]falseValue: (input[0, decimal(26,6), true] + 
> cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
> In sub expression elimination, only predicate is evaluated in 
> EquivalentExpressions# childrenToRecurse
> {code:java}
> private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match 
> {
>   case _: CodegenFallback => Nil
>   case i: If => i.predicate :: Nil
>   case c: CaseWhen => c.children.head :: Nil
>   case c: Coalesce => c.children.head :: Nil
>   case other => other.children
> } {code}
> I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
> i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.
>  
> But the following comment in `childrenToRecurse`  makes me not sure it will 
> cause any other problems.
> {code:java}
> // 2. If: common subexpressions will always be evaluated at the beginning, 
> but the true and
> // false expressions in `If` may not get accessed, according to the predicate
> // expression. We should only recurse into the predicate expression. {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice

2023-10-10 Thread Zhizhen Hou (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhizhen Hou updated SPARK-45478:

Description: 
*The SQL to reproduce the result*
{code:java}
create table t_dec (c1 decimal(6,2));
insert into t_dec values(1.0),(2.0),(null),(3.0);
explain codegen select sum(c1/2) from t_dec; {code}
 
*Reasons may cause the result:* 
 
Function sum use If expression in updateExpressions:
 `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
 
The three variables in if expression like this.
{code:java}
predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
true]falseValue: (input[0, decimal(26,6), true] + 
cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
In sub expression elimination, only predicate is evaluated in 
EquivalentExpressions# childrenToRecurse
{code:java}
private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
  case _: CodegenFallback => Nil
  case i: If => i.predicate :: Nil
  case c: CaseWhen => c.children.head :: Nil
  case c: Coalesce => c.children.head :: Nil
  case other => other.children
} {code}
I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.

 

But the following comment in `childrenToRecurse`  makes me not sure it will 
cause any other problems.
{code:java}
// 2. If: common subexpressions will always be evaluated at the beginning, but 
the true and
// false expressions in `If` may not get accessed, according to the predicate
// expression. We should only recurse into the predicate expression. {code}
 

  was:
*The SQL to reproduce* 
{code:java}
create table t_dec (c1 decimal(6,2));
insert into t_dec values(1.0),(2.0),(null),(3.0);
explain codegen select sum(c1/2) from t_dec; {code}
 
*Reasons may cause the result:* 
 
Function sum use If expression in updateExpressions:
 `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
 
The three variables in if expression like this.
{code:java}
predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
true]falseValue: (input[0, decimal(26,6), true] + 
cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
In sub expression elimination, only predicate is evaluated in 
EquivalentExpressions# childrenToRecurse
{code:java}
private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
  case _: CodegenFallback => Nil
  case i: If => i.predicate :: Nil
  case c: CaseWhen => c.children.head :: Nil
  case c: Coalesce => c.children.head :: Nil
  case other => other.children
} {code}
I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.

 

But the following comment in `childrenToRecurse`  makes me not sure it will 
cause any other problems.
{code:java}
// 2. If: common subexpressions will always be evaluated at the beginning, but 
the true and
// false expressions in `If` may not get accessed, according to the predicate
// expression. We should only recurse into the predicate expression. {code}
 


> codegen sum(decimal_column / 2) computes div twice
> --
>
> Key: SPARK-45478
> URL: https://issues.apache.org/jira/browse/SPARK-45478
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Zhizhen Hou
>Priority: Minor
>
> *The SQL to reproduce the result*
> {code:java}
> create table t_dec (c1 decimal(6,2));
> insert into t_dec values(1.0),(2.0),(null),(3.0);
> explain codegen select sum(c1/2) from t_dec; {code}
>  
> *Reasons may cause the result:* 
>  
> Function sum use If expression in updateExpressions:
>  `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
>  
> The three variables in if expression like this.
> {code:java}
> predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
> true]falseValue: (input[0, decimal(26,6), true] + 
> cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
> In sub expression elimination, only predicate is evaluated in 
> EquivalentExpressions# childrenToRecurse
> {code:java}
> private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match 
> {
>   case _: CodegenFallback => Nil
>   case i: If => i.predic

[jira] [Commented] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice

2023-10-10 Thread Zhizhen Hou (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773579#comment-17773579
 ] 

Zhizhen Hou commented on SPARK-45478:
-

The new compiled spark 4.0.0-SNAPSHOT from master branch also produce the same 
result.

> codegen sum(decimal_column / 2) computes div twice
> --
>
> Key: SPARK-45478
> URL: https://issues.apache.org/jira/browse/SPARK-45478
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Zhizhen Hou
>Priority: Minor
>
> *The SQL to reproduce* 
> create table t_dec (c1 decimal(6,2));
> insert into t_dec values(1.0),(2.0),(null),(3.0);
> explain codegen select sum(c1/2) from t_dec;
>  
> *Reasons may cause the result:* 
>  
> Function sum use If expression in updateExpressions:
>  `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
>  
> The three variables in if expression like this.
> {code:java}
> predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
> true]falseValue: (input[0, decimal(26,6), true] + 
> cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
> In sub expression elimination, only predicate is evaluated in 
> EquivalentExpressions# childrenToRecurse
> {code:java}
> private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match 
> {
>   case _: CodegenFallback => Nil
>   case i: If => i.predicate :: Nil
>   case c: CaseWhen => c.children.head :: Nil
>   case c: Coalesce => c.children.head :: Nil
>   case other => other.children
> } {code}
> I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
> i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.
>  
> But the following comment in `childrenToRecurse`  makes me not sure it will 
> cause any other problems.
> {code:java}
> // 2. If: common subexpressions will always be evaluated at the beginning, 
> but the true and
> // false expressions in `If` may not get accessed, according to the predicate
> // expression. We should only recurse into the predicate expression. {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice

2023-10-10 Thread Zhizhen Hou (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhizhen Hou updated SPARK-45478:

Description: 
*The SQL to reproduce* 
{code:java}
create table t_dec (c1 decimal(6,2));
insert into t_dec values(1.0),(2.0),(null),(3.0);
explain codegen select sum(c1/2) from t_dec; {code}
 
*Reasons may cause the result:* 
 
Function sum use If expression in updateExpressions:
 `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
 
The three variables in if expression like this.
{code:java}
predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
true]falseValue: (input[0, decimal(26,6), true] + 
cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
In sub expression elimination, only predicate is evaluated in 
EquivalentExpressions# childrenToRecurse
{code:java}
private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
  case _: CodegenFallback => Nil
  case i: If => i.predicate :: Nil
  case c: CaseWhen => c.children.head :: Nil
  case c: Coalesce => c.children.head :: Nil
  case other => other.children
} {code}
I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.

 

But the following comment in `childrenToRecurse`  makes me not sure it will 
cause any other problems.
{code:java}
// 2. If: common subexpressions will always be evaluated at the beginning, but 
the true and
// false expressions in `If` may not get accessed, according to the predicate
// expression. We should only recurse into the predicate expression. {code}
 

  was:
*The SQL to reproduce* 
create table t_dec (c1 decimal(6,2));
insert into t_dec values(1.0),(2.0),(null),(3.0);
explain codegen select sum(c1/2) from t_dec;
 
*Reasons may cause the result:* 
 
Function sum use If expression in updateExpressions:
 `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
 
The three variables in if expression like this.
{code:java}
predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
true]falseValue: (input[0, decimal(26,6), true] + 
cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
In sub expression elimination, only predicate is evaluated in 
EquivalentExpressions# childrenToRecurse
{code:java}
private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
  case _: CodegenFallback => Nil
  case i: If => i.predicate :: Nil
  case c: CaseWhen => c.children.head :: Nil
  case c: Coalesce => c.children.head :: Nil
  case other => other.children
} {code}
I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.

 

But the following comment in `childrenToRecurse`  makes me not sure it will 
cause any other problems.
{code:java}
// 2. If: common subexpressions will always be evaluated at the beginning, but 
the true and
// false expressions in `If` may not get accessed, according to the predicate
// expression. We should only recurse into the predicate expression. {code}
 


> codegen sum(decimal_column / 2) computes div twice
> --
>
> Key: SPARK-45478
> URL: https://issues.apache.org/jira/browse/SPARK-45478
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Zhizhen Hou
>Priority: Minor
>
> *The SQL to reproduce* 
> {code:java}
> create table t_dec (c1 decimal(6,2));
> insert into t_dec values(1.0),(2.0),(null),(3.0);
> explain codegen select sum(c1/2) from t_dec; {code}
>  
> *Reasons may cause the result:* 
>  
> Function sum use If expression in updateExpressions:
>  `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
>  
> The three variables in if expression like this.
> {code:java}
> predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
> true]falseValue: (input[0, decimal(26,6), true] + 
> cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
> In sub expression elimination, only predicate is evaluated in 
> EquivalentExpressions# childrenToRecurse
> {code:java}
> private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match 
> {
>   case _: CodegenFallback => Nil
>   case i: If => i.predicate :: Nil
>   case c: CaseWhen => c.ch

[jira] [Created] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice

2023-10-10 Thread Zhizhen Hou (Jira)
Zhizhen Hou created SPARK-45478:
---

 Summary: codegen sum(decimal_column / 2) computes div twice
 Key: SPARK-45478
 URL: https://issues.apache.org/jira/browse/SPARK-45478
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Zhizhen Hou


*The SQL to reproduce* 
create table t_dec (c1 decimal(6,2));
insert into t_dec values(1.0),(2.0),(null),(3.0);
explain codegen select sum(c1/2) from t_dec;
 
*Reasons may cause the result:* 
 
Function sum use If expression in updateExpressions:
 `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
 
The three variables in if expression like this.
{code:java}
predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
true]falseValue: (input[0, decimal(26,6), true] + 
cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
In sub expression elimination, only predicate is evaluated in 
EquivalentExpressions# childrenToRecurse
{code:java}
private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
  case _: CodegenFallback => Nil
  case i: If => i.predicate :: Nil
  case c: CaseWhen => c.children.head :: Nil
  case c: Coalesce => c.children.head :: Nil
  case other => other.children
} {code}
I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.

 

But the following comment in `childrenToRecurse`  makes me not sure it will 
cause any other problems.
{code:java}
// 2. If: common subexpressions will always be evaluated at the beginning, but 
the true and
// false expressions in `If` may not get accessed, according to the predicate
// expression. We should only recurse into the predicate expression. {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org