[ 
https://issues.apache.org/jira/browse/SPARK-42551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Kun updated SPARK-42551:
----------------------------
    Description: 
h1. *Design Sketch*
 * Get all common expressions from input expressions. Recursively visits all 
subexpressions regardless of whether the current expression is a conditional 
expression.
 * For each common expression:
 * Add a new boolean variable *subExprInit_n* to indicate whether we have  
already evaluated the common expression, and reset it to *false* at the start 
of operator.consume()
 * Add a new wrapper subExpr function for common subexpression, and replace all 
the common subexpression with the wrapper function.

|private void subExpr_n(${argList.mkString(", ")}) {
 if (!subExprInit_n) {
   ${eval.code}
   subExprInit_n = true;
   subExprIsNull_n = ${eval.isNull};
   subExprValue_n = ${eval.value};
 }
}|
h1. *New support subexpression elimination patterns*
 * 
h2. *Support subexpression elimination with conditional expressions*

|SELECT case when v + 2 > 1 then 1
            when v + 1 > 2 then 2
            when v + 1 > 3 then 3 END vv
FROM values(1) as t2(v)|

We can reuse the result of expression  *v + 1*

 
|SELECT a, max(if(a > 0, b + c, null)) max_bc, min(if(a > 1, b + c, null)) 
min_bc
FROM values(1, 1, 1) as t(a, b, c)
GROUP BY a​​​|

We can reuse the result of expression  b + c
 * 
h2. *Support subexpression elimination in FilterExec*

 
|SELECT ** *** ** FROM *(*
  *SELECT v* * *v* *+* *1* *v1* from values{*}({*}{*}1{*}{*}){*} as *t2(v)*
*) t*
where *v1* *>* *5* and *v1* *<* *10*|

We can reuse the result of expression  *v* * *v* *+* *1*
 * 
h2. *Support subexpression elimination in JoinExec*

 
|SELECT ** *** ** 
FROM ** values{*}({*}{*}1{*}{*},{*} {*}1{*}{*}){*} as *t1(a, b)* 
join values{*}({*}{*}1{*}{*},{*} {*}2{*}{*}){*} as *t2(x, y)*ON *b* *** *y* 
between ** *2* ** and ** *3*|

We can reuse the result of expression  *b* * *y*
 * 
h2. *Support subexpression elimination in ExpandExec*

 
|*SELECT* a, count(b),
    count({*}distinct{*} *case* *when* b > 1 *then* b + c *else* *null* 
{*}end{*}) *as* count_bc_1,
    count({*}distinct{*} *case* *when* b < 0 *then* b + c *else* *null* 
{*}end{*}) *as* count_bc_2
*FROM* {*}values{*}(1, 1, 1) *as* t(a, b, c)
*GROUP* *BY* a|

We can reuse the result of expression  b + c

  was:
h1. *Design Sketch*
 * Get all common expressions from input expressions. Recursively visits all 
subexpressions regardless of whether the current expression is a conditional 
expression.
 * For each common expression:
 * Add a new boolean variable *subExprInit_n* to indicate whether we have  
already evaluated the common expression, and reset it to *false* at the start 
of operator.consume()
 * Add a new wrapper subExpr function for common subexpression, and replace all 
the common subexpression with the wrapper function.

|private void subExpr_n(${argList.mkString(", ")}) {
 if (!subExprInit_n) {
   ${eval.code}
   subExprInit_n = true;
   subExprIsNull_n = ${eval.isNull};
   subExprValue_n = ${eval.value};
 }
}|
h1. *New support subexpression elimination patterns*
 * 
h2. *Support subexpression elimination with conditional expressions*

|SELECT case when v + 2 > 1 then 1
            when v + 1 > 2 then 2
            when v + 1 > 3 then 3 END vv
FROM values(1) as t2(v)|

We can reuse the result of expression  *v + 1*

 
|SELECT a, max(if(a > 0, b + c, null)) max_bc, min(if(a > 1, b + c, null)) 
min_bc
FROM values(1, 1, 1) as t(a, b, c)
GROUP BY a​​​|

We can reuse the result of expression  b + c
 * 
h2. *Support subexpression elimination in FilterExec*

 
|SELECT ** *** ** FROM \{*}({*}
  **  SELECT *v* *** *v* *+* ** *1* *v1* from ** values{*}({*}{*}1{*}{*}){*} as 
\{*}t2(v){*}
{*}) t{*}
where *v1* *>* ** *5* ** and *v1* *<* ** *10*|

We can reuse the result of expression  *v* *** *v* *+* ** *1*
 * 
h2. *Support subexpression elimination in JoinExec*

 
|WITH *t1 (* SELECT ** *** ** FROM ** values{*}({*}{*}1{*}{*},{*} 
{*}1{*}{*}){*} as \{*}t(a, b)),{*}
{*}t2 ({*} SELECT ** *** ** FROM ** values{*}({*}{*}1{*}{*},{*} {*}2{*}{*}){*} 
as \{*}t(x, y)){*}
SELECT ** *** ** FROM *t1* join \{*}t2{*}
ON *b* *** *y* between ** *2* ** and ** *3*|

We can reuse the result of expression  *b* *** *y*
 * 
h2. *Support subexpression elimination in ExpandExec*

 
|*SELECT* a, count(b),
    count({*}distinct{*} *case* *when* b > 1 *then* b + c *else* *null* 
{*}end{*}) *as* count_bc_1,
    count({*}distinct{*} *case* *when* b < 0 *then* b + c *else* *null* 
{*}end{*}) *as* count_bc_2
*FROM* {*}values{*}(1, 1, 1) *as* t(a, b, c)
*GROUP* *BY* a|

We can reuse the result of expression  b + c


> Support more subexpression elimination cases
> --------------------------------------------
>
>                 Key: SPARK-42551
>                 URL: https://issues.apache.org/jira/browse/SPARK-42551
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.3.2
>            Reporter: Wan Kun
>            Priority: Major
>
> h1. *Design Sketch*
>  * Get all common expressions from input expressions. Recursively visits all 
> subexpressions regardless of whether the current expression is a conditional 
> expression.
>  * For each common expression:
>  * Add a new boolean variable *subExprInit_n* to indicate whether we have  
> already evaluated the common expression, and reset it to *false* at the start 
> of operator.consume()
>  * Add a new wrapper subExpr function for common subexpression, and replace 
> all the common subexpression with the wrapper function.
> |private void subExpr_n(${argList.mkString(", ")}) {
>  if (!subExprInit_n) {
>    ${eval.code}
>    subExprInit_n = true;
>    subExprIsNull_n = ${eval.isNull};
>    subExprValue_n = ${eval.value};
>  }
> }|
> h1. *New support subexpression elimination patterns*
>  * 
> h2. *Support subexpression elimination with conditional expressions*
> |SELECT case when v + 2 > 1 then 1
>             when v + 1 > 2 then 2
>             when v + 1 > 3 then 3 END vv
> FROM values(1) as t2(v)|
> We can reuse the result of expression  *v + 1*
>  
> |SELECT a, max(if(a > 0, b + c, null)) max_bc, min(if(a > 1, b + c, null)) 
> min_bc
> FROM values(1, 1, 1) as t(a, b, c)
> GROUP BY a​​​|
> We can reuse the result of expression  b + c
>  * 
> h2. *Support subexpression elimination in FilterExec*
>  
> |SELECT ** *** ** FROM *(*
>   *SELECT v* * *v* *+* *1* *v1* from values{*}({*}{*}1{*}{*}){*} as *t2(v)*
> *) t*
> where *v1* *>* *5* and *v1* *<* *10*|
> We can reuse the result of expression  *v* * *v* *+* *1*
>  * 
> h2. *Support subexpression elimination in JoinExec*
>  
> |SELECT ** *** ** 
> FROM ** values{*}({*}{*}1{*}{*},{*} {*}1{*}{*}){*} as *t1(a, b)* 
> join values{*}({*}{*}1{*}{*},{*} {*}2{*}{*}){*} as *t2(x, y)*ON *b* *** *y* 
> between ** *2* ** and ** *3*|
> We can reuse the result of expression  *b* * *y*
>  * 
> h2. *Support subexpression elimination in ExpandExec*
>  
> |*SELECT* a, count(b),
>     count({*}distinct{*} *case* *when* b > 1 *then* b + c *else* *null* 
> {*}end{*}) *as* count_bc_1,
>     count({*}distinct{*} *case* *when* b < 0 *then* b + c *else* *null* 
> {*}end{*}) *as* count_bc_2
> *FROM* {*}values{*}(1, 1, 1) *as* t(a, b, c)
> *GROUP* *BY* a|
> We can reuse the result of expression  b + c



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to