[ https://issues.apache.org/jira/browse/SPARK-42551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wan Kun updated SPARK-42551: ---------------------------- Description: h1. *Design Sketch* * Get all common expressions from input expressions. Recursively visits all subexpressions regardless of whether the current expression is a conditional expression. * For each common expression: * Add a new boolean variable *subExprInit_n* to indicate whether we have already evaluated the common expression, and reset it to *false* at the start of operator.consume() * Add a new wrapper subExpr function for common subexpression, and replace all the common subexpression with the wrapper function. |private void subExpr_n(${argList.mkString(", ")}) { if (!subExprInit_n) { ${eval.code} subExprInit_n = true; subExprIsNull_n = ${eval.isNull}; subExprValue_n = ${eval.value}; } }| h1. *New support subexpression elimination patterns* * h2. *Support subexpression elimination with conditional expressions* |SELECT case when v + 2 > 1 then 1 when v + 1 > 2 then 2 when v + 1 > 3 then 3 END vv FROM values(1) as t2(v)| We can reuse the result of expression *v + 1* |SELECT a, max(if(a > 0, b + c, null)) max_bc, min(if(a > 1, b + c, null)) min_bc FROM values(1, 1, 1) as t(a, b, c) GROUP BY a| We can reuse the result of expression b + c * h2. *Support subexpression elimination in FilterExec* |SELECT ** *** ** FROM *(* *SELECT v* * *v* *+* *1* *v1* from values{*}({*}{*}1{*}{*}){*} as *t2(v)* *) t* where *v1* *>* *5* and *v1* *<* *10*| We can reuse the result of expression *v* * *v* *+* *1* * h2. *Support subexpression elimination in JoinExec* |SELECT ** *** ** FROM ** values{*}({*}{*}1{*}{*},{*} {*}1{*}{*}){*} as *t1(a, b)* join values{*}({*}{*}1{*}{*},{*} {*}2{*}{*}){*} as *t2(x, y)*ON *b* *** *y* between ** *2* ** and ** *3*| We can reuse the result of expression *b* * *y* * h2. *Support subexpression elimination in ExpandExec* |*SELECT* a, count(b), count({*}distinct{*} *case* *when* b > 1 *then* b + c *else* *null* {*}end{*}) *as* count_bc_1, count({*}distinct{*} *case* *when* b < 0 *then* b + c *else* *null* {*}end{*}) *as* count_bc_2 *FROM* {*}values{*}(1, 1, 1) *as* t(a, b, c) *GROUP* *BY* a| We can reuse the result of expression b + c was: h1. *Design Sketch* * Get all common expressions from input expressions. Recursively visits all subexpressions regardless of whether the current expression is a conditional expression. * For each common expression: * Add a new boolean variable *subExprInit_n* to indicate whether we have already evaluated the common expression, and reset it to *false* at the start of operator.consume() * Add a new wrapper subExpr function for common subexpression, and replace all the common subexpression with the wrapper function. |private void subExpr_n(${argList.mkString(", ")}) { if (!subExprInit_n) { ${eval.code} subExprInit_n = true; subExprIsNull_n = ${eval.isNull}; subExprValue_n = ${eval.value}; } }| h1. *New support subexpression elimination patterns* * h2. *Support subexpression elimination with conditional expressions* |SELECT case when v + 2 > 1 then 1 when v + 1 > 2 then 2 when v + 1 > 3 then 3 END vv FROM values(1) as t2(v)| We can reuse the result of expression *v + 1* |SELECT a, max(if(a > 0, b + c, null)) max_bc, min(if(a > 1, b + c, null)) min_bc FROM values(1, 1, 1) as t(a, b, c) GROUP BY a| We can reuse the result of expression b + c * h2. *Support subexpression elimination in FilterExec* |SELECT ** *** ** FROM \{*}({*} ** SELECT *v* *** *v* *+* ** *1* *v1* from ** values{*}({*}{*}1{*}{*}){*} as \{*}t2(v){*} {*}) t{*} where *v1* *>* ** *5* ** and *v1* *<* ** *10*| We can reuse the result of expression *v* *** *v* *+* ** *1* * h2. *Support subexpression elimination in JoinExec* |WITH *t1 (* SELECT ** *** ** FROM ** values{*}({*}{*}1{*}{*},{*} {*}1{*}{*}){*} as \{*}t(a, b)),{*} {*}t2 ({*} SELECT ** *** ** FROM ** values{*}({*}{*}1{*}{*},{*} {*}2{*}{*}){*} as \{*}t(x, y)){*} SELECT ** *** ** FROM *t1* join \{*}t2{*} ON *b* *** *y* between ** *2* ** and ** *3*| We can reuse the result of expression *b* *** *y* * h2. *Support subexpression elimination in ExpandExec* |*SELECT* a, count(b), count({*}distinct{*} *case* *when* b > 1 *then* b + c *else* *null* {*}end{*}) *as* count_bc_1, count({*}distinct{*} *case* *when* b < 0 *then* b + c *else* *null* {*}end{*}) *as* count_bc_2 *FROM* {*}values{*}(1, 1, 1) *as* t(a, b, c) *GROUP* *BY* a| We can reuse the result of expression b + c > Support more subexpression elimination cases > -------------------------------------------- > > Key: SPARK-42551 > URL: https://issues.apache.org/jira/browse/SPARK-42551 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.3.2 > Reporter: Wan Kun > Priority: Major > > h1. *Design Sketch* > * Get all common expressions from input expressions. Recursively visits all > subexpressions regardless of whether the current expression is a conditional > expression. > * For each common expression: > * Add a new boolean variable *subExprInit_n* to indicate whether we have > already evaluated the common expression, and reset it to *false* at the start > of operator.consume() > * Add a new wrapper subExpr function for common subexpression, and replace > all the common subexpression with the wrapper function. > |private void subExpr_n(${argList.mkString(", ")}) { > if (!subExprInit_n) { > ${eval.code} > subExprInit_n = true; > subExprIsNull_n = ${eval.isNull}; > subExprValue_n = ${eval.value}; > } > }| > h1. *New support subexpression elimination patterns* > * > h2. *Support subexpression elimination with conditional expressions* > |SELECT case when v + 2 > 1 then 1 > when v + 1 > 2 then 2 > when v + 1 > 3 then 3 END vv > FROM values(1) as t2(v)| > We can reuse the result of expression *v + 1* > > |SELECT a, max(if(a > 0, b + c, null)) max_bc, min(if(a > 1, b + c, null)) > min_bc > FROM values(1, 1, 1) as t(a, b, c) > GROUP BY a| > We can reuse the result of expression b + c > * > h2. *Support subexpression elimination in FilterExec* > > |SELECT ** *** ** FROM *(* > *SELECT v* * *v* *+* *1* *v1* from values{*}({*}{*}1{*}{*}){*} as *t2(v)* > *) t* > where *v1* *>* *5* and *v1* *<* *10*| > We can reuse the result of expression *v* * *v* *+* *1* > * > h2. *Support subexpression elimination in JoinExec* > > |SELECT ** *** ** > FROM ** values{*}({*}{*}1{*}{*},{*} {*}1{*}{*}){*} as *t1(a, b)* > join values{*}({*}{*}1{*}{*},{*} {*}2{*}{*}){*} as *t2(x, y)*ON *b* *** *y* > between ** *2* ** and ** *3*| > We can reuse the result of expression *b* * *y* > * > h2. *Support subexpression elimination in ExpandExec* > > |*SELECT* a, count(b), > count({*}distinct{*} *case* *when* b > 1 *then* b + c *else* *null* > {*}end{*}) *as* count_bc_1, > count({*}distinct{*} *case* *when* b < 0 *then* b + c *else* *null* > {*}end{*}) *as* count_bc_2 > *FROM* {*}values{*}(1, 1, 1) *as* t(a, b, c) > *GROUP* *BY* a| > We can reuse the result of expression b + c -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org