[ https://issues.apache.org/jira/browse/SPARK-32939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-32939: ------------------------------------ Assignee: (was: Apache Spark) > Avoid re-compute expensive expression > ------------------------------------- > > Key: SPARK-32939 > URL: https://issues.apache.org/jira/browse/SPARK-32939 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.0 > Reporter: angerszhu > Priority: Major > > {code:java} > test("Pushdown demo") { > withTable("t") { > withTempDir { loc => > sql( > s"""CREATE TABLE t(c1 INT, s STRING) PARTITIONED BY(P1 STRING) > | LOCATION '${loc.getAbsolutePath}' > |""".stripMargin) > sql( > """ > |SELECT c1, > |case > | when get_json_object(s,'$.a')=1 then "a" > | when get_json_object(s,'$.a')=2 then "b" > |end as s_type > |FROM t > |WHERE get_json_object(s,'$.a') in (1, 2) > """.stripMargin).explain(true) > } > } > } > will got plan as > == Physical Plan == > *(1) Project [c1#1, CASE WHEN (cast(get_json_object(s#2, $.a) as int) = 1) > THEN a WHEN (cast(get_json_object(s#2, $.a) as int) = 2) THEN b END AS > s_type#0] > +- *(1) Filter get_json_object(s#2, $.a) IN (1,2) > +- Scan hive default.t [c1#1, s#2], HiveTableRelation `default`.`t`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#1, s#2], [P1#3], > Statistics(sizeInBytes=8.0 EiB) > we can see that get_json_object(s#2, $.a) will be computed tree times > Always there are expensive expressions are re-computed many times in such > grammar。 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org