[ 
https://issues.apache.org/jira/browse/SPARK-36763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418386#comment-17418386
 ] 

Yuming Wang commented on SPARK-36763:
-------------------------------------

Benchmark and benchmark result:
{code:scala}
import org.apache.spark.benchmark.Benchmark
val numRows = 1024 * 1024 * 10
spark.sql(s"CREATE TABLE t1 using parquet AS select id AS a, id AS b FROM 
range(${numRows}L)")
val benchmark = new Benchmark("Benchmark pull out ordering expressions", 
numRows, minNumIters = 5)

Seq(false, true).foreach { pullOutEnabled =>
  val name = s"Pull out ordering expressions ${if (pullOutEnabled) "(Enabled)" 
else "(Disabled)"}"
  benchmark.addCase(name) { _ =>
    withSQLConf("spark.sql.pullOutOrderingExpressions" -> s"$pullOutEnabled") {
      spark.sql("SELECT t1.* FROM t1 ORDER BY translate(t1.a, '123', 
'abc')").write.format("noop").mode("Overwrite").save()
    }
  }
}
benchmark.run()
{code}
{noformat}
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.7
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Benchmark pull out ordering expressions:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Pull out ordering expressions (Disabled)           9232           9753         
867          1.1         880.4       1.0X
Pull out ordering expressions (Enabled)            7084           7462         
370          1.5         675.5       1.3X
{noformat}

> Pull out ordering expressions
> -----------------------------
>
>                 Key: SPARK-36763
>                 URL: https://issues.apache.org/jira/browse/SPARK-36763
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Yuming Wang
>            Priority: Major
>
> Similar to 
> [PullOutGroupingExpressions|https://github.com/apache/spark/blob/7fd3f8f9ec55b364525407213ba1c631705686c5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PullOutGroupingExpressions.scala#L48].
>  We can pull out ordering expressions to improve order performance. For 
> example:
> {code:scala}
> sql("create table t1(a int, b int) using parquet")
> sql("insert into t1 values (1, 2)")
> sql("insert into t1 values (3, 4)")
> sql("select * from t1 order by a - b").explain
> {code}
> {noformat}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- Sort [(a#12 - b#13) ASC NULLS FIRST], true, 0
>    +- Exchange rangepartitioning((a#12 - b#13) ASC NULLS FIRST, 5), 
> ENSURE_REQUIREMENTS, [id=#39]
>       +- FileScan parquet default.t1[a#12,b#13]
> {noformat}
> The {{Subtract}} will be evaluated 4 times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to