[GitHub] [spark] maropu edited a comment on issue #20965: [SPARK-21870][SQL] Split aggregation code into small functions

GitBox Thu, 05 Sep 2019 14:48:49 -0700

maropu edited a comment on issue #20965: [SPARK-21870][SQL] Split aggregation 
code into small functions
URL: https://github.com/apache/spark/pull/20965#issuecomment-528350052
 
 
   I checked the TPCDS performance numbers (sf=5) and I couldn't find much 
difference with/without this pr: 
https://docs.google.com/spreadsheets/d/10eCV0PHeAaWGaXwpKPDDYCSLzsogl4IIiGiitqvw8PY/edit?usp=sharing
   Rather, the numbers in q66 didn't change between them. Probably, the other 
PRs already have improved it (I didn't dig into it though).
   
   To make sure this pr is still beneficial, I run microbenchmarks below;
   ```
   $ ./bin/spark-shell --master=local[1] --conf spark.driver.memory=8g --conf 
spark.sql.shuffle.partitions=1 -v
   
   scala> val numCols = 50
   scala> val colExprs = (0 until numCols).map { i => s"id AS _c$i" }
   scala> spark.range(5000000).selectExpr(colExprs: 
_*).createOrReplaceTempView("t")
   
   scala> val aggExprs = (0 until numCols).map { i => s"AVG(_c$i)" }
   
   scala> sql("SET spark.sql.codegen.aggregate.splitAggregateFunc.enabled=true")
   scala> timer { sql(s"SELECT ${aggExprs.mkString(", ")} FROM 
t").write.format("noop").save() }
   Elapsed time: 0.997808995s                                                   
   
   
   scala> sql("SET 
spark.sql.codegen.aggregate.splitAggregateFunc.enabled=false")
   scala> timer { sql(s"SELECT ${aggExprs.mkString(", ")} FROM 
t").write.format("noop").save() }
   Elapsed time: 25.77200574s  
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu edited a comment on issue #20965: [SPARK-21870][SQL] Split aggregation code into small functions

Reply via email to