maropu edited a comment on issue #20965: [SPARK-21870][SQL] Split aggregation code into small functions URL: https://github.com/apache/spark/pull/20965#issuecomment-528350052 I checked the TPCDS performance numbers (sf=5) and I couldn't find much difference with/without this pr: https://docs.google.com/spreadsheets/d/10eCV0PHeAaWGaXwpKPDDYCSLzsogl4IIiGiitqvw8PY/edit?usp=sharing Rather, the numbers in q66 didn't change between them. Probably, the other PRs already have improved it (I didn't dig into it though). To make sure this pr is still beneficial, I run microbenchmarks below; ``` $ ./bin/spark-shell --master=local[1] --conf spark.driver.memory=8g --conf spark.sql.shuffle.partitions=1 -v scala> val numCols = 50 scala> val colExprs = (0 until numCols).map { i => s"id AS _c$i" } scala> spark.range(5000000).selectExpr(colExprs: _*).createOrReplaceTempView("t") scala> val aggExprs = (0 until numCols).map { i => s"AVG(_c$i)" } scala> sql("SET spark.sql.codegen.aggregate.splitAggregateFunc.enabled=true") scala> timer { sql(s"SELECT ${aggExprs.mkString(", ")} FROM t").write.format("noop").save() } Elapsed time: 0.997808995s scala> sql("SET spark.sql.codegen.aggregate.splitAggregateFunc.enabled=false") scala> timer { sql(s"SELECT ${aggExprs.mkString(", ")} FROM t").write.format("noop").save() } Elapsed time: 25.77200574s ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org