mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal buffer for long sum URL: https://github.com/apache/spark/pull/25347#issuecomment-525314313 @cloud-fan unfortunately the performance overhead is very significant. I tried and run the benchmarks in both modes. Here you can see the code: ``` runBenchmark("aggregate without grouping") { val N = 500L << 22 withSQLConf(SQLConf.SUM_DECIMAL_BUFFER_FOR_LONG.key -> "false") { codegenBenchmark("agg w/o group long buffer", N) { spark.range(N).selectExpr("sum(id)").collect() } } withSQLConf(SQLConf.SUM_DECIMAL_BUFFER_FOR_LONG.key -> "true") { codegenBenchmark("agg w/o group decimal buffer", N) { spark.range(N).selectExpr("sum(id)").collect() } } } ``` and here it is the output (as you can see, the overhead is more than 10x on a simple sum of longs): `` [info] Running benchmark: agg w/o group long buffer [info] Running case: agg w/o group long buffer wholestage off [info] Stopped after 2 iterations, 105407 ms [info] Running case: agg w/o group long buffer wholestage on [info] Stopped after 5 iterations, 6282 ms [info] [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Mac OS X 10.13.6 [info] Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz [info] agg w/o group long buffer: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] agg w/o group long buffer wholestage off 48538 52704 NaN 43,2 23,1 1,0X [info] agg w/o group long buffer wholestage on 1231 1257 28 1703,6 0,6 39,4X [info] [info] Running benchmark: agg w/o group decimal buffer [info] Running case: agg w/o group decimal buffer wholestage off [info] Stopped after 2 iterations, 1276890 ms [info] Running case: agg w/o group decimal buffer wholestage on [info] Stopped after 5 iterations, 496100 ms [info] [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Mac OS X 10.13.6 [info] Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz [info] agg w/o group decimal buffer: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] agg w/o group decimal buffer wholestage off 633585 638445 NaN 3,3 302,1 1,0X [info] agg w/o group decimal buffer wholestage on 92037 99220 NaN 22,8 43,9 6,9X ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org