panbingkun commented on PR #48237: URL: https://github.com/apache/spark/pull/48237#issuecomment-2373403603
## size(map_from_arrays(...)) ### Benchmark code: ```scala object SizeBenchmark extends SqlBasedBenchmark { private val N = 10_000_00 private val M = 100 private val path = "/Users/panbingkun/Developer/spark/spark-community/SizeBenchmark" private val df = spark.range(N).to(new StructType().add("id", "int")). withColumn("id1", col("id") + 1). withColumn("id2", col("id") + 2). withColumn("id3", col("id") + 3). withColumn("id4", col("id") + 4). withColumn("id5", col("id") + 5) df.write.parquet(path) private val table = spark.read.parquet(path) private def doBenchmark(): Unit = { table.selectExpr("size(map_from_arrays(array(id, id1, id2), array(id3, id4, id5)))").noop() } override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { runBenchmark("size") { val benchmark = new Benchmark("size", N, output = output) benchmark.addCase("optimize", M) { _ => doBenchmark() } benchmark.run() } } } ``` ### Result #### Before ```shell Running benchmark: size Running case: optimize Stopped after 100 iterations, 15653 ms OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0 Apple M2 size: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ optimize 142 157 8 7.0 142.0 1.0X Running benchmark: size Running case: optimize Stopped after 100 iterations, 17672 ms OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0 Apple M2 size: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ optimize 160 177 25 6.3 159.9 1.0X Running benchmark: size Running case: optimize Stopped after 100 iterations, 15140 ms OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0 Apple M2 size: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ optimize 141 151 13 7.1 140.6 1.0X ``` #### After ```shell After: Running benchmark: size Running case: optimize Stopped after 100 iterations, 3923 ms OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0 Apple M2 size: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ optimize 24 39 13 42.4 23.6 1.0X Running benchmark: size Running case: optimize Stopped after 100 iterations, 3778 ms OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0 Apple M2 size: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ optimize 31 38 7 32.1 31.2 1.0X Running benchmark: size Running case: optimize Stopped after 100 iterations, 3040 ms OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0 Apple M2 size: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ optimize 23 30 7 42.8 23.4 1.0X ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org