panbingkun commented on PR #48237: URL: https://github.com/apache/spark/pull/48237#issuecomment-2373414681
## size(map_from_entries(array(...))) ### Benchmark code: ```scala object SizeBenchmark extends SqlBasedBenchmark { private val N = 10_000_00 private val M = 100 private val path = "/Users/panbingkun/Developer/spark/spark-community/SizeBenchmark" private val df = spark.range(N).to(new StructType().add("id", "int")). withColumn("id1", col("id") + 1). withColumn("id2", col("id") + 2). withColumn("id3", col("id") + 3). withColumn("id4", col("id") + 4). withColumn("id5", col("id") + 5) df.write.parquet(path) private val table = spark.read.parquet(path) private def doBenchmark(): Unit = { table.selectExpr("size(map_from_entries(array(struct(id, id3))))").noop() } override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { runBenchmark("size") { val benchmark = new Benchmark("size", N, output = output) benchmark.addCase("optimize", M) { _ => doBenchmark() } benchmark.run() } } } ``` ### Benchmark Result: #### Before ```shell Running benchmark: size Running case: optimize Stopped after 100 iterations, 12723 ms OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0 Apple M2 size: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ optimize 105 127 19 9.5 104.9 1.0X Running benchmark: size Running case: optimize Stopped after 100 iterations, 13554 ms OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0 Apple M2 size: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ optimize 122 136 9 8.2 121.8 1.0X Running benchmark: size Running case: optimize Stopped after 100 iterations, 12055 ms OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0 Apple M2 size: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ optimize 105 121 12 9.5 105.3 1.0X ``` #### After ```shell Running benchmark: size Running case: optimize Stopped after 100 iterations, 3246 ms OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0 Apple M2 size: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ optimize 22 32 8 46.1 21.7 1.0X Running benchmark: size Running case: optimize Stopped after 100 iterations, 3312 ms OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0 Apple M2 size: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ optimize 23 33 18 42.7 23.4 1.0X Running benchmark: size Running case: optimize Stopped after 100 iterations, 3236 ms OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0 Apple M2 size: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ optimize 20 32 15 48.9 20.4 1.0X ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org