[ https://issues.apache.org/jira/browse/SPARK-39652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567087#comment-17567087 ]
Yang Jie commented on SPARK-39652: ---------------------------------- For this issue, I haven't found workaround way, We need to wait for the fix of [https://github.com/scala/bug/issues/12614] and upgrade to new Scala 2.13 version > For CoalescedRDDBenchmark, the test results using Scala 2.13 are slower than > Scala 2.12`, the test results using Scala 2.13 are slower than Scala 2.12 > ------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: SPARK-39652 > URL: https://issues.apache.org/jira/browse/SPARK-39652 > Project: Spark > Issue Type: Sub-task > Components: Spark Core > Affects Versions: 3.4.0 > Reporter: Yang Jie > Priority: Major > Attachments: CoalescedRDDBenchmark-212-results.txt, > CoalescedRDDBenchmark-213-results.txt > > > Typical result are as follows: > *2.12.16* > {code:java} > OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure > Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz > Coalesced RDD: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ---------------------------------------------------------------------------------------------------------------------------- > Coalesce Num Partitions: 500 Num Hosts: 1 659 671 > 19 0.2 6588.2 0.4X > Coalesce Num Partitions: 1000 Num Hosts: 1 1111 1133 > 25 0.1 11114.3 0.3X > Coalesce Num Partitions: 5000 Num Hosts: 1 4573 4580 > 12 0.0 45727.1 0.1X > Coalesce Num Partitions: 10000 Num Hosts: 1 8949 9107 > 240 0.0 89494.7 0.0X{code} > > *2.13.8* > {code:java} > OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure > Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz > Coalesced RDD: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ---------------------------------------------------------------------------------------------------------------------------- > Coalesce Num Partitions: 500 Num Hosts: 1 964 1131 > 284 0.1 9643.3 0.3X > Coalesce Num Partitions: 1000 Num Hosts: 1 1732 1742 > 10 0.1 17318.2 0.2X > Coalesce Num Partitions: 5000 Num Hosts: 1 7534 7539 > 4 0.0 75339.8 0.0X > Coalesce Num Partitions: 10000 Num Hosts: 1 14524 14565 > 37 0.0 145245.0 0.0X {code} > > From the analysis of jfr results, the hot path is `ArrayBuffer.min` > [https://github.com/apache/spark/blob/ded5981823ac8e8e9339291415d9828dfcc6e062/core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala#L224-L225] > > {code:java} > def getLeastGroupHash(key: String): Option[PartitionGroup] = > groupHash.get(key).filter(_.nonEmpty).map(_.min) {code} > > then I write a new simple bench to test `ArrayBuffer.min` and > `ArrayBuffer.max`: > {code:java} > private def min(numIters: Int, bufferSize: Int, loops: Int): Unit = { > val benchmark = new Benchmark("Array Buffer", loops, output = output) > val buffer = new mutable.ArrayBuffer[Int]() > (0 until bufferSize).foreach(i => buffer += i) > benchmark.addCase(s"buffer min with $bufferSize items", numIters) { _ => > (0 until loops).foreach(_ => buffer.min) > } > benchmark.addCase(s"buffer sorted head with $bufferSize items", numIters) { > _ => > (0 until loops).foreach(_ => buffer.sorted.head) > } > benchmark.addCase(s"buffer max with $bufferSize items", numIters) { _ => > (0 until loops).foreach(_ => buffer.max) > } > benchmark.addCase(s"buffer sorted last with $bufferSize items", numIters) { > _ => > (0 until loops).foreach(_ => buffer.sorted.last) > } > benchmark.run() > } > override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { > val numIters = 3 > val loops = 20000 > runBenchmark("ArrayBuffer min and max") { > Seq(1000, 10000, 100000).foreach { bufferSize => > min(numIters, bufferSize, loops) > } > } > } {code} > > The result as follows: > *Scala 2.12* > > {code:java} > OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure > Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz > Array Buffer: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------------------------------ > buffer min with 1000 items 121 121 > 0 0.2 6043.9 1.0X > buffer max with 1000 items 134 134 > 0 0.1 6705.5 0.9XOpenJDK 64-Bit Server VM > 1.8.0_332-b09 on Linux 5.13.0-1031-azure > Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz > Array Buffer: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------------------------------ > buffer min with 10000 items 1272 1272 > 0 0.0 63601.9 1.0X > buffer max with 10000 items 1339 1340 > 0 0.0 66972.2 0.9XOpenJDK 64-Bit Server VM > 1.8.0_332-b09 on Linux 5.13.0-1031-azure > Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz > Array Buffer: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------------------------------ > buffer min with 100000 items 12785 12788 > 3 0.0 639232.4 1.0X > buffer max with 100000 items 13433 13434 > 1 0.0 671654.2 1.0X {code} > > *Scala 2.13* > {code:java} > OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure > Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz > Array Buffer: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------------------------------ > buffer min with 1000 items 273 273 > 0 0.1 13646.5 1.0X > buffer max with 1000 items 162 162 > 0 0.1 8076.9 1.7XOpenJDK 64-Bit Server VM > 1.8.0_332-b09 on Linux 5.13.0-1031-azure > Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz > Array Buffer: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------------------------------ > buffer min with 10000 items 1676 1676 > 0 0.0 83782.1 1.0X > buffer max with 10000 items 1610 1610 > 0 0.0 80485.5 1.0XOpenJDK 64-Bit Server VM > 1.8.0_332-b09 on Linux 5.13.0-1031-azure > Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz > Array Buffer: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------------------------------ > buffer min with 100000 items 16849 16856 > 10 0.0 842469.3 1.0X > buffer max with 100000 items 15513 15515 > 4 0.0 775641.0 1.1X{code} > > Already submit a issue to scala: [https://github.com/scala/bug/issues/12614] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org