[ 
https://issues.apache.org/jira/browse/SPARK-39652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567087#comment-17567087
 ] 

Yang Jie commented on SPARK-39652:
----------------------------------

For this issue, I haven't found workaround way,  We need to wait for the fix of 
[https://github.com/scala/bug/issues/12614]  and upgrade  to new Scala 2.13 
version

 

> For CoalescedRDDBenchmark, the test results using Scala 2.13 are slower than 
> Scala 2.12`, the test results using Scala 2.13 are slower than Scala 2.12
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-39652
>                 URL: https://issues.apache.org/jira/browse/SPARK-39652
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core
>    Affects Versions: 3.4.0
>            Reporter: Yang Jie
>            Priority: Major
>         Attachments: CoalescedRDDBenchmark-212-results.txt, 
> CoalescedRDDBenchmark-213-results.txt
>
>
> Typical result are as follows:
> *2.12.16*
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Coalesced RDD:                                Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ----------------------------------------------------------------------------------------------------------------------------
> Coalesce Num Partitions: 500 Num Hosts: 1               659            671    
>       19          0.2        6588.2       0.4X
> Coalesce Num Partitions: 1000 Num Hosts: 1             1111           1133    
>       25          0.1       11114.3       0.3X
> Coalesce Num Partitions: 5000 Num Hosts: 1             4573           4580    
>       12          0.0       45727.1       0.1X
> Coalesce Num Partitions: 10000 Num Hosts: 1            8949           9107    
>      240          0.0       89494.7       0.0X{code}
>  
> *2.13.8*
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Coalesced RDD:                                Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ----------------------------------------------------------------------------------------------------------------------------
> Coalesce Num Partitions: 500 Num Hosts: 1               964           1131    
>      284          0.1        9643.3       0.3X
> Coalesce Num Partitions: 1000 Num Hosts: 1             1732           1742    
>       10          0.1       17318.2       0.2X
> Coalesce Num Partitions: 5000 Num Hosts: 1             7534           7539    
>        4          0.0       75339.8       0.0X
> Coalesce Num Partitions: 10000 Num Hosts: 1           14524          14565    
>       37          0.0      145245.0       0.0X {code}
>  
> From the analysis of jfr results, the hot path is `ArrayBuffer.min`
> [https://github.com/apache/spark/blob/ded5981823ac8e8e9339291415d9828dfcc6e062/core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala#L224-L225]
>  
> {code:java}
> def getLeastGroupHash(key: String): Option[PartitionGroup] =
>   groupHash.get(key).filter(_.nonEmpty).map(_.min) {code}
>  
> then I write a new simple bench to test `ArrayBuffer.min` and 
> `ArrayBuffer.max`:
> {code:java}
> private def min(numIters: Int, bufferSize: Int, loops: Int): Unit = {
>   val benchmark = new Benchmark("Array Buffer", loops, output = output)
>   val buffer = new mutable.ArrayBuffer[Int]()
>   (0 until bufferSize).foreach(i => buffer += i)
>   benchmark.addCase(s"buffer min with $bufferSize items", numIters) { _ =>
>     (0 until loops).foreach(_ => buffer.min)
>   }
>   benchmark.addCase(s"buffer sorted head with $bufferSize items", numIters) { 
> _ =>
>     (0 until loops).foreach(_ => buffer.sorted.head)
>   }
>   benchmark.addCase(s"buffer max with $bufferSize items", numIters) { _ =>
>     (0 until loops).foreach(_ => buffer.max)
>   }
>   benchmark.addCase(s"buffer sorted last with $bufferSize items", numIters) { 
> _ =>
>     (0 until loops).foreach(_ => buffer.sorted.last)
>   }
>   benchmark.run()
> }
> override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
>   val numIters = 3
>   val loops = 20000
>   runBenchmark("ArrayBuffer min and max") {
>     Seq(1000, 10000, 100000).foreach { bufferSize =>
>       min(numIters, bufferSize, loops)
>     }
>   }
> } {code}
>  
> The result as follows:
> *Scala 2.12*
>  
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer:                             Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 1000 items                          121            121        
>    0          0.2        6043.9       1.0X
> buffer max with 1000 items                          134            134        
>    0          0.1        6705.5       0.9XOpenJDK 64-Bit Server VM 
> 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer:                             Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 10000 items                        1272           1272        
>    0          0.0       63601.9       1.0X
> buffer max with 10000 items                        1339           1340        
>    0          0.0       66972.2       0.9XOpenJDK 64-Bit Server VM 
> 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer:                             Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 100000 items                      12785          12788        
>    3          0.0      639232.4       1.0X
> buffer max with 100000 items                      13433          13434        
>    1          0.0      671654.2       1.0X {code}
>  
> *Scala 2.13*
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer:                             Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 1000 items                          273            273        
>    0          0.1       13646.5       1.0X
> buffer max with 1000 items                          162            162        
>    0          0.1        8076.9       1.7XOpenJDK 64-Bit Server VM 
> 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer:                             Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 10000 items                        1676           1676        
>    0          0.0       83782.1       1.0X
> buffer max with 10000 items                        1610           1610        
>    0          0.0       80485.5       1.0XOpenJDK 64-Bit Server VM 
> 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer:                             Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 100000 items                      16849          16856        
>   10          0.0      842469.3       1.0X
> buffer max with 100000 items                      15513          15515        
>    4          0.0      775641.0       1.1X{code}
>  
> Already submit a issue to scala:  [https://github.com/scala/bug/issues/12614] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to