Github user Koraseg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22894#discussion_r229790796
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
    @@ -189,13 +190,12 @@ private[spark] class HighlyCompressedMapStatus 
private (
         emptyBlocks.readExternal(in)
         avgSize = in.readLong()
         val count = in.readInt()
    -    val hugeBlockSizesArray = mutable.ArrayBuffer[Tuple2[Int, Byte]]()
    +    hugeBlockSizes = new util.HashMap[Int, Byte](count).asScala
    --- End diff --
    
    scala.mutable.HashMap implementation does not have a way to set initial 
capacity out of the box. The performance gets worse, probably because of 
resizing hash table. 
    
    scala.mutable.OpenHashMap implementation does, but it is still slower than 
java.util.HashMap 
    
    However, if  a kind of tradeoff between code cleanness and performance os 
needed, I would use one of the variants above.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to