Github user Koraseg commented on a diff in the pull request: https://github.com/apache/spark/pull/22894#discussion_r229790796 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -189,13 +190,12 @@ private[spark] class HighlyCompressedMapStatus private ( emptyBlocks.readExternal(in) avgSize = in.readLong() val count = in.readInt() - val hugeBlockSizesArray = mutable.ArrayBuffer[Tuple2[Int, Byte]]() + hugeBlockSizes = new util.HashMap[Int, Byte](count).asScala --- End diff -- scala.mutable.HashMap implementation does not have a way to set initial capacity out of the box. The performance gets worse, probably because of resizing hash table. scala.mutable.OpenHashMap implementation does, but it is still slower than java.util.HashMap However, if a kind of tradeoff between code cleanness and performance os needed, I would use one of the variants above.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org