Github user Koraseg commented on the issue: https://github.com/apache/spark/pull/22894 > Practically, though, it generates a whole copy of the map at every update, so for 10 items, the implementation in the PR generates 9 copies of 1, 2, 3, ... elements, while the current one generates only 1 copy, at the end of size 10. So the proposed change is worse than the current solution. If you create a benchmark, you can see this. That is not a way how immutable persistent data structures handle updates and the scala map in particular. Moreover, as I mentioned above, it is exactly the same logic, which lies under the hood of ArrayBuffer -> Map conversion in the current implementation. I have only removed an intermediate layer. I created a benchmark with cut versions of HighlyCompressedStatus (with empty blocks bitmap and huge blocks map only) and measured deserialization performance. The proposed version has shown about 10% performance boost for different blocks configurations. You can check out the results on the repo below and repeat the test. https://github.com/Koraseg/mapstatus-benchmark
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org