[GitHub] spark issue #22894: [SPARK-25885][Core][Minor] HighlyCompressedMapStatus des...

Koraseg Wed, 31 Oct 2018 04:46:41 -0700

Github user Koraseg commented on the issue:

    https://github.com/apache/spark/pull/22894
  
    > Practically, though, it generates a whole copy of the map at every 
update, so for 10 items, the implementation in the PR generates 9 copies of 1, 
2, 3, ... elements, while the current one generates only 1 copy, at the end of 
size 10. So the proposed change is worse than the current solution. If you 
create a benchmark, you can see this.
    
    That is not a way how immutable persistent data structures handle updates 
and the scala map in particular. Moreover, as I mentioned above, it is exactly 
the same logic, which lies under the hood of ArrayBuffer -> Map conversion in 
the current implementation. I have only removed an intermediate layer. 
    
    I created a benchmark with cut versions of HighlyCompressedStatus (with 
empty blocks bitmap and huge blocks map only) and measured deserialization 
performance. The proposed version has shown about 10% performance boost for 
different blocks configurations. 
    
    You can check out the results on the repo below and repeat the test. 
    https://github.com/Koraseg/mapstatus-benchmark



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22894: [SPARK-25885][Core][Minor] HighlyCompressedMapStatus des...

Reply via email to