dongjoon-hyun commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance URL: https://github.com/apache/spark/pull/26085#issuecomment-544772675 In this PR, ZSTD reduces the size like the following. If we choose `LZ4`, it seems to be a regression over the previous Apache Spark versions using Gzip. We need to distribute this file to all nodes, don't we? I guess that was the reason Apache Spark preferred `GZIP` before. ``` - Compressed Serialized MapStatus sizes: 131 MB + Compressed Serialized MapStatus sizes: 123 MB ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org