[GitHub] [spark] dongjoon-hyun commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance
dongjoon-hyun commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance URL: https://github.com/apache/spark/pull/26085#issuecomment-545034306 Got it. Thanks, @tgravescs . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance
dongjoon-hyun commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance URL: https://github.com/apache/spark/pull/26085#issuecomment-544772675 In this PR, ZSTD reduces the size like the following. If we choose `LZ4`, it seems to be a regression over the previous Apache Spark versions using Gzip. We need to distribute this file to all nodes, don't we? I guess that was the reason Apache Spark preferred `GZIP` before. ``` - Compressed Serialized MapStatus sizes: 131 MB + Compressed Serialized MapStatus sizes: 123 MB ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance
dongjoon-hyun commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance URL: https://github.com/apache/spark/pull/26085#issuecomment-544590037 No. I don't think it's tried. To make it sure, let's ping @dbtsai . :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance
dongjoon-hyun commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance URL: https://github.com/apache/spark/pull/26085#issuecomment-544560483 That sounds like a reasonable idea. Could you make a JIRA and a PR for the configuration, @tgravescs ? I can help you the benchmark. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance
dongjoon-hyun commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance URL: https://github.com/apache/spark/pull/26085#issuecomment-544291723 Thank you, @dbtsai , @tgravescs , @viirya , @MaxGekk , @advancedxy . Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance
dongjoon-hyun commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance URL: https://github.com/apache/spark/pull/26085#issuecomment-543979549 Hi, @dbtsai . I created a PR to your branch. Could you review and merge the updated benchmark result? - https://github.com/dbtsai/spark/pull/7 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26085: [SPARK-29434] [Core] Improve the MapStatuses Serialization Performance
dongjoon-hyun commented on issue #26085: [SPARK-29434] [Core] Improve the MapStatuses Serialization Performance URL: https://github.com/apache/spark/pull/26085#issuecomment-540880356 Thank you for pinging me, @dbtsai . I'll take a look tomorrow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org