[GitHub] spark issue #18174: [SPARK-20950][CORE]Improve diskWriteBufferSize configura...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/18174 It would be also great to update the explicit performance tuning result in PR description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18174: [SPARK-20950][CORE]Improve diskWriteBufferSize configura...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18174 the change looks reasonable, but you really need to update your PR description. You are not improving some config, but introducing a new config which is hard coded before. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18174: [SPARK-20950][CORE]Improve diskWriteBufferSize configura...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18174 cc @jiangxb1987 @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18174: [SPARK-20950][CORE]Improve diskWriteBufferSize configura...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18174 @gatorsmile Can you review this code. thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18174: [SPARK-20950][CORE]Improve diskWriteBufferSize configura...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18174 @srowen yes, you're right, It's time, and their unit is MS. the numbers Is the average time of 10 times running` forceSorterToSpill`. I assume big buffer copies time consuming longer than small buffer. Although the small buffer has been copied many times. or local file systems write big buffer that time consuming longer than small buffer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18174: [SPARK-20950][CORE]Improve diskWriteBufferSize configura...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18174 There's no description of your test or what the numbers mean. I assume they're times. Why would a smaller buffer be faster? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18174: [SPARK-20950][CORE]Improve diskWriteBufferSize configura...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18174 @srowen thanks for review it. In our performance tuning, find the row of record the size of more than 2M. so need to initialSerBufferSize configurable. but Change `initialSerBufferSize ` is not good for performance tuning. However, change spill `diskWriteBufferSize ` is good for performance tuning. So I did a little experiment, change the size of the diskWriteBufferSize to test. set diskWriteBufferSize to 1M, 512K, 256k, 256K, 128K, 64K,etc. diskWriteBufferSize `1M512K256K128K64K32K 16K8K 4K` RecordSize:2.5M `742 722 694686 667 668 671669683` RecordSize:1M`294 293 292287 283 285281 279285` In order to eliminate the interference of other factors, these results are tested take the average of 10 times. please review code again. thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org