[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18031 **[Test build #77072 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77072/testReport)** for PR 18031 at commit [`bfea9f5`](https://github.com/apache/spark/commit/bfea9f59fd7587b87de0ddb4601f76786671f38a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 @HyukjinKwon Thank you so much ! Really helpful ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18031 **[Test build #77070 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77070/testReport)** for PR 18031 at commit [`970421b`](https://github.com/apache/spark/commit/970421b2a5cb2278d60403f72dc165418e4faf87). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18031 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77070/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18031 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18031 **[Test build #77070 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77070/testReport)** for PR 18031 at commit [`970421b`](https://github.com/apache/spark/commit/970421b2a5cb2278d60403f72dc165418e4faf87). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18031 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18031 **[Test build #77069 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77069/testReport)** for PR 18031 at commit [`f6670d8`](https://github.com/apache/spark/commit/f6670d8a77be52ce510e26d326b9b57c7ff4cc9b). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18031 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77069/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18031 **[Test build #77069 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77069/testReport)** for PR 18031 at commit [`f6670d8`](https://github.com/apache/spark/commit/f6670d8a77be52ce510e26d326b9b57c7ff4cc9b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18031 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77058/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18031 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18031 **[Test build #77058 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77058/testReport)** for PR 18031 at commit [`d5b8a21`](https://github.com/apache/spark/commit/d5b8a211affc3cbe26c1920efe54b8407c863ada). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 Gentle ping to @JoshRosen @cloud-fan @mridulm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18031 **[Test build #77058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77058/testReport)** for PR 18031 at commit [`d5b8a21`](https://github.com/apache/spark/commit/d5b8a211affc3cbe26c1920efe54b8407c863ada). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18031 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18031 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77056/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18031 **[Test build #77056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77056/testReport)** for PR 18031 at commit [`d5b8a21`](https://github.com/apache/spark/commit/d5b8a211affc3cbe26c1920efe54b8407c863ada). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 I try to give user a way to control the memory strictly and no blocks are underestimated(setting spark.shuffle.accurateBlockThreshold=0 and spark.shuffle.accurateBlockThresholdByTimesAverage=1). I'm a little bit hesitant to remove the huge blocks from the numerator in that calculation for average size. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 To resolve the comments in https://github.com/apache/spark/pull/16989 : >minimum size before we consider something a large block : if average is 10kb, and some blocks are > 20kb, spilling them to disk would be highly suboptimal. >One edge-case to consider is the situation where every shuffle block is just over this threshold: in this case HighlyCompressedMapStatus won't really be doing any compression. I propose two configs: `spark.shuffle.accurateBlockThreshold` and `spark.shuffle.accurateBlockThresholdByTimesAverage` , sizes of blocks above both will be record accurately. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18031 **[Test build #77056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77056/testReport)** for PR 18031 at commit [`d5b8a21`](https://github.com/apache/spark/commit/d5b8a211affc3cbe26c1920efe54b8407c863ada). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org