[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/16620 Thanks all for the work on this! I've merged this into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/16620 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 @kayousterhout @squito @markhamstra Thanks for all of your work for this patch. Really appreciate your help : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/16620 LGTM! Thanks for finding this subtle bug and all of the hard work to fix it @jinxing64. I'll wait until tomorrow to merge this to give Mark and Imran a chance for any last comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72974/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72974 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72974/testReport)** for PR 16620 at commit [`6809d1f`](https://github.com/apache/spark/commit/6809d1ff5d09693e961087da35c8f6b3b50fe53c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 Yes, refined : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72974 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72974/testReport)** for PR 16620 at commit [`6809d1f`](https://github.com/apache/spark/commit/6809d1ff5d09693e961087da35c8f6b3b50fe53c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/16620 LGTM pending one last comment improvement --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72913/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72913 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72913/testReport)** for PR 16620 at commit [`d225565`](https://github.com/apache/spark/commit/d2255654b1f6ae43ba47c0ffcec0e6adc4beed82). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72912/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72912 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72912/testReport)** for PR 16620 at commit [`e34cd85`](https://github.com/apache/spark/commit/e34cd85424d88daf96d0252d34f2ce28b956ddde). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72913/testReport)** for PR 16620 at commit [`d225565`](https://github.com/apache/spark/commit/d2255654b1f6ae43ba47c0ffcec0e6adc4beed82). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 @squito Thanks a lot. I've refined the comment, please take another look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72912/testReport)** for PR 16620 at commit [`e34cd85`](https://github.com/apache/spark/commit/e34cd85424d88daf96d0252d34f2ce28b956ddde). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 @kayousterhout I've refined accordingly, please take another look : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72849/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72849 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72849/testReport)** for PR 16620 at commit [`ab8d13e`](https://github.com/apache/spark/commit/ab8d13efaf12182517d3b311d74b2f0a8d2fbef8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72849/testReport)** for PR 16620 at commit [`ab8d13e`](https://github.com/apache/spark/commit/ab8d13efaf12182517d3b311d74b2f0a8d2fbef8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72797/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72797 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72797/testReport)** for PR 16620 at commit [`46ef5a3`](https://github.com/apache/spark/commit/46ef5a369902ce2ca8c0dfde64b973647f5fffeb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72797 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72797/testReport)** for PR 16620 at commit [`46ef5a3`](https://github.com/apache/spark/commit/46ef5a369902ce2ca8c0dfde64b973647f5fffeb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72776/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72776 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72776/testReport)** for PR 16620 at commit [`3a5d60d`](https://github.com/apache/spark/commit/3a5d60d74b8e37966a859d5d02b74aefb7cbee4f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 @kayousterhout Thanks a lot for the clear explanation. It makes great sense to me and help me understand the logic a lot. Also I think the way of testing is very good and make the code very clear. I've already refined this pr, please take a look when tests pass. Also with understanding of your explanation above in >Scenario A (performance optimization, as discussed here already): This happens if a ShuffleMapStage gets re-run (e.g., because the first time it ran, it encountered a fetch failure, so the previous stage needed to be re-run to generate the missing output). ... I made #16901 to add a test that success of old attempt should be taken as valid and corresponding pending partition should be removed. Please give a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/16620 Thanks for all the investigation and the write up, @kayousterhout This makes good sense to me, and should take us a long way toward both fixing the immediate bug and improving the code. We should also make sure that our intentions and understanding get preserved in documentation that is more obvious and accessible in the future than PR discussion threads. Probably more comments in the source code that cover the essence of your "very long write up", but maybe we should consider creating an external documentation page (wiki or something) that covers in long form what we know and intend; then we can scale down the in-code comments to a shorter form that includes pointers to the long form. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/16620 Also, if you implement the new change I proposed, I think it's relatively straightforward to write a new test in DAGSchedulerSuite for the new behavior (which will be pretty similar to the test I modified in #16892). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/16620 tl;dr I donât think Markâs change is quite correct, which is why the tests were failing. Instead, I think we need to replace the failedEpoch if/else statement and the pendingPartitions update in DAGScheduler.handleTaskCompletion with: `if (stageIdToStage(task.stageId).latestInfo.attemptId == task.stageAttemptId) {⨠// This task was for the currently running attempt of the stage. Since the task ⨠// completed successfully from the perspective of the TaskSetManager, mark it as ⨠// no longer pending (the TaskSetManager may consider the task complete even // when the output needs to be ignored because the task's epoch is too small below). ⨠shuffleStage.pendingPartitions -= task.partitionId â¨} â¨â¨if (failedEpoch.contains(execId) && smt.epoch <= failedEpoch(execId)) { ⨠logInfo(s"Ignoring possibly bogus $smt completion from executor $execId")⨠} else { ⨠// The epoch of the task is acceptable (i.e., the task was launched after the most⨠// recent failure we're aware of for the executor), so mark the task's output as ⨠// available. ⨠shuffleStage.addOutputLoc(smt.partitionId, status)⨠// Remove the task's partition from pending partitions. This may have already been ⨠// done above, but will not have been done yet in cases where the task attempt was⨠// from an earlier attempt of the stage (i.e., not the attempt that's currently ⨠// running). This allows the DAGScheduler to mark the stage as complete when one⨠// copy of each task has finished successfully, even if the currently active stage ⨠// still has tasks running.⨠shuffleStage.pendingPartitions -= task.partitionIdâ¨} ` I submitted #16892 to attempt to clarify the test case where Markâs change originally failed (this PR shouldn't block on that -- that's just to clarify things for ourselves in the future), and also wrote a very long write up of whatâs going on below. âââââ There are three relevant pieces of state to consider here: (1) The tasks that the TaskSetManager (TSM) considers currently pending. The TSM encodes these pending tasks in its âsuccessfulâ array. When a task set is launched, all of its tasks are considered pending, and all of the entries in the successful array are False. Tasks are no longer considered pending (and are marked as True in the âsuccessfulâ array) if either (a) a copy of the task finishes successfully or (b) a copy of the task fails with a fetch failed (in which case the TSM assumes that the task will never complete successfully, because the previous stage needs to be re-run). Additionally, a task that previously completed successfully can be re-marked as pending if the stage is a shuffle map stage, and the executor where the task ran died (this is because the map output needs to be re-generated, and the TSM will re-schedule the task). The TSM notifies the DAGScheduler that the stage has completed if either (a) the stage fails (e.g., thereâs a fetch failure) or (b) all of the entries in âsuccessfulâ are true (i.e., there are no more pending tasks). (2) ShuffleMapStage.pendingPartitions. This variable is used by the DAGScheduler to track the pending tasks for a stage, and mostly is consistent with the TSMâs pending tasks (described above). When a stage begins, the DAGScheduler marks all of the partitions that need to be computed as pending, and then removes them from pendingPartitions as the TSM notifies the DAGScheduler that tasks have successfully completed. When a TSM determines that a task needs to be re-run (because itâs a shuffle map task that ran on a now-dead executor), the TSM sends a Resubmitted task completion event to the DAGScheduler, which causes the DAGScheduler to re-add the task to pendingPartitions (in doing so, the DAGScheduler is keeping pendingPartitions consistent with the TSMâs pending tasks). I believe there are two scenarios (currently) where ShuffleMapStage.pendingPartitions and the TSMâs pending tasks become inconsistent: -Scenario A (performance optimization, as discussed here already): This happens if a ShuffleMapStage gets re-run (e.g., because the first time it ran, it encountered a fetch failure, so the previous stage needed to be re-run to generate the missing output). Call the original attempt #0 and the currently running attempt #1. If thereâs a task from attempt #0 thatâs still running, and it is running on an executor that *was not* marked as failed (this is the condition captured by the failedEpoch if-statement), and it completes successfully, this event will be handled by the TSM for attempt #0. When the DAGScheduler hears that the task completed
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/16620 I spent a long time looking at this and I think @markhamsta's solution is the way to go, and that we should update the tests to address the failures (I think the two tests that fail are actually partially testing for incorrect behavior, which is why they're failing). I'll post a longer writeup tomorrow midday, but wanted to provide a quick status update @jinxing64 so that you know progress is being made here! I'll also merge #16876 tomorrow, assuming tests pass, which will help slightly with implementing Mark's suggestion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 @markhamstra @squito @kayousterhout It would be great if you can give more comments about above and I can continue working on this : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 As @squito mentioned: >Before this, the DAGScheduler didn't really know anything about taskSetManagers. (In its current form, this pr uses a "leaked" handle via rootPool.getSortedTaskSetQueue). Is adding it here a mistake? An alternative would be to add a method to TaskScheduler like markTaskSetsForStageAsZombie(stageId: Int). But that is still basically exposing the idea of "zombie" tasksets to the dagscheduler, I dunno if its actually any cleaner. I think this a cleaner and simpler way for fixing this bug. And we can avoid adding TSM info to the DAGScheduler. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 @kayousterhout @squito @markhamstra Thanks a lot for reviewing this pr thus far. I do think the approach, which throws away task results from earlier attempts that were running on executors that failed and take `stage.pendingPartitions` as an exact mirror(in reverse) of the output locations for the state, can really fix this bug and make the code quite clear. But the understanding I have previously about `stage.pendingPartitions` is a little bit different, as commented in `Stage` as below: ``` /** * Partitions the [[DAGScheduler]] is waiting on before it tries to mark the stage / job as * completed and continue. Tasks' successes in both the active taskset or earlier attempts * for this stage can cause partition ids get removed from pendingPartitions. Finally, note * that when this is empty, it does not necessarily mean that stage is completed -- Some of * the map output from that stage may have been lost. But the [[DAGScheduler]] will check for * this condition and resubmit the stage if necessary. */ ``` All tasks' success can result in partition get removed `pendingPartitions`, no matter it is from a valid executor or a failed one. Thus when the `pendingPartitions` becomes empty, we can check if the stage's output locations are all available, if not we resubmit. If we take `stage.pendingPartitions` as an exact mirror(in reverse) of the output locations. Some unit tests can not pass in DAGSchedulerSuite(e.g. `("run trivial shuffle with out-of-band failure and retry"`). Think about below: 1. A stage have ShuffleMapTask1 and ShuffleMapTask2, `pendingPartitions`=(0, 1) 2. ShuffleMapTask1 succeeded on executorA and returned to driver, pendingPartitions=(1) 3. ShuffleMapTask2 succeeded on executorA; 4. Driver heard executorA is lost; 5. ShuffleMapTask2's success returned to driver, still `pendingPartitions`=(1) and the stage cannot get rescheduled. In my understanding, `pendingPartitions` helps us to track running of `TaskSetManager` and know if there is still tasks coming on the way and deserve waiting, and decide when to check if the output locations are all available and whether to resubmit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/16620 @kayousterhout yes, I also looked at duplicating `stage.pendingPartitions -= task.partitionId`. I could live with that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/16620 After thinking about this for a few more minutes, I'm going to retract my earlier statement about preferring my approach to yours. I think we can file a JIRA for the bigger problem of inconsistent state between the different components -- but no reason to force this PR to fix that bigger scheduler issue. Your approach (or the alternative I proposed immediately above) surgically fix the problem and I think it's good to merge that bug-fix separately from a more significant re-thinking of the logic here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/16620 BTW Mark one slightly different version of your suggestion I'd considered is: (1) move stage.pendingPartitions -= task.partitionId so that it's duplicated in each of the two case statements below (2) for the ResultTask case, removing the partition can happen right at the beginning (3) for the ShuffleMapTask case, removing the partition can happen in the else statement on line 1196, where addOutputLoc is called. One benefit of that approach is that it makes it a little more obvious which state is related: that the pendingPartitions should mirror (in reverse) the output locations for the state. It also consolidates the logic for handling previously failed executors into the one location. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/16620 @kayousterhout don't overestimate my enthusiasm for my own suggestion. I'm really just thinking aloud in search of a solution, and I agree with you that the TaskSetManager and DAGScheduler being in disagreement is not good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/16620 @markhamstra I prefer that approach to the approach in the existing PR, but I still have some hesitation about that because of the inconsistencies between the TaskSetManager (which still thinks tasks are running) and DAGScheduler (which thinks the stage is done), as mentioned in my comment above. It sounds like everyone else prefers that approach though -- perhaps we can at least add some better commenting so future readers of the code know the DAGSched and TSM will have different views of the world and that listeners may get duplicate stage completed messages as a result? Another argument for your approach is that it's *no worse* than the current code, and is the smallest change (I think) that can fix the bug. We can fix the larger issues in a separate PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/16620 The way that I am thinking about this right now is that @kayousterhout is on the right track with the early return at https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1141 , but that her proposed `stage...attemptId != task.stageAttemptId` is broader than it needs to be. My idea is that we want to be throwing away task results from earlier attempts that were run on executors that failed (on the presumption that one fetch failure means that other fetches from there are also going to fail), but that if the executor didn't fail, then the outputs from earlier attempts of tasks that complete late but successfully on still-good executors should still be valid and available, so we should accept them as though they were successful task completions for the current attempt. What you end up with is that if-statement now looking like: ```scala val stageHasBeenCancelled = !stageIdToStage.contains(task.stageId) val shuffleMapTaskIsFromFailedExecutor = task match { case smt: ShuffleMapTask => val status = event.result.asInstanceOf[MapStatus] val execId = status.location.executorId failedEpoch.contains(execId) && smt.epoch <= failedEpoch(execId) case _ => false } if (stageHasBeenCancelled || shuffleMapTaskIsFromFailedExecutor) { return } ``` ...and then the `failedEpoch.contains(execId) && smt.epoch <= failedEpoch(execId)` check can be removed from `case smt: ShuffleMapTask =>`. If we can do it cleanly, I think we should be avoiding re-running Tasks that complete successfully and should still be available. This is a bit different from the intent of SPARK-14649, which I am reading as an effort not to ignore the results of long-running tasks that start and eventually complete on an executor on which some other tasks actually run into fetch failures. I'm really only trying to preserve the results of successful tasks run on executors that haven't failed. Unfortunately, the DAGSchedulerSuite doesn't agree with my intentions, because the above change actually leads to multiple test failures. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/16620 @squito and @jinxing64 You're right -- with the existing code, if a task from an old attempt succeeded *and* didn't run on an executor where things already failed, the DAGScheduler will count the result (just realizing this based on [this if-statement](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1189)). That being said, I think this behavior is broken, because it leads to inconsistent state between the DAGScheduler (which thinks the stage is done and submits the next ones) and the TaskSetManager for the most recent version of the stage (which is still waiting on the more recent version of tasks to complete). When the TaskSetManager for most recent version of the stage finishes all of its tasks, it will tell the DAGScheduler -- again -- that the stage has finished, causing the DAGScheduler to update the finish time for the stage and send another (duplicate) SparkListenerStageCompleted message to the listeners (I think this will result in stages in the UI that appear to be finished yet still have running tasks), and re-update the outputs for the map stage. None of these things are obviously buggy (from a cursory look) but they violate a bunch of invariants in the scheduler, and I wouldn't be surprised if there were bugs lurking in this code path. Given the amount of debugging a nd reviewer time that gets dedicated to these subtle bugs, I'm in favor of the simpler solution that maintains consistent state between the DAGScheduler and TaskSetManager. @squito where has this behavior been argued against in the past? My understanding is that a bunch of the scheduler code is based on an assumption that once some tasks in a stage fail with a FetchFailure, we ignore future successes from that stage because it makes the code much simpler (it's also hard, in some cases, to know whether the successes are "real", or delayed messages from machines that later failed). There was a bigger effort to fix that issue in [SPARK-14649](https://issues.apache.org/jira/browse/SPARK-14649), but there were a bunch of subtleties in getting that right, so for now effort on that has stopped. If someone wants to re-start the effort on that, it seems useful, but I think should be de-coupled from fixing this bug. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/16620 I'll spend some time today trying to sort out the relative merits of the fix options; but in the meantime, there's also no good reason for `TaskSchedulerImpl.rootPool` to be a `var` initialized as `null`, nor any good reason for `TaskScheduler.rootPool` to be able to produce `null`. Cleaning that up also makes code in this PR slightly simpler: https://github.com/markhamstra/spark/commit/e11fe2a9817559492daee03c8c025879dc44d346 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user squito commented on the issue: https://github.com/apache/spark/pull/16620 @kayousterhout I think your fix is correct, but I actually think its a bigger change in behavior, one that has been explicitly argued *against* in the past. I think the idea is that if you've got a bunch of tasks completing from an old attempt for a stage, you dont' want to throw all that work away, as @jinxing64 mentioned. You may have a large number of resources tied up computing tasks from a previous attempt, and the results are completely correct, but you still throw those results away. (Its especially bad since we're [still not canceling tasks from previous attempts](https://issues.apache.org/jira/browse/SPARK-2666).) I do think that code would be simplified with the change you are suggesting -- late task completions from an earlier stage have been the cause of more bugs in the past. and this is all only happening when there is a fetch failure, not when everything is running smoothly. But I do think its a rather large change in behavior which we should weigh carefully. I was even worried that the change I was proposing would lead to some cases where tasks would get fully computed, and then the results would get thrown away, but it was necessary for correctness. Also I should mention, that I'm not even 100% sure about this -- I have to admit I find the epoch logic to be confusing and perhaps with a careful read, we'll see there really isn't much more that is getting thrown away than was already from epochs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72500/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72500 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72500/testReport)** for PR 16620 at commit [`ece3d01`](https://github.com/apache/spark/commit/ece3d01f7a77144ec8a543ab025c87f12739b3ac). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72498/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72498 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72498/testReport)** for PR 16620 at commit [`66686a7`](https://github.com/apache/spark/commit/66686a78a42def7c6777e464441af01edbd58606). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72497/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72497 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72497/testReport)** for PR 16620 at commit [`a02dd5c`](https://github.com/apache/spark/commit/a02dd5cc30c7c1999717ba06bc2e9adbdd020fea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 @kayousterhout @squito @markhamstra Thanks a lot for for the comments. I've already refined accordingly. I still have one concern: > If this is a correct description, Iâd argue that (5) is the problem: that when ShuffleMapTask2 finishes, we should not be updating a bunch of state in the DAGScheduler saying that thereâs output ready as a result. If Iâm understanding correctly, thereâs a relatively simple fix to this problem: In DAGScheduler.scala, in handleTaskCompletion, we should exit (and not update any state) when the task is from an earlier stage attempt thatâs not the current active attempt. This can be done by changing the if-statement on line 1141 to include: || stageIdToStage(task.stageId).latestInfo.attemptId != task.stageAttemptId With above, are we ignoring all the results from old stage attempts? As @squito mentioned: > It also can potentially improve performance, since you may submit downstream stages more quickly, rather than waiting for all tasks in the active taskset to complete. Is it maybe beneficial to add up the result from old stage attempts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72500 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72500/testReport)** for PR 16620 at commit [`ece3d01`](https://github.com/apache/spark/commit/ece3d01f7a77144ec8a543ab025c87f12739b3ac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72498 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72498/testReport)** for PR 16620 at commit [`66686a7`](https://github.com/apache/spark/commit/66686a78a42def7c6777e464441af01edbd58606). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72497 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72497/testReport)** for PR 16620 at commit [`a02dd5c`](https://github.com/apache/spark/commit/a02dd5cc30c7c1999717ba06bc2e9adbdd020fea). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/16620 @mridulm yeah once I saw this it seemed like something that's probably been a lurking issue for a bunch of jobs!! Will be great to get this fixed -- thanks for finding it @jinxing64! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16620 @kayousterhout That sounds more clear, and I can see this being a problem (and probably explains some hung jobs I had seen a while earlier), thanks ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 @squito Thanks a lot for helping this PR thus far. I've added unit test in `DAGSchedulerSuite`, but not sure if it is exactly what you suggest. I created a `mockTaskSchedulerImpl`. Since lots of status are maintained in `TaskScheudlerImpl`, I have to trigger the event by `resourceOffers`, `handleSuccessfulTask`, `handleFailedTask`. Please give another look at this when you have time. Really appreciate if you could help. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72401/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72401/testReport)** for PR 16620 at commit [`6547773`](https://github.com/apache/spark/commit/654777345a58acad382f241502ff165c7a34dbe6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72401/testReport)** for PR 16620 at commit [`6547773`](https://github.com/apache/spark/commit/654777345a58acad382f241502ff165c7a34dbe6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72377/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72377 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72377/testReport)** for PR 16620 at commit [`e7cbea0`](https://github.com/apache/spark/commit/e7cbea014783a4582c5cfdb40059a0f61910e9c8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72377/testReport)** for PR 16620 at commit [`e7cbea0`](https://github.com/apache/spark/commit/e7cbea014783a4582c5cfdb40059a0f61910e9c8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72376/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72376/testReport)** for PR 16620 at commit [`56aa1ca`](https://github.com/apache/spark/commit/56aa1ca8a3eb9583b003e783434655491368a178). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72376/testReport)** for PR 16620 at commit [`56aa1ca`](https://github.com/apache/spark/commit/56aa1ca8a3eb9583b003e783434655491368a178). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user squito commented on the issue: https://github.com/apache/spark/pull/16620 Hi @jinxing64 I'm sorry I haven't had time to look again. So the one big concern I had was still that test case -- I know you fixed up some of the things I complained about, but I still think it should probably be in `DAGSchedulerSuite`. I was hoping I would be able to help out by trying to write that test case myself, but maybe you could do that? I think its fine if you have to make `MockTaskScheduler` replicate the behavior of failing when it receives conflicting task sets. Maybe it really can't be done for some reason I don't see yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 @squito Would you please take another look at this? Please give some advice if possible and I can continue working on this : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 @squito Thanks a lot for keep reviewing this~ Your comments are very helpful ~ Thank you so much for your help ~~ -when we encounter the condition where there are no pending partitions, but there is an active taskset -- we just mark that taskset as inactive It's good idea, which makes the code quite clear. I've already modified, please take another look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72231/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72231 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72231/testReport)** for PR 16620 at commit [`db354c7`](https://github.com/apache/spark/commit/db354c79eadbfe177291e14e9d020234b7cfd1c5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72227/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72227/testReport)** for PR 16620 at commit [`76961c3`](https://github.com/apache/spark/commit/76961c3ba64e19c43ebfc0b18651d68c54949edb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72231 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72231/testReport)** for PR 16620 at commit [`db354c7`](https://github.com/apache/spark/commit/db354c79eadbfe177291e14e9d020234b7cfd1c5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72226/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72226 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72226/testReport)** for PR 16620 at commit [`ed1791f`](https://github.com/apache/spark/commit/ed1791fd9b6e434ec69a6c118a433e2539fed7a4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72227/testReport)** for PR 16620 at commit [`76961c3`](https://github.com/apache/spark/commit/76961c3ba64e19c43ebfc0b18651d68c54949edb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72226 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72226/testReport)** for PR 16620 at commit [`ed1791f`](https://github.com/apache/spark/commit/ed1791fd9b6e434ec69a6c118a433e2539fed7a4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user squito commented on the issue: https://github.com/apache/spark/pull/16620 Hi @jinxing64 sorry to go back and forth on this numerous times -- I think I have another alternative, see https://github.com/squito/spark/tree/SPARK-19263_alternate Its most of your changes but with one main difference: when we encounter the condition where there are no pending partitions, but there is an active taskset -- we just mark that taskset as inactive and continue as before https://github.com/squito/spark/commit/bec061c8486a681dc16e8b92e553f79e486924e9. I think this makes it easier to follow, as there are fewer states to keep track of. It also can potentially improve performance, since you may submit downstream stages more quickly, rather than waiting for all tasks in the active taskset to complete. I also think it fixes a bug in your version with mapStageJobs (I'll point it out in the code). This passes all tests in `o.a.s.scheduler.*`, including your new test case. (I did come across a race in `ScheduleIntegrationSuite` which I fixed https://github.com/squito/spark/commit/9125e6738269df4e0d7e6292726bad2a294c86c0 not directly related to these changes). Do you see any problems w/ that approach? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72160/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72160 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72160/testReport)** for PR 16620 at commit [`283373d`](https://github.com/apache/spark/commit/283373d722629681929ed9dd059a6cd22be1fb73). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72160 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72160/testReport)** for PR 16620 at commit [`283373d`](https://github.com/apache/spark/commit/283373d722629681929ed9dd059a6cd22be1fb73). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 @squito ping for review~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 @squito Could you please take another look at this ? : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org