[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19951 Since the target of the fix is silencing a misleading exception, handling that exception as I suggested before would be a feasible solution. But anything more complicated than that is overkill. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19951 It's not possible to differentiate the two race conditions @vanzin described in this code path without adding extra communication loads, since this should be a minor issue and there is no simple fix for it, I'd suggest we close this PR for now and revisit when someone come off a better solution, WDYT @vanzin @tgravescs @rezasafi @srowen @KaiXinXiaoLei ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19951 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88177/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19951 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19951 **[Test build #88177 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88177/testReport)** for PR 19951 at commit [`40bb11f`](https://github.com/apache/spark/commit/40bb11f3ec26e3ee2f3be62f048524cc94f14d44). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19951 **[Test build #88177 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88177/testReport)** for PR 19951 at commit [`40bb11f`](https://github.com/apache/spark/commit/40bb11f3ec26e3ee2f3be62f048524cc94f14d44). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user rezasafi commented on the issue: https://github.com/apache/spark/pull/19951 This is interesting. Thanks for working on it. It seems to me that the race situation is benign here and removing all the race cases will cause some extra communications that may introduce extra over-head. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user KaiXinXiaoLei commented on the issue: https://github.com/apache/spark/pull/19951 @vanzin yeah, it is difficult to consider all the race. So i continue to analyze the source code, and i think my another way to solve the problem better. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19951 I know what you're saying. I'm saying that you're not considering another race that also exists in this code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user KaiXinXiaoLei commented on the issue: https://github.com/apache/spark/pull/19951 I have another way to modify this problem: ![change coarsegrainedschedulerbackend](https://user-images.githubusercontent.com/9440626/33971859-9705974c-e0b5-11e7-95dd-499ff132e330.png) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user KaiXinXiaoLei commented on the issue: https://github.com/apache/spark/pull/19951 My mean is, `CoarseGrainedSchedulerBackend.stopExecutors()` is called, then same executors is exited. The driver does not need feel these executor is disconnected and send message, otherwise the exception will be appear. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19951 Yeah, but if an executor died right before the context was stopped, the "on disconnect" event might be in the queue when the stop call happens, and trigger the same code path that will throw the exception. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user KaiXinXiaoLei commented on the issue: https://github.com/apache/spark/pull/19951 @tgravescs the job is end, then error log apper. i think this error log will cause illusions to believe the failure of the task. @vanzin Your analysis is right. And i think my code can get rid of this exception. Because when things are stopping, the sc.stopped is set true firstly. So driver endpoint does not need to send messages to itself. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19951 BTW, it might still be possible to hit that race even with the changes here. I'm not sure there's a way to completely get rid of it, though, so perhaps catching the exception and not logging it if things are stopping might be a more sure way to get rid of the logs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19951 This seems like a race during shutdown: - executor disconnects, disconnects, which causes "on disconnect" event to be queued - at the same time, the `stop()` thread ends up calling `Dispatcher.stop()` which unregisters all endpoints and enqueues a message that stops each endpoint receiver - driver endpoint inbox is drained; "on disconnect" callback is called, driver tries to send a message to itself, but because it has been unregistered above, it fails. You could argue that what the RpcEnv is doing above is sort of fishy (delivering messages to the endpoint after it's already been unregistered), but this looks like an ok workaround. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/19951 Overall seems to make sense would like a few more details. there were changes to the dispatcher to try to ignore some of the these errors: https://github.com/apache/spark/pull/18547/files these look like different messages then that one as it handled mostly rpcenvstopped. So when these errors come out the dispatcher is not stopped yet? you see lots of these errors or just a single one? Do these cause job failure or just clutter the logs? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19951 @yoonlee95 maybe @tgravescs does this make sense? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user KaiXinXiaoLei commented on the issue: https://github.com/apache/spark/pull/19951 @devaraj-kavali @vanzin ,using https://github.com/apache/spark/pull/19741, i still find the problem "Could not find CoarseGrainedScheduler", i change the code ,please review, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19951 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84814/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19951 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19951 **[Test build #84814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84814/testReport)** for PR 19951 at commit [`40bb11f`](https://github.com/apache/spark/commit/40bb11f3ec26e3ee2f3be62f048524cc94f14d44). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19951 **[Test build #84814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84814/testReport)** for PR 19951 at commit [`40bb11f`](https://github.com/apache/spark/commit/40bb11f3ec26e3ee2f3be62f048524cc94f14d44). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19951 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84766/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19951 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19951 **[Test build #84766 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84766/testReport)** for PR 19951 at commit [`c4dcc19`](https://github.com/apache/spark/commit/c4dcc19ce8af02f99be18db8ddfe9b704086dd43). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19951 **[Test build #84766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84766/testReport)** for PR 19951 at commit [`c4dcc19`](https://github.com/apache/spark/commit/c4dcc19ce8af02f99be18db8ddfe9b704086dd43). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org