[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2018-03-13 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19951
  
Since the target of the fix is silencing a misleading exception, handling 
that exception as I suggested before would be a feasible solution. But anything 
more complicated than that is overkill.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2018-03-13 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/19951
  
It's not possible to differentiate the two race conditions @vanzin 
described in this code path without adding extra communication loads, since 
this should be a minor issue and there is no simple fix for it, I'd suggest we 
close this PR for now and revisit when someone come off a better solution, WDYT 
@vanzin @tgravescs @rezasafi @srowen @KaiXinXiaoLei ?





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2018-03-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19951
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88177/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2018-03-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19951
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2018-03-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19951
  
**[Test build #88177 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88177/testReport)**
 for PR 19951 at commit 
[`40bb11f`](https://github.com/apache/spark/commit/40bb11f3ec26e3ee2f3be62f048524cc94f14d44).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2018-03-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19951
  
**[Test build #88177 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88177/testReport)**
 for PR 19951 at commit 
[`40bb11f`](https://github.com/apache/spark/commit/40bb11f3ec26e3ee2f3be62f048524cc94f14d44).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-14 Thread rezasafi
Github user rezasafi commented on the issue:

https://github.com/apache/spark/pull/19951
  
This is interesting. Thanks for working on it. It seems to me that the race 
situation is benign here and removing all the race cases will cause some extra 
communications that may introduce extra over-head.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-13 Thread KaiXinXiaoLei
Github user KaiXinXiaoLei commented on the issue:

https://github.com/apache/spark/pull/19951
  
@vanzin yeah, it is difficult to consider all the race. So i  continue to 
analyze the source code, and i think my another way to solve the problem better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-13 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19951
  
I know what you're saying. I'm saying that you're not considering another 
race that also exists in this code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-13 Thread KaiXinXiaoLei
Github user KaiXinXiaoLei commented on the issue:

https://github.com/apache/spark/pull/19951
  
I have another way to modify this problem: 
![change 
coarsegrainedschedulerbackend](https://user-images.githubusercontent.com/9440626/33971859-9705974c-e0b5-11e7-95dd-499ff132e330.png)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-13 Thread KaiXinXiaoLei
Github user KaiXinXiaoLei commented on the issue:

https://github.com/apache/spark/pull/19951
  
My mean is, `CoarseGrainedSchedulerBackend.stopExecutors()` is called, then 
same executors is exited. The driver does not need feel these executor is 
disconnected and send message, otherwise the exception will be appear.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-13 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19951
  
Yeah, but if an executor died right before the context was stopped, the "on 
disconnect" event might be in the queue when the stop call happens, and trigger 
the same code path that will throw the exception.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-13 Thread KaiXinXiaoLei
Github user KaiXinXiaoLei commented on the issue:

https://github.com/apache/spark/pull/19951
  
@tgravescs  the  job is end, then error log  apper.  i think this error log 
will cause illusions to believe the failure of the task.

@vanzin Your analysis is right. And i think my code can get rid of this 
exception. Because when things are stopping, the sc.stopped is set true 
firstly. So driver endpoint does not need to send messages to itself.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-13 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19951
  
BTW, it might still be possible to hit that race even with the changes 
here. I'm not sure there's a way to completely get rid of it, though, so 
perhaps catching the exception and not logging it if things are stopping might 
be a more sure way to get rid of the logs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-13 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19951
  
This seems like a race during shutdown:

- executor disconnects, disconnects, which causes "on disconnect" event to 
be queued
- at the same time, the `stop()` thread ends up calling `Dispatcher.stop()` 
which unregisters all endpoints and enqueues a message that stops each endpoint 
receiver
- driver endpoint inbox is drained; "on disconnect" callback is called, 
driver tries to send a message to itself, but because it has been unregistered 
above, it fails.

You could argue that what the RpcEnv is doing above is sort of fishy 
(delivering messages to the endpoint after it's already been unregistered), but 
this looks like an ok workaround.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-13 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/19951
  
Overall seems to make sense would like a few more details.

there were changes to the dispatcher to try to ignore some of the these 
errors:
https://github.com/apache/spark/pull/18547/files

these look like different messages then that one as it handled mostly 
rpcenvstopped.

So when these errors come out the dispatcher is not stopped yet?
you see lots of these errors or just a single one? Do these cause job 
failure or just clutter the logs?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-13 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19951
  
@yoonlee95 maybe @tgravescs does this make sense?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-12 Thread KaiXinXiaoLei
Github user KaiXinXiaoLei commented on the issue:

https://github.com/apache/spark/pull/19951
  
@devaraj-kavali @vanzin ,using https://github.com/apache/spark/pull/19741, 
i still find the problem "Could not find CoarseGrainedScheduler", i  change the 
code ,please review, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19951
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84814/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19951
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19951
  
**[Test build #84814 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84814/testReport)**
 for PR 19951 at commit 
[`40bb11f`](https://github.com/apache/spark/commit/40bb11f3ec26e3ee2f3be62f048524cc94f14d44).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19951
  
**[Test build #84814 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84814/testReport)**
 for PR 19951 at commit 
[`40bb11f`](https://github.com/apache/spark/commit/40bb11f3ec26e3ee2f3be62f048524cc94f14d44).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19951
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84766/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19951
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19951
  
**[Test build #84766 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84766/testReport)**
 for PR 19951 at commit 
[`c4dcc19`](https://github.com/apache/spark/commit/c4dcc19ce8af02f99be18db8ddfe9b704086dd43).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19951
  
**[Test build #84766 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84766/testReport)**
 for PR 19951 at commit 
[`c4dcc19`](https://github.com/apache/spark/commit/c4dcc19ce8af02f99be18db8ddfe9b704086dd43).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org