[GitHub] spark issue #19396: [SPARK-22172][CORE] Worker hangs when the external shuff...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19396 Sorry I didn't notice it, will double-check next time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19396: [SPARK-22172][CORE] Worker hangs when the external shuff...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19396 we should update PR description too, but it's too late now... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19396: [SPARK-22172][CORE] Worker hangs when the external shuff...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19396 OK, let me merge to master branch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19396: [SPARK-22172][CORE] Worker hangs when the external shuff...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19396 I'm OK with the current changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19396: [SPARK-22172][CORE] Worker hangs when the external shuff...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19396 The change itself looks good to me, WDYT @jerryshao @cloud-fan ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19396: [SPARK-22172][CORE] Worker hangs when the external shuff...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/19396 @jiangxb1987 Thanks for the comment, I made the change which throws exception and exits the worker. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19396: [SPARK-22172][CORE] Worker hangs when the external shuff...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19396 IMO we should throw a new Exception in order to fail fast, it may cause some weird issues running with an ESS that you can't connect to. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19396: [SPARK-22172][CORE] Worker hangs when the external shuff...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19396 Sorry for the late response. I understand you purpose now. I think such behavior discrepancy is not a big problem. I guess the reason why NM still run with exception is that NM doesn't serve only for Spark, but also MR/TEZ, so the failure of Spark external service should not affect MR's. Based on your comment above, I don't have a strong preference on either, I think both are OK. Maybe you can ping others to get their feedbacks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19396: [SPARK-22172][CORE] Worker hangs when the external shuff...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/19396 @jerryshao Please let me know if you don't convince with the above comment, I can make the changes to PR to make Worker do down on external shuffle service start failure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19396: [SPARK-22172][CORE] Worker hangs when the external shuff...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/19396 Thanks @jerryshao for the comment. > IMO I think it might be better to throw an exception instead of not starting shuffle service. Since user want to use external shuffle explicitly, letting user to know the issues and fix the issue would be better. I considered this before creating PR but the Node Manager continues to run when spark shuffle service gets BindException, thought to make the behavior inline with NM spark shuffle. Please let me know if you think to make the Worker to go down on external shuffle service start failure I can update this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19396: [SPARK-22172][CORE] Worker hangs when the external shuff...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19396 IMO I think it might be better to throw an exception instead of shifting to another shuffle. Since user want to use external shuffle explicitly, letting user to know the issues and fix the issue would be better. Besides, this will lead to an issue if Cluster manager changing to not start shuffle service, while Spark application still assume external shuffle is on, and tries to connect to external shuffle service. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19396: [SPARK-22172][CORE] Worker hangs when the external shuff...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19396 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org