[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-218654108 Thanks @markhamstra for the explanations. I think currently the thread just dies and we log the uncaught error. I can add a catch for NoClassDefFoundError and handle it the same way as ClassNotFoundException. But even with that, I think it's still better to inform the scheduler in finally to make sure the failed task is handled. Let me know what you think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-218521768 @lirui-intel Sorry, I don't have anything specific at this time. In general, catching Errors is a bad idea. In this particular case, we may be able to handle NoClassDefFoundError well enough that we can effectively turn it into the equivalent of a ClassNotFoundException. I haven't even looked yet to figure out what we are doing with uncaught Throwables in this part of the code, but what I am thinking is that we may need a handler for those that will at least try to have the scheduler handle failed tasks and maybe also do some other things on the way toward what is likely an attempted clean shutdown -- but you're correct that we can't really expect such things to succeed for all Errors. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-218488089 Hey @markhamstra, anything specific that you think we should do in case of more severe errors? I think it doesn't hurt to handle the failed task in a finally block as some kind of best-effort to inform the scheduler a task has failed. If some fatal error prevents us from doing even that much, we probably won't expect to be able to do anything else. If we do want to distinguish errors based on level of severity and take actions accordingly, we can do it as a follow-on task. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217921869 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58146/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217921867 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217921616 **[Test build #58146 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58146/consoleFull)** for PR 12775 at commit [`cf7ef57`](https://github.com/apache/spark/commit/cf7ef57c9ceea7201e7f143ca3d8efc77344d88e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217887387 **[Test build #58146 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58146/consoleFull)** for PR 12775 at commit [`cf7ef57`](https://github.com/apache/spark/commit/cf7ef57c9ceea7201e7f143ca3d8efc77344d88e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217886648 Update to add test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217661490 Sorry, I haven't been completely clear while thinking through this issue. There are two basic considerations: 1) Is NoClassDefFoundError benign enough in this context that we can actually catch that particular Error here and handle it essentially the same as we currently do for ClassNotFoundException? 2) In the case of other Errors, instead of telling the scheduler about just this particular task failure, should we be doing something different or additional in a higher-level, more general case Error catcher? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217660452 @kayousterhout I agree with that much. My concern is whether we can expect to successfully do even that much in the case of the more extreme Errors, or whether we should be letting those percolate up. In the case of NoClassDefFoundError, logging a little more information and continuing as we do with ClassNotFoundException seems like the right thing to do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217660234 @markhamstra did you see that this is only in the case when we already know the task failed? So I think this approach is correct -- we should always be telling the schedule that the task failed, even if we can't deserialize he reason why. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217660199 This looks good but can you write a small unit test for this (in TaskResultGetterSuite)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217659983 I'm not 100% convinced that we should be calling scheduler.handleFailedTask after any Error resulting from the attempt to get the TaskEndReason from the serializedData; but in the particular case of NoClassDefFoundError it seems to me that we should be handling this similarly to what we are already doing with ClassNotFoundException. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217608341 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58053/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217608339 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217608311 **[Test build #58053 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58053/consoleFull)** for PR 12775 at commit [`ee34cd2`](https://github.com/apache/spark/commit/ee34cd2d67a98ef48b0453d6c0a77b88c9db12fb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217602287 **[Test build #58053 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58053/consoleFull)** for PR 12775 at commit [`ee34cd2`](https://github.com/apache/spark/commit/ee34cd2d67a98ef48b0453d6c0a77b88c9db12fb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-215658222 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57317/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-215658218 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-215658048 **[Test build #57317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57317/consoleFull)** for PR 12775 at commit [`ee34cd2`](https://github.com/apache/spark/commit/ee34cd2d67a98ef48b0453d6c0a77b88c9db12fb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-215637933 cc @kayousterhout --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-215637396 **[Test build #57317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57317/consoleFull)** for PR 12775 at commit [`ee34cd2`](https://github.com/apache/spark/commit/ee34cd2d67a98ef48b0453d6c0a77b88c9db12fb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...
GitHub user lirui-intel opened a pull request: https://github.com/apache/spark/pull/12775 [SPARK-14958][Core] Failed task not handled when there's error deserializing failure reason ## What changes were proposed in this pull request? TaskResultGetter tries to deserialize the TaskEndReason before handling the failed task. If an error is thrown during deserialization, the failed task won't be handled, which leaves the job hanging. The PR proposes to handle the failed task in a finally block. ## How was this patch tested? In my case I hit a NoClassDefFoundError and the job hangs. Manually verified the patch can fix it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lirui-intel/spark SPARK-14958 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12775.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12775 commit d302df2a361b248091198472c68c294c87db3483 Author: Rui Li Date: 2016-04-29T01:57:13Z SPARK-14958: fix commit 3dd5dd9642761053b0edd32615e1563813a1162d Author: Rui Li Date: 2016-04-29T05:12:31Z Revert "SPARK-14958: fix" This reverts commit d302df2a361b248091198472c68c294c87db3483. commit ee34cd2d67a98ef48b0453d6c0a77b88c9db12fb Author: Rui Li Date: 2016-04-29T05:15:25Z handle in finally --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org