[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-11 Thread lirui-intel
Github user lirui-intel commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-218654108
  
Thanks @markhamstra for the explanations. I think currently the thread just 
dies and we log the uncaught error. I can add a catch for NoClassDefFoundError 
and handle it the same way as ClassNotFoundException. But even with that, I 
think it's still better to inform the scheduler in finally to make sure the 
failed task is handled. Let me know what you think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-11 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-218521768
  
@lirui-intel Sorry, I don't have anything specific at this time.  In 
general, catching Errors is a bad idea.  In this particular case, we may be 
able to handle NoClassDefFoundError well enough that we can effectively turn it 
into the equivalent of a ClassNotFoundException.  I haven't even looked yet to 
figure out what we are doing with uncaught Throwables in this part of the code, 
but what I am thinking is that we may need a handler for those that will at 
least try to have the scheduler handle failed tasks and maybe also do some 
other things on the way toward what is likely an attempted clean shutdown -- 
but you're correct that we can't really expect such things to succeed for all 
Errors. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-11 Thread lirui-intel
Github user lirui-intel commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-218488089
  
Hey @markhamstra, anything specific that you think we should do in case of 
more severe errors?
I think it doesn't hurt to handle the failed task in a finally block as 
some kind of best-effort to inform the scheduler a task has failed. If some 
fatal error prevents us from doing even that much, we probably won't expect to 
be able to do anything else.
If we do want to distinguish errors based on level of severity and take 
actions accordingly, we can do it as a follow-on task.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-217921869
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58146/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-217921867
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-217921616
  
**[Test build #58146 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58146/consoleFull)**
 for PR 12775 at commit 
[`cf7ef57`](https://github.com/apache/spark/commit/cf7ef57c9ceea7201e7f143ca3d8efc77344d88e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-217887387
  
**[Test build #58146 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58146/consoleFull)**
 for PR 12775 at commit 
[`cf7ef57`](https://github.com/apache/spark/commit/cf7ef57c9ceea7201e7f143ca3d8efc77344d88e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-09 Thread lirui-intel
Github user lirui-intel commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-217886648
  
Update to add test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-07 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-217661490
  
Sorry, I haven't been completely clear while thinking through this issue.  
There are two basic considerations: 1) Is NoClassDefFoundError benign enough in 
this context that we can actually catch that particular Error here and handle 
it essentially the same as we currently do for ClassNotFoundException? 2) In 
the case of other Errors, instead of telling the scheduler about just this 
particular task failure, should we be doing something different or additional 
in a higher-level, more general case Error catcher?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-07 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-217660452
  
@kayousterhout I agree with that much.  My concern is whether we can expect 
to successfully do even that much in the case of the more extreme Errors, or 
whether we should be letting those percolate up.  In the case of 
NoClassDefFoundError, logging a little more information and continuing as we do 
with ClassNotFoundException seems like the right thing to do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-07 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-217660234
  
@markhamstra did you see that this is only in the case when we already know 
the task failed?  So I think this approach is correct -- we should always be 
telling the schedule that the task failed, even if we can't deserialize he 
reason why.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-07 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-217660199
  
This looks good but can you write a small unit test for this (in 
TaskResultGetterSuite)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-07 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-217659983
  
I'm not 100% convinced that we should be calling scheduler.handleFailedTask 
after any Error resulting from the attempt to get the TaskEndReason from the 
serializedData; but in the particular case of NoClassDefFoundError it seems to 
me that we should be handling this similarly to what we are already doing with 
ClassNotFoundException. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-217608341
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58053/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-217608339
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-217608311
  
**[Test build #58053 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58053/consoleFull)**
 for PR 12775 at commit 
[`ee34cd2`](https://github.com/apache/spark/commit/ee34cd2d67a98ef48b0453d6c0a77b88c9db12fb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-217602287
  
**[Test build #58053 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58053/consoleFull)**
 for PR 12775 at commit 
[`ee34cd2`](https://github.com/apache/spark/commit/ee34cd2d67a98ef48b0453d6c0a77b88c9db12fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-04-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-215658222
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57317/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-04-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-215658218
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-04-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-215658048
  
**[Test build #57317 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57317/consoleFull)**
 for PR 12775 at commit 
[`ee34cd2`](https://github.com/apache/spark/commit/ee34cd2d67a98ef48b0453d6c0a77b88c9db12fb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-04-28 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-215637933
  
cc @kayousterhout


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-04-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12775#issuecomment-215637396
  
**[Test build #57317 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57317/consoleFull)**
 for PR 12775 at commit 
[`ee34cd2`](https://github.com/apache/spark/commit/ee34cd2d67a98ef48b0453d6c0a77b88c9db12fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-04-28 Thread lirui-intel
GitHub user lirui-intel opened a pull request:

https://github.com/apache/spark/pull/12775

[SPARK-14958][Core] Failed task not handled when there's error 
deserializing failure reason

## What changes were proposed in this pull request?

TaskResultGetter tries to deserialize the TaskEndReason before handling the 
failed task. If an error is thrown during deserialization, the failed task 
won't be handled, which leaves the job hanging.
The PR proposes to handle the failed task in a finally block.


## How was this patch tested?

In my case I hit a NoClassDefFoundError and the job hangs. Manually 
verified the patch can fix it.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lirui-intel/spark SPARK-14958

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12775.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12775


commit d302df2a361b248091198472c68c294c87db3483
Author: Rui Li 
Date:   2016-04-29T01:57:13Z

SPARK-14958: fix

commit 3dd5dd9642761053b0edd32615e1563813a1162d
Author: Rui Li 
Date:   2016-04-29T05:12:31Z

Revert "SPARK-14958: fix"

This reverts commit d302df2a361b248091198472c68c294c87db3483.

commit ee34cd2d67a98ef48b0453d6c0a77b88c9db12fb
Author: Rui Li 
Date:   2016-04-29T05:15:25Z

handle in finally




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org