GitHub user CodingCat opened a pull request:

    https://github.com/apache/spark/pull/186

    SPARK-1235: fail all jobs when DAGScheduler crashes for some reason

    https://spark-project.atlassian.net/browse/SPARK-1235
    
    In the current implementation, the running job will hang if the 
DAGScheduler crashes for some reason (eventProcessActor throws exception in 
receive() )
    
    The reason is that the actor will automatically restart when the exception 
is thrown during the running but is not captured properly (Akka behaviour), and 
the JobWaiters are still waiting there for the completion of the tasks
    
    In this patch, I override the preRestart hook of the actor, in which I fail 
all running jobs if the dagScheduler crashes and restart
    
    thanks for @kayousterhout and @markhamstra to give the hints in JIRA

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/CodingCat/spark SPARK-1235

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/186.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #186
    
----
commit b417b763b3dec602b1262ec4f28460181d32e5ff
Author: CodingCat <[email protected]>
Date:   2014-03-20T04:59:52Z

    fail all jobs when DAGScheduler crashes for some reason

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to