GitHub user tejasapatil opened a pull request:

    https://github.com/apache/spark/pull/14202

    [SPARK-16230] [CORE] CoarseGrainedExecutorBackend to self kill if there is 
an exception while creating an Executor

    ## What changes were proposed in this pull request?
    
    With the fix from SPARK-13112, I see that `LaunchTask` is always processed 
after `RegisteredExecutor` is done and so it gets chance to do all retries to 
startup an executor. There is still a problem that if `Executor` creation 
itself fails and there is some exception, it gets unnoticed and the executor is 
killed when it tries to process the `LaunchTask` as `executor` is null : 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L88
 So if one looks at the logs, it does not tell that there was problem during 
`Executor` creation and thats why it was killed.
    
    This PR explicitly catches exception in `Executor` creation, logs a proper 
message and then exits the JVM. Also, I have changed the `exitExecutor` method 
to accept `reason` so that backends can use that reason and do stuff like 
logging to a DB to get an aggregate of such exits at a cluster level
    
    ## How was this patch tested?
    
    I am relying on existing tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tejasapatil/spark exit_executor_failure

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14202.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14202
    
----
commit 0c71699894d4b7920388056a1d05d2277a79cf38
Author: Tejas Patil <tej...@fb.com>
Date:   2016-07-14T14:36:36Z

    CoarseGrainedExecutorBackend to self kill if there is an exception while 
creating an Executor

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to