Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/12078#issuecomment-204329728
  
    I see your point, just defer the registering of executors until fully 
created. But here Spark do take care of this issue by such code, though as I 
said not so elegant to handle this race condition.
    
    ```scala
          if (executor == null) {
            logError("Received LaunchTask command but executor was null")
            System.exit(1)
          } else {
    ```
    Looking at the description of this JIRA, a more deeper problem is that 
driver scheduler is not aware of this bad machine and repeatedly assign tasks 
on this node, and finally make the job failure. So in short term maybe this pr 
can solve this race condition problem, but this race condition will only be 
happened on some slow machines (that's why I haven't met this problem before), 
so maybe a more generic solution is that scheduler should be aware of bad 
executor/node. Just my two cents, not so relevant to this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to