[ 
https://issues.apache.org/jira/browse/SPARK-42766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699467#comment-17699467
 ] 

Apache Spark commented on SPARK-42766:
--------------------------------------

User 'wangshengjie123' has created a pull request for this issue:
https://github.com/apache/spark/pull/40391

> YarnAllocator should filter excluded nodes when launching allocated containers
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-42766
>                 URL: https://issues.apache.org/jira/browse/SPARK-42766
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 3.3.2
>            Reporter: wangshengjie
>            Priority: Major
>
> In production environment, we hit an issue like this:
> If we request 10 containers form nodeA and nodeB, first response from Yarn 
> return 5 contianers from nodeA and nodeB, then nodeA blacklisted, and second 
> response from Yarn maybe return some containers from nodeA and launching 
> containers, but when containers(Executor) setup and send register request to 
> Driver, it will be rejected and this failure will be counted to 
> {code:java}
> spark.yarn.max.executor.failures {code}
> , and will casue app failed.
> {code:java}
> Max number of executor failures ($maxNumExecutorFailures) reached{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to