[ https://issues.apache.org/jira/browse/SPARK-22199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198258#comment-16198258 ]
Saisai Shao commented on SPARK-22199: ------------------------------------- Can you please list the steps to reproduce this issue? Also please try with latest master branch to see if the issue still exists. > Spark Job on YARN fails with executors "Slave registration failed" > ------------------------------------------------------------------ > > Key: SPARK-22199 > URL: https://issues.apache.org/jira/browse/SPARK-22199 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 1.6.3 > Reporter: Prabhu Joseph > Priority: Minor > > Spark Job on YARN Failed with max executors Failed. > ApplicationMaster logs: > {code} > 17/09/28 04:18:27 INFO ApplicationMaster: Unregistering ApplicationMaster > with FAILED (diag message: Max number of executor failures (3) reached) > {code} > Checking the failed container logs shows "Slave registration failed: > Duplicate executor ID" whereas the Driver logs shows it has removed those > executors as they are idle for spark.dynamicAllocation.executorIdleTimeout > Executor Logs: > {code} > 17/09/28 04:18:26 ERROR CoarseGrainedExecutorBackend: Slave registration > failed: Duplicate executor ID: 122 > {code} > Driver logs: > {code} > 17/09/28 04:18:21 INFO ExecutorAllocationManager: Removing executor 122 > because it has been idle for 60 seconds (new desired total will be 133) > {code} > There are two issues here: > 1. Error Message in executor is misleading "Slave registration failed: > Duplicate executor ID" as the actual error is it was idle > 2. The job failed as there are executors idle for > spark.dynamicAllocation.executorIdleTimeout > -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org