[jira] [Commented] (SPARK-13112) CoarsedExecutorBackend register to driver should wait Executor was ready
[ https://issues.apache.org/jira/browse/SPARK-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228899#comment-15228899 ] Apache Spark commented on SPARK-13112: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/12211 > CoarsedExecutorBackend register to driver should wait Executor was ready > > > Key: SPARK-13112 > URL: https://issues.apache.org/jira/browse/SPARK-13112 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: SuYan > > desc: > due to some host's disk are busy, it will results failed in timeoutException > while executor try to register to shuffler server on that host... > and then it will exit(1) while launch task on a null executor. > and yarn cluster resource are a little busy, yarn will thought that host is > idle, it will prefer to allocate the same host executor, so it will have a > chance that one task failed 4 times in the same host. > currently, CoarsedExecutorBackend register to driver first, and after > registerDriver successful, then initial Executor. > if exception occurs in Executor initialization, > But Driver don't know that event, will still launch task in that executor, > then will call system.exit(1). > {code} > override def receive: PartialFunction[Any, Unit] = { > case RegisteredExecutor(hostname) => > logInfo("Successfully registered with driver") executor = new > Executor(executorId, hostname, env, userClassPath, isLocal = false) > .. > case LaunchTask(data) => >if (executor == null) { > logError("Received LaunchTask command but executor was null") > System.exit(1) > {code} > It is more reasonable to register with driver after Executor is ready... and > make registerTimeout to be configurable... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13112) CoarsedExecutorBackend register to driver should wait Executor was ready
[ https://issues.apache.org/jira/browse/SPARK-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219237#comment-15219237 ] Apache Spark commented on SPARK-13112: -- User 'viper-kun' has created a pull request for this issue: https://github.com/apache/spark/pull/12078 > CoarsedExecutorBackend register to driver should wait Executor was ready > > > Key: SPARK-13112 > URL: https://issues.apache.org/jira/browse/SPARK-13112 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: SuYan > > desc: > due to some host's disk are busy, it will results failed in timeoutException > while executor try to register to shuffler server on that host... > and then it will exit(1) while launch task on a null executor. > and yarn cluster resource are a little busy, yarn will thought that host is > idle, it will prefer to allocate the same host executor, so it will have a > chance that one task failed 4 times in the same host. > currently, CoarsedExecutorBackend register to driver first, and after > registerDriver successful, then initial Executor. > if exception occurs in Executor initialization, > But Driver don't know that event, will still launch task in that executor, > then will call system.exit(1). > {code} > override def receive: PartialFunction[Any, Unit] = { > case RegisteredExecutor(hostname) => > logInfo("Successfully registered with driver") executor = new > Executor(executorId, hostname, env, userClassPath, isLocal = false) > .. > case LaunchTask(data) => >if (executor == null) { > logError("Received LaunchTask command but executor was null") > System.exit(1) > {code} > It is more reasonable to register with driver after Executor is ready... and > make registerTimeout to be configurable... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13112) CoarsedExecutorBackend register to driver should wait Executor was ready
[ https://issues.apache.org/jira/browse/SPARK-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217362#comment-15217362 ] meiyoula commented on SPARK-13112: -- I agree with it. When CoarseGrainedExecutorBackend receives RegisterExecutorResponse slow after LaunchTask, it will occurs the problem. I think we can't make sure CoarseGrainedExecutorBackend receives RegisterExecutorResponse before LaunchTask. Maybe CoarseGrainedExecutorBackend is busy to send itself RegisterExecutorResponse, and receives LaunchTask message first. > CoarsedExecutorBackend register to driver should wait Executor was ready > > > Key: SPARK-13112 > URL: https://issues.apache.org/jira/browse/SPARK-13112 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: SuYan > > desc: > due to some host's disk are busy, it will results failed in timeoutException > while executor try to register to shuffler server on that host... > and then it will exit(1) while launch task on a null executor. > and yarn cluster resource are a little busy, yarn will thought that host is > idle, it will prefer to allocate the same host executor, so it will have a > chance that one task failed 4 times in the same host. > currently, CoarsedExecutorBackend register to driver first, and after > registerDriver successful, then initial Executor. > if exception occurs in Executor initialization, > But Driver don't know that event, will still launch task in that executor, > then will call system.exit(1). > {code} > override def receive: PartialFunction[Any, Unit] = { > case RegisteredExecutor(hostname) => > logInfo("Successfully registered with driver") executor = new > Executor(executorId, hostname, env, userClassPath, isLocal = false) > .. > case LaunchTask(data) => >if (executor == null) { > logError("Received LaunchTask command but executor was null") > System.exit(1) > {code} > It is more reasonable to register with driver after Executor is ready... and > make registerTimeout to be configurable... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org