[jira] [Commented] (SPARK-13112) CoarsedExecutorBackend register to driver should wait Executor was ready

2016-04-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228899#comment-15228899
 ] 

Apache Spark commented on SPARK-13112:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/12211

> CoarsedExecutorBackend register to driver should wait Executor was ready
> 
>
> Key: SPARK-13112
> URL: https://issues.apache.org/jira/browse/SPARK-13112
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: SuYan
>
> desc: 
> due to some host's disk are busy, it will results failed in timeoutException 
> while executor try to register to shuffler server on that host... 
> and then it will exit(1) while launch task on a null executor.
> and yarn cluster resource are a little busy, yarn will thought that host is 
> idle, it will prefer to allocate the same host executor, so it will have a 
> chance that one task failed 4 times in the same host. 
> currently, CoarsedExecutorBackend register to driver first, and after 
> registerDriver successful, then initial Executor. 
> if exception occurs in Executor initialization,
> But Driver don't know that event, will still launch task in that executor,
> then will call system.exit(1). 
> {code}
>  override def receive: PartialFunction[Any, Unit] = { 
>   case RegisteredExecutor(hostname) => 
>   logInfo("Successfully registered with driver") executor = new 
> Executor(executorId, hostname, env, userClassPath, isLocal = false) 
> ..
> case LaunchTask(data) =>
>if (executor == null) {
> logError("Received LaunchTask command but executor was null")
> System.exit(1) 
> {code}
>  It is more reasonable to register with driver after Executor is ready... and 
> make registerTimeout to be configurable...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13112) CoarsedExecutorBackend register to driver should wait Executor was ready

2016-03-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219237#comment-15219237
 ] 

Apache Spark commented on SPARK-13112:
--

User 'viper-kun' has created a pull request for this issue:
https://github.com/apache/spark/pull/12078

> CoarsedExecutorBackend register to driver should wait Executor was ready
> 
>
> Key: SPARK-13112
> URL: https://issues.apache.org/jira/browse/SPARK-13112
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: SuYan
>
> desc: 
> due to some host's disk are busy, it will results failed in timeoutException 
> while executor try to register to shuffler server on that host... 
> and then it will exit(1) while launch task on a null executor.
> and yarn cluster resource are a little busy, yarn will thought that host is 
> idle, it will prefer to allocate the same host executor, so it will have a 
> chance that one task failed 4 times in the same host. 
> currently, CoarsedExecutorBackend register to driver first, and after 
> registerDriver successful, then initial Executor. 
> if exception occurs in Executor initialization,
> But Driver don't know that event, will still launch task in that executor,
> then will call system.exit(1). 
> {code}
>  override def receive: PartialFunction[Any, Unit] = { 
>   case RegisteredExecutor(hostname) => 
>   logInfo("Successfully registered with driver") executor = new 
> Executor(executorId, hostname, env, userClassPath, isLocal = false) 
> ..
> case LaunchTask(data) =>
>if (executor == null) {
> logError("Received LaunchTask command but executor was null")
> System.exit(1) 
> {code}
>  It is more reasonable to register with driver after Executor is ready... and 
> make registerTimeout to be configurable...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13112) CoarsedExecutorBackend register to driver should wait Executor was ready

2016-03-29 Thread meiyoula (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217362#comment-15217362
 ] 

meiyoula commented on SPARK-13112:
--

I agree with it. When  CoarseGrainedExecutorBackend receives 
RegisterExecutorResponse slow after LaunchTask, it will occurs the problem. 
I think we can't make sure CoarseGrainedExecutorBackend receives 
RegisterExecutorResponse before LaunchTask. Maybe CoarseGrainedExecutorBackend 
is busy to send itself RegisterExecutorResponse, and receives LaunchTask 
message first.

> CoarsedExecutorBackend register to driver should wait Executor was ready
> 
>
> Key: SPARK-13112
> URL: https://issues.apache.org/jira/browse/SPARK-13112
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: SuYan
>
> desc: 
> due to some host's disk are busy, it will results failed in timeoutException 
> while executor try to register to shuffler server on that host... 
> and then it will exit(1) while launch task on a null executor.
> and yarn cluster resource are a little busy, yarn will thought that host is 
> idle, it will prefer to allocate the same host executor, so it will have a 
> chance that one task failed 4 times in the same host. 
> currently, CoarsedExecutorBackend register to driver first, and after 
> registerDriver successful, then initial Executor. 
> if exception occurs in Executor initialization,
> But Driver don't know that event, will still launch task in that executor,
> then will call system.exit(1). 
> {code}
>  override def receive: PartialFunction[Any, Unit] = { 
>   case RegisteredExecutor(hostname) => 
>   logInfo("Successfully registered with driver") executor = new 
> Executor(executorId, hostname, env, userClassPath, isLocal = false) 
> ..
> case LaunchTask(data) =>
>if (executor == null) {
> logError("Received LaunchTask command but executor was null")
> System.exit(1) 
> {code}
>  It is more reasonable to register with driver after Executor is ready... and 
> make registerTimeout to be configurable...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org