[jira] [Comment Edited] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason

2014-10-10 Thread Denis Serduik (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166555#comment-14166555
 ] 

Denis Serduik edited comment on SPARK-2019 at 10/10/14 8:39 AM:


I have noticed the same problem with workers behavior. My installation: Spark 
1.0.2-hadoop2.0.0-mr1-cdh4.2.0 on Mesos 0.13. In my case, workers fail when 
there was an error while serialization the closure. Also please notice that we 
run Spark in  coarse-grained mode


was (Author: dmaverick):
I have noticed the same problem with workers behavior. My installation: Spark 
1.0.2-hadoop2.0.0-mr1-cdh4.2.0 on Mesos 0.13. In my case, workers fail when 
there was an error while serialization the closure.

 Spark workers die/disappear when job fails for nearly any reason
 

 Key: SPARK-2019
 URL: https://issues.apache.org/jira/browse/SPARK-2019
 Project: Spark
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: sam

 We either have to reboot all the nodes, or run 'sudo service spark-worker 
 restart' across our cluster.  I don't think this should happen - the job 
 failures are often not even that bad.  There is a 5 upvoted SO question here: 
 http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails
  
 We shouldn't be giving restart privileges to our devs, and therefore our 
 sysadm has to frequently restart the workers.  When the sysadm is not around, 
 there is nothing our devs can do.
 Many thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason

2014-10-10 Thread Denis Serduik (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166555#comment-14166555
 ] 

Denis Serduik edited comment on SPARK-2019 at 10/10/14 8:40 AM:


I have noticed the same problem with workers behavior. My installation: Spark 
1.0.2-hadoop2.0.0-mr1-cdh4.2.0 on Mesos 0.13. In my case, workers fail when 
there was an error while serialization the closure. Also please note, we run 
Spark in  coarse-grained mode


was (Author: dmaverick):
I have noticed the same problem with workers behavior. My installation: Spark 
1.0.2-hadoop2.0.0-mr1-cdh4.2.0 on Mesos 0.13. In my case, workers fail when 
there was an error while serialization the closure. Also please notice that we 
run Spark in  coarse-grained mode

 Spark workers die/disappear when job fails for nearly any reason
 

 Key: SPARK-2019
 URL: https://issues.apache.org/jira/browse/SPARK-2019
 Project: Spark
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: sam

 We either have to reboot all the nodes, or run 'sudo service spark-worker 
 restart' across our cluster.  I don't think this should happen - the job 
 failures are often not even that bad.  There is a 5 upvoted SO question here: 
 http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails
  
 We shouldn't be giving restart privileges to our devs, and therefore our 
 sysadm has to frequently restart the workers.  When the sysadm is not around, 
 there is nothing our devs can do.
 Many thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason

2014-06-05 Thread sam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018048#comment-14018048
 ] 

sam edited comment on SPARK-2019 at 6/5/14 9:47 AM:


Sorry. Its -0.9.1- 0.9.0


was (Author: sams):
Sorry. Its 0.9.1

 Spark workers die/disappear when job fails for nearly any reason
 

 Key: SPARK-2019
 URL: https://issues.apache.org/jira/browse/SPARK-2019
 Project: Spark
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: sam
Priority: Critical

 We either have to reboot all the nodes, or run 'sudo service spark-worker 
 restart' across our cluster.  I don't think this should happen - the job 
 failures are often not even that bad.  There is a 5 upvoted SO question here: 
 http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails
 We shouldn't be giving restart privileges to our devs, and therefore our 
 sysadm has to frequently restart the workers.  When the sysadm is not around, 
 there is nothing our devs can do.
 Many thanks



--
This message was sent by Atlassian JIRA
(v6.2#6252)