[jira] [Commented] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason

sam (JIRA) Thu, 05 Jun 2014 02:28:26 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018617#comment-14018617
 ]


sam commented on SPARK-2019:
----------------------------

I double checked with my sysadm this morning and although we have a ticket 
closed in our JIRA saying "upgrade spark 0.9.1" we don't actually have it on 
the cluster - turned out he didn't click the "won't fix" status by mistake!!

Very sorry for the confusion, hope you haven't yet wasted any time on this.

We are going to upgrade to 1.0.0 in a week or two ... I will keep an eye on 
things then we do and update.

> Spark workers die/disappear when job fails for nearly any reason
> ----------------------------------------------------------------
>
>                 Key: SPARK-2019
>                 URL: https://issues.apache.org/jira/browse/SPARK-2019
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: sam
>            Priority: Critical
>
> We either have to reboot all the nodes, or run 'sudo service spark-worker 
> restart' across our cluster.  I don't think this should happen - the job 
> failures are often not even that bad.  There is a 5 upvoted SO question here: 
> http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails
> We shouldn't be giving restart privileges to our devs, and therefore our 
> sysadm has to frequently restart the workers.  When the sysadm is not around, 
> there is nothing our devs can do.
> Many thanks



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason

Reply via email to