[ https://issues.apache.org/jira/browse/SPARK-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171331#comment-14171331 ]
Matt Cheah commented on SPARK-3736: ----------------------------------- I was curious if anyone had any feedback on my above comment? > Workers should reconnect to Master if disconnected > -------------------------------------------------- > > Key: SPARK-3736 > URL: https://issues.apache.org/jira/browse/SPARK-3736 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.0.2, 1.1.0 > Reporter: Andrew Ash > Assignee: Matthew Cheah > Priority: Critical > > In standalone mode, when a worker gets disconnected from the master for some > reason it never attempts to reconnect. In this situation you have to bounce > the worker before it will reconnect to the master. > The preferred alternative is to follow what Hadoop does -- when there's a > disconnect, attempt to reconnect at a particular interval until successful (I > think it repeats indefinitely every 10sec). > This has been observed by: > - [~pkolaczk] in > http://apache-spark-user-list.1001560.n3.nabble.com/Workers-disconnected-from-master-sometimes-and-never-reconnect-back-td6240.html > - [~romi-totango] in > http://apache-spark-user-list.1001560.n3.nabble.com/Re-Workers-disconnected-from-master-sometimes-and-never-reconnect-back-td15335.html > - [~aash] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org