[ 
https://issues.apache.org/jira/browse/SPARK-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171358#comment-14171358
 ] 

Nan Zhu commented on SPARK-3736:
--------------------------------

if the worker itself timeout, the Master will remove the worker from 
idToWorker, 

when the worker is resumed later and sends heartbeat to Master again, Master 
detect this by attempting to find worker in idToWorker (search "logWarning("Got 
heartbeat from unregistered worker " + workerId)" in Master.scala)

you can simply replace logWarning with the logic of sending a message to worker 
to ask it to re-register


> Workers should reconnect to Master if disconnected
> --------------------------------------------------
>
>                 Key: SPARK-3736
>                 URL: https://issues.apache.org/jira/browse/SPARK-3736
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.2, 1.1.0
>            Reporter: Andrew Ash
>            Assignee: Matthew Cheah
>            Priority: Critical
>
> In standalone mode, when a worker gets disconnected from the master for some 
> reason it never attempts to reconnect.  In this situation you have to bounce 
> the worker before it will reconnect to the master.
> The preferred alternative is to follow what Hadoop does -- when there's a 
> disconnect, attempt to reconnect at a particular interval until successful (I 
> think it repeats indefinitely every 10sec).
> This has been observed by:
> - [~pkolaczk] in 
> http://apache-spark-user-list.1001560.n3.nabble.com/Workers-disconnected-from-master-sometimes-and-never-reconnect-back-td6240.html
> - [~romi-totango] in 
> http://apache-spark-user-list.1001560.n3.nabble.com/Re-Workers-disconnected-from-master-sometimes-and-never-reconnect-back-td15335.html
> - [~aash]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to