Hi all,
Regarding a post here a few months ago
http://apache-spark-user-list.1001560.n3.nabble.com/Workers-disconnected-from-master-sometimes-and-never-reconnect-back-tp6240.html
Is there an answer to this?
I saw workers being still active and not reconnecting after they lost
connection
-master-sometimes-and-never-reconnect-back-tp6240.html
Is there an answer to this?
I saw workers being still active and not reconnecting after they lost
connection to the master. Using Spark 1.1.0.
What if a master server is restarted, should worker retry to register on
it?
Greetings
Hi,
Another problem we observed that on a very heavily loaded cluster, if the
worker fails to respond to the heartbeat within 60 seconds, it gets
disconnected permanently from the master and never connects back again. It
is very easy to reproduce - just setup a spark standalone cluster on a