Re: Workers disconnected from master sometimes and never reconnect back

2014-09-29 Thread Romi Kuntsman
Hi all, Regarding a post here a few months ago http://apache-spark-user-list.1001560.n3.nabble.com/Workers-disconnected-from-master-sometimes-and-never-reconnect-back-tp6240.html Is there an answer to this? I saw workers being still active and not reconnecting after they lost connection

Re: Workers disconnected from master sometimes and never reconnect back

2014-09-29 Thread Andrew Ash
-master-sometimes-and-never-reconnect-back-tp6240.html Is there an answer to this? I saw workers being still active and not reconnecting after they lost connection to the master. Using Spark 1.1.0. What if a master server is restarted, should worker retry to register on it? Greetings

Workers disconnected from master sometimes and never reconnect back

2014-05-22 Thread Piotr Kołaczkowski
Hi, Another problem we observed that on a very heavily loaded cluster, if the worker fails to respond to the heartbeat within 60 seconds, it gets disconnected permanently from the master and never connects back again. It is very easy to reproduce - just setup a spark standalone cluster on a