Re: Master not seeing recovered nodes(Got heartbeat from unregistered worker ....)

2014-06-16 Thread Piotr Kołaczkowski
We are having the same problem. We're running Spark 0.9.1 in standalone mode and on some heavy jobs workers become unresponsive and marked by master as dead, even though the worker process is still running. Then they never join the cluster again and cluster becomes essentially unusable until we

Master not seeing recovered nodes(Got heartbeat from unregistered worker ....)

2014-06-13 Thread Yana Kadiyska
Hi, I see this has been asked before but has not gotten any satisfactory answer so I'll try again: (here is the original thread I found: http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3c1394044078706-2312.p...@n3.nabble.com%3E ) I have a set of workers dying and coming back

Re: Master not seeing recovered nodes(Got heartbeat from unregistered worker ....)

2014-06-13 Thread Mayur Rustagi
I have also had trouble in worker joining the working set. I have typically moved to Mesos based setup. Frankly for high availability you are better off using a cluster manager. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi

Re: Master not seeing recovered nodes(Got heartbeat from unregistered worker ....)

2014-06-13 Thread Gino Bustelo
I get the same problem, but I'm running in a dev environment based on docker scripts. The additional issue is that the worker processes do not die and so the docker container does not exit. So I end up with worker containers that are not participating in the cluster. On Fri, Jun 13, 2014 at 9:44