We are having the same problem. We're running Spark 0.9.1 in standalone
mode and on some heavy jobs workers become unresponsive and marked by
master as dead, even though the worker process is still running. Then they
never join the cluster again and cluster becomes essentially unusable until
we
Hi, I see this has been asked before but has not gotten any satisfactory
answer so I'll try again:
(here is the original thread I found:
http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3c1394044078706-2312.p...@n3.nabble.com%3E
)
I have a set of workers dying and coming back
I have also had trouble in worker joining the working set. I have typically
moved to Mesos based setup. Frankly for high availability you are better
off using a cluster manager.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
I get the same problem, but I'm running in a dev environment based on
docker scripts. The additional issue is that the worker processes do not
die and so the docker container does not exit. So I end up with worker
containers that are not participating in the cluster.
On Fri, Jun 13, 2014 at 9:44