Re: Spark worker abruptly dying after 2 days

2016-02-14 Thread Kartik Mathur
Yes you are right I initially started from master node but what happened suddenly after 2 days that workers dies is what I am interested in knowing , is it possible that workers got disconnected because of some network issue and then they tried tried starting themselves but kept failing ? On Sun,

Re: Spark worker abruptly dying after 2 days

2016-02-14 Thread Prabhu Joseph
Kartik, Spark Workers won't start if SPARK_MASTER_IP is wrong, maybe you would have used start_slaves.sh from Master node to start all worker nodes, where Workers would have got correct SPARK_MASTER_IP initially. Later any restart from slave nodes would have failed because of wrong

Re: Spark worker abruptly dying after 2 days

2016-02-14 Thread Kartik Mathur
Thanks Prabhu , I had wrongly configured spark_master_ip in worker nodes to `hostname -f` which is the worker and not master , but now the question is *why the cluster was up initially for 2 days* and workers realized of this invalid configuration after 2 days ? And why other workers are still

Re: Spark worker abruptly dying after 2 days

2016-02-14 Thread Prabhu Joseph
Kartik, The exception stack trace *java.util.concurrent.RejectedExecutionException* will happen if SPARK_MASTER_IP in worker nodes are configured wrongly like if SPARK_MASTER_IP is a hostname of Master Node and workers trying to connect to IP of master node. Check whether SPARK_MASTER_IP in

Spark worker abruptly dying after 2 days

2016-02-14 Thread Kartik Mathur
on spark 1.5.2 I have a spark standalone cluster with 6 workers , I left the cluster idle for 3 days and after 3 days I saw only 4 workers on the spark master UI , 2 workers died with the same exception - Strange part is cluster was running stable for 2 days but on third day 2 workers abruptly