BTW I'm using Spark 1.3.0. Thanks
On Fri, May 8, 2015 at 5:22 PM, James King <jakwebin...@gmail.com> wrote: > I have two hosts host01 and host02 (lets call them) > > I run one Master and two Workers on host01 > I also run one Master and two Workers on host02 > > Now I have 1 LIVE Master on host01 and a STANDBY Master on host02 > The LIVE Master is aware of all Workers in the cluster > > Now I submit a Spark application using > > bin/spark-submit --class SomeApp --deploy-mode cluster --supervise > --master spark://host01:7077 Some.jar > > This to make the driver resilient to failure. > > Now the interesting part: > > If I stop the cluster (all daemons on all hosts) and restart > the Master and Workers *only* on host01 the job resumes! as expected. > > But if I stop the cluster (all daemons on all hosts) and restart the > Master and Workers *only* on host02 the job *does not* > resume execution! why? > > I can see the driver on host02 WebUI listed but no job execution. Please > let me know why. > > Am I wrong to expect it to resume execution in this case? > > > > > >