BTW I'm using Spark 1.3.0.

Thanks

On Fri, May 8, 2015 at 5:22 PM, James King <jakwebin...@gmail.com> wrote:

> I have two hosts host01 and host02 (lets call them)
>
> I run one Master and two Workers on host01
> I also run one Master and two Workers on host02
>
> Now I have 1 LIVE Master on host01 and a STANDBY Master on host02
> The LIVE Master is aware of all Workers in the cluster
>
> Now I submit a Spark application using
>
> bin/spark-submit --class SomeApp --deploy-mode cluster --supervise
> --master spark://host01:7077 Some.jar
>
> This to make the driver resilient to failure.
>
> Now the interesting part:
>
> If I stop the cluster (all daemons on all hosts) and restart
> the Master and Workers *only* on host01 the job resumes! as expected.
>
> But if I stop the cluster (all daemons on all hosts) and restart the
> Master and Workers *only* on host02 the job *does not*
> resume execution! why?
>
> I can see the driver on host02 WebUI listed but no job execution. Please
> let me know why.
>
> Am I wrong to expect it to resume execution in this case?
>
>
>
>
>
>

Reply via email to