1. For worker and master:
spark.worker.timeout 60s
see: http://spark.apache.org/docs/latest/spark-standalone.html

2. For executor and driver:
spark.executor.heartbeatInterval 10s
see: http://spark.apache.org/docs/latest/configuration.html


Please correct me if I'm wrong.



On Thu, Apr 6, 2017 at 5:01 AM, map reduced <k3t.gi...@gmail.com> wrote:

> Hi,
>
> I was wondering on how often does Worker pings Master to check on Master's
> liveness? Or is it the Master (Resource manager) that pings Workers to
> check on their liveness and if any workers are dead to spawn ? Or is it
> both?
>
> Some info:
> Standalone cluster
> 1 Master - 8core 12Gb
> 32 workers - each 8 core and 8 Gb
>
> My main problem - Here's what happened:
>
> Master M - running with 32 workers
> Worker 1 and 2 died at 03:55:00 - so now the cluster is 30 workers
>
> Worker 1' came up at 03:55:12.000 AM - it connected to M
> Worker 2' came up at 03:55:16.000 AM - it connected to M
>
> Master M *dies* at 03:56.00 AM
> New master NM' comes up at 03:56:30 AM
> Worker 1' and 2' - *DO NOT* connect to NM
> Remaining 30 workers connect to NM.
>
> So NM now has 30 workers.
>
> I was wondering on why those two won't connect to new master NM even
> though master M is dead for sure.
>
> PS:I have a LB setup for Master which means that whenever a new master
> comes in LB will start pointing to new one.
>
> Thanks,
> KP
>
>

Reply via email to