1. For worker and master: spark.worker.timeout 60s see: http://spark.apache.org/docs/latest/spark-standalone.html
2. For executor and driver: spark.executor.heartbeatInterval 10s see: http://spark.apache.org/docs/latest/configuration.html Please correct me if I'm wrong. On Thu, Apr 6, 2017 at 5:01 AM, map reduced <k3t.gi...@gmail.com> wrote: > Hi, > > I was wondering on how often does Worker pings Master to check on Master's > liveness? Or is it the Master (Resource manager) that pings Workers to > check on their liveness and if any workers are dead to spawn ? Or is it > both? > > Some info: > Standalone cluster > 1 Master - 8core 12Gb > 32 workers - each 8 core and 8 Gb > > My main problem - Here's what happened: > > Master M - running with 32 workers > Worker 1 and 2 died at 03:55:00 - so now the cluster is 30 workers > > Worker 1' came up at 03:55:12.000 AM - it connected to M > Worker 2' came up at 03:55:16.000 AM - it connected to M > > Master M *dies* at 03:56.00 AM > New master NM' comes up at 03:56:30 AM > Worker 1' and 2' - *DO NOT* connect to NM > Remaining 30 workers connect to NM. > > So NM now has 30 workers. > > I was wondering on why those two won't connect to new master NM even > though master M is dead for sure. > > PS:I have a LB setup for Master which means that whenever a new master > comes in LB will start pointing to new one. > > Thanks, > KP > >