I’m getting the same issue on Spark 1.2.0. Despite having set
“spark.core.connection.ack.wait.timeout” in spark-defaults.conf and verified in
the job UI (port 4040) environment tab, I still get the “no heartbeat in 60
seconds” error.
spark.core.connection.ack.wait.timeout=3600
15/01/22
For one of my Spark jobs, my workers/executors are dying and leaving the
cluster.
On the master, I see something like the following in the log file. I'm
surprised to see the '60' seconds in the master log below because I explicitly
set it to '600' (or so I thought) in my spark job (see
Hi Darin,
In our case, we were getting the error gue to long GC pauses in our app.
Fixing the underlying code helped us remove this error. This is also
mentioned as point 1 in the link below:
Darin,
You might want to increase these config options also:
spark.akka.timeout 300
spark.storage.blockManagerSlaveTimeoutMs 30
On Thu, Nov 13, 2014 at 11:31 AM, Darin McBeath ddmcbe...@yahoo.com.invalid
wrote:
For one of my Spark jobs, my workers/executors are dying and leaving the