I'm running Spark 1.3.1 on AWS... Having long-running application (spark
context) which accepts and completes jobs fine. However, it crashes at as it
seems random times (anywhere from 1 hour and up to 6 days)... At a latter
case, context run and finished hundreds of jobs without an issue and then
suddenly crashed with the following 2 lines on executors' logs:

/15/06/24 10:35:44 ERROR executor.CoarseGrainedExecutorBackend: Driver
Disassociated
[akka.tcp://sparkExecutor@ip-***-**-36-70.us-west-2.compute.internal:59891]
->
[akka.tcp://sparkDriver@ip-***-**-42-150.us-west-2.compute.internal:56572]
disassociated! Shutting down.
15/06/24 10:35:44 WARN remote.ReliableDeliverySupervisor: Association with
remote system
[akka.tcp://sparkDriver@ip-***-**-42-150.us-west-2.compute.internal:56572]
has failed, address is now gated for [5000] ms. Reason is: [Disassociated]./

Following advises on the forums, I've removed SPARK_PUBLIC_DNS setting and
increased the following Akka configs:

/spark.akka.failure-detector.threshold 30000
spark.akka.heartbeat.interval 100000
spark.akka.heartbeat.pauses 600000/

This resulted in context crash after 2 hours with different warnings/errors
*during* the operation:

[2015-06-25 04:50:16,769] WARN  ient.AppClient$ClientActor []
[akka://JobServer/user/context-supervisor/spark-sql-context] - Connection to
akka.tcp://sparkMaster@ec2-***.us-west-2.compute.amazonaws.com:7077 failed;
waiting for master to reconnect...
[2015-06-25 04:50:17,400] ERROR cheduler.TaskSchedulerImpl []
[akka://JobServer/user/context-supervisor/spark-sql-context] - Lost executor
0 on ip-***.us-west-2.compute.internal: remote Akka client disassociated

Despite of these, the log continues and shows couple jobs even done after
these... But then the end of story, context silently died...

Help with understanding and dealing with this would be greatly appreciated!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Akka-failures-Driver-Disassociated-tp23486.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to