Hi I have written Spark job which seems to be working fine for almost an hour
and after that executor start getting lost because of timeout I see the
following in log statement

15/08/16 12:26:46 WARN spark.HeartbeatReceiver: Removing executor 10 with no
recent heartbeats: 1051638 ms exceeds timeout 1000000 ms 

I dont see any errors but I see above warning and because of it executor
gets removed by YARN and I see Rpc client disassociated error and
IOException connection refused and FetchFailedException

After executor gets removed I see it is again getting added and starts
working and some other executors fails again. My question is is it normal
for executor getting lost? What happens to that task lost executors were
working on? My Spark job keeps on running since it is long around 4-5 hours
I have very good cluster with 1.2 TB memory and good no of CPU cores. To
solve above time out issue I tried to increase time spark.akka.timeout to
1000 seconds but no luck. I am using the following command to run my Spark
job Please guide I am new to Spark. I am using Spark 1.4.1. Thanks in
advance.

/spark-submit --class com.xyz.abc.MySparkJob  --conf
"spark.executor.extraJavaOptions=-XX:MaxPermSize=512M" --driver-java-options
-XX:MaxPermSize=512m --driver-memory 4g --master yarn-client
--executor-memory 25G --executor-cores 8 --num-executors 5 --jars
/path/to/spark-job.jar



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-executor-lost-because-of-time-out-even-after-setting-quite-long-time-out-value-1000-seconds-tp24289.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to