We are using Spark job server to submit spark jobs (our spark version is 0.91). 
After running the spark job server for a while, we often see the following 
errors (executor lost) in the spark job server log. As a consequence, the spark 
driver (allocated inside spark job server) gradually loses executors. And 
finally the spark job server no longer be able to submit jobs. We tried to 
google the solutions but so far no luck. Please help if you have any ideas. 
Thanks!
[2014-11-25 01:37:36,250] INFO  parkDeploySchedulerBackend [] 
[akka://JobServer/user/context-supervisor/next-staging] - Executor 6 
disconnected, so removing it[2014-11-25 01:37:36,252] ERROR 
cheduler.TaskSchedulerImpl [] 
[akka://JobServer/user/context-supervisor/next-staging] - Lost executor 6 on 
XXXX: remote Akka client disassociated[2014-11-25 01:37:36,252] INFO  
ark.scheduler.DAGScheduler [] [] - Executor lost: 6 (epoch 8)[2014-11-25 
01:37:36,252] INFO  ge.BlockManagerMasterActor [] [] - Trying to remove 
executor 6 from BlockManagerMaster.[2014-11-25 01:37:36,252] INFO  
storage.BlockManagerMaster [] [] - Removed 6 successfully in 
removeExecutor[2014-11-25 01:37:36,286] INFO  ient.AppClient$ClientActor [] 
[akka://JobServer/user/context-supervisor/next-staging] - Executor updated: 
app-20141125002023-0037/6 is now FAILED (Command exited with code 143)

Reply via email to