Try increasing the value of spark.yarn.executor.memoryOverhead. It’s default value is 384mb in spark 1.1. This error generally comes when your process usage exceed your max allocation. Use following property to increase memory overhead.
From: Yifan LI <iamyifa...@gmail.com<mailto:iamyifa...@gmail.com>> Date: Friday, 6 February 2015 3:53 pm To: Ankur Srivastava <ankur.srivast...@gmail.com<mailto:ankur.srivast...@gmail.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: how to debug this kind of error, e.g. "lost executor"? Hi Ankur, Thanks very much for your help, but I am using v1.2, so it is SORT… Let me know if you have any other advice, :) Best, Yifan LI On 05 Feb 2015, at 17:56, Ankur Srivastava <ankur.srivast...@gmail.com<mailto:ankur.srivast...@gmail.com>> wrote: Li, I cannot tell you the reason for this exception but have seen these kind of errors when using HASH based shuffle manager (which is default) until v 1.2. Try the SORT shuffle manager. Hopefully that will help Thanks Ankur Anyone has idea on where I can find the detailed log of that lost executor(why it was lost)? Thanks in advance! On 05 Feb 2015, at 16:14, Yifan LI <iamyifa...@gmail.com<mailto:iamyifa...@gmail.com>> wrote: Hi, I am running a heavy memory/cpu overhead graphx application, I think the memory is sufficient and set RDDs’ StorageLevel using MEMORY_AND_DISK. But I found there were some tasks failed due to following errors: java.io.FileNotFoundException: /data/spark/local/spark-local-20150205151711-9700/09/rdd_3_275 (No files or folders of this type) ExecutorLostFailure (executor 11 lost) So, finally that stage failed: org.apache.spark.shuffle.FetchFailedException: java.io.FileNotFoundException: /data/spark/local/spark-local-20150205151711-587a/16/shuffle_11_219_0.index Anyone has points? Where I can get more details for this issue? Best, Yifan LI