Re: how to debug this kind of error, e.g. lost executor?
Try increasing the value of spark.yarn.executor.memoryOverhead. It’s default value is 384mb in spark 1.1. This error generally comes when your process usage exceed your max allocation. Use following property to increase memory overhead. From: Yifan LI iamyifa...@gmail.commailto:iamyifa...@gmail.com Date: Friday, 6 February 2015 3:53 pm To: Ankur Srivastava ankur.srivast...@gmail.commailto:ankur.srivast...@gmail.com Cc: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: how to debug this kind of error, e.g. lost executor? Hi Ankur, Thanks very much for your help, but I am using v1.2, so it is SORT… Let me know if you have any other advice, :) Best, Yifan LI On 05 Feb 2015, at 17:56, Ankur Srivastava ankur.srivast...@gmail.commailto:ankur.srivast...@gmail.com wrote: Li, I cannot tell you the reason for this exception but have seen these kind of errors when using HASH based shuffle manager (which is default) until v 1.2. Try the SORT shuffle manager. Hopefully that will help Thanks Ankur Anyone has idea on where I can find the detailed log of that lost executor(why it was lost)? Thanks in advance! On 05 Feb 2015, at 16:14, Yifan LI iamyifa...@gmail.commailto:iamyifa...@gmail.com wrote: Hi, I am running a heavy memory/cpu overhead graphx application, I think the memory is sufficient and set RDDs’ StorageLevel using MEMORY_AND_DISK. But I found there were some tasks failed due to following errors: java.io.FileNotFoundException: /data/spark/local/spark-local-20150205151711-9700/09/rdd_3_275 (No files or folders of this type) ExecutorLostFailure (executor 11 lost) So, finally that stage failed: org.apache.spark.shuffle.FetchFailedException: java.io.FileNotFoundException: /data/spark/local/spark-local-20150205151711-587a/16/shuffle_11_219_0.index Anyone has points? Where I can get more details for this issue? Best, Yifan LI
Re: how to debug this kind of error, e.g. lost executor?
could you find the shuffle files? or the files were deleted by other processes? Yours, Xuefeng Wu 吴雪峰 敬上 On 2015年2月5日, at 下午11:14, Yifan LI iamyifa...@gmail.com wrote: Hi, I am running a heavy memory/cpu overhead graphx application, I think the memory is sufficient and set RDDs’ StorageLevel using MEMORY_AND_DISK. But I found there were some tasks failed due to following errors: java.io.FileNotFoundException: /data/spark/local/spark-local-20150205151711-9700/09/rdd_3_275 (No files or folders of this type) ExecutorLostFailure (executor 11 lost) So, finally that stage failed: org.apache.spark.shuffle.FetchFailedException: java.io.FileNotFoundException: /data/spark/local/spark-local-20150205151711-587a/16/shuffle_11_219_0.index Anyone has points? Where I can get more details for this issue? Best, Yifan LI
Re: how to debug this kind of error, e.g. lost executor?
Anyone has idea on where I can find the detailed log of that lost executor(why it was lost)? Thanks in advance! On 05 Feb 2015, at 16:14, Yifan LI iamyifa...@gmail.com wrote: Hi, I am running a heavy memory/cpu overhead graphx application, I think the memory is sufficient and set RDDs’ StorageLevel using MEMORY_AND_DISK. But I found there were some tasks failed due to following errors: java.io.FileNotFoundException: /data/spark/local/spark-local-20150205151711-9700/09/rdd_3_275 (No files or folders of this type) ExecutorLostFailure (executor 11 lost) So, finally that stage failed: org.apache.spark.shuffle.FetchFailedException: java.io.FileNotFoundException: /data/spark/local/spark-local-20150205151711-587a/16/shuffle_11_219_0.index Anyone has points? Where I can get more details for this issue? Best, Yifan LI
Re: how to debug this kind of error, e.g. lost executor?
Li, I cannot tell you the reason for this exception but have seen these kind of errors when using HASH based shuffle manager (which is default) until v 1.2. Try the SORT shuffle manager. Hopefully that will help Thanks Ankur Anyone has idea on where I can find the detailed log of that lost executor(why it was lost)? Thanks in advance! On 05 Feb 2015, at 16:14, Yifan LI iamyifa...@gmail.com wrote: Hi, I am running a heavy memory/cpu overhead graphx application, I think the memory is sufficient and set RDDs’ StorageLevel using MEMORY_AND_DISK. But I found there were some tasks failed due to following errors: java.io.FileNotFoundException: /data/spark/local/spark-local-20150205151711-9700/09/rdd_3_275 (No files or folders of this type) ExecutorLostFailure (executor 11 lost) So, finally that stage failed: org.apache.spark.shuffle.FetchFailedException: java.io.FileNotFoundException: /data/spark/local/spark-local-20150205151711-587a/16/shuffle_11_219_0.index Anyone has points? Where I can get more details for this issue? Best, Yifan LI