Re: how to debug this kind of error, e.g. lost executor?

2015-02-11 Thread Praveen Garg
Try increasing the value of spark.yarn.executor.memoryOverhead. It’s default 
value is 384mb in spark 1.1. This error generally comes when your process usage 
exceed your max allocation. Use following property to increase memory overhead.

From: Yifan LI iamyifa...@gmail.commailto:iamyifa...@gmail.com
Date: Friday, 6 February 2015 3:53 pm
To: Ankur Srivastava 
ankur.srivast...@gmail.commailto:ankur.srivast...@gmail.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: how to debug this kind of error, e.g. lost executor?

Hi Ankur,

Thanks very much for your help, but I am using v1.2, so it is SORT…

Let me know if you have any other advice, :)

Best,
Yifan LI





On 05 Feb 2015, at 17:56, Ankur Srivastava 
ankur.srivast...@gmail.commailto:ankur.srivast...@gmail.com wrote:


Li, I cannot tell you the reason for this exception but have seen these kind of 
errors when using HASH based shuffle manager (which is default) until v 1.2. 
Try the SORT shuffle manager.

Hopefully that will help

Thanks
Ankur

Anyone has idea on where I can find the detailed log of that lost executor(why 
it was lost)?

Thanks in advance!





On 05 Feb 2015, at 16:14, Yifan LI 
iamyifa...@gmail.commailto:iamyifa...@gmail.com wrote:

Hi,

I am running a heavy memory/cpu overhead graphx application, I think the memory 
is sufficient and set RDDs’ StorageLevel using MEMORY_AND_DISK.

But I found there were some tasks failed due to following errors:

java.io.FileNotFoundException: 
/data/spark/local/spark-local-20150205151711-9700/09/rdd_3_275 (No files or 
folders of this type)

ExecutorLostFailure (executor 11 lost)


So, finally that stage failed:

org.apache.spark.shuffle.FetchFailedException: java.io.FileNotFoundException: 
/data/spark/local/spark-local-20150205151711-587a/16/shuffle_11_219_0.index


Anyone has points? Where I can get more details for this issue?


Best,
Yifan LI









Re: how to debug this kind of error, e.g. lost executor?

2015-02-05 Thread Xuefeng Wu
could you find the shuffle files? or the files were deleted by other processes?

Yours, Xuefeng Wu 吴雪峰 敬上

 On 2015年2月5日, at 下午11:14, Yifan LI iamyifa...@gmail.com wrote:
 
 Hi,
 
 I am running a heavy memory/cpu overhead graphx application, I think the 
 memory is sufficient and set RDDs’ StorageLevel using MEMORY_AND_DISK.
 
 But I found there were some tasks failed due to following errors:
 
 java.io.FileNotFoundException: 
 /data/spark/local/spark-local-20150205151711-9700/09/rdd_3_275 (No files or 
 folders of this type)
 
 ExecutorLostFailure (executor 11 lost)
 
 
 So, finally that stage failed:
 
 org.apache.spark.shuffle.FetchFailedException: java.io.FileNotFoundException: 
 /data/spark/local/spark-local-20150205151711-587a/16/shuffle_11_219_0.index
 
 
 Anyone has points? Where I can get more details for this issue?
 
 
 Best,
 Yifan LI
 
 
 
 
 


Re: how to debug this kind of error, e.g. lost executor?

2015-02-05 Thread Yifan LI

Anyone has idea on where I can find the detailed log of that lost executor(why 
it was lost)?

Thanks in advance!





 On 05 Feb 2015, at 16:14, Yifan LI iamyifa...@gmail.com wrote:
 
 Hi,
 
 I am running a heavy memory/cpu overhead graphx application, I think the 
 memory is sufficient and set RDDs’ StorageLevel using MEMORY_AND_DISK.
 
 But I found there were some tasks failed due to following errors:
 
 java.io.FileNotFoundException: 
 /data/spark/local/spark-local-20150205151711-9700/09/rdd_3_275 (No files or 
 folders of this type)
 
 ExecutorLostFailure (executor 11 lost)
 
 
 So, finally that stage failed:
 
 org.apache.spark.shuffle.FetchFailedException: java.io.FileNotFoundException: 
 /data/spark/local/spark-local-20150205151711-587a/16/shuffle_11_219_0.index
 
 
 Anyone has points? Where I can get more details for this issue?
 
 
 Best,
 Yifan LI
 
 
 
 
 



Re: how to debug this kind of error, e.g. lost executor?

2015-02-05 Thread Ankur Srivastava
Li, I cannot tell you the reason for this exception but have seen these
kind of errors when using HASH based shuffle manager (which is default)
until v 1.2. Try the SORT shuffle manager.

Hopefully that will help

Thanks
Ankur

Anyone has idea on where I can find the detailed log of that lost
executor(why it was lost)?

Thanks in advance!





On 05 Feb 2015, at 16:14, Yifan LI iamyifa...@gmail.com wrote:

Hi,

I am running a heavy memory/cpu overhead graphx application, I think the
memory is sufficient and set RDDs’ StorageLevel using MEMORY_AND_DISK.

But I found there were some tasks failed due to following errors:

java.io.FileNotFoundException:
/data/spark/local/spark-local-20150205151711-9700/09/rdd_3_275 (No files or
folders of this type)

ExecutorLostFailure (executor 11 lost)


So, finally that stage failed:

org.apache.spark.shuffle.FetchFailedException:
java.io.FileNotFoundException:
/data/spark/local/spark-local-20150205151711-587a/16/shuffle_11_219_0.index


Anyone has points? Where I can get more details for this issue?


Best,
Yifan LI