Re: What are the likely causes of org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle?

2015-06-26 Thread XianXing Zhang
Do we have any update on this thread? Has anyone met and solved similar problems before? Any pointers will be greatly appreciated! Best, XianXing On Mon, Jun 15, 2015 at 11:48 PM, Jia Yu jia...@asu.edu wrote: Hi Peng, I got exactly same error! My shuffle data is also very large. Have you

Re: What are the likely causes of org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle?

2015-06-26 Thread Eugen Cepoi
Are you using yarn? If yes increase the yarn memory overhead option. Yarn is probably killing your executors. Le 26 juin 2015 20:43, XianXing Zhang xianxing.zh...@gmail.com a écrit : Do we have any update on this thread? Has anyone met and solved similar problems before? Any pointers will be

Re: What are the likely causes of org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle?

2015-06-26 Thread XianXing Zhang
Yes we deployed Spark on top of Yarn. What you suggested is very helpful, I increased the Yarn memory overhead option and it helped in most cases. (Sometime it still has some failures when the amount of data to be shuffled is large, but I guess if I continue increasing the Yarn memory overhead

Re: What are the likely causes of org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle?

2015-06-16 Thread Jia Yu
Hi Peng, I got exactly same error! My shuffle data is also very large. Have you figured out a method to solve that? Thanks, Jia On Fri, Apr 24, 2015 at 7:59 AM, Peng Cheng pc...@uow.edu.au wrote: I'm deploying a Spark data processing job on an EC2 cluster, the job is small for the cluster

What are the likely causes of org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle?

2015-04-24 Thread Peng Cheng
I'm deploying a Spark data processing job on an EC2 cluster, the job is small for the cluster (16 cores with 120G RAM in total), the largest RDD has only 76k+ rows. But heavily skewed in the middle (thus requires repartitioning) and each row has around 100k of data after serialization. The job