Re: What are the likely causes of org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle?

Jia Yu Mon, 15 Jun 2015 23:49:12 -0700

Hi Peng,

I got exactly same error! My shuffle data is also very large. Have you
figured out a method to solve that?


Thanks,
Jia

On Fri, Apr 24, 2015 at 7:59 AM, Peng Cheng <pc...@uow.edu.au> wrote:

> I'm deploying a Spark data processing job on an EC2 cluster, the job is
> small
> for the cluster (16 cores with 120G RAM in total), the largest RDD has only
> 76k+ rows. But heavily skewed in the middle (thus requires repartitioning)
> and each row has around 100k of data after serialization. The job always
> got
> stuck in repartitioning. Namely, the job will constantly get following
> errors and retries:
>
> org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output
> location for shuffle
>
> org.apache.spark.shuffle.FetchFailedException: Error in opening
> FileSegmentManagedBuffer
>
> org.apache.spark.shuffle.FetchFailedException:
> java.io.FileNotFoundException: /tmp/spark-...
> I've tried to identify the problem but it seems like both memory and disk
> consumption of the machine throwing these errors are below 50%. I've also
> tried different configurations, including:
>
> let driver/executor memory use 60% of total memory.
> let netty to priortize JVM shuffling buffer.
> increase shuffling streaming buffer to 128m.
> use KryoSerializer and max out all buffers
> increase shuffling memoryFraction to 0.4
> But none of them works. The small job always trigger the same series of
> errors and max out retries (upt to 1000 times). How to troubleshoot this
> thing in such situation?
>
> Thanks a lot if you have any clue.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/What-are-the-likely-causes-of-org-apache-spark-shuffle-MetadataFetchFailedException-Missing-an-outpu-tp22646.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: What are the likely causes of org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle?

Reply via email to