Re: job keeps failing with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 1

Kelvin Chu Fri, 27 Feb 2015 11:24:42 -0800

Hi Darin, you might increase spark.yarn.executor.memoryOverhead to see if
it fixes the problem. Please take a look of this report:
https://issues.apache.org/jira/browse/SPARK-4996


On Fri, Feb 27, 2015 at 12:38 AM, Arush Kharbanda <
ar...@sigmoidanalytics.com> wrote:

> Can you share what error you are getting when the job fails.
>
> On Thu, Feb 26, 2015 at 4:32 AM, Darin McBeath <
> ddmcbe...@yahoo.com.invalid> wrote:
>
>> I'm using Spark 1.2, stand-alone cluster on ec2 I have a cluster of 8
>> r3.8xlarge machines but limit the job to only 128 cores.  I have also tried
>> other things such as setting 4 workers per r3.8xlarge and 67gb each but
>> this made no difference.
>>
>> The job frequently fails at the end in this step (saveasHadoopFile).   It
>> will sometimes work.
>>
>> finalNewBaselinePairRDD is hashPartitioned with 1024 partitions and a
>> total size around 1TB.  There are about 13.5M records in
>> finalNewBaselinePairRDD.  finalNewBaselinePairRDD is <String,String>
>>
>>
>> JavaPairRDD<Text, Text> finalBaselineRDDWritable =
>> finalNewBaselinePairRDD.mapToPair(new
>> ConvertToWritableTypes()).persist(StorageLevel.MEMORY_AND_DISK_SER());
>>
>> // Save to hdfs (gzip)
>> finalBaselineRDDWritable.saveAsHadoopFile("hdfs:///sparksync/",
>> Text.class, Text.class,
>> SequenceFileOutputFormat.class,org.apache.hadoop.io.compress.GzipCodec.class);
>>
>>
>> If anyone has any tips for what I should look into it would be
>> appreciated.
>>
>> Thanks.
>>
>> Darin.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
> --
>
> [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>
>
> *Arush Kharbanda* || Technical Teamlead
>
> ar...@sigmoidanalytics.com || www.sigmoidanalytics.com
>

Re: job keeps failing with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 1

Reply via email to