job keeps failing with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 1

Darin McBeath Wed, 25 Feb 2015 15:07:51 -0800

I'm using Spark 1.2, stand-alone cluster on ec2 I have a cluster of 8 
r3.8xlarge machines but limit the job to only 128 cores.  I have also tried 
other things such as setting 4 workers per r3.8xlarge and 67gb each but this 
made no difference.


The job frequently fails at the end in this step (saveasHadoopFile).   It will 
sometimes work.

finalNewBaselinePairRDD is hashPartitioned with 1024 partitions and a total 
size around 1TB.  There are about 13.5M records in finalNewBaselinePairRDD.  
finalNewBaselinePairRDD is <String,String>


JavaPairRDD<Text, Text> finalBaselineRDDWritable = 
finalNewBaselinePairRDD.mapToPair(new 
ConvertToWritableTypes()).persist(StorageLevel.MEMORY_AND_DISK_SER());

// Save to hdfs (gzip)
finalBaselineRDDWritable.saveAsHadoopFile("hdfs:///sparksync/", Text.class, 
Text.class, 
SequenceFileOutputFormat.class,org.apache.hadoop.io.compress.GzipCodec.class); 


If anyone has any tips for what I should look into it would be appreciated.

Thanks.

Darin.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

job keeps failing with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 1

Reply via email to