Re: job keeps failing with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 1
Hi Darin, you might increase spark.yarn.executor.memoryOverhead to see if it fixes the problem. Please take a look of this report: https://issues.apache.org/jira/browse/SPARK-4996 On Fri, Feb 27, 2015 at 12:38 AM, Arush Kharbanda ar...@sigmoidanalytics.com wrote: Can you share what error you are getting when the job fails. On Thu, Feb 26, 2015 at 4:32 AM, Darin McBeath ddmcbe...@yahoo.com.invalid wrote: I'm using Spark 1.2, stand-alone cluster on ec2 I have a cluster of 8 r3.8xlarge machines but limit the job to only 128 cores. I have also tried other things such as setting 4 workers per r3.8xlarge and 67gb each but this made no difference. The job frequently fails at the end in this step (saveasHadoopFile). It will sometimes work. finalNewBaselinePairRDD is hashPartitioned with 1024 partitions and a total size around 1TB. There are about 13.5M records in finalNewBaselinePairRDD. finalNewBaselinePairRDD is String,String JavaPairRDDText, Text finalBaselineRDDWritable = finalNewBaselinePairRDD.mapToPair(new ConvertToWritableTypes()).persist(StorageLevel.MEMORY_AND_DISK_SER()); // Save to hdfs (gzip) finalBaselineRDDWritable.saveAsHadoopFile(hdfs:///sparksync/, Text.class, Text.class, SequenceFileOutputFormat.class,org.apache.hadoop.io.compress.GzipCodec.class); If anyone has any tips for what I should look into it would be appreciated. Thanks. Darin. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- [image: Sigmoid Analytics] http://htmlsig.com/www.sigmoidanalytics.com *Arush Kharbanda* || Technical Teamlead ar...@sigmoidanalytics.com || www.sigmoidanalytics.com
Re: job keeps failing with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 1
Can you share what error you are getting when the job fails. On Thu, Feb 26, 2015 at 4:32 AM, Darin McBeath ddmcbe...@yahoo.com.invalid wrote: I'm using Spark 1.2, stand-alone cluster on ec2 I have a cluster of 8 r3.8xlarge machines but limit the job to only 128 cores. I have also tried other things such as setting 4 workers per r3.8xlarge and 67gb each but this made no difference. The job frequently fails at the end in this step (saveasHadoopFile). It will sometimes work. finalNewBaselinePairRDD is hashPartitioned with 1024 partitions and a total size around 1TB. There are about 13.5M records in finalNewBaselinePairRDD. finalNewBaselinePairRDD is String,String JavaPairRDDText, Text finalBaselineRDDWritable = finalNewBaselinePairRDD.mapToPair(new ConvertToWritableTypes()).persist(StorageLevel.MEMORY_AND_DISK_SER()); // Save to hdfs (gzip) finalBaselineRDDWritable.saveAsHadoopFile(hdfs:///sparksync/, Text.class, Text.class, SequenceFileOutputFormat.class,org.apache.hadoop.io.compress.GzipCodec.class); If anyone has any tips for what I should look into it would be appreciated. Thanks. Darin. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- [image: Sigmoid Analytics] http://htmlsig.com/www.sigmoidanalytics.com *Arush Kharbanda* || Technical Teamlead ar...@sigmoidanalytics.com || www.sigmoidanalytics.com
job keeps failing with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 1
I'm using Spark 1.2, stand-alone cluster on ec2 I have a cluster of 8 r3.8xlarge machines but limit the job to only 128 cores. I have also tried other things such as setting 4 workers per r3.8xlarge and 67gb each but this made no difference. The job frequently fails at the end in this step (saveasHadoopFile). It will sometimes work. finalNewBaselinePairRDD is hashPartitioned with 1024 partitions and a total size around 1TB. There are about 13.5M records in finalNewBaselinePairRDD. finalNewBaselinePairRDD is String,String JavaPairRDDText, Text finalBaselineRDDWritable = finalNewBaselinePairRDD.mapToPair(new ConvertToWritableTypes()).persist(StorageLevel.MEMORY_AND_DISK_SER()); // Save to hdfs (gzip) finalBaselineRDDWritable.saveAsHadoopFile(hdfs:///sparksync/, Text.class, Text.class, SequenceFileOutputFormat.class,org.apache.hadoop.io.compress.GzipCodec.class); If anyone has any tips for what I should look into it would be appreciated. Thanks. Darin. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org