Re: job keeps failing with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 1

2015-02-27 Thread Kelvin Chu
Hi Darin, you might increase spark.yarn.executor.memoryOverhead to see if
it fixes the problem. Please take a look of this report:
https://issues.apache.org/jira/browse/SPARK-4996

On Fri, Feb 27, 2015 at 12:38 AM, Arush Kharbanda 
ar...@sigmoidanalytics.com wrote:

 Can you share what error you are getting when the job fails.

 On Thu, Feb 26, 2015 at 4:32 AM, Darin McBeath 
 ddmcbe...@yahoo.com.invalid wrote:

 I'm using Spark 1.2, stand-alone cluster on ec2 I have a cluster of 8
 r3.8xlarge machines but limit the job to only 128 cores.  I have also tried
 other things such as setting 4 workers per r3.8xlarge and 67gb each but
 this made no difference.

 The job frequently fails at the end in this step (saveasHadoopFile).   It
 will sometimes work.

 finalNewBaselinePairRDD is hashPartitioned with 1024 partitions and a
 total size around 1TB.  There are about 13.5M records in
 finalNewBaselinePairRDD.  finalNewBaselinePairRDD is String,String


 JavaPairRDDText, Text finalBaselineRDDWritable =
 finalNewBaselinePairRDD.mapToPair(new
 ConvertToWritableTypes()).persist(StorageLevel.MEMORY_AND_DISK_SER());

 // Save to hdfs (gzip)
 finalBaselineRDDWritable.saveAsHadoopFile(hdfs:///sparksync/,
 Text.class, Text.class,
 SequenceFileOutputFormat.class,org.apache.hadoop.io.compress.GzipCodec.class);


 If anyone has any tips for what I should look into it would be
 appreciated.

 Thanks.

 Darin.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




 --

 [image: Sigmoid Analytics] http://htmlsig.com/www.sigmoidanalytics.com

 *Arush Kharbanda* || Technical Teamlead

 ar...@sigmoidanalytics.com || www.sigmoidanalytics.com



Re: job keeps failing with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 1

2015-02-27 Thread Arush Kharbanda
Can you share what error you are getting when the job fails.

On Thu, Feb 26, 2015 at 4:32 AM, Darin McBeath ddmcbe...@yahoo.com.invalid
wrote:

 I'm using Spark 1.2, stand-alone cluster on ec2 I have a cluster of 8
 r3.8xlarge machines but limit the job to only 128 cores.  I have also tried
 other things such as setting 4 workers per r3.8xlarge and 67gb each but
 this made no difference.

 The job frequently fails at the end in this step (saveasHadoopFile).   It
 will sometimes work.

 finalNewBaselinePairRDD is hashPartitioned with 1024 partitions and a
 total size around 1TB.  There are about 13.5M records in
 finalNewBaselinePairRDD.  finalNewBaselinePairRDD is String,String


 JavaPairRDDText, Text finalBaselineRDDWritable =
 finalNewBaselinePairRDD.mapToPair(new
 ConvertToWritableTypes()).persist(StorageLevel.MEMORY_AND_DISK_SER());

 // Save to hdfs (gzip)
 finalBaselineRDDWritable.saveAsHadoopFile(hdfs:///sparksync/,
 Text.class, Text.class,
 SequenceFileOutputFormat.class,org.apache.hadoop.io.compress.GzipCodec.class);


 If anyone has any tips for what I should look into it would be appreciated.

 Thanks.

 Darin.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-- 

[image: Sigmoid Analytics] http://htmlsig.com/www.sigmoidanalytics.com

*Arush Kharbanda* || Technical Teamlead

ar...@sigmoidanalytics.com || www.sigmoidanalytics.com


job keeps failing with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 1

2015-02-25 Thread Darin McBeath
I'm using Spark 1.2, stand-alone cluster on ec2 I have a cluster of 8 
r3.8xlarge machines but limit the job to only 128 cores.  I have also tried 
other things such as setting 4 workers per r3.8xlarge and 67gb each but this 
made no difference.

The job frequently fails at the end in this step (saveasHadoopFile).   It will 
sometimes work.

finalNewBaselinePairRDD is hashPartitioned with 1024 partitions and a total 
size around 1TB.  There are about 13.5M records in finalNewBaselinePairRDD.  
finalNewBaselinePairRDD is String,String


JavaPairRDDText, Text finalBaselineRDDWritable = 
finalNewBaselinePairRDD.mapToPair(new 
ConvertToWritableTypes()).persist(StorageLevel.MEMORY_AND_DISK_SER());

// Save to hdfs (gzip)
finalBaselineRDDWritable.saveAsHadoopFile(hdfs:///sparksync/, Text.class, 
Text.class, 
SequenceFileOutputFormat.class,org.apache.hadoop.io.compress.GzipCodec.class); 


If anyone has any tips for what I should look into it would be appreciated.

Thanks.

Darin.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org