Hi Darin, you might increase spark.yarn.executor.memoryOverhead to see if it fixes the problem. Please take a look of this report: https://issues.apache.org/jira/browse/SPARK-4996
On Fri, Feb 27, 2015 at 12:38 AM, Arush Kharbanda < ar...@sigmoidanalytics.com> wrote: > Can you share what error you are getting when the job fails. > > On Thu, Feb 26, 2015 at 4:32 AM, Darin McBeath < > ddmcbe...@yahoo.com.invalid> wrote: > >> I'm using Spark 1.2, stand-alone cluster on ec2 I have a cluster of 8 >> r3.8xlarge machines but limit the job to only 128 cores. I have also tried >> other things such as setting 4 workers per r3.8xlarge and 67gb each but >> this made no difference. >> >> The job frequently fails at the end in this step (saveasHadoopFile). It >> will sometimes work. >> >> finalNewBaselinePairRDD is hashPartitioned with 1024 partitions and a >> total size around 1TB. There are about 13.5M records in >> finalNewBaselinePairRDD. finalNewBaselinePairRDD is <String,String> >> >> >> JavaPairRDD<Text, Text> finalBaselineRDDWritable = >> finalNewBaselinePairRDD.mapToPair(new >> ConvertToWritableTypes()).persist(StorageLevel.MEMORY_AND_DISK_SER()); >> >> // Save to hdfs (gzip) >> finalBaselineRDDWritable.saveAsHadoopFile("hdfs:///sparksync/", >> Text.class, Text.class, >> SequenceFileOutputFormat.class,org.apache.hadoop.io.compress.GzipCodec.class); >> >> >> If anyone has any tips for what I should look into it would be >> appreciated. >> >> Thanks. >> >> Darin. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > > > -- > > [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com> > > *Arush Kharbanda* || Technical Teamlead > > ar...@sigmoidanalytics.com || www.sigmoidanalytics.com >