Re: job keeps failing with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 1

2015-02-27 Thread Kelvin Chu
Hi Darin, you might increase spark.yarn.executor.memoryOverhead to see if it fixes the problem. Please take a look of this report: https://issues.apache.org/jira/browse/SPARK-4996 On Fri, Feb 27, 2015 at 12:38 AM, Arush Kharbanda ar...@sigmoidanalytics.com wrote: Can you share what error you

Re: job keeps failing with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 1

2015-02-27 Thread Arush Kharbanda
Can you share what error you are getting when the job fails. On Thu, Feb 26, 2015 at 4:32 AM, Darin McBeath ddmcbe...@yahoo.com.invalid wrote: I'm using Spark 1.2, stand-alone cluster on ec2 I have a cluster of 8 r3.8xlarge machines but limit the job to only 128 cores. I have also tried

job keeps failing with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 1

2015-02-25 Thread Darin McBeath
I'm using Spark 1.2, stand-alone cluster on ec2 I have a cluster of 8 r3.8xlarge machines but limit the job to only 128 cores. I have also tried other things such as setting 4 workers per r3.8xlarge and 67gb each but this made no difference. The job frequently fails at the end in this step