Re: Running out of space (when there's no shortage)

2015-02-27 Thread Kelvin Chu
Hi Joe, you might increase spark.yarn.executor.memoryOverhead to see if it fixes the problem. Please take a look of this report: https://issues.apache.org/jira/browse/SPARK-4996 Hope this helps. On Tue, Feb 24, 2015 at 2:05 PM, Yiannis Gkoufas johngou...@gmail.com wrote: No problem, Joe. There

Running out of space (when there's no shortage)

2015-02-24 Thread Joe Wass
I'm running a cluster of 3 Amazon EC2 machines (small number because it's expensive when experiments keep crashing after a day!). Today's crash looks like this (stacktrace at end of message). org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 On my

Re: Running out of space (when there's no shortage)

2015-02-24 Thread Vladimir Rodionov
Usually it happens in Linux when application deletes file w/o double checking that there are no open FDs (resource leak). In this case, Linux holds all space allocated and does not release it until application exits (crashes in your case). You check file system and everything is normal, you have

Re: Running out of space (when there's no shortage)

2015-02-24 Thread Ted Yu
Here is a tool which may give you some clue: http://file-leak-detector.kohsuke.org/ Cheers On Tue, Feb 24, 2015 at 11:04 AM, Vladimir Rodionov vrodio...@splicemachine.com wrote: Usually it happens in Linux when application deletes file w/o double checking that there are no open FDs (resource

Re: Running out of space (when there's no shortage)

2015-02-24 Thread Yiannis Gkoufas
Hi there, I assume you are using spark 1.2.1 right? I faced the exact same issue and switched to 1.1.1 with the same configuration and it was solved. On 24 Feb 2015 19:22, Ted Yu yuzhih...@gmail.com wrote: Here is a tool which may give you some clue: http://file-leak-detector.kohsuke.org/

Re: Running out of space (when there's no shortage)

2015-02-24 Thread Yiannis Gkoufas
No problem, Joe. There you go https://issues.apache.org/jira/browse/SPARK-5081 And also there is this one https://issues.apache.org/jira/browse/SPARK-5715 which is marked as resolved On 24 February 2015 at 21:51, Joe Wass jw...@crossref.org wrote: Thanks everyone. Yiannis, do you know if