Hi Joe, you might increase spark.yarn.executor.memoryOverhead to see if it
fixes the problem. Please take a look of this report:
https://issues.apache.org/jira/browse/SPARK-4996
Hope this helps.
On Tue, Feb 24, 2015 at 2:05 PM, Yiannis Gkoufas johngou...@gmail.com
wrote:
No problem, Joe. There
I'm running a cluster of 3 Amazon EC2 machines (small number because it's
expensive when experiments keep crashing after a day!).
Today's crash looks like this (stacktrace at end of message).
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output
location for shuffle 0
On my
Usually it happens in Linux when application deletes file w/o double
checking that there are no open FDs (resource leak). In this case, Linux
holds all space allocated and does not release it until application exits
(crashes in your case). You check file system and everything is normal, you
have
Here is a tool which may give you some clue:
http://file-leak-detector.kohsuke.org/
Cheers
On Tue, Feb 24, 2015 at 11:04 AM, Vladimir Rodionov
vrodio...@splicemachine.com wrote:
Usually it happens in Linux when application deletes file w/o double
checking that there are no open FDs (resource
Hi there,
I assume you are using spark 1.2.1 right?
I faced the exact same issue and switched to 1.1.1 with the same
configuration and it was solved.
On 24 Feb 2015 19:22, Ted Yu yuzhih...@gmail.com wrote:
Here is a tool which may give you some clue:
http://file-leak-detector.kohsuke.org/
No problem, Joe. There you go
https://issues.apache.org/jira/browse/SPARK-5081
And also there is this one https://issues.apache.org/jira/browse/SPARK-5715
which is marked as resolved
On 24 February 2015 at 21:51, Joe Wass jw...@crossref.org wrote:
Thanks everyone.
Yiannis, do you know if