On Fri, Aug 1, 2014 at 12:39 PM, Sean Owen <so...@cloudera.com> wrote:

Isn't this your worker running out of its memory for computations,
> rather than for caching RDDs?
>
I’m not sure how to interpret the stack trace, but let’s say that’s true.
I’m even seeing this with a simple a = sc.textFile().cache() and then
a.count(). Spark shouldn’t need that much memory for this kind of work, no?

then the answer is that you should tell
> it to use less memory for caching.
>
I can try that. That’s done by changing spark.storage.memoryFraction, right?

This still seems strange though. The default fraction of the JVM left for
non-cache activity (1 - 0.6 = 40%
<http://spark.apache.org/docs/latest/configuration.html#execution-behavior>)
should be plenty for just counting elements. I’m using m1.xlarge nodes that
have 15GB of memory apiece.

Nick
​

Reply via email to