Re: Help understanding - Not enough space to cache rdd

akhandeshi Wed, 03 Dec 2014 09:13:04 -0800

hmm..
33.6gb is sum of the memory used by the two RDD that is cached.  You're
right when I put serialized RDDs in the cache, the memory foot print for
these rdds become a lot smaller.

Serialized Memory footprint shown below:
RDD Name Storage Level Cached Partitions Fraction Cached Size in
Memory Size
in Tachyon Size on Disk
2 Memory Serialized 1x Replicated 239 100% 3.1 GB 0.0 B 0.0 B
5 Memory Serialized 1x Replicated 100 100% 1254.9 MB 0.0 B
0.0 B

I don't know what is 73.7 reflective of. I am able to verify in the
application UI, I am able to see 4.3 GB Used out of (73.7 GB Total) by the
cahced RDD. I am not sure how that is 73.7 is calculated. I have
following configuration:

conf.set("spark.storage.memoryFraction", "0.9");
conf.set("spark.shuffle.memoryFraction","0.1");

Based on my understanding, 0.9 * 95g (memory allocated to the driver) = 85.5
g should be the available memory, correct? Out of which 1 % is taken out
for shuffle(~85.5-8.55=76.95)! which would lead to 76.95 gb usable memory.
Is that right? The two RDD that is cached is not using nearly as much.

The two systematic problem that I am avoiding is MAX_INTEGER and Requested
array size exceeds VM limit No matter how much I tweak the
parallelism/memory configuration, there seems to be little or no impact. Is
there someone, who can help me understand the internals, so that I can get
this working. I know this platform is great viable solution for the use
case we have in mind, if I can get it running successfully. At this point,
the data size is not that huge compared to some white papers that are
published. So, I am thinking it boils down to the configuration and
validating what I have with an expert. We can take this offline, if need
be. Please feel free to email me directly.

Thank you,

Ami

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Help-understanding-Not-enough-space-to-cache-rdd-tp20186p20269.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Help understanding - Not enough space to cache rdd

Reply via email to