hmm..
33.6gb is sum of the memory used by the two RDD that is cached.  You're
right when I put serialized RDDs in the cache, the memory foot print for
these rdds become a lot smaller.

Serialized Memory footprint shown below:
RDD Name        Storage Level   Cached Partitions       Fraction Cached Size in 
Memory  Size
in Tachyon      Size on Disk
2       Memory Serialized 1x Replicated 239     100%    3.1 GB  0.0 B   0.0 B
5       Memory Serialized 1x Replicated 100     100%    1254.9 MB       0.0 B   
0.0 B

I don't know what is 73.7 reflective of.  I am able to verify in the
application UI, I am able to see 4.3 GB Used  out of (73.7 GB Total) by the
cahced RDD.  I am not sure how that is 73.7 is  calculated.  I have
following configuration:

conf.set("spark.storage.memoryFraction", "0.9");
conf.set("spark.shuffle.memoryFraction","0.1");

Based on my understanding, 0.9 * 95g (memory allocated to the driver) = 85.5
g should be the available memory, correct?  Out of which 1 % is taken out
for shuffle(~85.5-8.55=76.95)! which would lead to  76.95 gb usable memory. 
Is that right?  The two RDD that is cached is not using nearly as much.  

The two systematic problem that I am avoiding is MAX_INTEGER and Requested
array size exceeds VM limit  No matter how much I tweak the
parallelism/memory configuration, there seems to be little or no impact.  Is
there someone, who can help me understand the internals, so that I can get
this working.  I know this platform is great viable solution for the use
case we have in mind, if I can get it running successfully.  At this point,
the data size is not that huge compared to some white papers that are
published.  So, I am thinking it boils down to the configuration and
validating what I have with an expert.  We can take this offline, if need
be.  Please feel free to email me directly.

Thank you,

Ami




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Help-understanding-Not-enough-space-to-cache-rdd-tp20186p20269.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to