hmm.. 33.6gb is sum of the memory used by the two RDD that is cached. You're right when I put serialized RDDs in the cache, the memory foot print for these rdds become a lot smaller.
Serialized Memory footprint shown below: RDD Name Storage Level Cached Partitions Fraction Cached Size in Memory Size in Tachyon Size on Disk 2 Memory Serialized 1x Replicated 239 100% 3.1 GB 0.0 B 0.0 B 5 Memory Serialized 1x Replicated 100 100% 1254.9 MB 0.0 B 0.0 B I don't know what is 73.7 reflective of. I am able to verify in the application UI, I am able to see 4.3 GB Used out of (73.7 GB Total) by the cahced RDD. I am not sure how that is 73.7 is calculated. I have following configuration: conf.set("spark.storage.memoryFraction", "0.9"); conf.set("spark.shuffle.memoryFraction","0.1"); Based on my understanding, 0.9 * 95g (memory allocated to the driver) = 85.5 g should be the available memory, correct? Out of which 1 % is taken out for shuffle(~85.5-8.55=76.95)! which would lead to 76.95 gb usable memory. Is that right? The two RDD that is cached is not using nearly as much. The two systematic problem that I am avoiding is MAX_INTEGER and Requested array size exceeds VM limit No matter how much I tweak the parallelism/memory configuration, there seems to be little or no impact. Is there someone, who can help me understand the internals, so that I can get this working. I know this platform is great viable solution for the use case we have in mind, if I can get it running successfully. At this point, the data size is not that huge compared to some white papers that are published. So, I am thinking it boils down to the configuration and validating what I have with an expert. We can take this offline, if need be. Please feel free to email me directly. Thank you, Ami -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Help-understanding-Not-enough-space-to-cache-rdd-tp20186p20269.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org