I am running in local mode. I am using google n1-highmem-16 (16 vCPU, 104 GB memory) machine.
I have allocated the SPARK_DRIVER_MEMORY=95g I see Memory: 33.6 GB Used (73.7 GB Total) that the exeuctor is using. In the log out put below, I see 33.6 gb blocks are used by 2 rdds that I have cached. I should still have 40.2 gb left. However, I see messages like: 14/12/02 18:15:04 WARN storage.MemoryStore: Not enough space to cache rdd_15_9 in memory! (computed 8.1 GB so far) 14/12/02 18:15:04 INFO storage.MemoryStore: Memory use = 33.6 GB (blocks) + 40.1 GB (scratch space shared across 14 thread(s)) = 73.7 GB. Storage limit = 73.7 GB. 14/12/02 18:15:04 WARN spark.CacheManager: Persisting partition rdd_15_9 to disk instead. . . . . further down I see: 4/12/02 18:30:08 INFO storage.BlockManagerInfo: Added rdd_15_9 on disk on localhost:41889 (size: 6.9 GB) 4/12/02 18:30:08 INFO storage.BlockManagerMaster: Updated info of block rdd_15_9 14/12/02 18:30:08 ERROR executor.Executor: Exception in task 9.0 in stage 2.0 (TID 348) java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE I don't understand couple of things: 1) In this case, I am joining 2 RDDs (size 16.3 G and 17.2 GB) both rdds are create from reading from HDFS files. The size of each .part is 24.87 MB, I am reading this files into 250 partitions, so I shouldn't have any individual partition over 25MB, so how could rdd_15_9 have 8.1g? 2) Even if the data is 8.1g, spark should have enough memory to write, but I would expect Integer.MAX_VALUE 2gb limitation! However, I don't get that error message, and partial dataset is written to disk (6.9 gb). I don't understand how and why only partial dataset is written. 3) Why do get "java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE" after writing partial dataset. I would love to hear from anyone that can shed some light into this... None -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Help-understanding-Not-enough-space-to-cache-rdd-tp20186.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org