I was having similar issues with my persistent RDDs. After some digging around, I noticed that the partitions were not balanced evenly across the available nodes. After a "repartition", the RDD was spread evenly across all available memory. Not sure if that is something that would help your use-case though. You could also increase the spark.storage.memoryFraction if that is an option.
- Ranga On Wed, Dec 10, 2014 at 10:23 PM, Aaron Davidson <ilike...@gmail.com> wrote: > The ContextCleaner uncaches RDDs that have gone out of scope on the > driver. So it's possible that the given RDD is no longer reachable in your > program's control flow, or else it'd be a bug in the ContextCleaner. > > On Wed, Dec 10, 2014 at 5:34 PM, ankits <ankitso...@gmail.com> wrote: > >> I'm using spark 1.1.0 and am seeing persisted RDDs being cleaned up too >> fast. >> How can i inspect the size of RDD in memory and get more information about >> why it was cleaned up. There should be more than enough memory available >> on >> the cluster to store them, and by default, the spark.cleaner.ttl is >> infinite, so I want more information about why this is happening and how >> to >> prevent it. >> >> Spark just logs this when removing RDDs: >> >> [2014-12-11 01:19:34,006] INFO spark.storage.BlockManager [] [] - >> Removing >> RDD 33 >> [2014-12-11 01:19:34,010] INFO pache.spark.ContextCleaner [] >> [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33 >> [2014-12-11 01:19:34,012] INFO spark.storage.BlockManager [] [] - >> Removing >> RDD 33 >> [2014-12-11 01:19:34,016] INFO pache.spark.ContextCleaner [] >> [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33 >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-being-cleaned-too-fast-tp20613.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >