Re: RDDs being cleaned too fast

Ranga Thu, 11 Dec 2014 13:55:17 -0800

I was having similar issues with my persistent RDDs. After some digging
around, I noticed that the partitions were not balanced evenly across the
available nodes. After a "repartition", the RDD was spread evenly across
all available memory. Not sure if that is something that would help your
use-case though.
You could also increase the spark.storage.memoryFraction if that is an
option.



- Ranga

On Wed, Dec 10, 2014 at 10:23 PM, Aaron Davidson <ilike...@gmail.com> wrote:

> The ContextCleaner uncaches RDDs that have gone out of scope on the
> driver. So it's possible that the given RDD is no longer reachable in your
> program's control flow, or else it'd be a bug in the ContextCleaner.
>
> On Wed, Dec 10, 2014 at 5:34 PM, ankits <ankitso...@gmail.com> wrote:
>
>> I'm using spark 1.1.0 and am seeing persisted RDDs being cleaned up too
>> fast.
>> How can i inspect the size of RDD in memory and get more information about
>> why it was cleaned up. There should be more than enough memory available
>> on
>> the cluster to store them, and by default, the spark.cleaner.ttl is
>> infinite, so I want more information about why this is happening and how
>> to
>> prevent it.
>>
>> Spark just logs this when removing RDDs:
>>
>> [2014-12-11 01:19:34,006] INFO  spark.storage.BlockManager [] [] -
>> Removing
>> RDD 33
>> [2014-12-11 01:19:34,010] INFO  pache.spark.ContextCleaner []
>> [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33
>> [2014-12-11 01:19:34,012] INFO  spark.storage.BlockManager [] [] -
>> Removing
>> RDD 33
>> [2014-12-11 01:19:34,016] INFO  pache.spark.ContextCleaner []
>> [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-being-cleaned-too-fast-tp20613.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: RDDs being cleaned too fast

Reply via email to