[ https://issues.apache.org/jira/browse/SPARK-10942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rekha Joshi updated SPARK-10942: -------------------------------- Attachment: SPARK-10942_3.png SPARK-10942_2.png SPARK-10942_1.png SPARK-10942: TestStreaming job ran for checking cache and storage scenario.So far for my runs, the storage gets cleared out. > Not all cached RDDs are unpersisted > ----------------------------------- > > Key: SPARK-10942 > URL: https://issues.apache.org/jira/browse/SPARK-10942 > Project: Spark > Issue Type: Bug > Components: Streaming > Reporter: Nick Pritchard > Attachments: SPARK-10942_1.png, SPARK-10942_2.png, SPARK-10942_3.png > > > I have a Spark Streaming application that caches RDDs inside of a > {{transform}} closure. Looking at the Spark UI, it seems that most of these > RDDs are unpersisted after the batch completes, but not all. > I have copied a minimal reproducible example below to highlight the problem. > I run this and monitor the Spark UI "Storage" tab. The example generates and > caches 30 RDDs, and I see most get cleaned up. However in the end, some still > remain cached. There is some randomness going on because I see different RDDs > remain cached for each run. > I have marked this as Major because I haven't been able to workaround it and > it is a memory leak for my application. I tried setting {{spark.cleaner.ttl}} > but that did not change anything. > {code} > val inputRDDs = mutable.Queue.tabulate(30) { i => > sc.parallelize(Seq(i)) > } > val input: DStream[Int] = ssc.queueStream(inputRDDs) > val output = input.transform { rdd => > if (rdd.isEmpty()) { > rdd > } else { > val rdd2 = rdd.map(identity) > rdd2.setName(rdd.first().toString) > rdd2.cache() > val rdd3 = rdd2.map(identity) > rdd3 > } > } > output.print() > ssc.start() > ssc.awaitTermination() > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org