[ https://issues.apache.org/jira/browse/SPARK-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824360#comment-15824360 ]
Nicholas Chammas commented on SPARK-2141: ----------------------------------------- I'd like to reopen this issue given the fact that the Scala and [Java|http://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html#getPersistentRDDs()] APIs have this, and that several people have commented saying they would find this useful. I'll add my own use case to the list. In the course of working around various Spark bugs that are mitigated by strategically placed calls to {{persist()}} ([ex1|https://github.com/graphframes/graphframes/issues/159], [ex2|https://issues.apache.org/jira/browse/SPARK-18492]), it's useful to be able to unpersist all RDDs, or all RDDs that you don't have active references to, to free up memory. As others have noted, you can do this today by dipping into the Java context: {code} for (id, rdd) in spark.sparkContext._jsc.getPersistentRDDs().items(): rdd.unpersist() {code} But I think it makes sense to add this directly to the Python API. > Add sc.getPersistentRDDs() to PySpark > ------------------------------------- > > Key: SPARK-2141 > URL: https://issues.apache.org/jira/browse/SPARK-2141 > Project: Spark > Issue Type: New Feature > Components: PySpark > Affects Versions: 1.0.0 > Reporter: Nicholas Chammas > Assignee: Kan Zhang > > PySpark does not appear to have {{sc.getPersistentRDDs()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org