[ 
https://issues.apache.org/jira/browse/SPARK-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824360#comment-15824360
 ] 

Nicholas Chammas commented on SPARK-2141:
-----------------------------------------

I'd like to reopen this issue given the fact that the Scala and 
[Java|http://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html#getPersistentRDDs()]
 APIs have this, and that several people have commented saying they would find 
this useful.

I'll add my own use case to the list.

In the course of working around various Spark bugs that are mitigated by 
strategically placed calls to {{persist()}} 
([ex1|https://github.com/graphframes/graphframes/issues/159], 
[ex2|https://issues.apache.org/jira/browse/SPARK-18492]), it's useful to be 
able to unpersist all RDDs, or all RDDs that you don't have active references 
to, to free up memory. As others have noted, you can do this today by dipping 
into the Java context:

{code}
for (id, rdd) in spark.sparkContext._jsc.getPersistentRDDs().items():
    rdd.unpersist()
{code}

But I think it makes sense to add this directly to the Python API.

> Add sc.getPersistentRDDs() to PySpark
> -------------------------------------
>
>                 Key: SPARK-2141
>                 URL: https://issues.apache.org/jira/browse/SPARK-2141
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark
>    Affects Versions: 1.0.0
>            Reporter: Nicholas Chammas
>            Assignee: Kan Zhang
>
> PySpark does not appear to have {{sc.getPersistentRDDs()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to