[ 
https://issues.apache.org/jira/browse/SPARK-18361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-18361.
-------------------------------
          Resolution: Fixed
       Fix Version/s: 2.1.0
    Target Version/s: 2.1.0

> Expose RDD localCheckpoint in PySpark
> -------------------------------------
>
>                 Key: SPARK-18361
>                 URL: https://issues.apache.org/jira/browse/SPARK-18361
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark
>            Reporter: Gabriel Huang
>            Assignee: Gabriel Huang
>             Fix For: 2.1.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> As of today, I could not access rdd.localCheckpoint() in pyspark.
> This is an important issue for machine learning people, as we often have to 
> iterate algorithms and perform operations like joins in each iteration. 
> If the lineage is not truncated, the memory usage, the lineage, and 
> computation time explode. rdd.localCheckpoint()  seems like the most 
> straightforward way of truncating the lineage, but the python API does not 
> expose it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to