[ https://issues.apache.org/jira/browse/SPARK-18361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Or resolved SPARK-18361. ------------------------------- Resolution: Fixed Fix Version/s: 2.1.0 Target Version/s: 2.1.0 > Expose RDD localCheckpoint in PySpark > ------------------------------------- > > Key: SPARK-18361 > URL: https://issues.apache.org/jira/browse/SPARK-18361 > Project: Spark > Issue Type: New Feature > Components: PySpark > Reporter: Gabriel Huang > Assignee: Gabriel Huang > Fix For: 2.1.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > As of today, I could not access rdd.localCheckpoint() in pyspark. > This is an important issue for machine learning people, as we often have to > iterate algorithms and perform operations like joins in each iteration. > If the lineage is not truncated, the memory usage, the lineage, and > computation time explode. rdd.localCheckpoint() seems like the most > straightforward way of truncating the lineage, but the python API does not > expose it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org