[ https://issues.apache.org/jira/browse/SPARK-16921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414840#comment-15414840 ]
Nick Pentreath commented on SPARK-16921: ---------------------------------------- By the way for BC vars, I wonder if {{__exit__}} should call {{unpersist}} or {{destroy}}? Probably {{destroy}} as it's more along the lines of {{close}} type of semantics > RDD/DataFrame persist() and cache() should return Python context managers > ------------------------------------------------------------------------- > > Key: SPARK-16921 > URL: https://issues.apache.org/jira/browse/SPARK-16921 > Project: Spark > Issue Type: New Feature > Components: PySpark, Spark Core, SQL > Reporter: Nicholas Chammas > Priority: Minor > > [Context > managers|https://docs.python.org/3/reference/datamodel.html#context-managers] > are a natural way to capture closely related setup and teardown code in > Python. > For example, they are commonly used when doing file I/O: > {code} > with open('/path/to/file') as f: > contents = f.read() > ... > {code} > Once the program exits the with block, {{f}} is automatically closed. > I think it makes sense to apply this pattern to persisting and unpersisting > DataFrames and RDDs. There are many cases when you want to persist a > DataFrame for a specific set of operations and then unpersist it immediately > afterwards. > For example, take model training. Today, you might do something like this: > {code} > labeled_data.persist() > model = pipeline.fit(labeled_data) > labeled_data.unpersist() > {code} > If {{persist()}} returned a context manager, you could rewrite this as > follows: > {code} > with labeled_data.persist(): > model = pipeline.fit(labeled_data) > {code} > Upon exiting the {{with}} block, {{labeled_data}} would automatically be > unpersisted. > This can be done in a backwards-compatible way since {{persist()}} would > still return the parent DataFrame or RDD as it does today, but add two > methods to the object: {{\_\_enter\_\_()}} and {{\_\_exit\_\_()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org