cache() s...

holdenk Wed, 10 Aug 2016 11:28:09 -0700

Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/14579
  
    One minor thing to keep in mind - the subclassing of RDD approach could 
cause us to miss out on pipelining if the RDD was used again after it was 
unpersisted - but I think that is a relatively minor issue.
    
    On the whole I think modify base rdd and dataframe classes (option A / 
option 4) which is the one @MLnick has implemented here is probably one of the 
more reasonable options - the `with` statement doesn't add anything if the 
RDD/DataFrame isn't persisted but can do cleanup if it is.
    
    But if there is a better way to do this I'd be excited to find out as well 
:)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() s...

Reply via email to