Github user holdenk commented on the issue: https://github.com/apache/spark/pull/14579 One minor thing to keep in mind - the subclassing of RDD approach could cause us to miss out on pipelining if the RDD was used again after it was unpersisted - but I think that is a relatively minor issue. On the whole I think modify base rdd and dataframe classes (option A / option 4) which is the one @MLnick has implemented here is probably one of the more reasonable options - the `with` statement doesn't add anything if the RDD/DataFrame isn't persisted but can do cleanup if it is. But if there is a better way to do this I'd be excited to find out as well :)
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org