Github user nchammas commented on the issue: https://github.com/apache/spark/pull/14579 Ah, I see. I don't fully understand how `PipelinedRDD` works or how it is used so I'll have to defer to y'all on this. Does the `cached()` utility method have this same problem? > We could possibly work around it with some type checking etc but it then starts to feel like adding more complexity than the feature is worth... Agreed. At this point, actually, I'm beginning to feel this feature is not worth it. Context managers seem to work best when the objects they're working on have clear open/close-style semantics. File handles, network connections, and the like fit this pattern well. In fact, the [doc for `with`](https://docs.python.org/3/reference/compound_stmts.html#the-with-statement) says: > This allows common `try...except...finally` usage patterns to be encapsulated for convenient reuse. RDDs and DataFrames, on the other hand, don't have a simple open/close or `try...except...finally` pattern. And when we try to map one onto persist and unpersist, we get the various side-effects we've been discussing here.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org