Github user mallman commented on the issue: https://github.com/apache/spark/pull/19410 Hi @szhem. I'm sorry I haven't been more responsive here. I can relate to your frustration, and I do want to help you make progress on this PR and merge it in. I have indeed been busy with other responsibilities, but I can rededicate time to reviewing this PR. Of all the approaches you've proposed so far, I like the `ContextCleaner`-based one the best. Personally, I'm okay with setting `spark.cleaner.referenceTracking.cleanCheckpoints` to `true` by default for the next major Spark release and documenting this change of behavior in the release notes. However, that may not be okay with the senior maintainers. As an alternative I wonder if we could instead create a new config just for graph RDD checkpoint cleaning such as `spark.cleaner.referenceTracking.cleanGraphCheckpoints` and set that to `true` by default. Then use that config value in `PeriodicGraphCheckpointer` instead of `spark.cleaner.referenceTracking.cleanCheckpoints`. Would you be willing to open another PR with your `ContextCleaner`-based approach? I'm not suggesting you close this PR. We can call each PR alternative solutions for the same JIRA ticket and cross-reference each PR. If you do that then I will try to debug the problem with the retained checkpoints. Thank you.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org