GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/10534
[SPARK-7689][WIP] Remove TTL-based metadata cleaning This PR removes `spark.cleaner.ttl` and the associated TTL-based metadata cleaning code. Now that we have the `ContextCleaner` and a timer to trigger periodic GCs, I don't think that `spark.cleaner.ttl` is necessary anymore. The TTL-based cleaning isn't enabled by default, isn't included in our end-to-end tests, and has been a source of user confusion when it is misconfigured. If the TTL is set too low, data which is still being used may be evicted / deleted, leading to hard to diagnose bugs. For all of these reasons, I think that we should remove this functionality in Spark 2.0. Additional benefits of doing this include marginally reduced memory usage, since we no longer need to store timetsamps in hashmaps, and a handful fewer threads. This PR is WIP pending discussion, a cleanup in an unrelated test suite, and a second pass to check for any thread-safety issues (TimeStampedHashMap happened to be thread-safe, so we need to figure out whether its usages required that thread-safety and whether we preserved it). You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark remove-ttl-based-cleaning Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10534.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10534 ---- commit 942763555eafaddd5dd9ef2f9a1117c4987c6e88 Author: Josh Rosen <joshro...@databricks.com> Date: 2015-12-31T00:07:30Z Remove MapOutputTracker cleaner. commit 23669a7f04c801da5e23fe6ac1f479e28016af2e Author: Josh Rosen <joshro...@databricks.com> Date: 2015-12-31T00:11:59Z Remove from HttpBroadcast commit f2c2f5dd5820a41e31ef73b5b918299649f8cd72 Author: Josh Rosen <joshro...@databricks.com> Date: 2015-12-31T00:14:03Z Remove from BlockManager. commit 3940e976005cc6064ba903082d8e9918ed5708ff Author: Josh Rosen <joshro...@databricks.com> Date: 2015-12-31T00:19:54Z Delete TimeStampedHashSet commit 98b732a554216e0164bf01bdb547387f25dea7d4 Author: Josh Rosen <joshro...@databricks.com> Date: 2015-12-31T00:41:31Z All of the rest of the changes. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org