GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/10534

    [SPARK-7689][WIP] Remove TTL-based metadata cleaning

    This PR removes `spark.cleaner.ttl` and the associated TTL-based metadata 
cleaning code.
    
    Now that we have the `ContextCleaner` and a timer to trigger periodic GCs, 
I don't think that `spark.cleaner.ttl` is necessary anymore. The TTL-based 
cleaning isn't enabled by default, isn't included in our end-to-end tests, and 
has been a source of user confusion when it is misconfigured. If the TTL is set 
too low, data which is still being used may be evicted / deleted, leading to 
hard to diagnose bugs.
    
    For all of these reasons, I think that we should remove this functionality 
in Spark 2.0. Additional benefits of doing this include marginally reduced 
memory usage, since we no longer need to store timetsamps in hashmaps, and a 
handful fewer threads.
    
    This PR is WIP pending discussion, a cleanup in an unrelated test suite, 
and a second pass to check for any thread-safety issues (TimeStampedHashMap 
happened to be thread-safe, so we need to figure out whether its usages 
required that thread-safety and whether we preserved it).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark remove-ttl-based-cleaning

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10534.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10534
    
----
commit 942763555eafaddd5dd9ef2f9a1117c4987c6e88
Author: Josh Rosen <joshro...@databricks.com>
Date:   2015-12-31T00:07:30Z

    Remove MapOutputTracker cleaner.

commit 23669a7f04c801da5e23fe6ac1f479e28016af2e
Author: Josh Rosen <joshro...@databricks.com>
Date:   2015-12-31T00:11:59Z

    Remove from HttpBroadcast

commit f2c2f5dd5820a41e31ef73b5b918299649f8cd72
Author: Josh Rosen <joshro...@databricks.com>
Date:   2015-12-31T00:14:03Z

    Remove from BlockManager.

commit 3940e976005cc6064ba903082d8e9918ed5708ff
Author: Josh Rosen <joshro...@databricks.com>
Date:   2015-12-31T00:19:54Z

    Delete TimeStampedHashSet

commit 98b732a554216e0164bf01bdb547387f25dea7d4
Author: Josh Rosen <joshro...@databricks.com>
Date:   2015-12-31T00:41:31Z

    All of the rest of the changes.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to