Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/89#discussion_r10559142
--- Diff: docs/configuration.md ---
@@ -487,6 +477,88 @@ Apart from these, the following properties are also
available, and may be useful
</tr>
</table>
+
+The following are the properties that can be used to schedule cleanup jobs
at different levels.
+The below mentioned metadata tuning parameters should be set with a lot of
consideration and only where required.
+Scheduling metadata cleaning in the middle of job can result in a lot of
unnecessary re-computations.
+
+<table class="table">
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
+<tr>
+ <td>spark.cleaner.ttl</td>
+ <td>(infinite)</td>
+ <td>
+ Duration (seconds) of how long Spark will remember any metadata
(stages generated, tasks generated, etc.).
+ Periodic cleanups will ensure that metadata older than this duration
will be forgetten. This is
+ useful for running Spark for many hours / days (for example, running
24/7 in case of Spark Streaming
+ applications). Note that any RDD that persists in memory for more than
this duration will be cleared as well.
+ </td>
+</tr>
+<tr>
+ <td>spark.cleaner.ttl.MAP_OUTPUT_TRACKER</td>
+ <td>spark.cleaner.ttl, with a min. value of 10 secs</td>
+ <td>
+ Cleans up the map containing the information of the mapper (the input
block manager Id and the output result size) corresponding to a shuffle Id.
+ </td>
--- End diff --
same for rest ...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---