[ https://issues.apache.org/jira/browse/SPARK-20084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941012#comment-15941012 ]
Apache Spark commented on SPARK-20084: -------------------------------------- User 'rdblue' has created a pull request for this issue: https://github.com/apache/spark/pull/17412 > Remove internal.metrics.updatedBlockStatuses accumulator from history files > --------------------------------------------------------------------------- > > Key: SPARK-20084 > URL: https://issues.apache.org/jira/browse/SPARK-20084 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Web UI > Affects Versions: 2.1.0 > Reporter: Ryan Blue > > History files for large jobs can be hundreds of GB. These history files take > too much space and create a backlog on the history server. > Most of the size is from Accumulables in SparkListenerTaskEnd. The largest > accumulable is internal.metrics.updatedBlockStatuses, which has a small > update (the blocks that were changed) but a huge value (all known blocks). > Nothing currently uses the accumulator value or update, so it is safe to > remove it. Information for any block updated during a task is also recorded > under Task Metrics / Updated Blocks. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org