[ https://issues.apache.org/jira/browse/SPARK-42204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen reassigned SPARK-42204: ---------------------------------- Assignee: Josh Rosen > Remove redundant logging of TaskMetrics internal accumulators in JsonProtocol > event logs > ---------------------------------------------------------------------------------------- > > Key: SPARK-42204 > URL: https://issues.apache.org/jira/browse/SPARK-42204 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.0.0 > Reporter: Josh Rosen > Assignee: Josh Rosen > Priority: Major > > Spark's JsonProtocol event logs (used by the history server) contain > redundancy in how TaskMetrics are represented in SparkListenerTaskEnd events: > * The "Task Metrics" field is a map from metric names to values. > * Under the hood, each metric is implemented using an accumulator and those > accumulator values are redundantly stored in the `Task Info`.`Accumulables` > field. These Accumulable entries contain the metric value from the task, plus > the cumulative "sum-so-far" from the completed tasks in that stage. > The Spark History Server doesn't rely on the redundant information in the > Accumulables field. > I believe that this redundancy was introduced back in SPARK-10620 when Spark > 1.x's separate TaskMetrics implementation was replaced by the current > accumulator-based version. > I think that we should eliminate this redundancy by skipping JsonProtocol > logging of the TaskMetric accumulators. Although I think it's somewhat > unlikely that third-party code is relying on the presence of that redundant > information, I think we should hedge by adding an internal configuration flag > to re-enable the redundant logging if needed. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org